End-users must take joint responsibility for suffering downtime during the recent Amazon Web Services (AWS) outage, with businesses deploying amateur migrations based on little or no expert cloud knowledge.
Highlighting the need for greater cloud consultancy partners in New Zealand, the customer chaos across the world serves as a harsh reminder of the immaturity of organisations creating corporate cloud agendas.
“This could have been quite easily mitigated,” Soltius Cloud Architect, Peter Joseph, told Reseller News. “There’s an end-user expectation with cloud that it is available 24 hours a day, seven days a week, 365 days a year.
“But they don’t fully understand what they are dealing with because they have the belief that cloud guarantees a business zero downtime. That is not true at all. Even IBM mainframes don’t guarantee downtime and the only way to ensure it is to deploy two IBM mainframes in two separate places.
“That’s common knowledge and the SC US East is a single instance of Amazon’s cloud storage service. It isn’t logical to expect to achieve zero downtime when you place all of your eggs in one basket.”
As reported by Reseller News, AWS’ Simple Storage Service (S3) in the provider's Northern Virginia region experienced an 11-hour system failure, caused by a typo.
Other Amazon services in the US-EAST-1 region that rely on S3, like Elastic Block Store, Lambda, and the new instance launch for the Elastic Compute Cloud infrastructure-as-a-service offering were all impacted by the outage.
AWS apologised for the incident in a postmortem, with the outage affecting the likes of Netflix, Reddit, Adobe, and Imgur.
“This points to a greater requirement for cloud consultants,” Joseph added. “Cloud is more complex than end-users realise and the technology as a whole can be its own worst enemy at times.
“Because cloud is very easy to set up through X and Y with a credit card and a cloud provider, there’s a misconception in the market. Businesses embark on cloud migrations without specific knowledge of what is required, the experience is lacking.
“It’s trial and error for the end-user as a result because they are not aware of any lessons to be learned or any common pitfalls to avoid.”
But in assessing the New Zealand market as an AWS Advanced Consulting Partner, Joseph acknowledged that it’s “very easy” to adopt cloud strategies in an instance through easy to configure and easy to deploy services and solutions.
“Fast forward a few years and a customer’s cloud practice could have evolved into something completely different,” Joseph cautioned. “And because of it the end-user hasn’t properly architected its cloud to the correct level required and that’s the fundamental problem in the industry today.”
With specialist experience lacking, Joseph stressed the stark difference between merely spinning up a few servers in the cloud, compared to architecting for high availability.
“It requires experience,” he added. “Given that in-house skills are hard to acquire for New Zealand businesses, it’s crucial for organisations to seek outside help to ensure their cloud environments are fit for purpose.
“Traditionally this was carried out for businesses but today, cloud allows people to become amateur engineers on the spot. The bar has been significantly lowered but if you’re a business that doesn’t have much demand, or care if they will be down for a few hours on occasions, then that might be all you ever require.
“But for those reliant on having zero downtime, outside expertise is advised.”
While acknowledging the fault of AWS in the outage, Joseph said the end-user community should also shoulder some of the blame for the disruption, insisting both were jointly responsible for suffering downtime.
“If my business couldn’t tolerate being down, then I would configure my cloud to accomodate this,” he explained. “There’s joint responsibility here in that businesses didn’t have the proper procedures and technology in place to cater for something that was well within the perimeters of what was expected.
“Sure the outage was longer than expected but in context, this particular service has been up with no particular drama for ten years so that’s why it’s such big news, because it’s been so reliable. If it were down everyday nobody would care and the press wouldn’t be interested.
“If businesses read their SLAs, they’ll see 99.9999 per cent uptime and this is the one occasion AWS didn’t deliver. But AWS hasn’t made any 100 per cent guarantees, that is the role of the end-user.”
Despite AWS admitting that a typo caused the outage, Joseph said the root cause of the problem should be considered irrelevant from an end-user perspective.
“It really shouldn’t matter because this type of instance is in the realms of what is expected and what should have been planned for by users of the service,” he added.
“It was an over simplification by the press because the SC3 service outage was one of 16 parts of the service that was down.
"The service was available elsewhere globally and this should act as a wake-up call for businesses when building cloud services.”