With the recent launch of the AWS UK region, and the government’s increasing uptake of the platform; Tuesday’s multi-hour disruption of AWS’s S3 storage service in the Northern Virginia Region should serve as a stark warning to UK public sector departments, that critical data cannot be stored with just one provider.
The outage resulted in a number of web services going down, and it was later revealed by AWS that it was human error – a debugging issue – within its S3 storage service in its Northern Virginia data centre that was the cause.
In response, the company said it is making some changes to make sure that a similar human error did not happen again.
Despite demonstrating that even the world’s largest cloud computing platform is vulnerable to periodic failure, bigger questions should be asked around resilience and business continuity planning – particularly for mission-critical applications, especially as UK public sector organisations flock to the US provider in their droves.
What happens when one site hosts too much of the internet?
AWS’ London data centres went live at the end of last year. The third AWS region to go live in Europe, AWS now has 16 available regions and 42 availability zones inside of those regions.
No one can dispute the growth of AWS, as Bob Tarzey, services director and analyst at Quocirca confirmed, 40 pence of every £1 spent on public cloud in the UK goes to AWS.
The AWS cloud is now used by UK public sector organisations such as HMRC, DVLA, Ministry of Justice, Peterborough Council and a number of other local authorities who are looking to modernise their IT.
Even Liam Maxwell, the UK Government’s national technology adviser, publicly came out in support of the importance of the AWS cloud in the UK. Last July he said: “The Amazon region in the UK at the end of this year is going to bring a massive change to government technology. For so long people haven’t been using on-demand public cloud services because they feel they want to have data resident in the UK.”
Maxwell estimated that using a UK public cloud could save the government in the order of £100m over five years. This is tremendous endorsement for public cloud uptake, but as we’ve seen with the AWS outage, this can have crippling consequences if these public sector departments become too reliant on one provider who is using one technology stack.
Planning for business continuity
According to Gartner research director Olive Huang, “you can have redundancy, but it costs money,” she said. “People go to the public cloud very ill-prepared.”
Huang said that although IT departments running in-house systems might have business continuity and disaster recovery plans designed around the degree of systems failure a business could tolerate, that was often lacking when companies bought cloud.
>See also: 3 steps to avoiding outage disasters
If you’re Slack, whose users were unable to share files during the AWS outage, it’s not an insurmountable problem. But for medical applications or NHS health records, loss of cloud connectivity could be a life or death issue.
No homogeneous technology stack will ever be 100% reliable. Therefore it makes sense to use different technologies where practical.
For example, Memset uses OpenStack for their IaaS, but HyperV and Xen for their internal administration systems so that we can still access customer systems in the unlikely event of a catastrophic OpenStack failure.
Organisations need to challenge and rethink the way they build systems in the cloud. As tightly linked complex systems become the norm, the software linking it altogether becomes a single point of failure, causing a domino effect that takes out other server subsystems one by one, as AWS experienced with their outage.
So what’s the option?
Some of the world’s largest companies are using OpenStack, it gives you a lot of the benefits of and similarity to AWS, but it lets you maintain control. Importantly, it is an open cloud platform that allows workloads to be hosted or migrated between multiple cloud providers.
HRMC is using OpenStack as the foundation for its MultiChannel Digital Tax Platform, and the US Department of Defense is looking into the technology for both financial and security advantages, proving it is a feasible alternative for public sector.
This outage will shake up the blind faith that organisations are putting into just one provider like AWS, Azure or VMWare. It just goes to prove that it doesn’t matter how many data centres they have or how much resilience they have in place, with tightly linked complex systems and a heavy reliance on the software layer to control the technology stack, people and businesses will continue to see glitches and outages continuing to bring the internet to its knees.
Sourced by Kate Craig-Wood, MD of Memset