Information Age: News, analysis & insight for IT & business leaders

Black swans in the cloud

23 May 2011  

An outage at Amazon Web Services and a catastrophic data breach at Sony show that when Internet-scale systems fail, they do so spectacularly

The increasing use of the Internet as a platform and delivery mechanism for computing services appears to be an inexorable march. But two recent incidents gave pause for thought, and revealed that when Internet-scale systems fail it can be catastrophic.

One of the central tenets of cloud computing is that a distributed system is less vulnerable to failure than one that relies on a single piece of hardware. And yet in April 2011 many Amazon Web Services (AWS) customers, including some high-profile websites, lost access to their systems following an outage at one of the company’s North American data centres.

The issue took three days to resolve, and at the end of it some customers were told that their data was lost forever.

This is precisely the kind of outage that was not supposed to happen in the cloud – Amazon’s highly distributed architecture and multiple data centres were meant to provide sufficient redundancy as to reduce the risk to zero. It can therefore arguably be described as a ‘black swan’, an occurrence that is rare, but all the more disruptive when it does happen as a result.

The same might also be said of the multiple data breaches that struck Sony’s online gaming services, which saw hackers get their hands on the private data of a staggering 100 million customers.

Cyber attacks are by no means a rarity, but for an electronics company with Sony’s reputation to be compromised so effectively was shocking. And the implications are huge, not least for its customers.

Whether or not these events discourage adoption of business cloud services like AWS or consumer cloud services such as Sony’s PlayStation Network, they are a reminder that handing data to a third party always involves a certain degree of risk, both for organisations and individuals. And they reveal that a provider’s track record is not an infallible measure of what that risk might be.

Alan Calder, CEO of security advisory IT Governance, says the Amazon outage and Sony breach prove that businesses and individuals need to take responsibility for their own data

These two events are proof that there are black swans in technology, just like there are black swans virtually
everywhere else.

Both cases prove that whenever you hand your data to someone, you need to ask, “How safe is this?” And the fact that something hasn’t gone wrong in the past is no guarantee that it won’t happen in the future. Time and time
again, people assume that they can take risks without assessing the potential outcome.

The other thing that the two events have in common is the inadequate incident response processes on the part of Sony and Amazon. In both cases, the response was muted and, to one extent or another, fed through lawyers. 


Simon Wardley, researcher at CSC’s Executive Leadership Forum, says that cloud computing need not be susceptible to black swans

To take full advantage of the cloud, you need to design for failure at every level – not just at the virtual machine level.
The solution to the risk of provider failure is a competitive marketplace of providers offering functionally equivalent services, with easy switching and semantic interoperability between them. In practice, however, these markets require a common, open source reference model and the first major attempt to achieve this, OpenStack, has only recently begun.

A combination of a marketplace of utility service providers, good enough components and designing systems for failure will create levels of resilience at a given price point that are unobtainable today. This will reduce the likelihood of such black swans far further.


Comments  [1]

Sarah Marsh
Wednesday 1st June 2011

The Amazon Web Services outage and the compromise of Sony’s Playstation Network are far from being ‘black swan events’. A black swan is beyond comprehension, or to quote Nassim Nicholas Taleb, “an outlier…nothing in the past can convincingly point to its possibility… [making] us concoct explanations for its occurrence after the fact to make it explainable and predictable”. There have been numerous large-scale infrastructure failures. It was only a matter of time before a public cloud provider like Amazon succumbed and the Sony incident is just another example in a long line of data breaches. The causes behind both incidents will undoubtedly reoccur and the danger, in a cloud context, is that any failure could have a cascading effect, with one provider impacting several others (the failure of a PaaS provider could take down multiple LaaS and SaaS providers). At least AWS was isolated in that respect. The cloud isn’t infallible and the risk is still there. So perhaps we need to think ‘common house sparrow’ rather than ‘black swan’ and adapt our expectations and security demands accordingly.
(Posted on behalf of Richard Walters, CTO, Invictis Information Security).

Report this comment »

People who read this also read...

 

White Papers

Read article

11 Hiring Trends for 2011

In this document, you'll get the insider info you need to give potential employers what they want and beat your competition in 2011. You'll learn about the most valuable certifications and the game-changing skills that can lead to more job security and stability.

Read article

12 Hiring Manager Secrets to Getting the IT Job You Want

Learn how you can make yourself a more attractive candidate now with PrepLogic's free 12 Hiring Manager Secrets to Getting the Job You Want.

Read article

1Z0-040 Oracle Database 10G New Features for Administrators Practice Exam

Oracle 9i administrators can certify on Oracle 10G by passing this exam. The ExamForce 1Z0-040 Oracle Database 10G New Features for Administrators practice exam provides their unique triple testing mode to instantly set a baseline of your knowledge and focus your study where you need it most.

More
Advertisement
div class="banner">