Black swans in the cloud
- Reduce text size Decrease text size
- Increase text size Increase text size
- Print article Print
- Jump to comments Comment
- Share this article Share
- Email article to a friend Email

Is cloud computing prone to rarer but more disastrous failures?
An outage at Amazon Web Services and a catastrophic data breach at Sony show that when Internet-scale systems fail, they do so spectacularly
The increasing use of the Internet as a platform and delivery mechanism for computing services appears to be an inexorable march. But two recent incidents gave pause for thought, and revealed that when Internet-scale systems fail it can be catastrophic.
One of the central tenets of cloud computing is that a distributed system is less vulnerable to failure than one that relies on a single piece of hardware. And yet in April 2011 many Amazon Web Services (AWS) customers, including some high-profile websites, lost access to their systems following an outage at one of the company’s North American data centres.
The issue took three days to resolve, and at the end of it some customers were told that their data was lost forever.
This is precisely the kind of outage that was not supposed to happen in the cloud – Amazon’s highly distributed architecture and multiple data centres were meant to provide sufficient redundancy as to reduce the risk to zero. It can therefore arguably be described as a ‘black swan’, an occurrence that is rare, but all the more disruptive when it does happen as a result.
The same might also be said of the multiple data breaches that struck Sony’s online gaming services, which saw hackers get their hands on the private data of a staggering 100 million customers.
Cyber attacks are by no means a rarity, but for an electronics company with Sony’s reputation to be compromised so effectively was shocking. And the implications are huge, not least for its customers.
Whether or not these events discourage adoption of business cloud services like AWS or consumer cloud services such as Sony’s PlayStation Network, they are a reminder that handing data to a third party always involves a certain degree of risk, both for organisations and individuals. And they reveal that a provider’s track record is not an infallible measure of what that risk might be.
Alan Calder, CEO of security advisory IT Governance, says the Amazon outage and Sony breach prove that businesses and individuals need to take responsibility for their own data
These two events are proof that there are black swans in technology, just like there are black swans virtually
everywhere else.
Both cases prove that whenever you hand your data to someone, you need to ask, “How safe is this?” And the fact that something hasn’t gone wrong in the past is no guarantee that it won’t happen in the future. Time and time
again, people assume that they can take risks without assessing the potential outcome.
The other thing that the two events have in common is the inadequate incident response processes on the part of Sony and Amazon. In both cases, the response was muted and, to one extent or another, fed through lawyers.
Simon Wardley, researcher at CSC’s Executive Leadership Forum, says that cloud computing need not be susceptible to black swans
To take full advantage of the cloud, you need to design for failure at every level – not just at the virtual machine level.
The solution to the risk of provider failure is a competitive marketplace of providers offering functionally equivalent services, with easy switching and semantic interoperability between them. In practice, however, these markets require a common, open source reference model and the first major attempt to achieve this, OpenStack, has only recently begun.
A combination of a marketplace of utility service providers, good enough components and designing systems for failure will create levels of resilience at a given price point that are unobtainable today. This will reduce the likelihood of such black swans far further.






The Amazon Web Services outage and the compromise of Sony’s Playstation Network are far from being ‘black swan events’. A black swan is beyond comprehension, or to quote Nassim Nicholas Taleb, “an outlier…nothing in the past can convincingly point to its possibility… [making] us concoct explanations for its occurrence after the fact to make it explainable and predictable”. There have been numerous large-scale infrastructure failures. It was only a matter of time before a public cloud provider like Amazon succumbed and the Sony incident is just another example in a long line of data breaches. The causes behind both incidents will undoubtedly reoccur and the danger, in a cloud context, is that any failure could have a cascading effect, with one provider impacting several others (the failure of a PaaS provider could take down multiple LaaS and SaaS providers). At least AWS was isolated in that respect. The cloud isn’t infallible and the risk is still there. So perhaps we need to think ‘common house sparrow’ rather than ‘black swan’ and adapt our expectations and security demands accordingly.
Report this comment »(Posted on behalf of Richard Walters, CTO, Invictis Information Security).