Just a few years ago, the two biggest hurdles to cloud computing adoption were cloud security and reliability — cloud outages. Over time, we’ve learned the cloud can be as secure (if not more so) than on-premises IT. And while it took a few years to build up a strong track record, we now know it’s usually more reliable too.
But that isn’t to say cloud is infallible. There are still plenty of significant outages. The trend in 2019 is interesting: platform-wide outages highlight the risk of relying on a single cloud, compromising cloud security.
Cloud security cloud risk and cloud outages and the oligopoly market
The cloud market is dominated by a small number of key players. AWS leads the way, holding 32.3% of market share in Q4 2018, with Azure second on 16.5% and Google Cloud in third place with 9.5%. A mix of others make up the rest of the market, including a couple of big names like Alibaba and IBM.
The downside of a small number of dominant cloud providers is many of those SaaS-based cloud services might be hosted on the same platform. On the face of it, you are spreading your risk, but you’re also potentially putting all of your eggs in one basket.
The shape of the market brings a conflicting mix of risk diversification and risk centralisation.
You spread your risk because you’re not just dependent on a single server room or data centre in your office. Your IT is now somewhere else – away from your premises, reducing the risk associated with that location.
Cloud security: The latest thinking, a guide to implementing cloud securely
Cloud offers multiple benefits, including the ability to scale up and down quickly to meet demand. But some firms – especially in highly-regulated industries such as financial services – have been slow to adopt the technology due to concerns over cloud security. Kate O’Flaherty takes up the tale.
Secondly, most businesses don’t just use one cloud service. Even businesses who claim to be going ‘all-in’ on a single cloud for the benefits of a single platform and volume discounts will probably also use other SaaS services like CRM and payroll. That should mean your risk is reduced even further, because there is less chance of all systems failing at once.
But that isn’t always the case.
The downside of a small number of dominant cloud providers is many of those SaaS-based cloud services might be hosted on the same platform. On the face of it, you are spreading your risk, but you’re also potentially putting all of your eggs in one basket. An AWS outage in February 2017 affected services like Spotify, Dropbox and Trello. A 2018 AWS outage affected Atlassian, Twilio and Slack. So cloud security is not as safe as one might expect.
Cloud security – who should take ownership?
The hyper-scale cloud providers have built their infrastructure to let customers make their systems and applications resilient. AWS, Azure and Google offer independent, isolated availability zones. If you build your infrastructure across at least two zones, you can reduce or eliminate single points of hardware failure.
This isolation should make it impossible for issues to affect more than one zone or region. Unfortunately, this doesn’t always happen. Separate data centres protect against geographic risks like power outages or extreme weather, but not against platform-wide issues.
There is no cloud – it’s just someone else’s computer
This is exactly what has just happened for Google in the US on the 2nd June 2019. “A configuration change” intended for a “small number of servers in a single region” was applied to a “larger number of servers across several neighbouring regions.” The result was “it caused those regions to stop using more than half of their available network capacity”. The impact was not just on Google’s own services like search, Gmail and YouTube, but also affected customers using Google’s cloud.
Earlier this year, Google Cloud Platform had another case of this kind of issue. A code change led to issues with Google Cloud Console and Cloud Dataflow, which then caused errors on Google Cloud Storage globally. Again this year, Azure suffered a global outage as a result of a mistake in a DNS migration. That small issue extended to hit compute, storage AD identity services and SQL Database.
There haven’t been many data centre issues for cloud providers in 2019. But what we have seen are several platform-wide issues, usually due to human error. “There is no cloud – it’s just someone else’s computer” – and that “someone” is just as likely to error as we are.
At Databarracks, we’ve been running an annual IT survey for over 10 years and the top causes of data loss are consistently hardware failure and human error. The cloud lets us build our systems to deal with hardware failure but it’s not possible to eliminate human error.
Cloud security: Recommendations
Manage your supplier risk to limit cloud risk
Firstly, get a handle on where your cloud services are hosted.
For IaaS, that’s easy – you’ll already know which regions and zones you’re using. But don’t stop there. Supplier Risk Management means not just looking at the first tier of suppliers, but also looking deeper to see who your suppliers’ suppliers are.
Knowing that some services are hosted on the same platform doesn’t necessarily mean you need to substitute them – you may just accept the risk. The alternative SaaS (e.g. payroll software) might be so inferior that it’s not worth the risk-reduction to change. At least by investigating, you know what will be affected by an incident involving the cloud provider.
For other areas, it will mean making a change. An Emergency or Mass Notification tool can’t use the same cloud as your production systems because it’s what you rely on to communicate during a disruption. Here, you need to dig into the second and third tier of suppliers to see which clouds are used. Emergency Notification tools might be hosted on one cloud, but could also use a third party SMS tool like Twilio to deliver its service.
Use multiple cloud providers and reduce cloud outgages
The final recommendation is to use more than one cloud provider. Even if you have built resilience into your systems on one cloud, making use of multiple availability zones and even regions should mean that you always have at least a backup copy of data outside that cloud.
Or, you might want to build your resilience across multiple cloud providers. You need to make sure you’re not tied into platform-specific, proprietary tools and always build your applications to be portable. Containers and an Infrastructure-as-Code approach mean you can build and destroy environments quickly and repeatably. This approach will be more work, but the benefits extend beyond resilience. You’re also set to take advantage of pricing or performance difference between cloud providers and move between services more freely.
Peter Groucutt, is the managing director of Databarracks