3 steps to avoiding outage disasters

Outages have been causing problems for decades and today everyone is well aware of the pain they can cause.

In the last few years Google, Apple and Salesforce have all had well-publicised outages that have left unhappy customers stranded.

Many businesses have been proactive in trying to pre-empt outages from happening – for example, most have improved their capacity planning capabilities and many have moved to the cloud to cope with scalability issues. Yet despite these steps, outages continue to happen and the harm can be catastrophic.

First there is the straight-forward loss of revenue.

For retailers, every second your e-commerce store is offline and customers aren’t able to complete their purchases is money down the drain.

Second, there can be long-lasting reputational damage.

After the Salesforce outage in May, commentators openly questioned whether customers would be able to trust them again.

>See also: Playing the blame game in IT power outages

In ultra-competitive markets, customers have dozens of options. If your site gets a reputation for being unreliable, customers are guaranteed to take their business elsewhere, with research showing that 59% of businesses have lost customers as the direct result of an outage.

A serious business systems outage doesn’t just affect customers, it can also have a huge operational impact. An outage can lead to missed deadlines, loss of work and unacceptable delays in communication.

These all dramatically slow down the productivity of employees. Outages also often mean that staff need to work overtime, incurring further expense. Therefore, it’s no surprise that Ponemon’s research estimated the cost of the average outage to be around $15 million for a global 5000 company.

So given these costs and damages to businesses, why are outages continuing to cause havoc? One reason is that businesses have failed to tackle a key cause of outages: certificate expiry.

Outages set to rise?

Every piece of technology, from mobile apps to company servers, typically requires a valid encryption key or certificate to function.

They act as ‘passports’ for both machines and software, enabling trusted communication and transactions online. They underpin all e-commerce, and more broadly, any digital transaction.

The boom in the number of applications, servers and connected devices means that the number of certificates in use is skyrocketing. No one could imagine launching a new application which could not be uniquely authenticated from that of a hacker or one where communications were not encrypted.

As a result, organisations are struggling to keep track of and protect these keys and certificates – arguably now their most valuable cyber-assets. Unfortunately, certificates expire, meaning that unless they are replaced, apps and devices will cease to work since they can no longer be trusted to communicate.

As well, the process of requesting, issuing, renewing and installing is extremely complicated. Even Microsoft Azure has failed in the past to successfully configure certificates correctly.

>See also: 4 lessons learned from the Delta’s power outage

IT leaders have reported that, on average, every enterprise suffers 2 major business outages a year as a result of certificate expiry.

It’s no wonder: a survey of IT security professional found once they started a concerted effort to discover certificates, they found on average 16,500 previously unknown certificates in each organisation.

As ever-more software, clouds and devices are launched, a surge in the number of certificates in use could result in a spike in the number of outages – unless businesses are able to take control of the issue.

Beyond the loss of business continuity, outages from certificate expirations reveal an even greater threat: a business with certificate expirations has no handle on what is trusted or not, friend or foe. That is a breach waiting to happen.

Gaining Control

In many organisations, ownership of certificates is not centrally managed, so certificates get forgotten, meaning certificate expiry is able to cause outages.

With so many certificates in use, it is difficult to track expiry dates, particularly when some have shelf lives of a decade or more. But in many cases, companies have never gained proper visibility to begin with.

To illustrate, let’s imagine Company X hires a contractor to design a web app. That web app requires a certificate to run and so the contractor quickly grabs one from GoDaddy but forgets to mention it to anyone else at the company. Once the app goes live, the contractor leaves, taking with him all knowledge about the certificate that is now a ticking time-bomb.

In order to prevent these types of scenarios, businesses need to be able to discover, track, and continuously monitor all their keys and certificates through comprehensive automation, keeping them safe and active. There are 3 key steps in this process.

Know your infrastructure

It sounds straight-forward but in order to prevent technology outages the first step is to map everything out.

The average enterprise has over 23,000 certificates and so every device and every piece of software, from POS tills to apps, needs to be accounted for and the certificate status clarified.

>See also: IT disaster recovery: flooding lessons learned

As the number rises, this is increasingly becoming a challenge for enterprises. Implementing complete network, cloud, and application discovery, which maintains a continuously updated and complete knowledge of certificates as they are constantly change or introduced, is critical.

Take ownership

Creating a complete understanding of every key and certificate means responsibility and ownership can be easily established.

Then, until full automation is introduced, people are able to respond to notifications and change soon-to-expire certificates.

Without this step, there is little long term hope of eliminating certificate caused outages.

However, in a huge number of enterprises, certificates are tracked manually, or on spreadsheets known only to a few employees, significantly increasing the chances of human-error related outage.

Automate

After determining where all your keys and certificates are, establishing ownership, and enforcing a policy for issuance and renewal, the final step is to completely automate the process of issuance, renewal, and revocation.

This means keys and certificates are generated automatically, issued and renewed by trusted CAs, and safely installed and validated.

Errors and failures that plagued the likes of even Microsoft are eliminated.

Automation reduces the possibility of human error, while also significantly lowering costs. Two people using spreadsheets might be able to track a few hundred certificates, but with a possibility for error and failures, whereas when using an automated system, two people can orchestrate protection for over 100,000 keys and certificates far more securely.

Putting automation in place can save businesses in the real world: just recently the CA GlobalSign created an error that caused almost all of their issued certificates to be considered untrusted by browsers.

>See also: The cloud is great, but what happens when it goes down

With an automated system, an organisation could simply apply the fix suggest by GlobalSign or replace with other certificates. Without automation, an organisation could enter chaos for days while struggling to understand how all their keys and certificates work. Meanwhile, customers remain unable to access online services.

Human error is inevitable and some outages are just unforeseeable events. However, the cost of an outage is so high, enterprises need to make sure they are doing everything to reduce the chance of a disastrous outage hitting unexpectedly.

Until the use of keys and certificates is automated, each day is a gamble that one of thousands isn’t about to expire without warning.

 

Sourced by Kevin Bocek, chief security strategist at Venafi

Avatar photo

Nick Ismail

Nick Ismail is the editor for Information Age. He has a particular interest in smart technologies, AI and cyber security.