One certainty of IT disasters is it will happen to you at some point. Fire, flood, hurricane, corrupted data: whatever the cause, at some point a data centre will experience a catastrophic event that puts revenues and customer satisfaction at risk.
The largest, most prestigious companies claim they’re too big and clever for it to happen to them, but it does. Vodafone’s data centre in Istanbul suffered a flash flood, a transformer exploded inside the Canadian ISP Shaw’s data centre, and Go Daddy had a six-hour outage owing to bad data. Noone is immune.
The other certainty of IT disasters is you’d better be prepared. Yet most organisations rely on manual, time-intensive and unreliable processes for disaster recovery (DR) planning.
Given that today’s organisations are dealing with heterogeneity and complex platform and application interdependencies, manual DR planning processes mean they spend as much time hunting down the DR handbook as they do fixing the problem.
The other issue is that a DR strategy is often too heavily focused on the data components of business applications, like data replication and backups.
Against this backdrop, what should businesses do? Here are the five imperatives for DR planning.
1. Failover quickly
There’s no such thing as an ideal disaster, but if there was it would be this: the disaster occurs and your systems failover to a second DR site in the blink of an eye. It all takes place transparently to your customers and without interruption to your service. Just your IT data centre managers spot that the service previously running in data centre X now runs in the failover centre Y on the other side of town.
2. Maintain visibility and control
When disaster strikes, you need a single, complete view of the DR process: visibility into the end-to-end sequence of events that occurs between your primary data centre failing and the second site taking over.
If you rely on manual processes, that visibility and control just isn’t there. Your knowledge is buried in a mass of manuals, dependencies written down somewhere on spreadsheets and complexity you just can’t cope with. Without visibility and control, it is extremely unlikely you will be able to quickly recover your IT service.
3. Understand how long the recovery will take
Your website that provides the primary source of revenue is down. The financial trading platform is dormant. Your customer service agents can’t access the system, and service enquiries are building up. In any of these scenarios, your users will ask one fundamental question: “When will the service come back on?”
Assuming you rely on manual DR processes, you won’t have an answer. Your time and resources are dominated by pinpointing and fixing the disaster. And even when you have discovered the source, it’s still easier to predict next week’s lottery numbers than it is to communicate the precise time the service will be back up and running.
4. Understand the root cause of the problem
It happens more frequently than you might imagine. Your IT service goes down. You fix what you believe to be the problem, only to find the service collapses later the same day. It transpires that the flood in your data centre not only hit the network server, it also corrupted the web server two racks down.
5. Prove your DR plan works
For compliance, it is about maintaining confidentiality, integrity, availability and accountability. Having a systematic and comprehensive set of procedures and processes that use those procedures ensures that you have control and accountability.
The thoroughness of the procedures ensures that if you smell smoke, you can make the decision to move to a secondary site and maintain the same levels of confidentiality and integrity while maintaining availability using temporary facilities.
Sourced from Vladi Shlesman, Automic Software