Information technology provides enormous value for organisations, but it can also represent a tremendous point of weakness. When markets are global, employees work around the clock and business is effectively always on, and any interruption to application availability can quickly lead to lost revenue, productivity and brand value, and regulatory problems.
Taken to the extreme, extended downtime can even threaten the survival of a business. So how should organisations deal with this type of existential threat? As it stands, the painful reality is that most organisations do not deal with it well.
Many backup and recovery techniques were developed for relatively unsophisticated computing processes, back when we had regularly scheduled periods of time when no one would be using the system.
However, the way we do business today has changed dramatically from 20, 15 or even just 10 years ago, and most companies cannot afford not to be able to access data as and when required.
The cost of downtime in the current information-centric age is too high for the vast majority or businesses. The always-on applications that we now rely on need guaranteed continuous system availability and the elimination of the risk of data loss.
The results is that there is an increasing interest in high-availability (HA) solutions which are designed to be continuously operational and have sufficient redundancy in components.
Business continuity (BC) — the planning, preparation, and implementation of more resilient business systems in anticipation of unscheduled downtime — is often thought of as an IT problem, and most organisations leave it to the IT department to provide a fix.
This invariably leads to the deployment of a wide range of tactical solutions, with no overriding strategy providing guidance. In reality, as the term implies, BC is a business problem, and it requires a business approach and the involvement of users beyond the system administrators.
Today, low-cost, high-bandwidth networks are ubiquitous, to the point of being a business necessity. In addition, a wide variety of service providers make it simple to spin-up virtual servers on demand at very low cost.
These infrastructure advances now mean that high availability (HA) technology is available to more organisations at a much more modest price tag, helping them remain competitive at a time when even just a few minutes’ downtime can cost up to millions of pounds.
A sound business continuity strategy needs to be the result of the input of several teams but it doesn’t have to be a logistical and technical nightmare. Let’s have a look at the top 10 tips for disaster recovery (DR) and business continuity planning.
1. It’s about the business, not the technology
Before trying to work out how to implement HA and disaster recovery, you should start by talking to your company’s senior business management to understand their priorities. For some, it will be staff costs, for others productivity gains, for others billing cycles, and so on.
Then, identify which elements of the IT environment support these priorities (e.g. email. online order entry systems, SharePoint, databases, etc.). The point is you won’t know what systems are the most important unless you ask business users.
2. It’s a catastrophe, or maybe not
When you think about DR, you probably picture hurricanes, floods, terrorist attacks and the like – not a software upgrade gone wrong or a hardware error on a critical piece of networking equipment.
Planning for the worst-case scenario and being tripped up by trivial day-to-day errors is very common. Your HA planning has to take into account all eventualities, from the ordinary to the cataclysmic. If your company is about to be audited, system availability (or lack of) will not be an excuse.
3. How can you assign budget without knowing the cost of downtime?
Too often, organisations assign a dollar value for disaster recovery planning before determining the financial risk of downtime and data loss; but that is a back-to-front approach.
Assess the likely cost of downtime first and take it from there. Don’t forget to include regulatory compliance in your calculations. There are often financial penalties for unmet legal obligations.
4. It’s about measuring risk
Exactly what events classify as a disaster can change from organisation to organisation, and even from department to department. When thinking about HA and disaster recovery, it is essential to ask: what are we trying to protect ourselves from? Don’t overlook the commonplace. Small losses from common problems mount up quickly.
5. Do you have a plan?
As crazy as it sounds, a surprising number of organisations don’t have a disaster recovery plan. It is essential that you develop a formal document detailing all applications, hardware, facilities, service providers, personnel and priorities.
The plan must represent all functional areas and offer clear guidance on what happens before, during and after a disaster. And it’s imperative that everyone up to the top management is aware of the extent and limitations of this plan.
After all, if the authorities were to issue a fine or if customers couldn’t be serviced the repercussions would involve more than just the IT department.
6. We’ve got a plan, but we didn’t test it
A DR plan is only helpful if it works. The only way to ensure it does is to test it under simulated disaster conditions – this is essential but it can also be challenging. Look for data protection solutions that help you create environments for non-disruptive testing of your disaster recovery plan.
7. Who is responsible, and for what?
A real-life disaster event will be chaotic and confusing. If key staff does not understand its DR responsibilities, the recovery process will be long and fraught with problems. Your DR plan must clearly state the roles and responsibilities of everyone involved, including what to do if key personnel are not available.
8. Recovery point what? Recovery time who?
Two metrics are used to record an application’s tolerance of downtime and data loss: recovery point objective (RPO) and recovery time objective (RTO). RPO is a measure of data loss. The larger the RPO, the more data loss each application can tolerate before it becomes a problem for the business.
RTO is a measure of recovery time. The smaller the RTO, the faster you have to work to get the application back online before the organisation starts to suffer significant losses. It’s crucial that the IT department involves the business users in setting both.
9. Recovery will take longer than you think
Understanding how long it will take to recover key business systems is essential. Can you restore data and rebuild application systems fast enough to satisfy business users? Do you have the bandwidth to recover data from a cloud service provider?
Understanding how long it takes to recover applications, and the effect of downtime on the business, may prompt you to make different technology choices.
10. Going home
The ability to failback to production systems is every bit as important as the ability to failover. Unless carefully planned, a backup data centre is unlikely to have the same capacity or performance as the production site.
Without a failback plan, you may perform a successful initial failover and then see losses mount as your business limps along for weeks operating from an inadequately provisioned backup site.
It’s no secret what successful high availability looks like: no application downtime and no application data loss. The maturing of HA products has brought the price within reach of both enterprises and SMBs. This, combined with lowered infrastructure costs – broadband, server virtualisation, multiple service providers – and dramatically improved usability, is making HA affordable for organisations of all sizes.
To implement HA successfully, talk to business owners to understand their priorities. It’s key that you understand what systems are the most important. Understanding the needs of the business will let you set priorities that dictate the disaster recovery technology choices that best fit your business continuity needs.
Sourced from Christophe Bertrand, Arcserve