How Toy Story 2 was nearly cancelled by a data backup disaster
Are you prepared for an 'oops' moment like the kind that hit Pixar in 1998?
Short of time?
I bet you didn’t know that one simple command line resulted in the full deletion of most of the production files for Pixar’s Toy Story 2 from a studio server back in 1998. The studio ensured it created daily backups of production files, however they didn’t realise until they made an attempt to restore the lost files, that the backup solution hadn’t worked.
This event happened when backup solutions were immensely complex and difficult, if not impossible, to 'test.' According to the story, the studio had seemingly been prepared for this unimaginable situation; however, they ultimately had to rely on blind luck to recover lost files. An employee just happened to have a copy of the movie that she had taken home the week before, and that became the de facto backup file.
Today, cloud-based disaster recovery solutions are quickly gaining enterprise-wide adoption as organisations seek to reduce hardware costs and improve flexibility in responding to unplanned downtime events.
Disaster Recovery as a Service (DRassS) not only allows organisations to quickly and easily recover data, but more importantly, enables them to resume operations seamlessly during a disaster. Advances in cloud-based DR solutions allow IT administrators to determine the level of protection at the server level. Mission critical servers can be set to recover instantly while other servers with less critical data might be set to recover at a longer Recovery Time Objective (RTO).
Despite the benefits of cloud-based DR over traditional solutions, a DR program can only be successful if it is consistently tested. Regular scheduled testing must include communications, data recovery, and application recovery. DR testing in these areas is required to conduct planned maintenance and train staff in disaster recovery procedures.
Traditionally, DR tests have been complex, disruptive and consequently unpopular. Too often, testing focuses on backing up instead of recovery. While this approach ensures you have a copy, it does little to make the data, server, or application easy to reinstate. To further complicate efforts, many of the systems used in the testing are needed to run day-to-day operations. To have those systems down during testing is unacceptable.
A hybrid-cloud approach to DR has changed the testing landscape for the better, combining public cloud and SaaS automation software to make continuity planning easier. Companies gain data backup, fail-over of servers and the ability to have a secondary data centre at a different site to allow for regional disaster recovery.
Here are four suggestions to make your DRaaS testing more efficient and productive.
Plan ahead and plan often
The problem with disasters is they aren’t planned and are unexpected. If you’re not testing your DR frequently, you might find yourself hung out to dry when lightning strikes. DR tests can be done frequently because DRaaS doesn’t have the physical infrastructure and configuration synchronisation associated with traditional disaster recovery.
With an automated DRaaS solution, you don’t need to schedule IT personnel to manually check system configurations. Recent innovations make it easy to create an on-demand recovery node that you can test quickly. Unlike a typical backup-only cloud storage solution, hybrid DRaaS solutions can maintain up-to-date, ready-to-run virtual machine clones of your critical systems that can run on an appliance or in the cloud.
Test your DRaaS in a sandbox
With DRaaS solutions, standby-computing capacity is available to recover applications in the event of a disaster. This can be easily tested without impacting your production servers or unsettling the daily business routine. A sandbox copy is created in the cloud, which is only accessible by the system administrator. These copies are created on demand, paid for while being used and deleted once the test is complete.
The approach makes testing simple, cost effective and does not disrupt business operations. You can test DR and applications every day without missing a beat, assuming you have the right DRaaS provider.
Test cases can be performed against the recovery nodes in as little as 15 minutes, depending on the application, often with no incremental costs. Applications and services are immediately available for other uses, enabling businesses to effectively adopt cloud infrastructure or speed tie to production for new applications or initiatives.
Take advantage of a sliding scale
There are financial benefits to cloud-based testing. Service providers regularly offer sliding scales for DR testing. Putting your DR solution in the cloud also means there isn’t a redundant in-house infrastructure that is sitting unused most of the time.
The cloud gives small- to medium-sized businesses the same capabilities of larger organisations. With a level playing field, SMBs have greater access to DR solutions and the ability to test frequently.
Entice regular employee participation
In traditional DR settings, employees may consider testing to be time consuming and distracting from their already busy schedule. However, according to a survey by market research company, Enterprise Strategy Group, respondents using cloud-based DR services were four times more likely to perform weekly DR tests than those-hosting their BC/DR solution.
People learn by reputation, so just like fire drills, we have to create and practice DR drills, which are critical to a DR Plan. Companies that fail to conduct regular drills shouldn’t be shocked when its employees panic during a disaster.
As you consider these steps, you might find yourself among other skeptics who think drills are unnecessary and that the chances of disaster striking are still relatively slim. But according to a May 2014 study by the Aberdeen Group, the annual average number of unplanned downtime events in the US is 1.7 per SMB, with the average downtime per event 6.7 hours. The average cost from downtime is estimated at $$8,600 per hour, or about $100,000 year.
Unplanned downtime events, whether caused by a natural disaster, human error, or hardware failure can have immediate and long-term negative impact. Take steps to ensure your business can quickly and easily recover its IT infrastructure and data, and minimise the impact by being prepared. Rather than just relying on luck.
Sourced from Kemal Balioglu, Vice President of Product, Quorum