Protecting the data

Modern business continuity thinking rightly stresses that organisations need to protect their business processes, not just their data. But even though every second counts in the online world, processes are mostly transient, and can usually be picked up again after a continuity problem. When data is lost, it can be expensive, embarrassing and even devastating.

Ask the CEO of Co-Operators Life Insurance in Canada. In January 2003, the company’s IT partner, IBM, informed it that it had mislaid the financial details of as many as 180,000 of its clients. Eventually IBM found the data – but only after the CEO of Co-Operators Life, fearful of large losses, gave a humiliating press conference to apologise.

Until the mid-1990s, the mainstay of data back up was tape. Even today, many smaller companies use tape as a primary back-up medium – because it is cheap and reliable. If there is a disk failure, however, IT staff have to search for the correct back-up tape, then spool through the tape to locate

 
 

Glossary

Imaging: Making a copy of a disk and its contents. Imaging is a batch process rather than a real-time process and involves copying the whole contents of the drive. Modern systems can restore from an image in just a few minutes.

Mirroring: Copying the contents of one or more disks to other drives to ensure they have the same files and structure. Mirroring can be one-way or two-way, so that if one server fails and a second takes over, the first can be updated with the contents of the second when it is restored.

Replication: The process of updating a second disk so that it is identical to the first disk. The level at which this is done is dependent on the system : some will copy changed files while others, particularly those aimed at database or Microsoft Exchange servers, will copy changes to parts of files as well.

Shadowing: Data is recorded on multiple devices and remains accessible when one device is unavailable. Read and write operations to the volume continue transparently on the remaining device or devices. This means a copy of the data is always available without interruption. This continuous availability prevents storage sub-system components from becoming a single point of failure, which could interrupt system or application operations. Access to shadowed devices is achieved via a virtual disk mechanism.

Snapshot: A read-only view of the file system. Snapshots take a ‘picture’ of what the system looks like at a particular instant, reflecting changes as they are made. The use of snapshots can help to minimise costs, because they take up very little disk space and can be sent to remote sites for disaster recovery, reducing the need for redundant storage. Snapshotting also enables an organisation to have versioning on its back ups, so that if damage has been caused by a virus or disk corruption, it can recover data from before the infection or corruption.

Virtualisation: Creating the illusion that two or more devices are only one device. Virtualisation simplifies management of storage devices and makes it easier for client devices to start using a different server after the main server becomes inaccessible.

 

 

the wanted data. This takes time. And while modern tape drives may be able to back up and restore as much as two terabytes (2,000GB) of data in less than an hour, according to Graham Hunt, product marketing manager, EMEA, at storage specialist Quantum, this is far slower than disk-to-disk copying.

Tape is prone to other problems, too. “If one file is missing, it affects the whole recovery,” says Tony Reid, director of enterprise systems for HDS EMEA.

For these reasons tape now takes second place to disk-based back-up systems, and is reserved for archiving and offline restores only. Most organisations choose to use direct copying from one disk to another local drive as their primary business continuity back up, providing the maximum possible speed for both back up and restore. “It’s very fast” says Tarek Maliti, technical director of hosting company TDM. “You can choose which files you want, point to the version and it’s restored in seconds.”

A disk-based back up can also be organised at several levels, says Laurence James, disk business manager at Storagetek: at an application or operating system level, with either writing to both drives simultaneously; at an intermediate level, where a software or hardware agent monitors a server for disk writes and replicates them on the back-up drive; or at disk controller level, where any instructions to write to disk are duplicated by the hardware to replicate the write on the second disk. Batch back-ups, which tend to be out of date quickly, are effectively replaced.

The software-based approach is used by a number of software companies, including NeverFail, a Microsoft specialist. NeverFail sells software agents that monitor both SQL Server and Exchange Server and replicate their contents on a second server. The higher level of the software agent means it may not be as fast as the pure hardware approach, but it can integrate into systems management tools. It can also access hardware information from the operating system and can use this to anticipate a hardware failure and initiate a switchover to a second server. It can also get a more complete view of the disk’s situation.

“There is a perception that a tape back-up or traditional disaster recovery will help in the event of disaster,” says Steve Stobo, European sales and marketing director at NeverFail. “But when Exchange Server goes down, it usually goes down quite messily – usually for a day or so.” Stobo says NeverFail for Exchange Server can spot problems and switch the servers before the data is corrupted.

Remote management

These strategies are all effective for dealing with the common problem of a local drive failure. But what if the business interruption stems from something more than a single sub-system problem – such as a flood or fire?

Only a remote back-up system can deal with this. In the event of a breakdown of the primary centre, the second centre can pick up the slack with a complete or near-complete copy of the original data.

“It’s not cheap, because you need to double everything,” says Miles Cunningham of SunGard Availability Services. “But you can get a gigabit circuit halfway across the UK for as little as GBP 50-60K a year.” An alternative, at the lower end, is to use a back-up service provider, such as Imperidata, which is able to ‘trickle’ a disk drive update to a remotely hosted copy over a standard broadband Internet connection without the user noticing.

Using remote replication, disk drives can automatically be brought online. And the second servers do not need to be reconfigured because the network address of the primary server can be virtualised using Cisco’s hot standby router protocol (HRRP) or the international standard virtual router redundancy protocol (VRRP). A router at the apparent network address of the primary server forwards traffic to its usual address until the server fails, but redirects traffic to the second server in the event of a failure.

Virtualisation technology, whether of the network or of storage, has provided users with much more resilient systems all round. In the event of storage failure, for example, traffic from the still-functioning servers is simply redirected to the back-up network storage without any interruption in service.

When SANs were first introduced – during the 1990s – they solved some continuity problems but created others. Remote backing up of SANs was initially difficult: many SANs are based on fibre channel networking rather than standard Ethernet-based networking; they also often require expensive and proprietary virtualisation software, even for local back ups, to create easily manageable back-up processes.

This, however, has become easier. The advent of the iSCSI protocol, which forgoes the fibre channel of traditional SANs in favour of Gigabit Ethernet, has made it as easy to push iSCSI-based SAN data out over a leased line as it is to push out data from attached storage. “You’re better off deploying a SAN if you want to do remote back ups,” says Stephen Owens, EMEA product manager of Adaptec, a storage systems supplier. “It’s easier, you get the benefits of separate storage – you don’t impact the LAN – and you can deploy snapshotting, remote mirroring and storage virtualisation.”

Quantum’s Hunt concurs, adding that iSCSI will go where fibre channel can’t. “Fibre channel isn’t designed for long distances, but Ethernet has an unlimited range,” he says.

Remote back-up solutions face a barrier that local solutions do not face, however: the speed of light. Once servers are further apart than roughly 16 kilometres, there are noticeable lags in communication between primary and secondary sites that can affect performance, as messages confirming arrival and requests for more data are passed back and forth. With many companies now worried about large scale terrorist or natural disasters, this problem has moved from theoretical to practical.

Various storage firms claim to have overcome the problem, including EDS and Hitachi Data Systems. “We offer an asynchronous solution that can overcome the lag,” says John Hickman, business continuity manager, EMEA, HDS. Rather than write to both primary and secondary server simultaneously, asynchronous back up allows for lags of a few seconds before data is successfully written to the secondary server. “When we send out the data, we include metadata that specifies the order in which disk operations should be performed. The secondary server won’t perform the writes until it knows it has received all the data.”

This also prevents disk corruption in the secondary server, since it will hold off writing data that is broken off by a breakdown in the primary server.

“It is not uncommon for some organisations to synchronously mirror their data to a secondary site 10 kilometres away,” agrees Paul Hammond, director of solutions consulting at CNT.” Some then back up asynchronously to a third site, which could be up to 100 kilometres away.

An added step that creates a very sophisticated “belt and braces” business continuity system is to then run back ups of the data at the third site and keep these tapes in a fourth site.” Some organisations, at least, take their business continuity and back up process pretty seriously.

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics