There was a time when business continuity meant having a tape back-up to recover lost data. But with many organisations now looking for zero downtime and zero data loss, the requirements of business continuity have changed considerably.
From ensuring failover when applications crash to having at least one geographically separate duplicate of an entire infrastructure, business continuity now requires an extensive combination of technology and services. The traditional approach to application continuity is failover clustering. In the event of downtime, another server with the same application and a duplicate of the data or access to the shared storage takes over from the first. Hot-swappable blade servers, which can be removed from their racks and replaced almost immediately, make it easy to have a replacement for the collapsed server ready in minutes.
Failover clustering has long been available in high-end Unix operating systems and OpenVMS, but is now supported by Windows. In many instances, however, the application needs to be ‘cluster aware’ to take advantage of failover technology.
But a new range of point solutions is emerging, designed to ensure failover even among applications that might not be cluster aware. Most of these point solutions are designed for Windows environments, in part because of the relative newness of clustering to the Windows server world, but Forrester Research anticipates they will soon spread to Unix and other operating systems.
Grid and utility computing, which are still immature, should improve on failover clustering by making the server almost irrelevant. According to Michael Hjalsted, marketing manager for servers at Unisys, grid computing will play a major role in business continuity, albeit not for at least two more years. By spreading the processing for a particular application across a number of servers, grid computing makes the collapse of an application or server as problematic as the failure of a single processor in a multi-processor server. Similarly, utility computing, combines discrete server processing power and storage capacity into a single resource, with servers and storage re-deployable according to fluctuating business needs.
However, grid is severely hampered by a lack of solid standards and a paucity of ‘parallelised’ applications that can split their processes so they execute in parallel on different servers. Heterogeneous utility computing is unlikely to be commercially viable before 2010.
Other vendors, notably IBM, are exploring the development of so-called autonomic or ‘self-healing’ servers. Hot-swappable fans, hard drives and other components are already staples of most high-end servers from Sun, HP and others. But chips that can predict impending failures or errors and can switch themselves off after warning monitoring systems are all on the autonomic computing roadmap.
Intel’s Itanium2 processor is capable of running more than one operating system at the same time – a process called partitioning – so that in the event of an application or operating system failure, the other operating environment still carries on. Software-based approaches to this can be seen in products such VMWare, while hardware-based partitioning reaches its zenith in IBM’s zSeries mainframes, which are capable of running up to 10,000 Linux images simultaneously.
The focus on business continuity, particularly in light of compliance legislation but also since 9/11 and the Asian tsunami, has been disaster recovery. Trying to keep a geographically separate duplicate of the main systems has been expensive, not just in terms of raw hardware, staffing and facilities costs, but also because of the high cost of bandwidth: a gigabit circuit halfway across the country can cost £50,000 to £60,000 per year, while a 100Mb link might be £7,000. Fibre connectivity, the basis for many current high-bandwidth networks, remains very expensive and is currently only a viable option for the top 200 UK companies. But as network technologies such as IP become more widespread, argues Craig Parker, general manager for storage at BT Retail, synchronised data replication will become an affordable option for small and mid-sized enterprises (SMEs) as well. BT is currently looking at introducing IP-based storage products to the SME market, as well as to provide hosted data centres for organisations that currently do not find it cost effective or practical to manage data storage themselves.
Advances in SAN technology are also making data replication more viable. Wave division multiplexing, used by companies such as Ciena, allow multiple SANs in the same city to be connected over a single network using a single optical fibre pair and will increasingly be used by large organisations that want to create their own Metro wide area networks, according to research from RHK.
Over links of 25 miles or more, synchronous replication is impossible, since even lightspeed networking can match the speed at which data writes to disk. With two data centres located within this distance, it’s possible for both to be destroyed if they’re on the same flood plain for instance, and organisations with data centres in certain countries often want to have replicated data on another continent altogether.
Asynchronous replication has become possible in the last two years, through products such as EMC’s Symmetric Remote Data Facility. However, the pioneer of asynchronous replication, HDS, has just begun to deploy products that use a disk journaling approach that is able to treat asynchronous storage identically to local storage, recording the changes made to data in journal files and periodically sending the changes to the alternative site at the request of the storage system there. Analyst firm Gartner says that while this approach is superior to other vendors’, interoperability may be an issue.
Over time these technological incompatiblities will inevitably disappear as vendors match each other’s efforts. Operating system vendors will also start to include replication facilities with their systems, effectively making replication a commodity. More and more application vendors will start to include failover clustering or similar facilities in their own products, mainly through licensing others’ technology. It will then be the management and monitoring capabilities of vendors’ products that will differentiate them, through the use of centralised administration consoles and autonomic features. Business continuity can only continue to become more important – and easier – as time and technology progress.