Most large organisations have more than enough servers and resources to meet the average demands placed on them. There are failover servers in case servers break down. And there are additional servers available for when peak demands exceed server capabilities. The result? More servers than are actually needed, many sitting idle for much of the time. Even more frustrating for server administrators is when over-worked servers buckle under traffic loads and under-utilised servers are unable to pick up the task.
In 2002, a new vision has emerged – ‘utility computing'. The idea is that computers linked together over networks will pool their processing power in order to provide all the computing resources that an organisation needs, whenever it needs them. Servers will be just servers – not dedicated web servers, database servers or application servers – ready to be deployed as demand requires.
But what is actually needed to create a seamless computing pool, and to enable IT managers to take an end-to-end view of the network and its resources? Is it really possible to create a ‘worry-free' system that can allocate computing resources to different tasks according to need?
Large systems and software suppliers such as Hewlett-Packard (HP), IBM and Sun Microsystems say it is. But the approaches these suppliers are taking to utility computing differ widely. In broad terms, they can be split into two categories: ‘top-down' and ‘bottom-up'.
The bottom line
The bottom-up approach takes a large number of servers and provides server administrators with the ability to share loads among them, reallocating tasks as requirements change. This is the approach that companies such as HP and Dell are taking. An extra dimension to this approach is ‘grid computing' – not just allocating servers to particular tasks, but giving them parts of tasks to process before integrating the results.
Systems management tools such as HP's OpenView suite have long given users the ability to view, monitor and control network resources and infrastructure. However, the utility approach goes further. It gives both managers and administrators far greater control over individual components of the infrastructure. Indeed, the primary raison d'être of the utility computing concept is to be able to remotely re-task a server, even to the extent of changing its operating system, as requirements vary.
This is not a simple technological issue. Aside from providing monitoring requirements, administrators need to be able to issue commands appropriate to heterogeneous hardware ranging from relatively ‘dumb' hardware such as firewalls, to complex mainframe and high-end servers.
HP's Utility Data Center (UDC) is one of the few systems currently available that attempts to address these needs. Based on the company's OpenView systems management suite and a dedicated utility controller system, UDC adds a resource abstraction layer (RAL) to servers, and effectively functions as a driver.
"The bulk of [the RAL] is written in Java," explains Peter Hindle, senior technical consultant at HP. "That makes us hardware-agnostic. [The RAL] responds to a relatively small set of commands, and passes them down to the physical hardware.
Once the RAL receives simple instructions from the UDC, says Hindle, it consults a database of instructions to find those appropriate for the hardware to which the instruction is being sent: "The resource abstraction layer needs to be written for a particular system and is configuration-specific. The instructions to change a mount point on a two-way server, for example, may be different from those on an eight-way server. The RAL has to know that for a particular box, it needs a particular script for a particular command."
As long as hardware can physically fit into HP's custom-built UDC racks (no simple matter for a mainframe, although HP executives claim that the company has already integrated high-end Sun Microsystems servers and others into UDCs), if an OpenView client exists for it, and HP's services division or the customer's own developers can write a RAL for the hardware, it is possible to integrate it into a UDC environment.
‘Dumb' equipment, meanwhile, poses a different problem for the utility approach – security. Firewalls are not designed to be configurable across a network without a great deal of security and there are aspects that should not be controllable without direct physical access. While HP has written software for accessing a wide variety of such hardware, a direct connection via an RS-232 port is still needed for full redeployment capabilities, says Hindle. As a result, networking hardware that is distributed over a wide geographic area cannot come under the UDC umbrella.
Jim Cassell, an analyst at Gartner, agrees that the experience HP has of working with both hardware and software gives it a head start, perhaps as much as 18 months in some cases. "It's the only company with a tangible technology for heterogeneous networks," he claims.
Heterogeneity is a major challenge for other suppliers, as Lance Osborne, product marketing manager at Dell's Enterprise Systems Group concedes. "To work with heterogeneous systems, you need wide experience with both hardware and software, which gives HP an advantage," he says.
Dell's president and COO Kevin Rollins has publicly said that the utility approach is "an old idea given a new name". However, that has not deterred Dell from making significant efforts in this area, says Osborne. The company's OpenManage software is capable of remotely deploying whole operating system images to Dell hardware, and will offer the same capabilities for other vendors' systems by the end of 2002, he says.
For the bottom-up utility computing approach to work, spare server capacity must be available for re-tasking, otherwise the unified system will simply divert resources from one processing-hungry task to another.
The HP approach to utility computing still leaves resources under-utilised, as Hindle admits. "If you must have your system tuned to the nth degree, you don't want UDC, because that fine-tuning is your key factor in terms of running your system. You're prepared to pay for it and sacrifice flexibility to get it."
So-called grid computing, which Gartner's Cassell predicts will become part of vendors' offerings by 2007, overcomes that issue by using the spare resources on a server for individual parts of tasks rather than whole tasks. Applications developed to take advantage of grid computing can distribute calculations and parts of tasks to other servers on a network or even on the Internet and then collate the answers to complete the task.
But, as Cassell points out, that will require applications to be rewritten and there are many enterprise applications that will not benefit from the huge boost in processing gains grid computing offers. "You need to be able to parallelise your application so that parts of the same calculation can be performed out of order if necessary," explains Cassell. "That's a lot of work and most programs just don't need that kind of processing capability.
It's really only the mathematically and computationally intensive applications that will benefit."
Nevertheless, IBM is building grid computing capabilities into its software and hardware systems, using the Open Grid Services Architecture (OGSA) standard developed by the Globus Project, a consortium of academics, public sector organisations, and industry partners including IBM and Microsoft.
Meanwhile, IBM's storage architecture based around the iSCSI Internet-protocol-based storage standard will offer distributed storage accessible across a network by grid-based applications. Most importantly, the company will offer grid computing through its eLiza programme.
Take it from the top
The top-down approach to utility computing is to have very powerful computers that are ultra-reliable and can be ‘partitioned' to appear as several different servers. These separate partitions can then be altered and have resources added to them or removed as appropriate.
To depend on fewer servers, however, organisations need to know that they are reliable. IBM's approach to utility computing – to have a far less heterogeneous network and more consolidation – depends on servers being ultra-reliable (even offering mainframe-class reliability) for the utility computing environment to work.
Paul Horn, senior vice president of research at IBM, says that IBM plans to develop ‘autonomic' computers – systems that identify and correct faults themselves, and that automatically adapt to the tasks required of them. IBM's research into autonomic systems, says Horn, is modelled on the human autonomic nervous system, which regulates and repairs the body, responding to changing conditions without any conscious effort on our part. He argues that if the current rate of expansion of digital technology continues, there will soon not be enough skilled people to keep the world's computer systems running. Far fewer and more reliable computers are the only option, he says.
One of IBM's early autonomic efforts is eLiza, a self-healing and dynamic workload management system announced in March 2001 and scheduled to first appear in the p690 series of Unix servers that was announced in October 2001.
Through the use of internal sensors, eLiza will monitor component health and automatically reallocate memory ‘on the fly'. Chip-kill, memory mirroring, and hot-swappable memory, disks and other components – all standard features of mainframe computing – will be introduced to IBM's entire range of high-end servers as part of its Enterprise X-architecture.
The X-architecture also offers an additional benefit, says Tikiri Wandarugala, a senior server consultant at IBM Europe. "You can build as you grow. You don't need to decide [immediately] how much power you're ever going to need." Rather than connecting to each other via backplanes connected to the PCI bus, systems based on the X-architecture connect directly into each other's memory controllers. Plugging a two-way system into a two-way system, for example, gives the user either two two-way systems or a four-way system indistinguishable from a purpose-built four-way system. A reboot of the system allows the administrator to switch between the two configurations. The difference is determined by application requirements: not all applications can take advantage of multiple processors or clustering, while licensing issues may limit the number of processors financially rather than technically.
Dell is taking a similar approach with its ‘bricks' technology, due out at the end of 2002. Although it does not offer the speed of a direct memory interconnect, this modular approach to server design – where the user can add power supplies, storage, processing power or other server components to an existing system as though they were Lego blocks – is offered on a ‘pay as you grow' basis.
Reliability issues aside, the ability to partition a single server into multiple ‘virtual' servers remains an important requirement of top-down utility computing. Without it, a badly behaved hardware driver can bring down the whole operating system, no matter how big the computer; poorly written applications can ruin the performance of other applications on the same server; and hardware failures can render the whole server unavailable until the broken parts are replaced. By partitioning the hardware and software into virtual servers, a hardware failure only causes the demise of the servers that use those pieces of the server; overly hungry applications can only consume the virtual server on which they run; and an unstable operating system can be rebooted or replaced while processes running on the other virtual operating systems continue uninterrupted.
The difficulty for server administrators lies in deciding which resources should be allocated to which partitions. In response, most suppliers are aiming to introduce some form of ‘dynamic partitioning', letting systems decide how to allocate resources to each partition on the fly. Sun product marketing manager Mark Lewis proudly claims that Sun's top-end Fire servers are the only ones capable of dynamic hardware partitioning, although software partitioning will only be available once the Solaris 9 operating system comes out. IBM's current systems require operating system restarts to achieve a change of partitioning, but through a technology deal with virtualisation specialist VMWare can offer 20 dynamic software partitions even on its eSeries mid-range servers. However, its zSeries mainframe can mimic up to 10,000 Linux servers.
Although a top-down utility computing approach produces a more seamless utility computing environment by virtue of there being relatively few physical servers to manage – just virtual servers – it is an expensive option for most companies. A Sun Fire 15K costs over $1 million, the equivalent price of several dozens of mid-range and low-end servers from HP or Dell. Equally important is that existing servers can be incorporated into a bottom-up utility approach. Either way, the technology involved in utility computing is neither simple nor inexpensive.