The data centre is fast becoming a black hole for IT resources. The sheer number of applications, their supporting infrastructure and the tools to manage it all is manpower intensive. The well-paid staff that oversee the running of the data centre find their time taken up on routine tasks. It is a situation that wastes talents, threatens to strangle innovation and is inordinately inefficient.
Much of the current problem stems from the complexity of operations within the data centre; complexity that stems from the lack of interoperability between applications and the inherent difficulty in managing these incongruous systems.
Yet it was only when this started to impact on IT suppliers' bottom line that they sought to do anything about it. "Everybody [in the vendor community] realised the complexity was threatening their customers' ability to buy more stuff," says Donna Scott, an analyst with research group Gartner.
That simple incentive has spurred management and infrastructure suppliers to find new ways to automate the mundanity of the data centre. IBM, a pioneer in this area, defined a set of features, of what it brands ‘autonomic' computing, that it predicts will revolutionise systems management: systems of the future will be self-configuring, self-protecting, self-optimising and self-healing. The term autonomic comes from the idea of bodily functions, such as breathing, being regulated without conscious direction.
Today, there are many systems management tools which allow for remote systems configuration, for example obviating the need to manually install new software or create new user accounts. Microsoft's Automatic Update installs patches for the Windows operating system without users having to remember to download them, adding a basic level of security. IT departments have set up virtual server farms – enabling several individual boxes to be used as one resource – and automatic software provisioning to enable more capacity to be added to a business service as demand increases.
But self-healing systems have largely remained in the testing labs. There is clearly an addressable market: estimates indicate that around 70% of an IT department's budget is spent maintaining the existing infrastructure; IBM research has found that 70% of problems are ‘repeatable' and so can be addressed programmatically. So where are the products?
Creating a self-healing system sounds deceptively simple: merely connect alerts and error reports to the management systems, and the manual intervention required to make basic fixes or deal with false alarms could be minimised. However, the technological challenge involves bridging more fundamental divides: between applications and infrastructure architects; between business units and IT; and between software vendors' product sets.
It also requires a shift in the way IT managers think about their systems. It is frequently human configuration that causes systems to malfunction, often the product of bored technicians working on repetitive tasks. But an autonomic system can cause problems when it thinks it is helping.
If all this sounds too much like science fiction, echoes of HAL 9000, the psychopathic self-aware computer in Kubrick's 2001: A Space Odyssey, Microsoft UK's security and management product marketing manager, Alfred Beeler describes how it can happen in practice. One customer bought an expensive RAID array to ensure a server would never go down.
Yet when its first hard drive failed, it was so good at failing over that nobody noticed until the second drive also failed. That took the entire service it was supporting offline. "There will still be a need for [manual] monitoring," says Beeler. "Even the best self-healing system could kill itself."
Similarly, in the event of a denial of service attack, a well-meaning management system could just keep provisioning servers until a whole data centre was taken out, if it did not realise the demand was coming from an illegitimate source. This highlights the need for various management elements – like security and provisioning – to communicate with each other. "The end task is a system which can hide all the noise and the mundane tasks but when something more interesting happens it can flag it up," says Beeler.
Management software vendor BMC believes a starting point is to establish thresholds of behaviour. BMC recently released its Patrol Analytics tool which can establish a baseline of what constitutes normal fluctuations in server performance. Alarms are triggered only when the systems operate beyond those limits. "If you set a static threshold that sends an alarm whenever a server goes over 70% utilisation, every Monday morning you would get a false alarm [when utilisation goes up with everyone checking the weekend's emails]," says Kym Wood, BMC's business unit field manager for EMEA.
While hardly a revolution in self-healing systems, it is a good example of how making systems more intelligent can mean fewer people having to manually monitor server behaviour. "With the skill shortage in the IT industry, CIOs have to be careful how they deploy that intelligence," says Wood. But she warns that there is a fine line between usefully redeploying staff away from repetitive management tasks and automating to the point that errors could be introduced. "If a company automates too much it could introduce a major change into the IT infrastructure that would be impacting another service," she says.
In response to this, some infrastructure vendors are taking a wider-focused approach to automating IT management – by looking at business processes rather than technology. IBM, Computer Associates (CA), Hewlett-Packard (HP), Microsoft and Sychron all sell modelling software that allows organisations to map elements of the IT infrastructure to business processes – and then automate the management of the IT, ensuring service is maintained. IBM is also pushing an initiative aiming to map the dependencies between different elements of the infrastructure.
Conflict of interest
But there remains a fundamental problem at the heart of all the vendors' self-healing systems initiatives: they are too far ahead of their would-be customers. Gartner's Scott says that there are few organisations doing even the early stages of autonomic computing, such as the automatic provision of web servers in response to increased demand. The challenge is cultural, not technological; most businesses' IT processes are insufficiently mature to understand what they need to automate.
"If you don't have good controls you're not going to automate. If IT is not well managed it might be a multi-year project just to get to the base level," says Scott. The IT Infrastructure Library (ITIL) is the best service management model for IT departments aspiring to automation, she adds. Many vendors have now introduced its principles to their products.
Colin Bannister, a consulting manager at CA, says an accurate view of an organisation's assets, using a management database, is a fundamental prerequisite for a self-managing infrastructure that organisations can put in place today. He adds that achieving a full view and automating its management is facilitated by standardisation of the infrastructure, with fewer customised applications.
Ian Curtis, HP's UK director of software strategy, goes further, adding that "most large enterprises have a plethora of different management tools" which, he claims, they want to consolidate to take "a more strategic approach". This may mean ripping and replacing existing software.
In effect, what is proposed is an argument for companies to buy all their software from the same vendor, says Scott: "Really a lot of this is about lock-in. All software for business processes is lock-in and if it is really ingrained in management processes that causes lock-in as well."
But there may be hope that vendors have recognised user reluctance to get locked in to a single provider. "Looking back over the last 40 or 50 years, it's as if in the beginning the aim was to create as many different formats as we can – and we were wildly successful," says IBM's head of autonomic computing, David Bartlett. "To solve this problem this has to be more than a single vendor initiative."
A number of standards initiatives aim to solve the fundamental problem of managing a heterogeneous environment. Currently there are several bodies looking at creating standards for representing IT configurations, creating an integration layer between web services-based management tools and the resources they command, and ways to represent event data. However, work is still at an early stage; it is far from clear how the results will pan out.
Still the standards lay only the barest of foundations for an autonomic future. Web services represent both symptom and cure of today's data centre management ills. Because web services touch many servers, operating systems, networking devices, databases and middleware, it is difficult to pin down the root cause when things go wrong. Standards should solve that problem. "As web services are more common and ubiquitous, they do offer greater potential for dynamic automation because of the way interfaces work between the components," says Gartner's Scott. "A lot of management vendors are making web services out of their own components, turning their management functionality into a web service."
While she bemoans the slow progress on standards, Scott notes that it is in everyone's best interests for vendors to settle their differences. Ultimately, says Microsoft's Beeler, "it's not a big competitive thing. We're all competing with the 70% of the budget that is wasted on management."