The next data deluge

It is a simple but unavoidable truth: The amount of storage capacity required by enterprises is set to expand exponentially for as long as anyone dare predict.

For starters, there is the base of corporate data; that is doubling each year. But the arrival of several new sources of data (voice-over-IP systems that can capture all calls digitally and RFID tags transmitting their location throughout the supply chain are just two examples) means that data volumes are going to go into an even steeper climb.

Adding to this, stacks of new and pending legislation, designed to enforce sound corporate governance, have meant that businesses and public sector organisations are now expected to keep hold of certain data for longer and to be able to lay their hands on it when required to. The burgeoning volume of data – and the resulting demands that puts on storage resources – is creating a challenging situation for IT management.

On the one hand, the cost per gigabyte of physically storing data continues to fall. And even beyond today's disk technology, there are a range of innovations set to hit the market, such as polymer storage and magneto-resistive random access memory, that promise to keep that line descending.

But on the other, even as the cost of storing data falls, IT executives are having to carve out an ever-larger part of their budgets to meet demand and to deliver increasingly hefty bills to users to cover their storage resource consumption.

That leaves CIOs with key decisions to make when formulating short- and mid-term storage strategies that can build on legacy infrastructure while still providing the flexibility necessary for future storage architectures.

Real-time storage

The task of implementing a future-proof storage strategy is made all the more complex by the legacy environment. Traditionally storage has been introduced in a piecemeal fashion, with businesses adding more capacity with each new application.

But while this may have been effective in serving the storage needs of particular applications, the approach has created silos of information that restrict business responsiveness. And that, in the opinion of many analysts, has got to be a thing of the past if organisations are to establish the kind of business agility many are talking about. As Bob Passmore, VP of research at analyst group Gartner, says, the move towards a real-time business places different demands on storage systems, where information must be readily accessible across the enterprise.

On the application and server side, the drive towards the real-time enterprise has generated a buzz around concepts such as service-oriented architectures and grid computing. In this approach, components from a variety of different application libraries and data sets can be pulled together, assembled and re-assembled, to serve changing business processes. The processing power that drives this new real-time enterprise comes from a grid of standardised servers, with capacity allocated dynamically depending on the needs of the business.

It is an alluring vision, and one that is running parallel to developments in networked storage. Sector leaders, such as EMC, IBM and Network Appliance, all offer storage area networking (SAN) and network-attached storage (NAS) products that enable organisations to pool and allocate their storage resources based on changing demand.

That approach also helps drive down the proportion of wasted space on storage devices, says Stuart Gilks, European director of systems engineering at NetApp. In a typical silo application only 20% of the disk resource is being used at any one time, he says. Such levels of utilisation means that "for every 2 terabytes of data you have, you're going to need 10 terabytes of storage," he adds.

But the real-time enterprise throws up more issues for storage than simply managing utilisation. With data available on a network, being called on by any number of applications, data consistency becomes a real concern. "One of the biggest mistakes customers can make is to get hooked on the idea of not losing any transactions. Actually, data consistency is the most important factor," says Passmore.

And whereas grid computing delivers cost-effectiveness by aggregating the processing power of networks of relatively low-end servers, achieving the same economies by building new storage infrastructures from commodity boxes is not so simple. As Passmore explains, even the best mid-range storage systems cannot match high-end systems when it comes to data consistency.

That means businesses need to construct a storage hierarchy that applies different technologies to the storage of different types of data – powerful, highly reliable storage devices to handle the most important, most frequently accessed data; less expensive options for more static data.

"If you compare the costs of storing a gigabyte of stored data on different devices, typically the cost on mid-range systems is about 50% that of enterprise systems; ATA disk is around 20% to 25%; and tape library 2% to 3%. That ratio has been fairly stable for three decades," says Passmore.

The upshot is that in order to get the greatest value from storage systems, it is essential that data is prioritised and stored using the most appropriate medium.

But historically that has not been the practice. Typically, storage hierarchies have been built around application requirements. Organisations have deployed systems on the basis of how important various applications are to the running of their business and the priority for these applications to be delivered without interruption, says Dennis Ryan, European partner sales development manager at EMC. "Most businesses usually have a pretty good view of how long they can cope without an application before the blood starts running down the wall."

But simply establishing a storage hierarchy is not sufficient: While a real-time enterprise needs its data to be available to applications in a flexible way, it also needs that data to be carefully managed. A policy-driven approach is needed so that as data becomes less important, it is hived off to lower-cost media.

In essence this is where the long-established notion of hierarchical storage management meets information lifecycle management (ILM) – the approach that Passmore says is easy to dismiss as merely "the latest buzzword", but which is in fact of critical importance to most organisations.

"Typically, most organisations store the same piece of data seven times. You don't want to be spending money on data that's not valuable to your business," he says.

Policy-driven, automated management is the most effective way of dealing with both the cost of storing data, and the regulatory framework that is building around corporate data handling, he adds.

The message is clear: even as they face escalating data volumes, organisations need to take a proactive approach to prioritising that data and optimally distributing it over a hierarchy of networked devices.

Storage hierarchy

Source: Gartner

Hype cycle for storage, 2004

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and... More by Ben Rossi

Ben Rossi

Related Topics

Related Stories

Future challenges and innovations in cloud security platforms

CMA to probe big tech cloud providers for market dominance

Einstein 1 platform announced at Dreamforce

Two-thirds of small businesses plan to cut cloud spending

Related Stories

Future challenges and innovations in cloud security platforms

CMA to probe big tech cloud providers for market dominance

Einstein 1 platform announced at Dreamforce

What is industry cloud?