The amount of electronic data in the world is growing at an unprecedented rate as email becomes a prime method of communication, and data previously stored on paper, tape or film is digitised
Recent changes in legislation have meant that the growth in the volumes of corporate data has been accompanied by a requirement to account for the veracity of that information. Compliance laws require that businesses can produce specific documents on demand, and prove that the document has not been altered.
In an attempt to ameliorate the twin demands of burgeoning data volumes and increased accountability, IT vendors have developed the concept of content-addressed storage (CAS), a technique intended to simplify the storage and retrieval of fixed data.
The problem of storing static data is likely to be the greatest enterprise storage challenge for the next ten years, according to analyst group Gartner. Likewise, IDC predicts that CAS will account for as much as 8% of the storage market by 2006, a 5% rise from 2005.
The term CAS was first coined by EMC in 2002 when the storage giant released its Centera product. EMC had spotted an opportunity with the growing interest in information lifecycle management strategies to address the large segment of fixed data. "CAS has four major qualities: accessibility; longevity; manageability; authenticity," explains Mark Lewis, EMEA marketing manager at EMC.
CAS is a disk-based method of storing data that gives each data object a unique, location-independent, digital identifier. Metadata containing this address, the location and other identifiers (for example, the retention period) is stored in an index, which is accessed when users need the object. In effect, this provides a storage search engine, which makes it easier for users to locate documents.
Within a storage hierarchy, CAS fits most naturally into the middle tier. It is less suited to the low end as each object is stored along with metadata, and identified with long, encrypted addresses, making it an inefficient way to store small or relatively unimportant files. Similarly, its addressing system is designed for fixed data; each object has a unique identifier, and any change to the object results in a new address being generated. High- end transactional data is subject to too many alterations.
Using CAS can also save on the total amount of storage utilised by businesses and therefore on costs. Because each document only needs to be saved a single time, and is identifiable, it eradicates the problems associated with storing multiple versions of the same data.
And although disk-based systems are more expensive than tape storage, the ability to search a CAS system may make it a more attractive option for businesses.
Holding on for tomorrow
The storage vendors have been keen to promote CAS as a vital technology in dealing with compliance issues. For example, over in the US, the Securities and Exchange Commission's (SEC) Rule 17a-4 governs the retention of electronic data, requiring it to be stored in non-rewritable or erasable form and easily retrievable. The US also has other laws targeting individual industries, such as medicine and finance, however there are fundamental principles guiding the legislation: that information is retained for a proscribed period; that the integrity of stored information is guaranteed; and finally, that it can be accessed speedily.
While this approach has encouraged adoption of CAS in the US, however, this has yet to be repeated in Europe. "In Europe," says Nick Bunyan, a Director of Research at IDC's European Storage Group, "legislation comes way down the drivers for storage purchases. People have an attitude of 'we'll do it next year' – it's rather like the attitude to Y2K in 1998."
Another reason European customers are slower to embrace CAS is the different emphasis legislators have taken compared to their US counterparts, says Stephen Ellis, co-founder of storage company Permabit. Europe has gone down the road of legislating for individual protections rather than record management.
Nevertheless, vendors are confident that regulatory pressures will fuel sales of CAS. With this in mind, vendors have introduced CAS products branded for compliance. One of the main features of these products is the ability to digitally 'shred' information at the end of the statutory retention period. Using a retention period in the data object's metadata can also ensure that it is impossible to erase during that time.
"Part of the power of CAS," says Permabit's Ellis, "is that back-up is a very costly and labour-intensive exercise. Over time businesses are going to want storage that will back itself up, and CAS can do that." Currently, whenever companies back-up their systems, they tend to incorporate a lot of fixed content. This makes the process laborious and is not a cost effective use of back-up. Back-up provides a snapshot of corporate information at any given time, but static data remains consistent regardless of when that snapshot is taken.
Using CAS, fixed data does not need to be backed up separately. This offers potential savings to the business, without risking data loss: All the vendors mirror stored data to protect against data loss from systems failure. Ellis admits most customers are not yet ready to take the plunge of relying on a CAS system to double up as a back-up mechanism for valuable data, but says that future CAS systems will have this functionality as standard.
One of the principle benefits that CAS delivers is automation, says Carl Greiner, senior VP at the Meta Group. Greiner estimates that one storage administrator could manage over 300 terabytes of fixed content using a CAS system – a massive advantage over current systems. He predicts those volumes will double by 2006.
Migrating to CAS disk systems can be expensive; however this can be partly offset by reducing hardware costs. CAS is both media-neutral and scalable, providing users with a degree of flexibility and future proofing. Users can migrate to new hardware without compromising content. Similarly new nodes can be added which the system then automatically configures.
This has major repercussions on the ROI because full-scale migration can be an expensive business, responsible for up to 20% of the TCO in cases. While traditional file systems reach their maximum capacity and then have to be migrated to something larger, CAS-based tools can scale to multiple petabytes, and as the demand for capacity increases terabytes can be added in relatively small increments.
Something like a medical record must be stored as long as the patient is alive, but no hardware lasts that long. CAS, because it takes no account of physical media, could last for decades rather than just a few years. "The data can outlive the storage," says Ellis. Yet because the market is still in its formative years, there is no proof or otherwise of CAS's longevity.