Applying the ILM architecture

Storage costs have never been lower. Data volumes have never been higher. For many hard-pressed IT managers, that makes buying more hardware to accommodate burgeoning data seem a relatively easy option. In a recent survey by market research company Vanson Bourne, 80% of respondents admitted that they simply purchase more storage rather than investigate ways of getting more from existing resources.

The view that this approach is “pain-free”, however, could not be more wrong, says Jon Pavitt, professional services manager at StorageTek UK and Ireland. “It quickly becomes extremely expensive – not just in terms of capital expenditure but also in terms of management costs,” he says. By first rationalising the existing storage infrastructure to maximise its use, IT managers can delay or eliminate the need for new primary capacity to manage information growth.

Not only that, but efficiencies can also be achieved through eliminating redundant copies, moving less critical data to less expensive storage media and recovering over-reserved capacity.

That is where information lifecycle management (ILM) comes in. ILM concepts can be applied to assess data uses and storage assets, identify inefficiencies and adjust the infrastructure to maximise utilisation. It is much needed. According to research by IT market analyst company Gartner, utilisation of typical direct-attached storage rarely exceeds 50%. Highly efficient networked storage environments that are organised according to ILM principles, by contrast, can operate at 70% to 90% capacity utilisation.

But that is entirely dependent on an organisation having a good understanding of the types of data it holds, the value that data has, and where it is stored. ILM

Migration mathematics

A mid-size manufacturing company holds 12 terabytes of primary storage and disk mirroring – 6TB is used for primary application purposes and 6TB for data protection.

The company finds that over 50% of that information is not referenced again and so can move 3TB of the application data from expensive performance disk to SATA disk or tape.

A business impact analysis, meanwhile, justifies moving protection copies off expensive primary disk as well. That means the 6TB used for data protection could also move to SATA. The end result: 3TB of primary data on high performance disk out of the original 16 terabytes. Assuming the purchase cost of SATA disk arrays is one-sixth the cost of primary disk, this process recovers a substantial value in primary capacity and lowers the overall cost per online gigabyte.

Nine additional terabytes of primary storage are now available. With online data growth of between 40% and 75% annually, this process enables the recovery of enough reserve capacity to handle the growth of high-performance business applications for several years.

works on the principle that all data is not equal and that it is used in different ways at different stages of its lifecycle. For example, monthly financial reports may combine sales orders, shipments, inventory expense and other data for the month.

During the processing cycle, the finance department needs access to verify and analyse this data frequently and rapidly. After the reports are generated, however, the previous month’s data is referenced less frequently as the focus changes to data for the current month. Previous reports can then be migrated to less-available, lower-cost storage, thus freeing up primary, high-cost storage.

Few companies, as yet, have that level of visibility into their data environment and the needs of different data categories, says Pavitt. As a result, one of the main roles his team performs is to help companies to gain that insight. “Our job, in some ways, is to make a nuisance of ourselves,” he jokes. “We go into an organisation and ask lots of questions about what data they have and where it is kept. They frequently don’t have answers – but that’s the whole point.”

First steps

The first step in getting those answers is a review of the current storage environment by evaluating data, categorising it, and then applying business rules to each category.

“Data valuation starts with listing data types and ranking them based on how often the data is accessed, who uses it, and so on,” explains Pavitt. “IT managers can then map the current location of the data across a hierarchy of storage systems, from high-performance disk to low-cost tape systems.”

“It’s vital to analyse current storage usage before embarking on any major storage implementation or upgrade. You need to know what you’re holding, where you hold it and what data has specific compliance requirements,” says Nigel Ghent, UK marketing director at storage supplier EMC. “That is a huge job of work in itself,” he adds.

There are ways, however, to minimise the effort involved, says Phil Goodwin, an analyst at IT market research company the Meta Group. “Obviously, treating each data type individually is impractical. The number of associated policy implementations would be unmanageable,” he says.

Instead, he suggests classifying data elements according to specific attributes (see table, Sample data element attributes and categories). “The objective is to reduce the number of elements, and therefore the number of policies, to an amount that can be effectively implemented and managed,” he says.

From there, business rules can be applied to different data types (see table, Sample business rules matrix).

That auditing process, however, should not be too exhaustive, warns Simon Gay, consulting practice leader at systems integration company, Computacenter: “The danger of the categorisation process is that some companies are all talk and no action. You could explore endlessly the kinds of data your organisation holds and how it should be stored, but until you start putting ILM into practice, you’re probably no closer to a compliance situation.”

Early wins can be gained, he says, by identifying the most important data a company holds, and applying ILM policies to them, before moving on to less significant data groups.

ATTRIBUTE	CATEGORY
Sample data element attributes and categories
Business continuity	Mission-critical, business-critical, operational
Response time	Online processing, analytical processing, offline batch processing
Disaster recovery time	High, medium, low
Retention	Short term (<180 days), medium term (1 year), long term (7-20 years)
Confidentiality	High (government-specified), medium, (company-specified), low
Source: Meta Group

Data/storage alignment

That assessment should give storage administrators a clear view of the requirements of different data types and how well these requirements are currently being met. They are then in a position to move data where appropriate to lower-cost and lower-performance storage classes, while still delivering adequate performance for mission-critical applications.

That requires an assessment of the tiers of storage that exist below primary, high-cost, high-performance disk, says Tim Mortimer, business manager at storage integration company InTechnology. “We generally advocate four levels of storage. First: fast access, high-performance primary disk. Second: low-cost disk such as SATA [serial advanced technology architecture] disk. Third: tape technology where data must be retained but is unlikely to be referenced again. Fourth: offline tape in a secure facility, possibly offsite, which can be manually reintroduced into a tape library in the very unlikely event that it needs to be recalled,” Pavitt says.

The emergence of SATA disks has done much to boost ILM efforts, he says. SATA arrays can store data at a fraction of the cost of high-performance disk. When shifting point-in-time copies of data, for example, SATA is often the obvious choice.

That is not to say, however, that tape technology is becoming redundant, argues Derek Lewis at Morse: “Despite advances in disk, there are still huge advantages to tape technology. I’ve recently seen tape technology that can hold around 1.5 terabytes on an £80 tape. The costs involved in storing large volumes of data that will probably not be accessed again on tape are now staggeringly low, and disk – even low-cost disk — still cannot match them.”

One StorageTek customer, for example, uses ILM to balance availability and cost by automating payroll data management and migration. Payroll processing is a mission-critical application, so it made sense to store the data on high-performance disk during the processing cycle and replicate it every two hours.

Once the pay cycle is complete, the automated management system now moves payroll data to mid-range SATA disk arrays. At this stage, users can access payroll data from the company’s web site for a period of three months.

After three months, the data is written to a tape library, which is on the same campus as the data archive. For disaster recovery protection, the data is replicated to a remote location, where it is stored on a back-up tape library.

ILM is an ongoing process – data storage administrators will need to continually maintain a balance between data performance needs and storage options, says Pavitt of StorageTek. “The struggle is to get the client to realise that getting benefit out of ILM is only 20% about technology and 80% about business processes. It’s that kind of housework that drives the biggest savings,” he says.

	DISK			TAPE
Storage tiers and technology
TIER	ENTERPRISE	MODULAR	ONLINE CAPACITY / ARCHIVAL	AUTOMATED CAPACITY	MANUAL
Design	Monolithic	Modular	Modular	ATL	Rack
Drive interface	SCSI/FC drives	SCSI/FC	ATA/SATA/FATA	FC	People
Drive/Media reliability: Mean time between failure (hours)	1.2 million+	1.2 million+	600K+	1 million+	1 million+
Performance:
Rpm	10K – 15K	10K – 15K	7.2K
Seek time	<6 ms.	<15 ms.	<1sec.	<1 min.	Days
Key environments	Mission-critical/online transaction processing	Business-critical	Fixed content, WORM archival, back-up	Back-up, archival WORM	Archival, back-up
Source: Meta Group

APPLICATION	BUSINESS CONTINUITY	RESPONSE TIME	RECOVERY TIME	RETENTION	CONFIDENTIALITY
Sample business rules matrix
Routine email	Operational	Online	High	Short term	Low
HR email	Operational	Online	High	Long term	High
Order processing	Mission	Online	High	Medium term	Medium
Marketing analytics	Business	Analytical	Low	Short term	Medium
Financial processing	Business	Offline batch	Medium	Long term	High (special)
Source: Meta Group

Pete Swabey

Pete was Editor of Information Age and head of technology research for Vitesse Media plc from 2005 to 2013, before moving on to be Senior Editor and then Editorial Director at The Economist Intelligence... More by Pete Swabey

Applying the ILM architecture

First steps

Pete Swabey

Related Topics

Related Stories

Data storage problems and how to fix them

Combining Qumulo integration with open source backup software

Combining block, file and object storage in one cluster technology

Overcoming data loss from embedded devices

Related Stories

Future challenges and innovations in cloud security platforms

CMA to probe big tech cloud providers for market dominance

Einstein 1 platform announced at Dreamforce

Two-thirds of small businesses plan to cut cloud spending