A question of quality
- Reduce text size Decrease text size
- Increase text size Increase text size
- Print article Print
- Jump to comments Comment
- Share this article Share
- Email article to a friend Email
The value of business intelligence is undermined entirely if the data being analysed is inaccurate, incomplete or inconsistent.
The value of business intelligence (BI) is undermined entirely if the data being analysed is inaccurate, incomplete or inconsistent. And data quality is more complex than simply checking that names and addresses are recorded properly: poor quality data threatens to undermine crucial business operation.
The extent of the problem is so severe that industry analyst group Gartner predicts that through 2007, more than half of all data warehouse projects will have limited acceptance, or be outright failures, as a direct result of a lack of attention to data quality issues.
For too long now, organisations have relied on extraction, transformation and loading (ETL) tools without looking at the underlying quality of the data, often taking data quality as a given, says Ed Wrazen, marketing vice president at Trillium Software: "ETL is more of a process for migrating and mapping the data; it doesn't really address data quality."
Infestation
One off, isolated cases of bad quality data are easy to overlook or ignore, but a report by analysts Butler Group recently likened the problem to an infestation of woodworm. "A solitary insect will go unnoticed and not cause too many problems. However, without adequate protection, one can soon become an infestation."
As with many issues, preventative tactics are far more successful and cost-effective than having to engage in repairs. Data quality should be more of an ongoing process, as opposed to an ongoing saga - starting from the moment that data enters the organisation; the process of data quality should only finish when data is erased for good.
Without measuring the quality of data as an asset to the company, it is hard to manage the data quality process. "We're still in the fire-fighting mode but we are actually starting to measure," says Stephen Brobst, chief technical officer at data warehousing firm Teradata. "But in my experience, no matter how bad people say their data is, it's worse," he adds.
The first step to resolving a problem is to admit you have one. One of those not pulling the wool over their own eyes is Amanda Hughes, senior manager of wholesale data management at Lloyds TSB: "We've been looking at data for the best part of last year and have been horrified at just how bad it is. We've got our work cut out." With 50,000 customers with corporate relationships, and by calculating that poor data quality was costing the bank £200 million, Hughes got the budget she needed to start to improve matters.
She blames the lack of incentives pushing staff to input data accurately and has added an element of competition, measuring by service centre, to spur better practice. "We do data quality by embarrassment," she says. When Lloyds TSB started looking at its data, 26% of all non-personal records had measured errors or were incomplete. Eighteen months later, that figure was down to 15%, but mainly because of the improved quality of brand new data.
Cleansing existing data poses its own set of problems, but nevertheless is unavoidable. Data cleansing tools - such as Trillium Software Discovery - typically work with data after it has been profiled, and areas needing correction or alteration have been identified. Wrazen advises clients to first profile, understand and explore their data, and then target inconsistencies either by correcting the business process from which they originate, or by applying automated rules to make any corrections.
Again, data cleansing is not a one-off exercise as it is in continuous flux, and quality can deteriorate over time.
According to the Data Warehouse Institute, 76% of dirty data is a result of unsatisfactory data entry by employees. At Coca Cola, Keith Henry, director of data warehousing, explains that the sheer number of acquisitions alone have forced his department to make rapid advances in dealing with data quality. "We even have four people just watching the feeder systems coming into the data warehouse every morning, and they conduct sample tests. Any problems and a message goes out to the sales centre in question, which is then given a deadline to re-submit their data," says Henry.
Certainly by tackling the problem at the source, poor quality data can be remedied before it affects the enterprise's business intelligence procedures.
Analyst firm Forrester Research is predicting that the overall information quality market is on course to pass the $1 billion mark in 2008, an indicator that the market is both maturing and becoming more business critical. Its current growth rate is 14%, far higher than the 7% average growth forecast for other IT segments.
With data quality no longer seen as a problem to be left to the IT department, the prospects are good.
| |||||





