Data quality

Overview: What is data quality?

Corporate systems are awash with inaccurate, incomplete, out-of-date, redundant, meaningless and duplicated data. It is hardly surprising, therefore, that companies consistently make poor and ill-guided decisions. Or, as the mantra of the data or information quality industry puts it: 'You put garbage in, you get garbage out'.

That problem has escalated substantially over the last ten years as companies have implemented software such as enterprise resource management (ERP), customer relationship management (CRM) and supply chain management to automate business processes and to capture information about business transactions in ever-burgeoning databases. As overall data volumes have grown, so has the proportion of 'dirty data'. The problem, moreover, is compounded by data acquired in M&As.

Poor quality data manifests itself in many ways. In a 2004 survey of 88 IT developers, managers and executives conducted by the Data Warehouse Institute and IT market research company Forrester Research, 30% of respondents indicated that data quality problems were serious enough to require or attract the attention of the executive function. Some 12% acknowledged missed deadlines in closing the company books and 10% said revenues had been improperly booked or credited due to data quality lapses.

However, more and more businesses now recognise that information is a valuable corporate asset, a vital tool in the struggle to improve customer service, identify new business opportunities and shorten the sale-to-cash cycle. That recognition is driving many of them to invest in information quality tools. These enable them to establish data consistency and quality through the use of data profiling, data standardisation and data matching technologies.

Data quality definition

Good quality data is accurate, relevant and up-to-date. Data quality tools identify data that does not meet these standards and removes it from corporate databases. By implementing these tools, managers can have greater confidence that the data they have will equip them to make better-informed business decisions.


The rising profile of data quality tools

A number of factors are pushing data quality to the top of the boardroom agenda, according to analysts at IT market research company, Forrester Research:

° Data defects

Regulatory compliance issues, such as the Sarbanes-Oxley Act, have refocused attention back to information quality and have shifted its importance into the corporate boardroom. Increasingly, missed deadlines in closing company accounting books and statutory reporting have been blamed on data defects and quality issues.

° Poor quality of CRM systems

According to Forrester analyst Lou Agosta, information quality is the weak underbelly of CRM implementations and many systems fail to deliver an accurate, 360-degree view of the customer. Information quality tools need to identify individual customers across multiple datasets and eliminate duplications.

° Bad data creates costly, operational inefficiencies

Duplicated customer or product data creates redundant information that impacts all downstream processes that use it. Backups, system interfaces and repeated verification of the same data increases the cost of daily storage management processes. Productivity is also hit as tasks are repeated.

° Mergers, acquisitions and reorganisation require data integration

Mergers and acquisitions of companies create critical compatibility issues between different information technology systems. If data is not carefully inventoried and evaluated, there is the risk of dysfunctional islands of information and data silos being created.

° Loss of trust

A lack of data and information quality across systems reduces the value of all systems to employees as it becomes difficult for them to judge which one is accurate.

Source: Forrester Research



Data profiling and data cleansing

Data quality tools fall into two broad categories: tools for data profiling and tools for data cleansing.

Data profiling enables a company to get an understanding of its data: what information is stored, where, its structure and the anomalies and inconsistencies it contains. These might include invalid data structures, incorrect values, missing values, duplicates, misplaced fields or inconsistencies between values stored in different systems. Trying to uncover such problems manually is time-consuming, prone to error and expensive.

With a clear understanding of data quality problems gained from data profiling tools, data managers know precisely what corrective measures and cleansing rules need to be applied. Data cleansing software applies these rules automatically and ensures that the data is standardised and corrected: duplicates are removed, structures such as field lengths and formats are standardised, and values are corrected.


Key data quality suppliers

° Ascential

Purchased by IBM in 2005, Ascential's flagship product is the DataStage suite, aimed primarily at the data extraction, transformation and loading (ETL) markets.

° DataFlux/SAS Institute

DataFlux was bought by SAS Institute in June 2000. The suite offers data profiling, data quality, integration, data enrichment and data monitoring capabilities.

° Firstlogic

FirstLogic's Information Quality Suite provides tools for data profiling, cleansing, enhancement and consolidation.

° Group 1 Software

Group 1 Software is a subsidiary of Pitney Bowes. Its product is aimed at improving CRM data.

° Similarity Systems

Similarity Systems' Athanor suite includes data profiling, standardising, matching, cleansing and data enrichment tools.

° Trillium

Trillium is owned by marketing specialists Harte-Hanks and specialises in data investigation, standardisation, information enrichment and data linking.



Dirty data: causes and effects

What problems does dirty data create?

° Extra time to reconcile transactions

° Delay deploying new systems

° Loss of credibility in system

° Loss of revenue

° Extra overhead costs

° Customer dissatisfaction

° Compliance problems

Source: The Data Warehouse Institute


What's new in data quality?

Data quality is growing in both sophistication and popularity, according to Forrester analyst Lou Agosta. Customers for data quality tools should watch for several important trends in 2005:

° An important growth area will be the maturing of data quality standards and methods to access data and its accompanying metadata. The lack of metadata quality is a major source of problems, as anomalies tend to skew data structures and their content.

° Policy-based information quality is also set to play a role in establishing better quality data. Policies that define standards for information quality will be related to an information quality methodology that sets out patterns for raising the capabilities and maturity of the enterprise's relationship to information quality. An integrated methodology will help achieve a defined, repeatable and measurable process to increase information quality and achieve improvements.

° There is likely to be a more continued differentiation between data standardisation and information quality. Without an understanding of how data standards interact with raw data, standardisation can result in a loss of data. Agosta argues, therefore, that matching standards to end-user requirements is the ultimate goal and not the standard in itself.



Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics