Overview: What is data quality?
Corporate systems are awash with inaccurate, incomplete, out-of-date, redundant, meaningless and duplicated data. It is hardly surprising, therefore, that companies consistently make poor and ill-guided decisions. Or, as the mantra of the data or information quality industry puts it: 'You put garbage in, you get garbage out'.
That problem has escalated substantially over the last ten years as companies have implemented software such as enterprise resource management (ERP), customer relationship management (CRM) and supply chain management to automate business processes and to capture information about business transactions in ever-burgeoning databases. As overall data volumes have grown, so has the proportion of 'dirty data'. The problem, moreover, is compounded by data acquired in M&As.
Poor quality data manifests itself in many ways. In a 2004 survey of 88 IT developers, managers and executives conducted by the Data Warehouse Institute and IT market research company Forrester Research, 30% of respondents indicated that data quality problems were serious enough to require or attract the attention of the executive function. Some 12% acknowledged missed deadlines in closing the company books and 10% said revenues had been improperly booked or credited due to data quality lapses.
However, more and more businesses now recognise that information is a valuable corporate asset, a vital tool in the struggle to improve customer service, identify new business opportunities and shorten the sale-to-cash cycle. That recognition is driving many of them to invest in information quality tools. These enable them to establish data consistency and quality through the use of data profiling, data standardisation and data matching technologies.
Data quality definition
Good quality data is accurate, relevant and up-to-date. Data quality tools identify data that does not meet these standards and removes it from corporate databases. By implementing these tools, managers can have greater confidence that the data they have will equip them to make better-informed business decisions.
Data profiling and data cleansing
Data quality tools fall into two broad categories: tools for data profiling and tools for data cleansing.
Data profiling enables a company to get an understanding of its data: what information is stored, where, its structure and the anomalies and inconsistencies it contains. These might include invalid data structures, incorrect values, missing values, duplicates, misplaced fields or inconsistencies between values stored in different systems. Trying to uncover such problems manually is time-consuming, prone to error and expensive.
With a clear understanding of data quality problems gained from data profiling tools, data managers know precisely what corrective measures and cleansing rules need to be applied. Data cleansing software applies these rules automatically and ensures that the data is standardised and corrected: duplicates are removed, structures such as field lengths and formats are standardised, and values are corrected.
Dirty data: causes and effects
What problems does dirty data create?
° Extra time to reconcile transactions
° Delay deploying new systems
° Loss of credibility in system
° Loss of revenue
° Extra overhead costs
° Customer dissatisfaction
° Compliance problems
Source: The Data Warehouse Institute