With the birth of the Internet and the pervasive nature of technology, it’s no wonder the majority of data in the world has been generated over the last few years. As we continue to embrace the Internet of Things (IoT), it’s safe to say we’re on track to beating any and all records of data generation year-on-year.
This explosion of data is pushing enterprises in a more data- driven direction; organisations are performing complex analysis on their data to develop new revenue streams, streamline operations and enhance the customer experience.
One of the key concerns during this analysis is that of the data’s quality. With IT systems comprising of legacy, cloud and standalone applications, plus the integration of social network and third party feeds, synchronising this data is a real challenge.
Factors affecting data quality
Over time, original reference data can often become fragmented for a myriad of reasons. However, the three we see most commonly are:
Master data being held across multiple applications, often with different data architectures; Adependency on the end user ensuring their information is updated regularly, despite the user not having any motivation to do so; and Updating data in only one application even though it should be updated in multiple systems in real time, without impacting the existing set up.
As soon as the data is out of sync, the effort and money invested in data analytics is effectively wasted.
Improving data quality management
Data quality management poses its own challenges. Synchronising data across systems often requires complex string comparison operations with the process sometimes needing costly changes to existing applications’ data design.
However, there is already a solution that can be used to improve data quality, one that is rooted in existing best practices of software development.
In a typical software development project, multiple developers work on individual pieces of the software functionality. When each fragment of code from each individual developer is combined, these functionalities result in the single desired application.
Until this moment, the code fragments are maintained using version control. This is not only a repository of code but also an intelligent application that allows developers to track which code blocks were written and why.
It also notes which version of a code fragment has been used, which is critical for tracking edits and selective rollback, should it be required. The logic here is to apply the same process to data quality management.
The first step of this process is to identify the key data and baseline it for reference – this is known as the initial version. This version then becomes the start point within the ocean of data and any subsequent changes are identified by a number or letter code.
At periodic intervals, the version of the data points across all applications is checked and any mismatches indicate the existence of an out-of-sync data point. These are then brought in sync by authorising the change (if required) and replicating it in all the applicable applications.
The ideal scenario here is that the version of the data point is the same in all applications at any point in time.
> See also: How to tackle the great data quality challenge
In addition to ensuring data across applications is in sync, there are other benefits to this method of data quality management: implementation is non-invasive and doesn’t require major system changes; you get an improvement in the operational efficiency of all systems, improved productivity of the data analytics teams, and easy rollback and data recovery
Deploying an effective data quality management programme is not an easy task but the rewards make it worthwhile, especially in terms of productivity and customer service. This disciplined and novel approach will better serve any data driven organisation needing accurate and informed data to succeed in business.
Sourced from Siddharatha Joshi, Technology Analyst, Tech Mahindra