The most common mistakes in data preparation

Data and analytics continue to be a number one investment priority for CTOs. Time and time again, research has shown that big data, advanced analytics and artificial intelligence have the potential to dramatically improve an organisation’s performance.

According to Gartner, businesses that embrace data and analytics at a transformational level enjoy increased agility, better integration with partners and suppliers, and easier use of advanced predictive and prescriptive forms of analytics. This all translates to competitive advantage and differentiation.

>See also: The success of artificial intelligence depends on data

But according to recent research amongst data professionals working closely with global organisations trying to use data to their best advantage, many fundamental mistakes happen very early on in the data preparation stages which hold progress back. Fortunately, though, these issues can be easily overcome:

1. Spending too much time preparing data – New figures suggest that 60% of IT professionals spend more than half of their time at work on data quality assurance, clean-up or preparation. Based upon Glassdoor salary estimates and IDC’s estimation there are 18 million IT operations and management professionals globally, meaning organisations are spending more than $450 billion on data preparation. That is a lot of resources to spend before proving the value of the project. Instead, organisations should explore more cost-effective and efficient ways to reverse this pattern. Intelligent data preparation platforms are one method.

2. Relying too heavily on IT departments – Many data and analytics teams rely heavily on their IT department to source the data they need to run their projects. To be exact, 59% of data analysts say they are dependent on IT resources to prepare or access data. Given IT departments are often focused on anything from keeping the day-to-day operations running, through to legislation and to launching new products and services; this can cause significant delays to what should be fast-moving projects. 82% of analysts believe they would be able to drive increased value from their analysis projects with a decreased dependency on IT. To overcome this issue, big data solutions must become more ‘people friendly’ so that non-IT experts across different lines of business can use them.

>See also: Data scientists: What they do and why businesses need them

3. Preparing data without context of the use case – An in-depth understanding of the business use case at the preparation stage for any analytics initiative is crucial. This is another challenge when outsourcing data requirements to IT; while they have the technical capabilities, they often lack context and details to identify relevant information about the data. Without doing so, an organisation can spend untold cycles in an attempt to achieve the best iteration of data required to make a project successful. Knowing upfront what is important to a particular use case means a business can maximise the outcome of the analysis.

4. Bringing data scientists into the preparation stages – We should keep front of mind that data scientists are highly-trained powerhouses doing complicated work which generates real value. The average salary for a data scientist in London is around £65,000 per annum, which makes them a precious commodity to be used strategically. However, they can spend the bulk of their time preparing data instead of the complex work they were hired to do. 60% of IT professionals rightly consider themselves overqualified to be spending a large proportion of their time preparing data. Many of them go on to explain that their time would be better spent modelling, finding insights or designing programmes, but until their time is freed up from the pain of data preparation, how can this be possible?

>See also: What are the real opportunities for big data in the digital world?

5. Preparing data manually – Manual data preparation tools like Microsoft Excel can hinder collaboration and efficiency but remain popular among analysts and IT professionals alike: 37% of data analysts and 30% of IT professionals use it more than other tools to prepare data. This reliance on manually driven data preparation tools will continue to delay data initiatives and deter new insights. It is a no-brainer. Organisations like financial services provider Deutsche Börse have explored how data preparation platforms accelerate these processes to fast-track new data-led product development.

6. Not spotting data quality issues – When preparing data, it’s essential to ensure that the outcome is consistent, conforms, complete and current. Teams should be checking every dataset against these 4 C’s of data quality and identify any issues early, and often. Remediating issues of data quality can significantly impact on the end analysis. For example, marketing lead data is far more valuable when it has been enriched with external data to complete missing values. Or consider the difference between outdated versus up-to-date data when predicting sales and calculating margins. In both instances, improperly prepared data can have a huge impact.

>See also: The top five data trends coming in 2018

Sourced by Adam Wilson, CEO, Trifacta

Nominations are now open for the Women in IT Awards Ireland and Women in IT Awards Silicon Valley. Nominate yourself, a colleague or someone in your network now! The Women in IT Awards Series – organised by Information Age – aims to tackle this issue and redress the gender imbalance, by showcasing the achievements of women in the sector and identifying new role models

Avatar photo

Andrew Ross

As a reporter with Information Age, Andrew Ross writes articles for technology leaders; helping them manage business critical issues both for today and in the future