Why data lakes don’t always need to belong to data scientists

If your organisation has started collecting and consolidating data that was previously going to waste, and trying to mine it for value, you’re not alone.

Many businesses now recognise that big data is a valuable asset. By consolidating business information into one place and integrating it with third-party sources, companies can build a “data lake” that can offer unprecedented insights into customer behaviour, operations and more.

Yet, according to Gartner, data lakes need to be approached with caution. By flattening the silos traditionally used to store data, businesses also flatten the barriers to entry.

Qualified data scientists can strategically analyse disparate data sets, but other business users can tap that data too. While this can provide advantages, there are also risks.

If people apply incorrect methodologies to aggregated data, they can reach the wrong conclusions and base important decisions on false information, or even corrupt sensitive data.

Gartner’s advice about control and access to data lakes is well-taken, but that doesn’t mean companies should be dissuaded from building and maintaining data lakes.

Overall, consolidating data across systems and silos allows for far more powerful use and application of that information. Data lakes aren’t inherently good or bad – it’s the way they are accessed and used that’s key.

>See also: How to turn a data swamp into a data lake: best and worst practices

Unfortunately, there’s a shortage of “true” experienced and skilled data scientists to direct this process for businesses. Without qualified people who know how to sort through a data lake – and draw meaningful conclusions from it – what is the point of collecting information?

How can companies build data lakes and benefit from them, if they only have a few people who can properly use the information?

This is why there needs to be an additional layer built on top of data lakes, something that can automatically process, refine and learn from the information. These applications would serve as business-oriented solutions that apply select data to reach a certain value-generating goal. And these applications can “package” data science to become truly powerful and democratising access to data for all stakeholders.

Allowing everyone to tap directly into the data lake may not be the best approach, but that doesn’t mean data lakes should be ignored due to excessive worries over access.

Instead, there needs to be a system of controls and analytics built on top of it so that business users – not just data scientists – can safely access the information and benefit from actionable insights.

The art of refinement

Say that a communications service provider (CSP) is collecting and consolidating information about networks and users. That means that the data lake is collecting information about networks, technical details, subscribers, services, locations and usage patterns.

Not only that – third-party data, such as web traffic, social media and even weather forecasts, is going to end up in the same data lake.

Without a team of data scientists to analyse all that data, it may end up siting in the data lake unused. The sheer volume of information makes it impossible for an individual to manually or visually evaluate every possible data set and look for all of the relevant opportunities and relationships they contain.

This is where a data refinement layer must come in. As data is consolidated, the refinement layer would process, evaluate, correlate and learn from the information passing through it, essentially generating additional insights and information from the data, and also linking to the aforementioned applications to drive value.

By enhancing data lakes with a refinement layer, companies can ensure that every business user can explore those opportunities – even if they aren’t experts in data science.

If there seems to be a new customer trend or a potential operational anomaly, the refinement layer would notify the right team automatically. Since much of the relevant information would already be refined and analysed, this automated process would reduce the need for data scientists in the regular tasks and allow them to focus on where they can truly generate most value for the organisation.

This “data refinery” layer refines the unprocessed information in the data lake and turns it into relevant insights. Need to know which customers are most likely to purchase an upgraded data plan? Perhaps you need to combine network, usage and social media data that indicates customers who are likely to churn. All these solutions can be active on a daily or even real-time basis.

From raw recommendations to automated action

Gartner might urge companies to be wary of data lakes because of the difficulty in building solid data governance strategies. But there is a way to maximise the use of data lakes while minimising risk if you can build a layer on top of the lake that can refine the information. 

>See also: Beware of the data lake fallacy, warns Gartner

When done properly, data consolidation and integration can lead to far more benefits than harm. If a refining layer is built between the data lake and business users, then the business users will be able to leverage analytics-driven use cases rather than raw, unregulated data. 

A true data refinery will deliver recommendations to end users who can use it to improve decision-making and drive actions in technical operations, marketing campaigns, sales processes and business decisions, and quicken speed-to-action across the board.

And the findings can be utilised immediately by anyone in the business to meet their goals and drive the organisation forward to truly drive value out of all that big data. 


Sourced from Matti Aksela, Comptel

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics

Big Data