Why open source can save companies from drowning in the data lake

Many companies are either struggling to tap their big data reserves, or are drowning in data overload

Related topics
Data
Open Source

Related articles

Beware of the data lake fallacy, warns Gartner
4 reasons why people should stop associating open source with a lack of security
Why data lakes don’t always need to belong to data scientists

Share article

Short of time?

Print this pageEmail article

‘Open source opens up a world of access to the latest technology that would otherwise take extensive time and resources to develop’

 

The end goal of any big data initiative is to deliver key insights very quickly, if not in real-time. While the first step of gathering data is challenging, today’s technology is more than capable of this.

What comes next – extracting accurate insights in real-time and gaining foresight from it – is something enterprises have yet to nail.

When put to good use, data can provide endless opportunities for innovation and growth, saving money and time, while also expediting services. Despite the opportunity to yield big insights from big data, many businesses are struggling with one of two challenges: those unable to tap their big data reserves and those drowning in data overload.

One of the major reasons is that big data is proving difficult to manage. Terms like ‘data lake’ have been coined to describe repositories for storing relevant data requiring analysis.

>See also: How to turn a data swamp into a data lake: best and worst practices

However, given the rapid accumulation of structured, unstructured and semi-structured data housed in data lakes, they’re more closely resembling a data wasteland for companies without the faintest idea of how to use this information. These companies know they should be able to make decisions in real-time, yet they struggle to integrate traditional and digital methods.

Many CIOs and IT teams use the quick fix of mass collecting data in the hope that it will get easier to gather analysis in the future. However, this approach is not only inefficient and expensive, but it can often kill a data analytics project in its tracks. Instead, businesses should start these initiatives the other way around.

First – i.e. before collecting data – businesses should determine what insights they want and need from customers, competitors and allies, prioritising high business value.

This means that instead of trawling through data looking for common links or themes, businesses should be prepared with a strategy to discern the information that is most relevant to them, to effectively maximise their time and money.

By applying methodologies like design thinking and agile development, businesses can ensure that they focus on the right problems and develop highly viable and feasible solutions.

The best way to analyse the gathered information is to build a custom analytics solution, designed to deliver the insights required.

While a bespoke platform has historically required significant time and financial investment, the use of open source technology is changing the analytics landscape.

Open source big data platforms offer incredible capabilities to build not only insights solutions, but also forecasting and providing predictive analytics solutions through mathematical and statistical models that have the ability to crunch through large volumes of data.

Open source opens up a world of access to the latest technology that would otherwise take extensive time and resources to develop.

Where businesses used to rely on human intervention to process and compute data, open source means companies can effectively codify and automate significant pieces of their operations to make better use of their time and money while extracting better insights.

>See also: How a better understanding of open source can lower the risks

The technology is a strong choice for enterprises that have a growing expectation of flexibility and faster results. There’s no vendor lock-in and the associated costs are lower than proprietary solutions.

But while open source throws open immense possibilities, beware of its biggest challenge: assuring security, access control and governance of the data lake.

There is also the risk that a poorly managed data lake will end up as an aggregate of data silos in one place. CIOs must caution teams about the need to train lay users in appreciating key nuances – contextual bias in data capture, incomplete nature of datasets, ways to merge and reconcile different data sources, and so on – which is a herculean task in every way.

Though untangling the web of big data may seem to be a daunting task – determining which insights you want to extract at the beginning of the process and quickly building an analytics platform with the flexibility of open source technology – businesses can access actionable insights and foresights faster, easier and more cost-effectively.

 

Sourced from Abdul Razack, SVP, head of big data and analytics, Infosys