Logo Header Menu

Hadoop: the rise of the modern data lake platform

Hadoop, according to Matt Hutton, director, R&D Think Big/Teradata, is difficult to get right Hadoop: the rise of the modern data lake platform image

Hadoop, while it may be synonymous with big data, and while it may be free to access and work with, engineering teams globally will admit that behind every Hadoop undertaking is a major technical delivery project.

Failures are so commonplace that even the experts don’t have great expectations of 2017: at the recent Gartner Data & Analytics Summit in Sydney, research director Nick Heudecker claimed that 70% of Hadoop deployments in 2017 will either fail to deliver their estimated cost savings or their predicted revenue.

It shouldn’t come as a surprise. Hadoop was designed for big data storage, but it wasn’t designed as an actual big data application. Hadoop and Spark are incredible enabling technologies.

>See also: Purifying the Hadoop data lake and preventing a data swamp

However, for many, it has been notoriously challenging to successfully implement big data solutions on the Hadoop stack due to lack of available engineering skills and big data experience, accompanied by inflated expectations around time-to-value and cost-savings.

So will 2017 see Heudecker’s predictions come true, or will companies break out of the vicious Hadoop failure loop and finally begin to recognise the consistent value and big data success from the elusive open source framework?

The Hadoop helpers: modern data lakes platforms

While all organisations would agree that the business value potential of Hadoop is huge, getting to a point where that value can be realised has been difficult.

A key culprit to this is that many of these companies’ use cases are built around the ability to bring data together. Up to now, there has been a high barrier to entry in ingesting data from many sources into Hadoop, making the business value difficult to realise.

The challenge of efficiently and reliably ingesting data in a governed manner is familiar to any organisation with a data warehouse.

Enter the new modern data lake platforms. These platforms are working to remove the barriers to data ingestion and discovery. There is growing popularity and awareness of these solutions-based, modern data lake platforms like the newly debuted open source Kylo, with its entirely flexible models, collaborative data science platforms such as Daitaiku, and commercial offerings from Podium and Zaloni which also speed up the solution development cycle on Hadoop in a more fixed, opinionated manner.

>See also: Is Hadoop’s position as the king of big data storage under threat?

Though they vary in approach and flexibility, these next generation data lake platforms are enabling enterprise use cases and removing many of the risks of a custom-engineered approach, allowing companies to become quickly productive.

In addition, they are encouraging organisations to consider governance and best practices upfront to eliminate the common pitfalls of data lakes built on in-house developed solutions.

The key to modern data lake platform success

The Hadoop-ecosystem is similar to a building foundation and bag of useful carpentry tools, but it still requires a highly skilled construction team to actually build a house. Modern data lake platforms provide the house, and companies just need to furnish it with data.

These new platforms are able to take on more complex enterprise user cases to provide a solution so organisations can more easily exploit Hadoop and Spark for analytics. All these solutions, just like the Hadoop distributors themselves, are focused on simplifying and speeding up the solution development cycle on Hadoop, whether it is ingestion or analytics modelling.

>See also: Hadoop in finance: big data in the pursuit of big bucks

Hadoop has some compelling advantages that modern data lake platforms exploit:

 Schema on read, inexpensive Hadoop storage and parallel processing means IT data modellers and their carefully designed normalised schemas aren’t needed. Modern data lake platforms can shift the effort of data ingest from IT to business users.

 Relational database management system (RDBMS) usually represent a stack of precious hardware resources with fixed capacity and carefully managed by IT. Classic ETL transformations are performed along the edge with proprietary software. ETL tools had to deal with complex transform gyrations such as populating star schemas.

Hadoop and Spark can transform data using inexpensive cluster resources. Spark data frames are well suited for the type of complex data transformations any analytics team would need. Modern platforms allow organisations to exploit the transformational power of Spark without any programming skills.

All the same governance, security, and data confidence challenges exist with Hadoop that were solved over time with data warehouse projects. Modern data lake platforms bring all the capabilities needed to navigate these challenges.

How will the modern data lake platform help Hadoop succeed?

All these platforms target common data lake use cases: enabling self-service data ingest, data preparation, metadata management, security and governance.

>See also: Data lakes vs data streams: which is better?

Some provide a framework for IT to extend capabilities to design and manage custom pipelines which can be used to integrate with enterprise systems, and others offer the application layer, step-by-step governance requirements and enforced best practices at the build stage.

Companies heavily invested in data lakes are recognising the value of the above combined assets, and are comparing the relatively low price points of the modern data lake platforms to the ongoing cost of Hadoop-savvy software engineers, who are difficult to find.

What companies are finding is that in order to remain competitive and drive growth in the new big data-driven digital economy, these next generation data lake platforms are the future, and possibly just the helping hand companies need to make a success of Hadoop.

 

Sourced by Matt Hutton, director, R&D Think Big/Teradata

 

Nominations are now open for the Tech Leaders Awards 2017, the UK’s flagship celebration of the business, IT and digital leaders driving disruptive innovation and demonstrating value from the application of technology in businesses and organisations. Nominating is free and simply: just click here to enter. Good luck!

This article is tagged with: Big Data, Digital Economy, Hadoop

Latest news

divider
Research
Digital experiences failing in financial services — VMware

Digital experiences failing in financial services — VMware

4 March 2021 / The widespread lack of improved digital experiences delivered by financial services companies, as found by [...]

divider
Data Analytics & Data Science
Why making the business case for text and data mining is key to embracing digital techniques

Why making the business case for text and data mining is key to embracing digital techniques

4 March 2021 / We live on the edge of an age of unlimited potential. Advances in artificial intelligence [...]

divider
Tech and society
Food tech in the pandemic: digesting the second wave of innovation

Food tech in the pandemic: digesting the second wave of innovation

3 March 2021 / The cultural and social implications of Covid-19 have accelerated trends in the food tech sector [...]

divider
Releases & Updates
Spring Budget 2021: what it means for the UK tech sector

Spring Budget 2021: what it means for the UK tech sector

3 March 2021 / Chancellor Rishi Sunak has announced his 2021 Spring Budget, as the UK economy continues to [...]

divider
M&A
Wunderman Thompson acquires NN4M to strengthen commerce offering

Wunderman Thompson acquires NN4M to strengthen commerce offering

3 March 2021 / In an aim to advance towards delivering effective cross-channel engagement platforms, the acquisition of NN4M [...]

divider
People Moves
Ian Duggan appointed Indigo Telecom Group CEO

Ian Duggan appointed Indigo Telecom Group CEO

3 March 2021 / Duggan joins as CEO after founding and serving as CEO of 4site, the mobile wireless [...]

divider
Case Studies
EIS partners with esure Group to expedite digital transformation

EIS partners with esure Group to expedite digital transformation

2 March 2021 / The partnership between esure and EIS marks the start of a transformation programme, which includes [...]

divider
Recruitment
Tech recruiting lessons in the Covid-19 era

Tech recruiting lessons in the Covid-19 era

2 March 2021 / Covid-19 has sent waves through the global economy and has influenced businesses from all industries [...]

divider
People Moves
Former Worldpay executive, Shane Happach, appointed Mollie CEO

Former Worldpay executive, Shane Happach, appointed Mollie CEO

2 March 2021 / New Mollie CEO Happach, who will succeed Gaston Aussems, previously spent 10 years in leadership [...]

Information Age

Pin It on Pinterest