In the future, all new platforms for data will need be built on open-source software that facilitates the storage and managing of data in a single system – that was the prediction from Doug Cutting, co-founder of Hadoop and chief software architect at Cloudera, speaking to an audience of data professionals at the Strata conference in London that concluded this week.
In the past, businesses paid a fee again and again, often for the same software. Over time the price of the software would go up, and the whole business would be subject to the whims of the vendor. But people have grown wary of falling into that trap, argued Cutting, and have embraced open-source solutions.
As the amount of data generated by businesses exponentially increases, the opportunities around it are exploding in terms of both scale and variety – traditional data warehouses are not equipped to cost effectively handle this alone.
‘Hadoop is essentially the kernel for an OS for big data,’ continued Cutting in an interview with Information Age.
‘We’re really seeing a trend to towards people no longer keeping separate silos for data but putting them all in one. We call this the Enterprise Data Hub, and this is what we have been building towards. It is a functionality currently being used by advanced users of Cloudera, but in the next few years it will become the most common way for customers to approach big data.’
During their presentations, Cutting and his Cloudera colleagues outlined how Cloudera is expanding the scope of its software to serve as a hub for all of an enterprise’s data, taking over the workloads from traditional database management systems including data warehouses and document management systems.
Through the Enterprse Data Hub, data can be accessed by all and stored and managed in a single system, regardless of format or scale.
As Hadoop has matured, it has expanded its ecosystem and drastically improved in scalability, flexibility and security.
The Enterprise Data Hub is designed to aid the widening uses that people are finding for Hadoop outside of map reduce, with capabilities around machine learning and NoSQL key-value.
Cutting pointed to the expanding range of capabilities available with Hadoop and its ecosystem such as the ability to make interactive SQL queries with Cloudera Impala and enterprise search capabilities launched this year.
‘We will soon we support for streaming, in-memory databases, graph processing and many more types of workloads able to move to Hadoop,’ he said.
Projects like the vast globally distributed database Google Spanner demonstrated that coordinating vast amounts of data on a global scale can be possible.
‘If someone showed this can be done, it’s inevitable it will be added to our platform,’ said Cutting.
See also: The challenges of adopting Hadoop
Those at Hadoop and Cloudera argue that the Enterprise Data Hub concept will allow for more types of advanced analytics on an unprecedented scale in future.
‘The more you can bring your computation to your data, the less you move your data, the more effective your system will be, enabling you to maximise your hardware resources and data.’
Cutting believes that big data will come of age through this model, but it will take some time for enterprises to align their structures and infrastructures with their business ambitions.
‘It will take some time and it’s a long process,’ Cutting told Information Age. ‘Businesses are structured in certain ways, and Hadoop is a different way of thinking at high level and a fundamental change to IT infrastructure. Companies have a lot invested in legacy inrastructure and taking the leap is not easy.
‘But we’re seeing steady growth – Cloudera has doubled its size in terms of employees and revenues in the last few years, and we still feel like its early days as we start to get a foothold in a lot of enterprses.’