There is a new trend emerging of marrying open source and enterprise software – and with harmonious outcomes. Businesses are benefiting by combining all the benefits of open source such as community effort, security and openness, with the IP, superior support and accountability that comes with enterprise propositions.
Open Source is en vogue and undoubtedly transformational, but enterprise software is still incredibly important in picking up the slack in areas open source has not reached or is behind the curve.
Hadoop is a fantastic example of open source done well and has become almost synonymous with big data analytics. It is a popular framework (some would say the framework) that allows businesses to store, work with and query large data sets.
Hadoop has its origins in an open-source search system called Nutch. In around 2003 Nutch was struggling with the complexities of scaling the system to search billions of pages, when Google released two whitepapers describing its own massively scalable file system and an algorithm it used for search called MapReduce.
The open-source community behind Nutch was consequently able to implement its own versions of the Google technologies and successfully scaled their search engine as a result.
However the Nutch team realised its technology could do much more than just search – it had scalability and versatility – and to reflect these new possibilities they decided to find a new name for the project. The project’s original creator, Doug Cutting, suggested the name Hadoop after his daughter’s stuffed yellow elephant.
So Hadoop makes sense. It’s got some real strengths, but it also has pitfalls.
As the pace of life, and indeed business speeds up, Hadoop falls short of delivering ‘real-time’ insights needed to drive business success.
The Map-Reduce batch jobs take time. Hadoop works best with long-running batch jobs – a 20 second start-up time on a 5 hour batch run is immaterial, but a 20 second start-up time on a 5 second query is a serious disadvantage. Hadoop really is not the right technology for real-time analysis.
This is where enterprise software can pick up the slack. By partnering open-source software, like Hadoop, with the dedicated support and effectiveness of robust, enterprise MPP (Massively Parallel Processor) in-memory databases, businesses can create a super system.
Like all strong relationships, the strengths and weaknesses of the two within it work well together. By providing HDFS plugins and connectors, the latest breed of MPP databases can use all the power of Hadoop for data storage and fully capitalise on the ability to query data in real time using familiar business intelligence tools or SQL.
We are seeing a number of customers take this road. A blending of open source and enterprise enables companies to run real-time business intelligence tools quickly and effectively on top of their Hadoop cluster.
A leading, international mobile and casual games provider, for example, found a mix of the two was the only way it could keep on top of its ever-increasing game data. This listed company was using Hadoop for all its analytics but this was just not working for them – queries were taking a long time to process and the tools to analyse the data were crude. Data is the lifeblood of their business, so being able to analyse this effectively, in real time, was critical to its operation. Using open source alone simply was not working.
Hadoop may take care of data storage and large scale batch processing, but without the real-time performance of an MPP in-memory enterprise database – and the steadying hand of its accompanying, robust technical support – Hadoop wildly swims in a sea of data, with no real grounding in the here and now.
A coupling with enterprise makes Hadoop a smarter, quicker, much friendlier beast, and businesses will undoubtedly have to marry the two if they want to remain agile and responsive to the demands on them.
Sourced from Aaron Auld, CEO, EXASOL