How to do big data in 2014

Proprietary vs. Open Source 

There seems to be a bit of debate brewing. Professor Thomas Davenport of Babson College contends enterprises looking to proprietary. Well, he's dead wrong. The future just isn't shaping up that way.

We see that big data is the first technology wave that's being led by open source innovation. Open source platforms have garnered billions of dollars in investments and commercial vendors and the ecosystems keep adopting open source and taking proprietary technologies and making them open source.

We don't agree with Thomas that buying supported versions of open source is 'proprietary' – the title he chose is misleading, and we do think that open source customers will pay for value through various services including support, subscriptions, integration, as well as building value on top.

Davenport said 'When they were using Hadoop, it was from vendors offering proprietary, supported versions, such as HortonWorks and Cloudera.'  HortonWorks offers an entirely open source, non-proprietary version and we see most adoption of Cloudera's open source, non-proprietary distribution as well. 

> See also: The silicon dream: Mike Olson tells the story of Cloudera

We see platform technologies moving decisively to open source. For tools, components, and applications, we see a more mixed story. While open source BI offerings like Pentaho are improving, there's still more adoption of commercial BI tools like Tableau.

Platforms win by having widespread adoption and the speed of adoption of open source is unbeatable, especially in a context where customers expect technology to be open source and look to that as a key risk mitigation factor – which is why you keep seeing commercial big data technologies get open sourced, such as Hadoop, HBase, Impala and others. 


The most important things to consider when starting out are: what types of datasets do I have access to, which projects are feasible and which one will provide the most strategic value the fastest? There are three 'Must Do’s' for any organisation at any stage of big data adoption:

Test and learn

An agile approach with rapid releases enables organizations to fine-tune their projects while they’re in progress. Traditional legacy systems were better suited to a 'waterfall' approach, where technology was introduced all at once.

Big data projects should focus on specific business goals and allow cross-pollination of ideas to better understand what’s possible, making a 'rapid release' approach much better.

Incremental adoption

Build a center of competency and cross-pollinate expertise among business experts, data scientists, and data engineers. This enables business units to leverage a common talent pool and a shared approach, eliminating the risk of data silos, providing for common governance, and avoiding redundant storage and processing by different departments.

> See also: Big data won't be mature for at least five years, Gartner predicts 

Change management

Think about key stakeholders for the initiative, understand their concerns, get their buy in and invest in early pilot systems that demonstrate the value that can be generated through a big data investment.

Big data represents a major opportunity for the enterprise. Projects take time but by making the right choices can get value quickly and avoid pitfalls. By moving forward thoughtfully and with determination, you have the opportunity to gain significant advantages.

Roadblocks and risk 

The space is becoming overly crowded with vendors who over promise on capabilities and simplicity and under deliver on both. Point tools are also dangerous because they often can't standalone without a lot of integration with data and infrastructure, so look out for point tools.

The greatest pitfall/mistake is thinking big data is a one and done — if you buy this big data platform, it will do magic. That's just not the case. Big data projects need to start small so value can be extracted and business sponsors can be engaged. 

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics

Big Data