The IT industry’s favourite buzzword in 2011 was, without question, ‘big data’. And like most popular buzzwords, it was sufficiently vague for all manner of IT vendors to co-opt it to market their products.
Readers could be forgiven for a degree of scepticism about the term, used to describe the ever-growing volume and variety of data and the technologies on offer to process and analyse it, and the way that big data is being presented as a new and burning issue.
The volume and diversity of data has been growing ever since the first computer was switched on. Managing that data has always been one of the IT department’s primary responsibilities, and converting it into actionable insight has been on the IT agenda since the emergence of ‘management information systems’ in the 1980s.
Nevertheless, behind the buzz lie some important technological developments that IT executives should be aware of. These technologies allow more varied data to be analysed at far greater scale than traditional business intelligence tools.
“Big data means extremely scalable analytics,” Forrester Research analyst James Kobielus told Information Age in October. “It means analysing petabytes of structured and unstructured data at high velocity. That’s what everybody’s talking about.”
The best known big data technology is the application of massively parallel processing to analytics. The idea of splitting analytical workloads into smaller jobs that are then processed in parallel is not an intrinsically new idea.
However, the technology was advanced significantly by Google in the last decade when it unveiled MapReduce, a technique that it developed to process the unimaginable volumes of click data it collects, and by Yahoo!, when it built and open sourced a software framework for using MapReduce called Hadoop.
One measure of the anticipated demand for Hadoop is the number of IT vendors that launched products based on the framework in 2011.
In May, IBM launched an analytics product called BigInsights that includes Hadoop, and EMC announced integration for its Greenplum data warehouse appliance with the framework.
In June, Yahoo! spun off its Hadoop centre of excellence as a start-up named HortonWorks, and in October Microsoft announced a partnership with the new company to work on Hadoop-related products for its database platforms.
Also in October, systems giant Oracle unveiled a Big Data Appliance, a pre-integrated stack of hardware and software that includes a version of Hadoop. Meanwhile, Hadoop-based start-ups such as CloudEra and MapR continued to attract venture capital funding during the year.
Kobielus likens Hadoop today to open source operating system Linux in the 1990s, in that various different ‘distributions’ are competing for dominance. “It will take several years, but this market will eventually shake down to two or three leading Hadoop distros,” he said.
Hadoop is not the only technology that sits under the big data umbrella, however. Other examples include columnar databases, which organise data by columns instead of rows, which lend themselves to analytical data warehousing and compression.
In February 2011, Hewlett-Packard rebooted its business intelligence strategy by acquiring columnar database technology start-up Vertica. Meanwhile, SAP made HANA, the in-memory, columnar database platform based on technology it acquired along with Sybase in 2010, the centrepiece of its innovation strategy.
Another big data-related database technology is the non-relational database structure. Loosely termed NoSQL, this class of database technologies rejects the conventional relational design in order to achieve improved performance for high volumes of data, and greater scalability.
So, there is now an abundance of big data technologies on offer. But why would businesses need it?
There are certainly use cases where traditional analytics technology just won’t cut it. When data is produced in a contrast stream over time, rather than periodically updating a single record, conventional database structures become unwieldy. Examples of this streaming data include smart meters, web clicks and GPS coordinates.
“This is data that can’t just be pushed into a structured repository,” Anjul Bhmabhri, IBM’s big data chief, told Information Age in April.
Andy Jones, co-founder of UK-based sensor manufacturer Ibexis, agrees. “Once sensor data has been written, it never changes – that’s always going to be the reading for that point in time,” he explained in September. “And there are vast quantities of it, so you need a system that allows you to write lots of things. Traditional databases can get overwhelmed very quickly.”
Ibexis uses Amazon.com’s cloud-based, non-relational database service SimpleDB to collate the data from its products.
With the adoption of GPS growing and the cost of sensors plummeting, many are predicting that streaming data is only going to become more abundant in future. For example, Michael Saylor, CEO of business intelligence vendor MicroStrategy, believes that the proliferation of mobile devices means that many business processes that are currently manual are about to become software-based.
“For every business process that has been automated, there are at least two or three that have not,” he says. As businesses use mobile devices to convert those processes, he believes, a whole new wave of data will be produced.
Much of that data will be of the unstructured, streaming variety that big data technologies are designed for.
Listening to the IT industry in 2011, however, you might have got the impression that the rise of this kind of data is an urgent issue that businesses are desperate to address as soon as possible.
This is not the impression given by Information Age’s reader survey, however. Just 7% of respondents have adopted big data technology, it found, and 11% plan to adopt it in 2012 – making it the least adopted technology in the survey.
There was a degree of agreement with the statement “The volume, velocity and variety of data we need to analyse is overwhelming [our current analytics tools” – 39% said that they either agreed somewhat or agreed strongly.
But when combined with the recent finding by EMC that 74% of organisations agree with the statement “We are able to quickly make decisions based on new data”, these figures hardly paint a picture of big data panic.
That means that while 2011 was the year that the industry got excited about big data, mainstream adoption is unlikely to materialise in 2012.