Big noise about big data

The term ‘big data’ was already in wide circulation in 2011, but in 2012 it became ubiquitous. So much so, in fact, that a US public broadcaster, NPR, nominated it for the ‘word of the year’.

Like many of the broad terms that dominate the technology discourse, big data means different things in different contexts – perhaps explaining its pervasiveness.

In IT circles, the phrase is often used to describe data whose ‘volume, velocity and variety’ mean that conventional, relational database systems are not up to the task of managing, processing and analysing it.

To begin with, the term was usually used to explain the use case for Hadoop, an open source framework for developing distributed data processing systems, but since then a pantheon of other big data technologies has emerged.

IT vendors of course see the currency of the buzzword as an opportunity to market their products, some more convincingly than others. Storage giant EMC has been particularly vocal on big data – it’s marketing strategy in 2012 included buying ads bearing the term on the side of London taxis. And little wonder: all this ‘big data’ has to be stored somewhere.

Growing awareness

In the wider culture, though, ‘big data’ has become a general term to describe the use of data and analytics in business, politics and society. This is nothing new, of course, but the popularity of the term outside

the IT sector indicates growing public awareness of it.

The US presidential race, for example, was described as the first “big data election”, as both candidates made extensive use of voter and consumer data to profile potential supporters and direct their campaign resources accordingly (although the pioneer of this approach is often said to be George Bush’s campaign manager, Karl Rove).

Meanwhile, Facebook’s IPO focused the public’s attention on the business model of many of the giants of the web – accumulating as much personal data as possible with which to target advertisements.

So when discussing big data, therefore, it may be useful to consider these two meanings as separate, albeit closely linked, trends: firstly, developments in data management technologies that allow organisations to process and analyse more and more varied data than they could do previously, and secondly, a growing recognition of the potential uses of data in business.

In October, analyst company Gartner published a ‘big data’ market forecast which acknowledged that the term is now so broad that it can be said to impact almost every facet of the IT industry. “Big data is not a distinct, stand-alone market, but it represents an industrywide market force which must be addressed in products, practices and solution delivery,” said research VP Mark Beyer at the time.

The analyst company estimated that big data would ‘drive’ $28 billion of IT spending worldwide in 2012 – in other words, that much IT investment would have something to do with big data – and that the figure would rise 21% to $34 billion in 2013.

Big data technologies have had most impact so far on the fields of social media analysis and content analysis, Gartner found, but they will eventually find their way into information management tools ranging from storage software to business intelligence, it said.

“Because big data’s effects are pervasive, big data will evolve to become a standardised requirement in leading information architectural practices, forcing older practices and technology into early obsolescence,” said Beyer. “As a result, big data will once again become ‘just data’ by 2020 and architectural approaches, infrastructure and hardware/software that does not adapt to this ‘new normal’ will be retired.”

In other words, CIOs should expect big data technologies to appear on the product roadmaps of all their information management suppliers, and will need to make architectural decisions accordingly. That may mean recruiting or training staff with the skills to work with non-relational databases and distributed storage architectures. In return, those information management systems will have the capacity to process more data, faster.

The other side of the big data coin – that businesses will increasingly see the accumulation and exploitation of data as a possible solution to their problems – is more interesting.

It is not as though businesses are only beginning to realise the value of analysis. Enterprise organisations starting building ‘executive information systems’ in the 1980s to present useful analysis to top management, while lower down the ranks employees have been using Microsoft Excel for years.

What may be different about the current excitement about data analysis is the recognition among non-IT staff of the value of sophisticated statistical analysis to help make business decisions.

This is arguably a separate category of analytics from business intelligence, in which preprepared reports showing historical data give employees a backward-looking view of what has happened. Instead, with more sophisticated statistical analysis, decision makers can be presented with predictions for future events.

Achieving this is arguably more of a personnel problem than a technological one, as the mathematical ability and domain knowledge of analyst it a greater predictor of success than whatever technology they happen to be using.

In 2012, the term ‘data scientist‘ became almost as pervasive as big data itself. Again, there is no agreed definition of this term, but it is generally understood to mean a statistician or analyst with some knowledge of data management technologies.

‘Data scientists’ were apparently in high demand in 2012, although historical comparisons to a time before the term was invented are meaningless. One company Information Age spoke to reported that they wanted to hire analysts with programming skills to close the communication gap that often occurs between statisticians and developers.

It is up for debate whether these ‘data scientists’ will be part of the IT department – it depends on each organisation’s understanding of the role. But there is clearly an opportunity for CIOs to bring their experience of data and information management to bear in their company’s wider data and analytics strategy.

There is another way that organisations may benefit from big data analytics, and that is through third party information service providers. One example is DataSift, a UK start-up that has a license to analyse the raw feed of posts from Twitter. It operates what it claims is one of the largest Hadoop cluster’s in Europe, but allows users to analyse its database using a simple SQL-like query language.

In October, European mobile telco (owner of O2), launched what it calls a ‘big data business unit’, offering companies and government organisations insights into their customers and citizens gleaned from the telco’s own data sets. It’s first offering will be ‘Smart Steps’, a service that shows the footfall of mobile users through their premises.

Services such as these may allow organisations to glean the ‘big data’ insights they need without hiring a data scientists or deploying their own Hadoop cluster. So under the banner of big data lie important developments in technology, culture, business management, and IT service provision.

There is no denying that these developments are important, but whether it is useful to wrap them all into a single, ineloquent buzzword in highly questionable. It will be interesting to see whether the term lasts another 12 months.

Pete Swabey

Pete was Editor of Information Age and head of technology research for Vitesse Media plc from 2005 to 2013, before moving on to be Senior Editor and then Editorial Director at The Economist Intelligence... More by Pete Swabey

Growing awareness

Pete Swabey

Related Topics

Related Stories

Observability – everything you need to know

Why data isn’t the answer to everything

Two-thirds of UKI firms struggling with data insight costs

Qlik completes acquisition of Talend

Related Stories

Observability – everything you need to know

Why data isn’t the answer to everything

Two-thirds of UKI firms struggling with data insight costs

What generative AI means for business analytics