DataSift adds the benefit of hindsight to Twitter analytics

With over 300 million users sharing their thoughts and feelings in 140 characters, Twitter is arguably the single biggest repository of public opinion available to man. It is unsurprising, then, that businesses want to know what is being said on the microblogging service about their products, brands and marketing campaigns.

Today there are a great number of social media monitoring services available, and to provide their customers with Twitter analytics, many of them rely on UK-based start-up DataSift.

DataSift is a spin-off from TweetMeme, a Twitter search engine that was set up "when there were literally 20 people on Twitter", says founder and CTO Nick Halstead. Thanks to that early relationship, it is now one of only two companies licensed to access the Twitter firehose (the complete stream of all Tweets).

See also: How Forward Internet Group is getting to grips with big data and data science

The company has developed a scripting language – curated stream definition language (CDSL) – that allows users to search and analyse Tweets as they happen, based not only on keywords but links, sentiment, and the gender and social influence of the Tweeter. Customers are charged according to the volume and complexity of their queries.

To date, this has allowed to monitor streams of Twitter activity from the very recent past. DataSift’s searches only went back 30 days, while Twitter’s own search engine only stretches back for a week.

In February 2012, however, DataSift added historical Twitter searches stretching back to January 2010. According to Halstead, this greatly improves the power of Twitter analytics.

When a company launches a product or marketing campaign, he says, it is nigh on impossible to predict all the information they will want to know about the public’s reaction. "No one is clever enough to identify all the things that might happen in the future, so it’s impossible to set up all the right real time queries in advance."

With the benefit of hindsight, however, businesses will be able to analyse the reaction among Twitter users to any event, using as many metrics as they can imagine. "Say something triggers negative publicity for your company – this will allow you to go back and analyse where you failed to respond quickly enough, and how it spread," says Halstead. 

“Customer are absolutely desperate for this," he says. "There hasn’t been a single day in the last year when someone hasn’t asked me about historical data." 

Big data

If any system can accurately be described as "big data", it is DataSift’s new historical Twitter analytics. It sits on a Hadoop cluster with over half a petabyte’s worth of storage (500,000 GB). "The only people that are dealing with larger datasets than us are the financial trading firms," Halstead claims.

Customising Hadoop for DataSift’s purposes has involved wrestling with incredible complexity and some very immature technologies, he says. And yet the company’s ambition is to make its services as easy to use as possible.

One way that it does this is to identify significant events for customers to use in their analyses. "We currently have over 40,000 events that have happened on Twitter, ranging from IPOs to big cinema releases, and we’re adding more all the time" explains Halstead. "So if you had an advertising campaign running against a big sporting event, you can see what people were saying about your brand during the game."

And while Halstead says that any SQL programmer could pick up CDSL easily, the company is now working on a graphical interface to make it even easier. "We are doing everything we can make big data to easy to use," says Halstead.

Not everyone’s reaction to the launch of DataSift’s historical Twitter search was positive, however. Pressure group Privacy International, for example, saw the service as an invasion of the Tweeters’ privacy, despite the fact that Tweets are broadcasts publicly.

“As a Twitter user you expect that what you say will be accessible to others but you don’t expect it will be data mined,” argued Gus Hosein, Privacy International’s deputy director. “You don’t expect that your tweets over a two year period will be dissected to see your attitudes towards a company.”

The Daily Mail, meanwhile, was less considered: “Twitter secrets for sale” was its headline.

This response came as a surprise to Rob Bailey, DataSift’s CEO. He believes the fact that Twitter was reported to be ‘selling’ Tweets, erroneous in Bailey’s eyes, triggered an emotional reaction.

See also

Open data consultation finds widespread privacy fears

“We haven’t bought anything,” he told Information Age after the launch. “We have a resyndication license, and it specifies that our customers cannot display Tweets on public websites. This is all about profiling trends, not individuals.”

Still, by DataSift’s own calculations, its detractors were in the minority. Of all the Tweets about the launch of its historical data service, 9.3% were positive in sentiment, compared to 3.7% that were negative (the majority were neutral). 

Nevertheless, with the launch of its historical Twitter analysis, DataSift has proved two things: firstly, that big data analytics need not be the preserve of statisticians with PhDs, and secondly, that it can be a privacy minefield.

Alan Dobie

Alan Dobie is assistant editor at Vitesse Media Plc. He has over 17 years of experience in the publishing industry and has held a number of senior writing, editing and sub-editing roles. Prior to his current...

Related Topics