MySpace taps big data for turnaround
As visitor traffic wanes, the social networking site has turned to massively parallel database analytics to get inside the minds of its members
MySpace was one of the first social networks to penetrate mainstream consciousness, and between 2006 and 2008 it had more members than any other. In July 2005, the site was acquired by News International for $580 million, and celebrated its 100 millionth member soon after.
Since then, however, MySpace has been rather eclipsed by rival Facebook. According to web traffic analysis site Compete.com, Facebook attracts around 120 million unique visitors per month and is still growing, while MySpace’s monthly figure of around 70 million visitors per month is gradually falling.
Back in 2008, when Facebook first overtook its visitor traffic, MySpace decided to focus on what made it unique – the number of bands and singers that use the site to promote themselves and the fact that visitors therefore use it to discover new music.
That places discovery – the ability for users to find music they don’t know but may like based on their existing interests – at the heart of its comeback strategy. “Facebook is about people you already know; LinkedIn is about your professional contacts,” says Don Watters, MySpace's chief data architect. “But MySpace is about finding people and music that you may not know yet. Our motto is ‘Discover and be discovered’.”
To enhance the potential for discovery on its site, MySpace had to become better acquainted with the tastes and behavioural patterns of individual members. Analysing the precise tastes and social connections of each user requires vast quantities of data – it’s not good enough to take a sample of the data and to generalise the findings to all users. “It’s incredibly important that we are not just looking at a set of the data,” says Watters.
At the time, MySpace’s data warehouse was not up to the task. “It had been built when we were growing like crazy, and the main thing was keeping the servers up,” says Watters.
In its search for new data warehousing technology, the company evaluated technology from vendors including Teradata and Netezza. However, says Watters, “we didn’t think any of it could scale according to our needs.”
Instead it turned to Aster Data, whose “massively parallel” database technology is based on Google’s MapReduce distributed analytics engine, and therefore operates on clusters of cheap, commodity hardware. That, Watters explains, means that the cluster can be scaled up cost effectively and in a granular fashion.
Today, MySpace’s data warehousing cluster consists of 120 servers and contains 190 terabytes of data. As well as supporting the music and content recommendations on the site, the deployment has allowed MySpace
to introduce such functionality as audience analytics for bands and their record labels.
Clearly, however, it has not yet been enough to stem the gradual decline in MySpace’s visiting figures. And while MySpace has so far managed to generate greater advertising revenue than Facebook despite having fewer visitors, that too is reported to be on the wane.
It is nevertheless a good idea to keep one eye on the giants of the web as they wrestle with their data problems, because there is every chance that mainstream businesses will be faced with similar challenges before long.