The Internet has changed advertising from an art (of sorts) to a science. The digital nature of online advertising allows companies to analyse what works and what does not in ever increasing detail.
Struq is a business built on the analysis of advertising data. The UK-based company, founded in 2008, has its roots in ‘retargetting’ – recording an individuals behaviour on a website, e.g. what products they have looked at, and using that to serve ads they are likely to respond to. More recently, it has developed ‘ad personalisation’ technology that analyses many more factors to chose appropriate ads.
Its customers include high street retailers such as Debenhams and TopShop as well as big name brands including Nike and Adidas.
Forward Internet Group applies data science to consumer choice
Behind the simple act of serving an appropriate ad lies a complex but instantaneous technical operation. When an individual clicks on a website, the opportunity to serve an ad is auctioned via advertising networks such as Google Ad Network, AppNexus or Rubicon.
If Struq ‘knows’ the individual, based on their past interactions with its customers, it will calculate how much serving an ad would be worth, and what content it should display. The service makes an offer for the ad, and if it is the highest bid, its ad will be served.
All this takes place as the web page is rendering, so Struq’s systems must operate at great spread. "We need to make sure that 98% of requests are processed within 100ms", explains VP for engineering Aaron McKee.
The decision which ad to serve and how much to offer is made instantly by Struq’s web facing systems, but that decision is informed by ongoing statistical analysis of the vast quantities of data the company collects.
"We have around 2TB of raw event data coming in every day," says McKee. "Our probability engine analyses up to 10,000 features for every online transaction – the time of day, whether the individual was using an iPad, what the weather was like etc – to find the optimal advert for every individual."
In the past, data was stored for analysis in a SQL Server database but the price-to-performance ratio proved too great as Struq’s data volumes increased. Last year, it moved its data warehouse infrastructure to big data platform Hadoop and Hive, a system that allows SQL-like queries using Hadoop.
"Calculating a probability matrix that took 27 hours with SQL is now taking about an hour using a cluster of Hive servers," says McKee. "It has been utterly brilliant for our business, as it means we can ask much more interesting questions of the data without worrying that it will block up the system."
SQL Server was also incapable of supporting the response times Struq needed to make snap decisions about which content to serve. Back in 2010, it adopted MongoDB, a non-relational database that McKee says supports far lower response times that SQL Server could offer.
Rapid physical provisioning
The speed at which Struq’s systems must operate also impacts its hardware selection. For example, it uses only solid state drives (SSDs) – which perform faster than conventional hard drives – to support its live systems. "For things like the MongoDB instance, where we need massively fast random access, SSDs have been a livesaver."
It also means that it cannot use virtual servers to support operational systems, says McKee. "Virtualised instances cannot deliver the performance we require," he says. "You just don’t get the level of [input / output] performance we need."
Nevertheless, the company requires the flexible scalability associated with virtualisation and cloud computing services. Although it can predict the volume of traffic for a given customer campaign quite accurately, it may need to onboard a new customer or launch a new campaign at the drop of a hat. It therefore needs to be able to provision new infrastructure as quickly as possible. "One of the sales guys might land a massive new campaign for a client, so he’ll need new servers in two days," explains McKee.
Until January this year, Struq’s infrastructure was hosted in a co-location facility operated by Rackspace. Unfortunately, Rackspace could only provision new physical servers in ten days, says McKee, severely limiting Struq’s ability to meet customer demand. "That ten day window was killing our business," he says.
Struq therefore migrated its IT infrastructure to hosting provider SoftLayer’s Amsterdam facility. According to McKee, SoftLayer offers much more immediate provisioning of physical equipment. "We’re able to get bare metal devices up in about two hours, ordering it directly through their portal," he says.
That speed of provisioning will come in handy as Struq uses its recent $8.5 million venture capital investment to expand its operations in the US, where SoftLayer has 11 data centres. “I have every confidence I’ll be able to deploy an East coast US and a West coast US data centre in about 12 hours,” says McKee.