UK-born RainStor applies military-grade compression to big data

In the early 2000s, the UK’s military development agency DERA began work on compression technology that would allow soldiers to crunch data from war game exercises on their PCs.

The problem was that those soldiers needed to analyse the data quickly. That meant it had to be loaded it into memory, which at the time was extremely expensive. The project therefore sought to compress data in a fashion that was extremely efficient but that did not slow down retrieval.

The compression algorithms the DERA researchers developed are based on the idea of creating a network, or graph, of values. Every concept that appears in a database is represented once. Records in which two or more values are combined – such as a location and a time – are represented simply as a connection between those values.

This means highly repetitive databases can be shrunk down to a tiny fraction of their original size.

The DERA project also devised a novel approach to data retrieval. To guide database queries to their desired records, relational databases include an index, a supplementary structure that gives each item an identifier. However, this adds to the storage required to contain the data.

Instead, the DERA database split the network of values into separate files, and analyses the contents of those files. The resulting statistics means that a query can tell whether or not the desired record is in a given file without having to search through it.

 

"You can’t go to a bank with excuses for why you lost their data. There is no excuse."
John Bantleman
RainStor

 

The project was a success but when part of DERA was privatised to become Qinetiq, non core-military projects were released to fend for themselves. The researchers who had developed the technology worked for few years as a consultancy, applying the intellectual property on a bespoke basis. 

Eventually, though, the company decided to commercialise. Chairman John Bantleman, a seasoned software executive, took over as CEO, and RainStor set about turning its IP into a product. 

The result, Bantleman says, is a database designed for very large datasets that delivers high levels of compression without sacrificing retrieval speed. 

Hyper compression

The compression rates that Bantleman claims RainStor can achieve (unverified by Information Age) are staggering. 

“One customer came to us with 10Tb Oracle database,” he says. “The first thing we do is remove all the stuff that Oracle puts in to make it accessible and queryable, like the index. That took it down to 2.5Tb.

“Then we compressed it, which took it down to 116Gb.”

RainStor is decidedly not a transactional database, of the kind that might used to support an ERP application, says Bantleman. Instead, it is best used as a highly accessible archive of historical data.

“A very common use case for RainStor is as a long term archive against a data warehouse,” he explains. “Data warehouses, from the likes of IBM and Teradata, are great but they’re expensive.

“We have customers whose data is growing 50% a year, so they were having to move their historical data onto tape because it’s uneconomical to store in the data warehouse,” he explains.

Moving that archive data onto RainStor, he claims, changes the economics of data storage. “It becomes available, accessible and cost efficient.”

RainStor is now targeting the finance and telecommunications sectors, both of which have huge, repetitive datasets that are growing at a rate of knots.

The company, which has so far raised a total of $23.5 million in investment, has moved its headquarters to California. Engineering is still based in Gloucester, though, and Bantleman says he has no intention of moving it.

For a company whose technology is ten years old, RainStor is remarkably “on trend”. Bantleman says that he sees a huge number of start-ups trying to jump on the “big data bandwagon”.

What will separate the men from the boys, he says, is their ability to serve enterprise-grade customers.

“It’s one thing to sell to a few start-ups who are basically testing your stuff for you,” he says. “Being used in production by the largest and most demanding enterprises in the world is a whole different standard.”

“You can’t go to a bank with excuses for why you lost their data,” he says. “There is no excuse.”

Alan

Alan Dobie

Alan Dobie is assistant editor at Vitesse Media Plc. He has over 17 years of experience in the publishing industry and has held a number of senior writing, editing and sub-editing roles. Prior to his current...

Related Topics

Big Data
Storage