Tackling the historical data challenge

In the quest to make sense of data, companies have invested time and money in an array of technology solutions.

The likes of in-memory computing and traditional databases have proved successful in many areas.

However, some much-touted approaches are costly and inaccessible while others have yet to address the expanding set of historical information.

This has become a critical issue for an increasing number of vertical sectors, which are looking to successes in financial services and the trading landscape as a guide for a new way of thinking.

Because companies have access to data does not mean it is all automatically useful. And therein lies the first challenge – how to truly understand the data and, therefore, the real opportunities.

It is often an expanding collection of information that companies perhaps used to throw away and now store so they can mine it to benefit the business.

Making sense of complexity

To get meaningful results, companies need to engage with it, slice and dice it in many different ways, and work out which parts are worth using.

Considering the enormity of task, for many, this requires – above all – speed. If every time a statistician wants to try a different approach they have to wait until the next day for results, it’s going to turn into a long, arduous and, ultimately, ineffective process.

In the energy sector, there is a proliferation of data due to the increasing digitisation of electricity delivery systems.

Wireless transformers on the smart grid are now pumping out streams of data that did not previously exist – and if companies can understand it, they can deliver significant benefits.

Being able to measure the impact of extreme weather conditions, assess how equipment is being stressed and predict power surges means they can more accurately determine where and when to deliver voltage.

Data is also customising sales initiatives, such as in the pharmaceutical industry. With insights into trends, such as insurance claims and prescription filings pertaining to individual physicians, sales representatives can build accurate profiles and better target their products.

Data analytics is also helping the pharmaceutical industry to diagnose patients in real-time, leading to a more efficient experience for the entire diagnostic ecosystem.

Dealing with structure

A large proportion of big data, particularly the interesting and useful element, is structured.

Even when there is unstructured data, it needs to be collated so users can analyse and engage with it, rather than simply collect it.

Trading firms have long had to tackle this type of structured data coming in from the financial markets.

Trading data, in particular, arrives in microsecond bursts so users need to be able to handle this data in a simple and consistent way if they are going to be able to process billions of records effectively.

While financial services have led the way in dealing with this, other sectors are increasingly tackling the same issue. This includes the energy market, where much of the data has similar characteristics.

Most of this structured data also has a time series component – to millisecond, microsecond and sometimes nanosecond increments.

Therefore, in order to move beyond amassing endless information, companies need technology suited to manipulating and aggregating structured, time-stamped data sets.

One highly publicised approach to dealing with this structured and complex data is in-memory computing.

This delivers on both high performance and speed, allowing companies to analyse data dynamically and quickly.

There is certainly a place for in-memory computing and there are plenty of case studies proving its worth, most notably in financial services.

For many companies, however, it is not always the most feasible solution. And this is because of the high cost of keeping and running huge data sets in-memory.

This becomes even more of an issue when dealing with terabytes of historical or legacy data sets which, for many, is the main priority.

Increasingly, decision-makers are demanding greater intelligence around this key component in order to anticipate events, rather than react – to make more informed decisions.

The historical data issue

In the case of utilities companies, for example, weather data is vast. Keeping even a few months of this data in-memory can become prohibitively expensive.

They may end up with a monstrous solution that far outweighs the need. Often, what they actually require is a simple snapshot, which could be around a specific weather meter.

Relying purely on in-memory computing means there is a limitation on how dynamically the historical analysis can be integrated.

Businesses are further limited by the way they are being forced to make the best use of their existing computing infrastructure with pragmatic hybrid storage solutions.

If we consider that a year of NYSE TAQ tick-by-tick exchange data is around four terabytes, the reality of big data has already long arrived for some industries.

Hardware makers have been working to address storage limitations, and the era of many-terabyte machines is arriving on a wider scale sooner than people realise.

Even when the memory and storage problems are solved, those using more traditional database approaches to work with historical data will often struggle with speed.

Rather than perform intraday analytics, they have to resort to overnight batch processing.

The reason for this is the way they use technology to store and access data. It is not uncommon to have to extract millions of records from a database and move it to a separate process, which may even be on a separate machine, to further analyse.

This worked successfully when dealing with a few hundred thousand or a few million records – but now companies are looking at billions of records. Extracting this and analysing it in yet another program does not scale.

Traditional databases are incredibly reliable and ideally suited to tasks like keeping track of accounts or ensuring banks process ATM withdrawals efficiently.

However, they are often less equipped to deal with high performance analytics – simply because they were neither designed for nor built for that purpose.

A hybrid approach

One of the main lessons to learn from the financial sector is that being faster delivers a huge competitive advantage.

As other industries look to mine data for similar gains, companies are in a prime position to pick the best of all worlds.

With a more hybrid approach, companies can combine the high performance and speed capabilities of in-memory while solving the storage issues by putting the vast historical data sets on disk.

By bridging available technologies, companies can deliver on all counts – including cost.

Crucially, by folding in a high performance programming language right in with the data, users can interact directly with their data in one place.

This gives a super-charged in-memory and on-disk historical database at their fingertips and the ability to deliver results in speeds and complexity previously unavailable.

In a world where the most interesting information is also the trickiest to manage, this is without doubt the Holy Grail of data management.

Sourced from Simon Garland, chief strategist, Kx Systems

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and... More by Ben Rossi

Tackling the historical data challenge

Making sense of complexity

The historical data issue

Ben Rossi

Related Topics

Related Stories

Observability – everything you need to know

Why data isn’t the answer to everything

Two-thirds of UKI firms struggling with data insight costs

Qlik completes acquisition of Talend

Related Stories

Observability – everything you need to know

Why data isn’t the answer to everything

Two-thirds of UKI firms struggling with data insight costs

What generative AI means for business analytics