The era of big data won’t materialise without fast data
Fast data must emerge from the shadows of big data if the true value of data is to be realised
Big data has become quite the buzzword as of late, and most people in the business world are at least somewhat familiar with the concept. Yet, relatively few are familiar with the term ‘fast data’.
But they should be – size, after all, is only part of the equation. With data coming in at increasing speed from a growing number of sources – including mobile and wearable devices, sensors attached to equipment (i.e. the Internet of Things), the cloud and social media – data has become more unwieldy than ever before.
Perhaps the reason that fast data has taken a backseat to big data in mainstream conversation is because the two are so conceptually intertwined that it’s assumed that one follows the other.
You might think that, as we’re able to collect more and more data, we’d also be able process in tandem. However, in practice, this is not the case. Yes, organisations are amassing larger and larger datasets from a growing number of sources. However, the ability to process this data has not kept pace.
Big data is only as good as fast data allows it to be
Think of it in terms of food production. We can now produce food in far greater quantities per capita than at any time in the past, yet our bodies can’t digest it any faster than they could thousands of years ago. Likewise, while big data may be a feast for analytics, just as the human body’s digestive capacity has natural limits, so do the disk-based data processing methods we’ve relied on for the past several decades.
In real terms, enterprises may no more stand to benefit from the ability to intake massive datasets than the human body benefits from eating an obscenely large meal. In fact, the result may actually be an adverse bloating, slowing effect more akin to how we feel after Christmas dinner.
Enterprises could spend years, even decades making sense of the information they’re collecting. However, the current business climate requires, above all else, agility, and in order to remain competitive, organisations must be able to make decisions at near-real-time speed. Furthermore, as applications are required to meet increasingly demanding SLAs, this requires the ability to process data as rapidly as it is being generated. With traditional computing, this isn’t possible.
The need to process exponentially growing datasets instantaneously will undoubtedly prompt innovations that haven’t even entered the big data discussion yet.
However, the basic technology to process massive amounts of data in real time is in fact already in place, and gaining significant awareness in the marketplace, particularly in industries such as financial services, in which even split-second latency can mean the difference between profit and loss on any given trade.
In-memory computing (IMC) for all intents and purposes is fast data in today’s technology landscape. Processing data where it resides, in the memory, affords computing which is hundreds of times faster than what can be accomplished with traditional processing methods.
Fortunately, the falling cost of DRAM – approximately 30% every 12 months – has made IMC an affordable option for just about any organisation. IMC, unlike methods that may be classified as forms of supercomputing, is distinct in that it can be performed on commodity hardware.
To be sure, more and more enterprises are investing in IMC, and the market continues to produce an expanding range of solutions. In fact, a recent survey within the financial services industry revealed that 58% of the respondents’ companies use in-memory technologies for real-time analytic applications, and 28% reported that they are used in a mission-critical capacity.
However, in order for IMC technologies to truly enable a world of fast data, a major shift will need to take place from in-memory point solutions, such as databases with in-memory as an add-on feature, to more strategic platform solutions that can be applied much more ubiquitously. Ultimately, IMC must be able to support all transactional, analytical and hybrid workloads across any application and data store, in order to change the velocity with which companies will be able to ingest and make sense of large amounts of data.
Organisations’ data infrastructures come in all shapes and sizes, and, for many, it has been built over the course of decades, often in a rather haphazard fashion. CIOs, CDOs and IT architects can’t afford to rebuild their infrastructures around IMC – the investment would be too high, and the process too disruptive to critical business functions.
Rather, IMC technology must be able to fit into and enhance what already exists to truly be feasible and maximise business value. In a sense, it must be as universally applicable as the commodity hardware that it runs on.
Additionally, it must be understood that as a new technology, even the most savvy organisations may not always know what to do with it. In fact, there’s probably no one single thought leader or cutting edge organisation that fully comprehends what can be accomplished when you can compute hundreds or even thousands of times faster. Developers, therefore, must be able to experiment with it without making a significant investment – something that can only be accomplished when software is freely available.
Against this background, perhaps one of the most significant developments for IMC in 2014 was the acceptance of the core GridGain code base into the Apache Incubator programme.
Apache Ignite, which is the project name for this code base in the Incubator programme, delivers IMC in a form that can be applied to a nearly universal range of applications and data stores. Like Hadoop and other open-source projects, Apache Ignite will soon be made available for download by anyone for free, and applied in any way the developer sees fit.
While Apache Ignite is still in its Incubation period under ASF, it is no understatement to say that IMC now ultimately resides in the hands of the people – anyone can access it, and everyone can experiment with it.
What will happen now that data which once took days to sort through can be processed within a few seconds? The possibilities are limitless, and those of us who have invested our careers in developing leading-edge technologies will no doubt watch the market with baited breath to see what comes next.
As fast data achieves the mainstream attention that big data has received, its contributions to industry, science and just about every other aspect of life will no doubt be just as profound.
Sourced from Abe Kleinfeld, CEO, GridGain