While big data projects may seem a complicated beast to many IT professionals, working on an Internet of Things (IoT) project will likely make the former seem simple.
The sheer velocity required for IoT projects is immense. The term “expanding velocity demands” is likely familiar to those who have worked on big data projects, being brought up in relation to data storage and a system’s ability to handle the increasing influx of data.
Entire architectures and technologies, Hadoop for example, have been created in response, enabling real-time storage of large data volumes.
When working on an IoT project, however, organisations need to keep in mind not only real-time storage requirements, but also the crucial need to enable real-time analysis and decision-making.
The velocity and volume of IoT data will make the current big data examples look pale. For example, Twitter is often mentioned as a source of big data, as the number of tweets during the day can reach hundreds of millions.
In contrast, for IoT, companies need to be able to ingest hundreds of thousands, or even millions, of events per second from their devices.
These organisations are looking for examples and use cases in the area of predictive maintenance and superior servitisation. This means they are aiming to architect for real-time predictive analytics and the ability to trigger the processes within seconds after certain critical patterns have been detected.
One big difference between big data and IoT projects is time. While in big data projects it is perfectly normal for data to rest before it is used in any kind of analysis, in any IoT project time is of the absolute essence.
IDC researcher John Gantz indicated that the IoT solutions organisations are likely to build will demand a decision from within one minute after detecting the situations the systems were designed to look for.
In order to make it even more complicated, there are a number of considerations when it comes to velocity.
The first issue is that data coming from the devices is often in a raw and simple format, but in order to be of any use in the more analytical decision models the data needs to be organised, transformed and enriched.
Organised refers to the issue that the data might arrive out of order for analysis purposes, meaning that the data has to be re-shuffled on the fly.
Transformation points to the fact that the analytical models decisions rely often don’t need the raw data from the device but rather the derived values.
If an organisation is looking to calculate the normal range for a certain time series dataset, it might want to do a Bollinger band calculation. However the Bollinger band might need an Exponential Weighed Time Based Moving Average (EWMA).
This EWMA can be seen as a derived value stream, constantly re-calculating itself with the arrival of every new event.
Enrichment is necessary if decision models need not just data from the devices but also data from enterprise sources – for example, which service level was agreed upon for this device? When was the last known maintenance? What historical reference data can be applied? Constantly retrieving this information might bring most enterprise applications to its knees.
In order to support these three capabilities in real time, organisations need advanced integration, analysis and in-memory caching capabilities. Forrester calls this technology domain ‘streaming analytics’.
It is fair to say that if, from a volume aspect, IoT might look like big data in disguise, then from a velocity viewpoint IoT is big data on steroids.
Sourced from Bart Schouw, director of IoT, Software AG