Oscar Wilde once said: “The truth is rarely pure and never simple”, a phrase that holds even greater significance in the new big data world, where in theory, all the answers to life’s questions should be within our grasp. Yet, is the deluge of data bringing us closer to the truth, or obscuring it from CIO’s vision?
The pervasive growth in data analytics, across a variety of industries, holds the promise of unlocking significant insights on a wide array of important topics.
However, this creates an interesting phenomenon: as businesses increasingly use data analytics, they may actually distance themselves from “the truth” by creating more questions than they answer. This is not to say that the data is inaccurate, just that, at times, data may provide only part of the perspective.
Whose truth should be believed?
Consider the analogy of criminal witnesses: in some cases, there may only be a single eyewitness to a criminal act, and an alleged offender could be convicted based on that eyewitness testimony alone.
The press is full of stories of injustice, where the person convicted by the single witness evidence is eventually proven to be innocent.
The situation is not totally solved when the court has access to multiple witnesses, as they have often seen that each witness offers their version of the facts, but each version differs in some material way.
Both of these cases highlight that facts as offered by a data source can be entirely accurate from that source’s perspective, but may not represent the whole truth.
The same challenges exist for enterprises as they seek to use data analytics in order to truly understand a particular facet of their business.
Big data is often described as being the combination of three Vs: volume, velocity, and variety. Much is written about and many supplier claims are based around capabilities to handle volume and velocity: yet the variety team is less vocal. But this does not diminish its importance, especially when considering the idea of seeking truth.
Gathering the evidence
As in the crime witness analogy, there are two aspects to consider regarding variety in data analytics.
The first is that variety is a necessary component to achieving real insights…. in the case of data sources, more is more. Accurate conclusions are difficult to reach when only one data source is being collected and analysed.
An effective data analytics strategy is one that takes into account multiple different types of data streams, from a large array of data sources, with a variety of data schemes and formats. The technology in place must be able to quickly take this cacophony of information and transform it into a consistent set of usable data from which further analysis can be performed.
The second aspect is that even when there are multiple data sources, it is not just a matter of putting the discrete pieces together and the truth magically appears. The art that is added to science in this endeavor is that ability to efficiently and accurately combine, correlate, and analyse the distinct data elements and extract relevant information from extraneous data.
A detective must take the combined witness testimony, along with physical evidence from the crime, as well as years of experience that provide the intangible “gut feel” in order to weave together the truth about the crime.
Data analytics platforms must be capable of doing the same thing by taking multiple sources of information, some of which may be streaming in on a constant basis and some of which may be housed in other data repositories.
Combined with this must be a practical knowledge of the industry and the attributes under evaluation in order to add the real life perspective that goes beyond just the numbers.
In addition, to be most beneficial, the analysis will build and become more efficient over time. The good detective does not just look at the current case, but tries to apply all the prior knowledge and experience gained in other investigations to hone in more quickly on what is most likely in the current instance.
As seen frequently in TV crime dramas, there are “profilers” who attempt to take patterns of behavior and methods from previous cases, and apply them to current cases in order to narrow down the suspects and anticipate what might be the criminal’s next action.
Efficient use of data analytics tools can perform exactly the same function. Data analytics should not be a single point event but should be a dynamic environment with prior knowledge and insights improving future analysis, thus creating a machine-learning environment.
Over time, as analysis and insights are developed, these can then be used to more clearly determine what data sources are most appropriate and how to fuse these sources in order to create increasingly relevant outputs.
These enhancements can be codified and used to consistently improve the overall workflow process so that the learning becomes institutionalised.
The truth can be obtained
As with solving complex crimes, the truth is often elusive: the process to reaching a conclusion is neither a straight line nor easy.
Effective data analytics at any level requires the basic technical ability to handle a large and ever increasing flow of data.
Beyond this, businesses need the ability to capture and analyse data from multiple sources, each of which on its own provides a piece of the story but only when combined together can a full picture be generated.
The process of performing this requires the challenging combination of skills in the science of data analytics along with the art of understanding the industry and issue under evaluation.
Therefore, to turn information into truth requires the effective application of the third “V”, variety, in order to get a truly well rounded perspective from which accurate operational decisions can be made.
Sourced from Rob Chimsky, VP Insights, Guavus