Any time we interact digitally with anyone or anything, we generate scads of data.
The global internet population is now estimated to be well over 3.2 billion people, a number that continues to grow, as access to connectivity continues to increase.
Physical items like home thermostats and pacemakers are generating data. That doesn’t even begin to touch the amount of log data generated everyday by companies across the world.
By 2020, experts estimate that there will be 5,200 GB of data for every person on Earth.
With these numbers exponentially increasing as we continue down the road to digitisation, it’s clear that we’ve only begun to scratch the surface of what can be accomplished with a holistic view of all of that data.
However, the sheer volume of that data and its rapid growth can often overwhelm our ability to process the information.
To take advantage of the new insights possible from all types of data, companies have to find a way to staunch the flood.
Incoming data pools quickly in an organisation’s data lake, and just like a real lake that is fed by rivers and streams, a data lake is also fed by data rivers and data streams (Binaries, flat files, Sybase, Oracle, MSSQL, etc.).
In nature, when heavy rains fall or a waterway becomes choked, the river can quickly overflow its banks and wreak considerable mayhem and damage on the surrounding ecosystem.
The same thing happens with data.
When you have data coming in faster than you can read, process and analyse, the surrounding environment can quickly become encumbered or disrupted in the form of storage exhaustion, business intelligence misinformation, application development delays and production outages.
The same effects occur when constraints like ticketing system handoff delays between departments; the inability to quickly refresh full data sets; or a cumbersome data rewind processes restrict your data flow.
>See also: How big data is changing business innovation
The issue is compounded as organisations increasingly rely on a nimble IT department following a DevOps model of application deployment.
With pressure to constantly update and improve application performance, IT teams will have up to 10 non-production tributaries for every production data river.
These are regularly used for Developers, Testing, QA, Staging, Training and Business Intelligence initiatives.
The ebbs and flows of data are going to come and are often influenced by external factors beyond our control.
The best you can do is be prepared and agile enough to adapt.
You must learn to swim.
Jumping into the Big Data Lake
Traditional approaches to data integration such as ESBs, EAIs and ETL came of age when ERPs were the heavyweights of enterprise software and they were well suited for the needs of its time.
Large installation footprints, heavy reliance on network processing, and use of older languages and protocols cause them to fall short at a time when agility, above all else, is required to accommodate an ever-expanding number of data sources.
Maintaining drowning middleware implementations becomes a failing tactic, and organisations need to begin migrating to more modern, cloud-based models, like data virtualisation.
>See also: How to measure the value of big data
Just because Big Data is bigger than anything before it, doesn’t mean it has to be so unwieldy.
The multiple sets of production environments could very quickly become exorbitantly expensive, but losing the granularity on that data would be detrimental to the many different teams within a company that all need copies.
Much in the same way as desktop virtualisation eases the strain of PC performance caused by heavy application workloads, data virtualisation alleviates the mounting tension of provisioning and large volumes of data places on network and storage systems.
By virtualising non-production data sources, an organisation can increase its storage capacity by up to 90%, enabling engineers to develop "flood control systems" up to two times faster to quickly adapt to changing needs.
By allowing an individual's access to full virtual environments, an organisation ensures that its engineers will know exactly how its systems will behave when called upon in live applications, and won’t find itself drowning in untested data.
Big Data, Big Threats
A fundamental shift in the types of data we have at our disposal is coming.
Non-traditional data sources such as voice and video will expand us beyond text-based data, and we’ll be able to glean more data from communication channels such as Slack, Yammer, Messenger, and countless social media sites.
This level of personalisation in the data we have access to further expands the possibilities for insightful business decisions, but it also drastically increases an organisation’s the data security risk.
The value of big data increases with the amount of data provisioned on to the platform and centralising all of this information on a single platform inherently increases the risk to that data.
Large scale hacks of personal data have crippled massive organisations such as Target and Home Depot in recent years, and they can have an even more damaging effect on smaller businesses who are unable to bounce back quickly from such attacks.
One such way to help prevent the leak of sensitive customer information is through data masking – a process that hides the exact figures in a dataset and replaces it with synthetic but comparable data.
This process allows IT teams to operate under the guise of “business as usual” without taking sensitive information out of its secure environment and into a place where cyber-criminals have access through a system’s vulnerabilities.
Sink or Swim
There’s no question that a flood of data is coming. Organisations must find tools that will help them brave the storm.
Sourced By Adam Bowen, Worldwide Field Innovation Lead, Delphix