Managing challenges of scale, speed and personal information in the big data era

Worldwide, 2.5 quintillion bytes of data are created every day and, with the expansion of the Internet of Things, that pace is increasing. 90 percent of the current data in the world was generated in the last two years alone. When it comes to businesses, for a forward thinking, digitally transforming organisation, you’re going to be dealing with data. A lot of data. Big data.

While simply collecting lots of data presents comparatively few problems, most businesses run into two significant roadblocks in its use: extracting value and ensuring responsible handling of data to the standard required by data privacy legislation like GDPR. What most people don’t appreciate is the sheer size and complexity of the data sets that organisations have to store and the related IT effort, requiring teams of people working on processes to ensure that others can access the right data in the right way, when they need it, to drive essential business functions. All while ensuring personal information is treated appropriately.

The problem comes when you’ve got multiple teams around the world, all running to different beats, without synchronising. It’s a bit like different teams of builders, starting work independently, from different corners of a new house. If they have all got their own methods and bricks, then by the time they meet in the middle, their efforts won’t match up. It’s the same in the world of IT. If one team is successful, then all teams should be able to learn those lessons of best practice. Meanwhile, siloed behaviour can become “freeform development” where developers write code to suit a specific problem that their department is facing, without reference to similar or diverse problems that other departments may be experiencing.

In addition, often there simply aren’t enough builders going around to get these data projects turned around quickly, which can be a problem in the face of heightening business demand. In the scramble to get things done at the pace of modern business, at the very least there will be some duplication of effort, but there’s also a high chance of confusion and the foundations for future data storage and analysis won’t be firm. Creating a unified, standard approach to data processing is critical – as is finding a way to implement it with the lowest possible level of resource, at the fastest possible speeds.

Data management challenges and opportunities

Most people in a business environment have heard the expression ‘data is king’, but that expression may take on a whole new meaning in 2018. Read here

One of the ways businesses can organise data to meet both the needs for standardisation and flexibility is in a data vault environment. This data warehousing methodology is designed to bring together information from multiple different teams and systems into a centralised repository, providing a bedrock of information that teams can use to make decisions – it includes all of the data, all of the time, ensuring that no information is missed out of the process.

However, while a data vault design is a good architect’s drawing, it alone won’t get the whole house built. Developers can still code and build it manually over time but given its complexity they certainly cannot do this quickly, and potentially may not be able to do it in a way that can stand up to the scrutiny of data protection regulations like the GDPR. Building a data vault environment by hand, even using standard templates, can be incredibly laborious and potentially error prone. This is where data vault automation comes in, taking care of the 90% or so of an organisation’s data infrastructure that fits standardised templates and the stringent requirements that the data vault 2.0 methodology demands.

Data vault automation can lay out the core landscape of a data vault, as well as make use of reliable, consistent metadata to ensure information, including personal information, can be monitored both at its source and over time as records are changed. Meanwhile, in-house developer teams can focus their time and energy on the 5-10% of parts of the data warehouse environment which require a more bespoke approach – using wizard-driven development to combine individual expertise and the power of automation for more complex, or organisation-proprietary parts of the data landscape.

Cloudera chief Tom Reilly on the evolution of big data

Hadoop has become a core part of every single large enterprise for data management. Read here

The results of automating a data vault speak for themselves. Using this approach, a global insurance corporation can now build the data architecture in one week that used to take them six months, with the same number of people. A bank can now build in two hours what used to take them two weeks. In the era of big data, when data streams change frequently, and businesses want to be able to turn on a dime, this type of flexibility is critical for getting, and staying, ahead of the curve. In addition, automation ensures that personal data protection doesn’t become an after-thought in the race for business efficiency, building in metadata tracking as a core part of the data ingestion process.

The risk reduction and business mobility are obvious benefits for the organisation overall, but what’s in it for the IT team? It comes down to the time to value. If you’re boasting a two-hour turnaround for tasks that used to take weeks or months, you are providing invaluable flexibility in the fast-paced business world. Business users can rely on the IT team as an invaluable resource for delivering results, based on data handled responsibly, because they know that building together means building better.

Written by Dan Linstedt, the inventor of Data Vault modelling
Written by Dan Linstedt, the inventor of Data Vault modelling

Editor's Choice

Editor's Choice consists of the best articles written by third parties and selected by our editors. You can contact us at timothy.adler at

Related Topics

Big Data