How can you secure big data in the information age?

The IDC predicts continued double-digit growth for big data and business analytics through 2020 which will be used to handle this ever-growing amount of data.

However, some of the big data that is stored and processed will undoubtedly be sensitive and will therefore need to be protected. So there are things that need to be considered such as; are the environment and the data vulnerable to cyber threats and who has access to the data? And there’s also the issue of compliance.

Big data deployments are subject to the same compliance mandates (e.g., GDPR, HIPAA, PCI, and SOX) and require the same protection against breaches as traditional databases and their associated applications and infrastructure.

>See also: Big data vs. privacy: the big balancing act

All the best practices for data security are still applicable for big data environments. The problem is how to achieve security and compliance for big data environments given the unique challenges they present.

Much of the challenge of security big data is the nature of the data itself. Consider the impact on security of the well-known three V’s of big data:


Enormous volumes of data require security solutions built to handle them. This means incredibly scalable solutions that are, at a minimum, an order of magnitude beyond that for traditional data environments.


Your security solutions must be able to keep up with big data speeds. You’ll need to focus on data parsing and collection throughput, the degree of automation that is available, and the ability to deliver real-time visibility of policy violations and other events.


Mixing multiple sources and types of data with different access permissions compounds classification and policy-setting challenges, elevating the need for robust audit capabilities.

>See also: How big data can transform the enterprise network

It’s not necessarily the associated infrastructure and technology within big data environments that make it more challenging to secure, it’s the multiplicity that dramatically increases complexity.

For example, the open source Hadoop framework has different layers of the stack serving a variety of purposes, from distributed storage at the bottom, to table and schema management, distributed programming, and querying/interface options at the middle tiers, and a wide range of management tools along the top. There is no single logical point of entry or resource to guard, but many different ones, each with an independent lifecycle.

Often big data environments will use multiple technologies for data storage and retrieval.

For example, it’s not uncommon for an implementation to include either or both relational stores and query tools to support analytical workloads/purposes and non-relational technologies—also known as NoSQL technologies—for real-time, interactive workloads.

Many big data environments include multiple instances or versions of the same core building blocks, except from different vendors, such as different Hadoop distributions and NoSQL offerings.

This means a greater amount of diversity and complexity to be addressed by security tools and staff.

Big data deployments typically have a multitude of geographically distributed data stores and, therefore, numerous physical nodes requiring protection. This inherently increases the potential for inconsistent security policies and practices, suggesting the need for solutions that feature strong, centralised administration capabilities.

>See also: The UK app economy: it’s big but is it secure?

Finally, there’s the challenge presented by the lack of security knowledge and understanding in the people working most closely with the data: data scientists and developers. Data scientists, with their skills and experience working with structured and unstructured data to deliver new insights, don’t necessarily think about the security of the data.

It’s not surprising given that new technologies have encouraged data scientists to view big data as a giant sandbox where they are the owners and can decide how the data will be used.

While most development projects rely on access to non-sensitive, test data instead of live, production data, big data application development by its nature often falls outside of the more secure processes set up within IT.

And with higher-access privileges than many others in the organisation, developers also present a greater security risk either through accidental means or malicious intent.

There’s no time to waste when it comes to rethinking security for big data environments. The number and breadth of data breaches continues to grow unabated, with a 40% increase in data breaches in 2016 reported by the Identity Theft Resource Centre.

Everyone from the CIO on down needs to understand and prioritise implementing better security for big data—after all, the last thing you want to hear is that there’s been a big breach in your big data.


Sourced by Terry Ray, chief product strategist at Imperva

Avatar photo

Nick Ismail

Nick Ismail is a former editor for Information Age (from 2018 to 2022) before moving on to become Global Head of Brand Journalism at HCLTech. He has a particular interest in smart technologies, AI and...

Related Topics