It’s no secret that data lakes have piqued the interest of organisations across the globe that are seeking to reduce costs, discover insights in their business, and boost productivity from their big data efforts.
In fact, Gartner recently reported that inquiries into data lakes have increased by 21% year-over-year. However, the same report found that businesses are growing skeptical to their hype. Considering this skepticism, why do businesses still pursue data lakes?
The fact of the matter is, data lakes represent a valuable model for managing big data. But getting a data lake implementation right takes strategy, time, and effort. You can’t invest a little and suddenly get a lot back. You should make a solid investment to get a much bigger return.
>See also: The UK’s top 50 data leaders 2017
Many new data lake users are seeking out guidance to help them shape best practices to do more than just store their data, but get the maximum value out of their data lake — and rightly so. Without the tools and procedures to shape a data lake into a shared enterprise resource, businesses are risking their newest data architecture asset turning into a “data swamp.”
By knowing the secrets of the data lake, businesses can get a complete picture of their data and their operations, and make sure their data lake investment doesn’t go dry:
1. Complement, don’t replace, enterprise data warehouses and data marts
You have a data lake, but now what do you do with all your previous data storage investments? The answer is to keep them and have them complement your current data lake. The truth is, you will still find value in your data warehouses for specific types of queries and analytics, so you want to make sure you still retain the best tool for the job.
The modern data lake is great for enriching large data sets and correlating data sets that were previously spread across disparate sources. Technologies like Apache Hadoop are ideal for these huge environments because they offer lower costs around storage and processing.
>See also: Data forecast to grow 10-fold by 2025
And with so many integrated tools available like analyst-friendly SQL-on-Hadoop, businesses should reap the rewards of the cheaper storage option for their unstructured data that integrates with their enterprise data warehouses and data marts.
2. Visual analytics make the big picture accessible
Enterprise data lake users must be able to get insights without having to code. Otherwise, data lakes are just a private area reserved for technical teams. To make data analytics as accessible as possible to the larger business analyst community, enterprises must invest in a tool that permits them to visually display that information, ensuring a data lake isn’t a black box to less tech-savvy users.
This feature enables non-techies to drill down into data and derive insights, and even make predictions, through an intuitive interface.
Additionally, recent developments in machine learning and automation are doing even more to introduce visualized insights to all employees. Capabilities such as natural language search and analytical recommendations help end users to gain insights more easily. And it’s also powering predictive analytics, where visuals can proactively suggest actions to users.
3. Create a strong data culture
What is the use of having powerful visualised data if it can’t be shared? All businesses employing a data lake need to create a governance framework that enables collaboration. By creating a framework that allows sharable data sets and dashboards, everyone in the enterprise will be able to offer feedback on which models generate the most valuable insights.
Employees from senior executive leadership down to new hires should have the tools and training to access and share data insights. This will help drive a strong data culture that lets all employees to feel comfortable using analytics tools. Comfort and familiarity with the tools will lead to deeper analysis that result in better decisions that are justified with compelling data points.
4. Have unified security and governance
It’s great to have shareable data, but it has to stay in the right hands. This topic becomes even more critical as more stringent regulations require more controls in the enterprise. For example, the imminent General Data Protection Regulation, which is going to affect any globally operating company, will force anyone that deals with data to keep security as top of mind.
Businesses must know what kind of data they have, where their sensitive data resides, how to handle it, and how to see it.
To do so, they must take a unified approach to data governance before they land themselves in hot water.
This means that their data governance policies must try to simplify data management, and ensure data is not unnecessarily copied to other locations. Minimizing the number of data locations can simplify the overall governance effort.
With a data lake that has self-service tools that allow anyone to gain insights, this security and governance is especially important. Governance is an issue that affects all users and stakeholders. Businesses can’t expect IT to be the only department in on the data hygiene housekeeping.
So while IT still needs to be involved, the proper governance framework will continue to enable user flexibility while promoting individual data responsibility throughout the organization.
5. Scale to support your hundreds, or thousands, of users
Coding-intensive tools that traditionally have been used in data platforms like Hadoop limited the number of users that could access data in a data lake. This might have been fine before data became more democratised, but the advantages of a data lake are for naught if data insights are limited to a handful of people in a company.
Recent technologies have simplified the access to big data, allowing business analysts to use their familiar desktop analytics tools to analyse data in a data lake. But these technologies were designed primarily for smaller user groups doing ad-hoc queries, and do not handle the load from thousands of concurrent users.
Analytics platforms on data lakes need high scalability for data and users, but also query acceleration capabilities to get low latency responses with a large number of concurrent users.
>See also: 5 ways to improve a data strategy
There is not enough time in the day or money in organizations’ budgets to justify the effort to accumulate petabytes of data if it can’t be accessed concurrently. To be a true shared enterprise resource, businesses need to invest in analytics tools that serve a large portion of their employee population.
While data lakes have tremendous potential, they are not silver bullets. Organizations need to set themselves up for success by accompanying their data lakes with the right analytics technologies so they make their data visual, accessible, shareable, secure, and scalable.
Too many data lakes have failed because the implementers assumed that all the pieces will naturally fall into place. In other words, they expected a high return on little investment. As with any other business initiative, you don’t get successes for free.
Learn not only from this article, but also from your peers at conferences and in online media. There are a lot of stories of failed data lakes, but there are also great successes as well. No one will tell you how easy their data lake implementation was, so don’t dismiss the challenge.
But if you make it a priority and allocate proper resources, you will start the path to getting much more value from your data. As you take on this journey, always remind yourself about complementary integration, data visualisations, collaboration and sharing, security and governance, and scale.
Sourced by Shant Hovsepian, co-founder and CTO, Arcadia Data