In the world of data lakes, people often throw around the phrase: “If you build it, they will come.”
With all of that data in one place, you would think everyone would want to come flock to the lake and drink from it. But instead of thinking about people drinking from the lake, businesses should think about it more like fishing. Just because the lake is well stocked with data, it doesn’t mean you’re going to get the results you want.
Catching a lot of fish and catching the right type of fish requires the correct combination of skill, equipment and location. Examining data lakes in the same manner, you might ask: “Why haven’t I been able to help my analysts catch more, higher value fish?”
>See also: Data lakes vs data streams: which is better?
Here are six of the biggest challenges with data lakes that hold organisations back from obtaining more value.
1. Alignment with the business
Even if you believe you’re creating a great data lake, the business teams may not agree. Or if they don’t have visibility into what’s being created, they may not have developed any adoption plans.
This misalignment between the data lake builders and business teams or fishermen is the single biggest barrier to getting value out of your data lake.
There are a number of ways to align data lake stakeholders, which include:
• IT and analysts.
• Line-of-business leaders.
Use case discovery workshops and value-definition frameworks help bring together the virtual team to agree on priorities, requirements and business usage to drive adoption. In addition, staying focused on specific use cases and departments early in your big data journey can help your data lake show value and eventually drive adoption.
2. Lack of skills
Needing a brand new set of skills is a major fallacy surrounding data lakes. Yes, you do require some Hadoop skills to manage the data lake. But layering the proper platform with familiar, analyst-friendly interfaces lets your analyst team reuse existing skills and eliminates the need to find specialised analysts that can program to Hadoop or Spark.
3. Data quality and consistency is low
Organisations are draining lots of time reviewing data lakes for quality and consistency. As data lakes contain more data in varying formats, cleansing them can be a formidable task.
To remove this barrier, organisations need to avoid the trap of creating hand-coded routines to cleanse and manage data on Hadoop. These routines tend to be error-prone and require manual checks.
Modern, native-on-Hadoop data preparation tools dramatically reduce time spent on data quality and consistency and produce superior datasets for analytic teams faster.
4. A lack of true data curating
Simply finding the right datasets in data lakes is a tremendous challenge for analysts. There’s a large volume of data and it comes in different formats, some of which can be very complex and cryptic. If you don’t know where to fish, you won’t catch anything.
This is where the data analysts and stewards add their value to the data lake. These team members know the data very well and are able to properly curate it. Armed with a native-on-Hadoop data preparation tool, your data “masters” can get analysis-ready datasets to analysts to speed analytic cycles.
5. Governance that is limiting
With a tremendous volume of data sitting in a data lake, administrators will often over react when it comes to data security, especially at the raw data level. With customer analytics as the number-one use case for data lakes, they often see the need to protect private data about customers.
Strong, granular and flexible governance in your big data analytics platform on your data lake can help you set the right policies at various levels of the data to ensure raw data is secure, while analysts can get to the curated data and business teams get to the analytic results they need.
6. Lack of operationalisation
Even if your analyst teams can get to the right data and perform their discovery and analytics quickly, these new insights don’t produce value on their own. The last step of getting the right data to the proper business teams is often a forgotten step in the analytic cycle.
Your data lake requires operationalisation features that enable analytic jobs to run as needed to feed data to the business. Capabilities such as scalable execution, detailed data management policies, flexible job scheduling and more will help deploy analytic data pipelines to the business.
There are a number of obstacles the impede organisations from getting the most value out of their data lakes. And often it isn’t one single challenge, but a combination. To take your data lake to the next level and reap the rewards of fruitful data fishing, play close attention to these roadblocks.
Sourced by John Morrell, Sr. director of product marketing at Datameer
Nominations are now open for the Tech Leaders Awards 2017, the UK’s flagship celebration of the business, IT and digital leaders driving disruptive innovation and demonstrating value from the application of technology in businesses and organisations. Nominating is free and simply: just click here to enter. Good luck!