The intelligent data pipelines collaboration will enable customers to quickly ingest data directly into a managed data lake from hundreds of hybrid data sources. The collaboration involves Informatica’s Cloud Data Integration and Databricks’ Unified Analytics Platform.
Informatica is also announcing support for Delta Lake, the new open source project from Databricks, to provide an analytics-ready place to store massive amounts of data.
The collaboration will enable “data engineers to easily discover the right datasets and ingest high volumes of data from multiple sources into Delta Lakes,” said Ghodsi, co-founder and CEO, Databricks.
The three considerations of data: standardise data, data strategy and data culture
There’s data cataloging, data bench lining, and a single view of data. Getting data right requires multiple considerations. Information Age spoke to Greg Hanson from Informatica and he outlined three considerations: to standardise data, data strategy and data culture.
The product combination will entail simplified creation of high-volume intelligent data pipelines, and integrated data governance for intelligent data discovery and end-to-end lineage.
The companies say that the product integrations will allow faster development and complete governance for data engineering workloads.
Data teams will be able to ‘easily’ create performant, scalable data pipelines for big data. Using Informatica’s visual drag and drop workflows, data teams can define their data pipelines to run on highly optimized Apache Spark clusters in Databricks to provide high performance at scale.
The difference between a data swamp and a data lake? 5 signs
As companies collect increasing amounts of data and store it, they risk creating data swamps. Sometimes, what started as a data lake turns into a data swamp. Data lakes and data swamps are both data repositories, but data swamps are highly disorganised.
Delta Lake provides ACID transactions and schema enforcement that brings reliability at scale to data lakes and makes high quality datasets ready for downstream analytics
“Trusted, high-quality data and efficient use of data users’ time are critical success factors for analytics and data science projects,” said Chakravarthy, CEO, Informatica. “Informatica’s support for Databricks allows data engineers to rapidly build serverless pipelines to ingest and govern data from a variety of sources at scale, while empowering data scientists using Databricks to quickly find and prepare the data for their analytics and data science projects in a self-service fashion.”
Ghodsi added: “This means joint customers can use the reliability and performance at scale from Databricks to make data ready for analytics and machine learning – and get intelligent governance to find, track and audit that data from end to end.”