We're excited to be Gold Sponsors of the DATA + AI Summit this year! Check out our booth and session on May 28th!
Guaranteeing Data Quality SLAs with Deequ & Databand
May 28, 2021 11:05 AM (PT)
As the importance of data grows and its connection to business value becomes more direct, data engineering teams are increasingly adopting service level agreements (SLAs) for how they deliver data, covering new factors like data freshness, completeness, and accuracy.
In this session we’ll discuss how to use Deequ, a data quality library that’s purpose-built for Spark, to develop a data monitoring and QA system that will enable you to meet SLAs guaranteed to your analytics users, scientists, and other business stakeholders. We’ll cover how to use Deequ to create quality checks that report metrics and enforce rules on data arrivals, schemas, distributions, and custom metrics. We’ll cover how to visualize, trend, and alert on those metrics using pipeline observability tools. And we’ll discuss common challenges that teams face when setting up data quality logging infrastructure and best practices for adoption.
We’ll use common examples such as machine learning, data transformation, and replication pipelines (such as moving data from S3 to Delta Lake).
With these tools, you’ll be able to create more stable, reliable pipelines that your business can depend on.
In this session watch: Josh Benamram, Co-founder and CEO, Databand.aiMichael Harper, Data Solution Architect, Databand.ai