The Ravit Show Q&A: How More Data Observability Leads to Better Governance
We recently had the opportunity to join an episode of The Ravit Show, a community for data science and AI professionals to upskill, grow, share, and learn from each other.
Ryan Yackel, Product Evangelist at Databand, and Kip Yego, Program Director at IBM, joined Ravit Jain to talk about all things data observability and data governance. During the show, Ryan, Kip, and Ravit spoke a lot about how observability is the “new kid on the block” that plays into the more well-defined governance discipline.
The conversation sparked an interesting Q&A from Ravit and the audience. You can watch the full show below, or read on for a recap of the Q&A.
Do you suggest any specific tools to support data observability or do you think it’s better to promote observability as a discipline within data governance? Or both?
Governance is a well-defined discipline that’s been around for a while, and data observability is something that can be factored into the overall governance story.
The reason why we have this topic around observability vs. governance is because we often talk to people from the governance team who are hearing about data observability for the first time and trying to figure out how it plays into their work.
At Databand, we primarily talk to data engineering, platform, and data science teams, but we do have people come to us with governance titles (such as data stewards) too. That’s because data observability ties back to data quality and reliability, but the thing to note is it’s not the same quality and reliability that governance teams traditionally consider. With observability, it’s more about in-motion alerting and incident management.
The recent release of the Gartner Magic Quadrant for Data Quality Solutions underscores this relationship, as it highlights critical capabilities like monitoring, detection, automation, and augmentation that are all part of data observability. It’s a sign the industry now realizes that observability underpins data governance and that proactive notification and resolution of issues is really a necessity.
When we talk about the role of data governance and data observability in an organization, what does the information architecture look like?
There are a number of things to consider from an information architecture perspective. Take the point of view of a CDO: Their primary objective is to simplify the movement and migration of data and get it to end users as quickly as possible, all while adhering to regulations.
The first thing that data governance and observability help with is the dynamic nature of regulatory compliance. We’re all familiar with regulations like GDPR and CCPA, and there are so many new ones that continue to come into effect that impact data governance and elements within that like AI governance. For example, Canada’s C-27 bill and the US AI Bill of Rights are two potential new regulations that organizations need to think about.
From an overall information architecture perspective, the question becomes how are organizations evolving dynamically with these changes and embedding those new needs into their governance processes so they’re not caught off guard. This is where automation, integration, and augmentation become key, and observability and governance are what work together to ensure that happens.
The other thing to consider about information architecture is the changing environment with which most organizations are dealing. Multi-cloud and hybrid cloud environments are a reality for most organizations today, and that means you now have data strewn across not just different data estates, but also different jurisdictions. The question becomes, how do you govern data within that context? This is where the centralization of data intelligence comes into play. Importantly, you don’t need to centralize data to govern it; rather, you need to centralize the management of the intelligence of that data so you can continue to affect governance across a distributed landscape without having to rely on different point solutions or worrying about moving huge amounts of data. This is another area where observability complements data governance, especially when the movement of data happens across borders.
Ultimately, those issues that are raised early in the process ensure resolution well before data gets pushed downstream, so some of the information architecture considerations from a governance perspective have complementary components to data observability.
How is data monitoring different from data observability?
Monitoring is essentially a limited view of what’s going on in the system. If you have an alert that a service is down, that would be monitoring. But if you have an alert that a service is down and here’s the root cause, here’s the impact of the dependencies on that service, and here’s where you go to fix it, that’s observability.
Oftentimes people will say they’ve built some lightweight monitoring solution, but it doesn’t go deep enough because we don’t really know the what, where, and why, we just know what happened. If you want to know how the issue impacts dependent systems or you want to use things like ML anomaly detection and dynamic alerting to proactively fix issues before the data gets to a resting state, then you need observability.
Would you say an ideal situation would be to develop a data and AI strategy that conforms to all technological concerns that data engineers have in an organization?
This is something we see with many organizations. One of the starting points is to align your data strategy to your overall business strategy. Very often we see data strategies that focus on data for the sake of data projects, but you need to make sure you align everything to your overall business strategy to drive forward key objectives.
For example, one of things you want to avoid is a scenario where you’re perpetually modernizing and monetizing. With an appropriate data strategy that’s aligned to a business strategy, you can bring in modular services and solutions that can solve for overall business objectives and incorporate important aspects of governance, whether that’s integration or observability. When those are built into your data strategy, they allow for a more natural evolution as the organization and the environment within it change.
So ultimately, yes: A holistic data strategy is certainly something that would capture the concerns of data engineers. At the same time, it also enables your CDO, CIO, and CTO to come together and work from the same page, which benefits data consumers within the organization.
How does Databand help modern data teams and data engineers get to the next level?
Databand is a cloud-first data observability solution. As an IBM company, we’re a part of the IBM Data Fabric, and one of the great things about this connection with IBM is that we can be a part of that ecosystem regardless of the different technologies our customers use.
When we work with organizations, we talk a lot about the technology that’s in their data stacks that may be used by different teams. Most companies don’t use a single vendor solution anymore. We see a lot of teams using tools like Apache Airflow or Spark, code-drive pipelines like those in Python and Java, DBT for transformations, and the list goes on. Databand can help monitor the processes and pipelines within those solutions.
For example, maybe an organization has a version of Apache Airflow that’s behind their own firewall, but they want to move to Google Cloud Composer or they want to use more of a managed service on Airflow. Databand can help make sure that as teams move these different workloads to the cloud, they can monitor each one. We support a lot of data migration use cases specifically on the data pipeline in production environments, where we can attach observability to every single thing that goes online to help teams get into the habit of monitoring what’s going on as they continue to add more and more into production.
Critically, Databand doesn’t replace anything for most organizations, it just makes everything more productive. We meet engineers and platform teams where they are to solve the issues around reliability, quality, and impact analysis that they run into regularly. Just consider the fact that data engineering teams spend almost half their time maintaining and fixing pipelines – and that’s without observability tied into that, meaning they don’t even know what’s going on. Databand comes in to help detect issues earlier and resolve them faster, that way modern data teams can deliver more trustworthy data.
Where do you see data observability in the next two to three years?
There are a lot of observability solutions out there from great companies in addition to Databand, like Bigeye, Monte Carlo, and Unravel Data, and IBM was the first to market in terms of acquiring a leading technology in the space. One of the reasons why IBM acquired Databand was that it’s the only solution that can provide application observability with Instana. Databand also does machine learning and AI model observability with OpenScale.
Going forward, we can expect to see more consolidation from companies like IBM adding these solutions to their observability stacks. And as more and more startups get acquired, we’ll see a fuller story around data observability. For instance, analysts will start talking about observability outside of application and performance monitoring to really get into full stack observability across the entire enterprise landscape. Once that happens, conversations about observability will stop getting tossed from the data team to the engineering team to the compliance team and instead start and end with the CIO and CTO. It will elevate observability to a higher level problem with more urgency to resolve.
If you look at the Gartner Magic Quadrant for Data Observability, we’re already seeing hints of what’s to come. Gartner specifically calls out that data quality solutions will transition from standalone applications into a set of integrated platforms, noting that features like data catalog, data observability, and data preparation should be part of an integrated platform. This call-out signals that the industry is heading toward a world where data governance and observability live within a single platform.
Would you recommend any books about data observability?
Go to IBM.com and download the Gartner Magic Quadrant for Data Quality Solutions. It’s hot off the press and gives a good overview of what we spoke about here regarding observability, including taking the first steps to embed data observability in conversations about data quality and reliability beyond just the governance or software testing level.
Looking for more on how data observability supports better governance? Click here to watch the full episode of The Ravit Show featuring Databand.
Data observability that's built to improve operational flow
Implement end-to-end observability for your entire solutions stack so your team can identify, troubleshoot, and resolve problems outlined in your data governance strategy.