How to ensure data quality, value, and reliability
The quality of data downstream relies directly on data quality in the first mile. As early as ingestion, accurate and reliable data will ensure that the data used downstream for analytics, visualization, and data science will be of high value.
For a business, this makes all the difference between benefiting from the data and having it play second fiddle when making decisions. In this blog post, we describe the importance of data quality, how to audit and monitor your data, and how to get your leadership, colleagues, and board – on board.
- Proactive Data Observability
- Auditing Data for Quality
- Data Quality or Data Value?
- How to Approach the C-level and the Board
- How to Train Internally
- The Curse of the “Other”
- Best Practices for Getting Started: Ensuring Data Quality Across the Enterprise
Proactive Data Observability
Managing data is like running a marathon. Many factors determine the end result, and it is a long process. However, suppose a runner trips and hurts her ankle at that first mile. In that case, she will not successfully complete the marathon. Similarly, if data isn’t monitored as early as ingestion, the rest of the pipeline will be negatively impacted.
How can we ensure data governance during this first mile of the data journey?
Data enters the pipeline from various sources: external APIs, data drops from outside providers, pulling from a database, etc. Monitoring data at the ingestion points ensures data engineers can gain proactive observability of the data coming in.
This enables them to wrangle and fix data to assure the process is healthy and reliable from the get-go.
By gaining proactive observability of data pipelines, data engineers can:
- Trust the data
- Easily identify breaking points
- Quickly fix issues before they arrive at the warehouse or dashboard
Auditing Data for Quality
Data engineers who want to review their pipeline or audit and monitor an external data source can use the following questions during their evaluation:
- What’s the coverage scope?
- How is the data being tracked?
- Is there a master data reference that includes requirements and metadata?
- Is the customer defined in the right way?
- Is there a common hierarchy?
- Do the taxonomies leverage the business requirements?
- Are geographies correctly set?
- Are there any duplicates?
- Was the data searched before creating new entities?
- Is the data structured to enable seamless integrations and interoperability?
Now that we’ve covered how data engineers can approach data quality let’s see how to get buy-in from additional stakeholders in the enterprise.
Data Quality or Data Value?
Data engineers often talk about the quality of data. However, by changing the conversation to the value of the data, additional stakeholders in the organizations could be encouraged to take a more significant part in the data process. This is important for getting attention, resources, and for ongoing assistance.
To do so, we recommend talking about how the data aligns with business objectives. Otherwise, external stakeholders might think the conversation revolves only around cleaning up data.
4 Criterion for Determining Data Value – for Engineers and the Business:
- Relevancy – Does the data meet the business objective?
- Coverage – Does the data cover the entire market, enabling the enterprise to put it into play?
- Structure – Is the data structured so the enterprise can use it?
- Accuracy – is the data complete and correct?
How to Approach the C-level and the Board
By shifting the conversation to the value of the data rather than its quality, the C-level and the board can be encouraged to invest more resources into the data pipeline. Here’s how to approach them:
- Begin with the reasons why managing data is of strategic importance to your enterprise. Show how data can help execute strategic intentions.
- Explain how managing and analyzing data can help the company get to where it needs to go. Show how data can grow, improve, and protect the business. You can weave in the four criteria from before to emphasize your points.
- Connect the data to specific departments. Show how data can help improve operational efficiency, grow sales and mitigate risk. No other department can claim to help grow, improve and protect all departments to the same extent that data engineering can.
- Do not focus on the process and the technology – otherwise, you will have a very small audience.
How to Train Internally
In addition to the company’s leadership, it’s also important to get people on board in the company. This will help with data analysis and monitoring. Data engineers often need the company’s employees to participate in the ongoing effort of maintaining data. For example, salespeople are required to fill out multiple fields in a CRM when adding a new opportunity.
We recommend investing time in people management, i.e., training and ensuring everyone is on the same page regarding the importance of data quality. For example, explaining how identifying discrepancies accurately can help discover a business anomaly (rather than a data anomaly, which could happen if people don’t consistently and comprehensively update data).
The Curse of the “Other”
Data value auditing is crucial because it directly impacts the ability to make decisions on top of it. If you need an example to convince employees to participate in data management, remind them of “the curse of the ‘other’.”
When business units like marketing, product, and sales monitor dashboards, and a big slice is titled “other”, they do not have all the data they need and their decision-making is impaired. This is the result of a lack of data management and data governance.
Best Practices for Getting Started: Ensuring Data Quality Across the Enterprise
How can data engineers turn data quality from an abstract theory into practice? Let’s tie up everything we’ve covered into an actionable plan.
Step 1 – Audit the Data Situation
First, assess which domains should be covered and how well they are being managed. This includes data types like:
- Relationship data: with customers, vendors, partners, prospects, citizens, patients, and clients
- Brand data: products, services, offerings, banners, etc.
Identify the mistakes at the different pipeline stages, starting from ingestion.
Step 2 – Showcase the Data Pipeline
Present the data situation to the various stakeholders. Show how the data is managed from the entry point to the end product. Then, explain how the current data value is impacting their decisions. Present the error points and suggest ways to fix them.
Step 3 – Prioritize Issues to Fix
Build a prioritized plan for driving change. Determine which issues to fix first. Include identifying sources and how they send data, internal data management, and training employees. Get buy-in to the plan, and proceed to execute it.
Ensuring data quality is the responsibility of data engineers and the entire organization. Monitoring data quality starts at the source. However, by getting buy-in from employees and management, data engineers can ensure they will get the resources and attention needed to monitor and fix data issues throughout the pipeline, and help the business grow.
To try out Databand, the observability platform for data quality and value, click here.