Shipper Detects Data Incidents From Days To Minutes With Databand
Automated detection of schema changes, missing data sources, and critical failures.
Reduction of mean time to detection (MTTD) from days to minutes.
Visibility into data model changes from third-party APIs.
Shipper is one of the fastest-growing tech companies in Indonesia, working to digitize Indonesian logistics and enable cost-efficiencies at scale nationwide. Since its 2017 founding, Shipper has built a vast network of fulfillment centers and partnered with hundreds of local delivery companies across the country in pursuit of this goal.
In the highly competitive logistics industry, data uptime and data quality mean everything. Accurately calculating costs, delivery schedules, and inventory can make or break many businesses, especially those like Shipper that offer digital solutions.
Specifically, Shipper’s platform provides customers with a complete dashboard of metrics on shipment logistics so they can see all the pertinent information in a single place to make smarter decisions. As a result, it’s critical that this data is accessible and reliable.
As Shipper grew, the company not only expanded its customer base, but also started to provide more data to those customers to increase the value they get from the platform. Of course more data sources (and more data to track for a growing customer base) means more complex data pipelines.
This increasing complexity forced Shipper to rebuild its pipeline, shifting from on-prem Airflow and Spark to the latest cloud infrastructure with Amazon and Databricks. Unfortunately, the new data platform left the Shipper team with one major blindspot: data observability. As the business continued to scale, Shipper’s ingestion processes became more and more complex, and catching issues before SLAs were missed became nearly impossible.
Fithrah Fauzan, Data Engineering Lead at Shipper, points to three critical challenges the team experienced:
- Failed data SLAs due to inaccurate or missing data in customer-facing dashboards
- Lack of visibility into data model changes from third party APIs
- Heavy costs to the business due to weekly failed pipelines
“Due to the complexity of our ingestion process and the lack of observability in that area, we’d only know if there was some kind of issue with our pipelines after we’d missed our SLA. From there, the only thing we could do was to ask the operational manager to fix it and backfill the data — which could take two to three day. When this was happening on a weekly basis, it became extremely costly and difficult to deal with,” Fauzan explains.
Recognizing these challenges, the Shipper team knew they would need to find a solution sooner rather than later to continue growing the business effectively.
Their search for a solution that could help with end-to-end data observability led Shipper straight to Databand. In particular, they found value in Databand’s ability to support:
- Root cause analysis with automatic notification management, logging, and lineage
- Automated detection of schema changes, missing data sources, and critical failures
- Orchestrated remediation workflows for data issue notifications to their DevOps alerting system
According to Fauzan, implementing Databand had an immediate positive impact on the Shipper team’s ability to track pipeline errors, schema changes, and other data quality issues at scale, that way they can identify issues before they miss any SLAs – and resolve those issues faster.
Shipper’s customers feel the benefits of this visibility too. Fauzan shares: “Customers are using our dashboard to report shipment metrics for their business. If data pipelines fail and we miss our SLA, the dashboard will not be correct. Having a way to know whether the data will be delivered and in the right form is extremely important to our customers.”
Business Impact and Results
From tracking data more easily to resolving issues faster, the Shipper team reports that implementing Databand has had a significant, positive business impact.
Reduced Mean Time to Detection and Mean Time to Resolution for Greater System Uptime
Previously, the Shipper team could only detect problems in their data pipeline by manually QAing the data delivery or – worse yet – through customer or team complaints. That’s because pipeline failures weren’t a part of their resolution flow.
Now, with Databand, Shipper can set up pipeline alerts on their ingestion process, pipeline statuses, and anomalous run durations, which has reduced the mean time to detection (MTTD) on issues from three days to mere minutes.
This real-time capturing of data quality issues during ingestion has also empowered the team to improve their mean time to resolution (MTTR). Now, they can detect and resolve issues in real-time thanks to the Databand alerts, which connect directly to the team’s existing workflows in Opsgenie and Jira.
Once Databand detects an issue, the Shipper team can quickly conduct a
root cause analysis. Specifically, the logs within Databand enable the team to diagnose the affected pipeline in minutes, rather than spending hours tracking down pipeline owners, searching through logs, and tracing source lineage.
Altogether, the visibility provided by Databand has dramatically improved system uptime and given the engineering team much-needed peace of mind.
Improved Data SLAs for Happier Customers
The lack of visibility the Shipper team had before Databand meant they couldn’t track progress toward meeting SLAs until after they had already missed those commitments. This meant Fauzan needed to manually track pipeline successes and failures retroactively to understand performance.
Databand has changed this entirely, making it easy for Shipper to measure and guarantee their SLAs in real-time. Now, Fauzan can use the Databand dashboard to quickly see how the team is tracking toward their SLAs and visualize how much of an error budget they have left for the rest of the month.
This improved ability to meet data SLAs for both external customers and internal data consumers has measurably improved the user experience, leading to happier customers engaging with Shipper dashboards.
Without Databand, we didn’t know we had problems until two or three day later. Databand helps us detect data quality issues faster so we can meet our data SLAs.