For Agile Data Engineering

An agile, open source framework for orchestrating and tracking your data pipelines.

DBND is a Python library, set of APIs, and CLI that enables you to collect metadata from your workflows, create a system of record for runs, and easily orchestrate complex processes. Gain visibility over your data flows, pipeline versioning, reusability and easier automation.

In closed beta – contact for access!

Features

Make pipelines fully automated and tracked across your suite of tools.

DBND integrates with the tools you use to run your data stack – Spark, Kubernetes, AWS Batch, and more.
Connect and extend an existing Airflow deployment or use DBND to build pipelines from scratch.

Orchestration

Use DBND's orchestration framework to transform any Python workflows into DAGs with added parallelization, portability and data management.

Environment Control

Focus on your code, enable DBND to manage execution and storage - move runs from local to AWS batch, Dataproc, EMR and other systems using a simple switch.

Data Tracking

Access lineage of input / output data and all intermediate results for any pipeline run, automatically stored to your cloud or on-prem data lake.

Data Profiling

Track metadata changes across pipeline runs, such as size, structure, schema, distributions, and other dataset dimensions.

Custom Metrics

Track user defined metrics from your workflows, such as custom business KPIs or ML performance scores like R2, MAE, and RMSE.

Run Versioning

Create snapshots of every pipeline run (code, data, and parameters) for reproducibility and faster debugging.