Apache DBND

A framework for data pipeline operations.

DBND is a library, set of APIs, and CLI for tracking metadata from your workflows, creating a system of record for runs, debugging tasks, and adding automation to data processes.

Documentation

Github

DBND is a modular library with a range of DataOps capabilities. It can be used solely for tracking pipeline metadata, for orchestrating new ones, or for adding additional capabilities to existing orchestrators like Apache Airflow.

Features

Make pipelines fully tracked and automated.

DBND is a modular library with a range of DataOps capabilities. It can be used solely for tracking pipeline metadata, for automating new ones, or for adding more dynamic capabilities to pipeline orchestrators like Apache Airflow.

Easy Integration

Track workflows through decorators or simple logging APIs with minimal change to your projects.

Metadata Tracking

Track application logs, errors, function input/output, performance metrics, and system resources.

Data Profiling

Automatically check schema changes, completeness, and custom measurements.

Task Automation

Wire together workflow functions to create automated and reproducible pipelines.

Portability

Run tasks and pipelines across compute environments including Kubernetes and Spark clusters.

Data Caching

Cache and reuse data from unchanged tasks to cut down on unnecessary compute time.