Open and extensible DataOps management

A core part of our DataOps platform, Databand’s open source library enables you to track data quality information, monitor pipeline health, and automate advanced DataOps processes. We keep our library open source to provide users control over how data is tracked and build custom extensions for any requirement.

Data quality monitoring

Run health checks on your data lake and database tables, like S3, Snowflake, and Redshift. Built using Databand’s open source library, which makes it easy to report data quality and performance metrics to Databand’s monitoring system or your local logging system.

Automatic data tracking

Gain out of the box metrics for tracking data freshness, accuracy, and completeness.

Fully configurable

Customize data trackers according to your most important data quality checks

Easy setup

Instantly setup data health tracking in Airflow DAGs, Spark jobs, and your data warehouse

Pipeline logging and metrics tracking

Integrate Databand into pipelines to report metrics about your data quality and job performance.

Data health indicators

Automatically generate data profiling and statistics on data files and tables.

Custom metrics

Define and report any custom metric about your data every time your pipeline runs.

Data input/output

Track workflow inputs and outputs and lineage of data across tasks and broader pipelines.

Automation tools

Create advanced automation for data pipeline management and MLOps, including pipeline testing, model deployment, and retraining.

Central Config & Input Management

Abstract out configurations to compute environments and data locations so its easier to test, deploy, and iterate

Dynamic Runs

Run different versions of pipelines based on changing data inputs, parameters, or model scores

Easier Scaling

Execute pipelines easily across large Spark or Kubernetes clusters

@task def buy_vegetables(veg_list) from store import veg_store return veg_store.purchase(veg_list) @task def cut(vegetables): chopped = [] for veg in vegetables: chopped.append(veg.dice()) return [x + "\n" for x in chopped] @task def add_dressing(chopped_vegetables, dressing, salt_amount="low"): for veg in chopped_vegetables: veg.season(salt_amount) return chopped_vegetables @pipeline def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"): vegetables = buy_vegetables(vegetables_list) chopped = cut(vegetables) dressed = add_dressing(chopped, dressing) return dressed with DAG(dag_id="prepare_salad") as dag: salad = prepare_salad() """ CLI: airflow backfill -s 2020-06-01 -e 2020-06-02 prepare_salad """
@task def buy_vegetables(veg_list) from store import veg_store return veg_store.purchase(veg_list) @task def cut(vegetables): chopped = [] for veg in vegetables: chopped.append(veg.dice()) return [x + "\n" for x in chopped] @task def add_dressing(chopped_vegetables, dressing, salt_amount="low"): for veg in chopped_vegetables: veg.season(salt_amount) return chopped_vegetables @pipeline def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"): vegetables = buy_vegetables(vegetables_list) chopped = cut(vegetables) dressed = add_dressing(chopped, dressing) return dressed """ CLI: dbnd run prepare_salad """


Contributions to the community

We’ve benefited greatly from the work of other developers and we want to share the love. These are some recent contributions we’ve made to the community.

github

Find and fix data health issues fast

See how Databand can transform data observability at your organization today.