Read our new blog about 'Avoid data SLA misses with Databand Dashboard'

Open and extensible DataOps management

A core part of our DataOps platform, Databand’s open source library enables you to track data quality information, monitor pipeline health, and automate advanced DataOps processes. We keep our library open source to provide users control over how data is tracked and build custom extensions for any requirement.

Data quality monitoring

Run health checks on your data lake and database tables, like S3, Snowflake, and Redshift. Built using Databand’s open source library, which makes it easy to report data quality and performance metrics to Databand’s monitoring system or your local logging system.

Automatic data tracking

Gain out of the box metrics for tracking data freshness, accuracy, and completeness.

Fully configurable

Customize data trackers according to your most important data quality checks

Easy setup

Instantly setup and run data tracking as Python scripts, Airflow DAGs, or CRON jobs

Pipeline logging and metrics tracking

Integrate Databand into pipelines to report metrics about your data quality and job performance.

Data health indicators

Automatically generate data profiling and statistics on data files and tables.

Custom metrics

Define and report any custom metric about your data every time your pipeline runs.

Data input/output

Track workflow inputs and outputs and lineage of data across tasks and broader pipelines.

Automation tools

Create advanced automation for data pipeline management and MLOps, including pipeline testing, model deployment, and retraining.

Central Config & Input Management

Abstract out configurations to compute environments and data locations so its easier to test, deploy, and iterate

Dynamic Runs

Run different versions of pipelines based on changing data inputs, parameters, or model scores

Easier Scaling

Execute pipelines easily across large Spark or Kubernetes clusters

@task def buy_vegetables(veg_list) from store import veg_store return veg_store.purchase(veg_list) @task def cut(vegetables): chopped = [] for veg in vegetables: chopped.append(veg.dice()) return [x + "\n" for x in chopped] @task def add_dressing(chopped_vegetables, dressing, salt_amount="low"): for veg in chopped_vegetables: veg.season(salt_amount) return chopped_vegetables @pipeline def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"): vegetables = buy_vegetables(vegetables_list) chopped = cut(vegetables) dressed = add_dressing(chopped, dressing) return dressed with DAG(dag_id="prepare_salad") as dag: salad = prepare_salad() """ CLI: airflow backfill -s 2020-06-01 -e 2020-06-02 prepare_salad """
@task def buy_vegetables(veg_list) from store import veg_store return veg_store.purchase(veg_list) @task def cut(vegetables): chopped = [] for veg in vegetables: chopped.append(veg.dice()) return [x + "\n" for x in chopped] @task def add_dressing(chopped_vegetables, dressing, salt_amount="low"): for veg in chopped_vegetables: veg.season(salt_amount) return chopped_vegetables @pipeline def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"): vegetables = buy_vegetables(vegetables_list) chopped = cut(vegetables) dressed = add_dressing(chopped, dressing) return dressed """ CLI: dbnd run prepare_salad """


Contributions to the community

We’ve benefited greatly from the work of other developers and we want to share the love. These are some recent contributions we’ve made to the community.

github

Find and fix data health issues fast

Get started for free when you start your trial or request a product demo.