A library to standardize your data operations

Databand’s open source library helps teams establish their DataOps lifecycle. Track pipeline metadata, debug workflows, and add dynamic automation to data processes.

New features

We built our custom operators to provide out-of-the-box functionality for pipeline tracking and optimization. Click the Github link to learn more about the operator and how to integrate in your pipelines.

Apache Spark Run Operator

Track Redshift query metadata, performance metrics, and data profiles

Snowflake Metadata Operator

Track Snowflake query metadata and performance metrics.

DataFrame Profiling

Report data statistics and distributions on Pandas and Spark dataframes.

Pipeline logging and tracking

Store metadata to always be on top of your pipeline and data health.

Custom Metrics

Define and report any custom metric about your data every time your pipeline runs.

Data Profiles

Automatically generate data profiling and statistics on data files and tables.

Input / Output

Track workflow inputs and outputs and lineage of data across tasks and broader pipelines.

from dbnd_snowflake.airflow_operators import LogSnowflakeResourceOperator, log_snowflake_resource_operator select_query = 'select * from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF100TCL". "CUSTOMER" limit 1000' # Airflow snowflake_operator get_customers_task = SnowflakeOperator( sql=select_query, snowflake_conn_id="test_snowflake_conn", task_id="get_customers", ) # Databand operator for Snowflake resource tracking log_snowflake_resources_task = LogSnowflakeResourceOperator( query_text=select_query, snowflake_conn_id="airflow_snowflake_conn", warehouse=None, database=database, schema=schema, account=account, task_id="log_snowflake_resources_task", )
from dbnd_snowflake import log_snowflake_resource_usage log_snowflake_resource_usage( query_text, database="DATABASE", user="user", connection_string="snowflake://<user>:<password>@<account>/", session_id=123456, )

Integration points

Databand Open Source contains three methods of integration with your data workflows.

Connectors

Plugins to connect with services like Apache Airflow, Azkaban, Deequ, MLFlow, and more.

Operators

Off-the-shelf tasks and templates for tracking metadata and enhanced code deployment.

Instrumentation

Logging methods and code annotations for reporting metrics and building dynamic workflows.

Optimized run orchestration

Leverage Databand operators and instrumentation to make building, running, and deploying pipelines easier and more dynamic. You can use the library directly in Python workflows from scratch, or as an extension of Apache Airflow.

Dynamic Runs

Define different versions of pipelines to run based on changing data or parameters

Central Configs

Abstract out configurations to compute environments and data locations

Decorator Definition

Easily define flows by annotating your code, without making big changes to your workflow

@task def buy_vegetables(veg_list) from store import veg_store return veg_store.purchase(veg_list) @task def cut(vegetables): chopped = [] for veg in vegetables: chopped.append(veg.dice()) return [x + "\n" for x in chopped] @task def add_dressing(chopped_vegetables, dressing, salt_amount="low"): for veg in chopped_vegetables: veg.season(salt_amount) return chopped_vegetables @pipeline def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"): vegetables = buy_vegetables(vegetables_list) chopped = cut(vegetables) dressed = add_dressing(chopped, dressing) return dressed with DAG(dag_id="prepare_salad") as dag: salad = prepare_salad() """ CLI: airflow backfill -s 2020-06-01 -e 2020-06-02 prepare_salad """
@task def buy_vegetables(veg_list) from store import veg_store return veg_store.purchase(veg_list) @task def cut(vegetables): chopped = [] for veg in vegetables: chopped.append(veg.dice()) return [x + "\n" for x in chopped] @task def add_dressing(chopped_vegetables, dressing, salt_amount="low"): for veg in chopped_vegetables: veg.season(salt_amount) return chopped_vegetables @pipeline def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"): vegetables = buy_vegetables(vegetables_list) chopped = cut(vegetables) dressed = add_dressing(chopped, dressing) return dressed """ CLI: dbnd run prepare_salad """

Start a free trial or demo

Contact us for a free trial or to see a demo of the solution in action.