A library to standardize your data operations
Databand’s open source library helps teams establish their DataOps lifecycle. Track pipeline metadata, debug workflows, and add dynamic automation to data processes.

New features
We built our custom operators to provide out-of-the-box functionality for pipeline tracking and optimization. Click the Github link to learn more about the operator and how to integrate in your pipelines.
Track Redshift query metadata, performance metrics, and data profiles
Track Snowflake query metadata and performance metrics.
Report data statistics and distributions on Pandas and Spark dataframes.
Pipeline logging and tracking
Store metadata to always be on top of your pipeline and data health.
Define and report any custom metric about your data every time your pipeline runs.
Automatically generate data profiling and statistics on data files and tables.
Track workflow inputs and outputs and lineage of data across tasks and broader pipelines.
from dbnd_snowflake.airflow_operators import LogSnowflakeResourceOperator,
log_snowflake_resource_operator
select_query = 'select * from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF100TCL".
"CUSTOMER" limit 1000'
# Airflow snowflake_operator
get_customers_task = SnowflakeOperator(
sql=select_query,
snowflake_conn_id="test_snowflake_conn",
task_id="get_customers",
)
# Databand operator for Snowflake resource tracking
log_snowflake_resources_task = LogSnowflakeResourceOperator(
query_text=select_query,
snowflake_conn_id="airflow_snowflake_conn",
warehouse=None,
database=database,
schema=schema,
account=account,
task_id="log_snowflake_resources_task",
)
from dbnd_snowflake import log_snowflake_resource_usage
log_snowflake_resource_usage(
query_text,
database="DATABASE",
user="user",
connection_string="snowflake://<user>:<password>@<account>/",
session_id=123456,
)
Integration points
Databand Open Source contains three methods of integration with your data workflows.
Plugins to connect with services like Apache Airflow, Azkaban, Deequ, MLFlow, and more.
Off-the-shelf tasks and templates for tracking metadata and enhanced code deployment.
Logging methods and code annotations for reporting metrics and building dynamic workflows.
Optimized run orchestration
Leverage Databand operators and instrumentation to make building, running, and deploying pipelines easier and more dynamic. You can use the library directly in Python workflows from scratch, or as an extension of Apache Airflow.
Define different versions of pipelines to run based on changing data or parameters
Abstract out configurations to compute environments and data locations
Easily define flows by annotating your code, without making big changes to your workflow
def buy_vegetables(veg_list)
from store import veg_store
return veg_store.purchase(veg_list)
@task
def cut(vegetables):
chopped = []
for veg in vegetables:
chopped.append(veg.dice())
return [x + "\n" for x in chopped]
def add_dressing(chopped_vegetables, dressing, salt_amount="low"):
for veg in chopped_vegetables:
veg.season(salt_amount)
return chopped_vegetables
def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"):
vegetables = buy_vegetables(vegetables_list)
chopped = cut(vegetables)
dressed = add_dressing(chopped, dressing)
return dressed
with DAG(dag_id="prepare_salad") as dag:
salad = prepare_salad()
""" CLI:
airflow backfill -s 2020-06-01 -e 2020-06-02 prepare_salad
"""
def buy_vegetables(veg_list)
from store import veg_store
return veg_store.purchase(veg_list)
@task
def cut(vegetables):
chopped = []
for veg in vegetables:
chopped.append(veg.dice())
return [x + "\n" for x in chopped]
def add_dressing(chopped_vegetables, dressing, salt_amount="low"):
for veg in chopped_vegetables:
veg.season(salt_amount)
return chopped_vegetables
def prepare_salad(vegetables_list=data_repo.vegetables, dressing="oil"):
vegetables = buy_vegetables(vegetables_list)
chopped = cut(vegetables)
dressed = add_dressing(chopped, dressing)
return dressed
""" CLI:
dbnd run prepare_salad
"""
Contributions to the community
We’ve benefited greatly from the work of other developers and we want to share the love. These are some recent contributions we’ve made to the community.
githubRecent contributions
Scheduler Optimizations
Start a free trial or demo
Contact us for a free trial or to see a demo of the solution in action.