Virtual Event: Data-Driven and Coffee-Fueled with Kate Strachnyi, founder of Datacated. Register now

[video] How Oracle manages Airflow in research and production and highlights on Airflow 2.0

Databand
2021-03-12 08:44:42

[video] How Oracle manages Airflow in research and production and highlights on Airflow 2.0

Webinar: Mar 10, 2021

Online – Zoom

Gita Ferber | Senior Software Developer, Oracle
Evgeny Shulman | CTO and Co-founder, Databand

In this meetup session, we will have a special guest speaker, Gita Ferber, Oracle’s Senior Software Developer. Gita will dive deeply into Apache Airflow at Oracle: Airflow in production and Airflow in research teams. Get a sneak peek into how Oracle built out their projects, so you can gain some tips and best practices to implement on your own Airflow project.

How Oracle manages their Airflow in research and production:

Airflow for production:

  • Oracle’s use case and why they use Airflow
  • Working with Airflow over time: how their architecture changed
  • Airflow in large scale: how they run Airflow cluster with over 30k tasks
  • Automating the CI/CD process

Airflow orchestration as a service for research teams:

  • The different requirements for research
  • Running heavy research pipelines using Airflow + DBND open source
  • Using multiple compute environments

Evgeny, Databand’s co-Founder and CTO, will then discuss the pains that Airflow 1.0 has that Airflow 2.0 will solve.

Airflow then and now:

Apache Airflow 2.0 highlights and deep dive:

  • Functional DAGs
  • Airflow Scheduler (HA and Performance )
  • Serialized DAGs and Versioned Dags
  • KubernetesExecutor and KEDA
  • Packaging
  • What’s the current status? What’s missing? What’s Next?

Need to monitor pipeline data quality?

Databand can help. Find out how