Airflow is familiar with your data pipelines. It knows all about your tasks, statuses, and how long they take to run. It has awareness around execution. It doesn’t know anything about your DAGs.
There are plenty of issues that can happen to your data outside of what execution metadata would tell you. What if your data source doesn’t deliver any data for some reason? Airflow would show all green on the Webserver UI, but your data consumer would have stale data in their warehouse. What if data is delivered, but an entire column has missing values? Airflow says everything is good, but your data consumers have incomplete data. What if data is complete, but an unexpected transformation occurs? This may not cause a task to fail, but inaccurate data will be delivered.
You may be able to set some alerts around Run & Task duration that may help notify that something is up. That said, you wouldn’t have the flexibility you need to cover all of your blind spots, and you would still need to spend time diagnosing a root cause. This brings us to our next point.