Context manager DAG
DAGs can be used as context managers to assign each Operator/Sensor to that DAG automatically. This can be helpful if you have lots of tasks in a DAG; you don't need to repeat dag=dag
in each Operator/Sensor. From the latest Airflow document, using context managers is recommended.
Below is a modified version of our first DAG in the previous page.
Create a file named 2_context_manager_dag.py
that contains the following code:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
default_args = {
"owner": "airflow",
"depends_on_past": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
"email": ["airflow@example.com"],
"email_on_failure": False,
"email_on_retry": False,
}
with DAG(
"2_context_manager_dag",
default_args=default_args,
description="context manager Dag",
schedule_interval="0 12 * * *",
start_date=datetime(2021, 12, 1),
catchup=False,
tags=["custom"],
) as dag:
start = BashOperator(
task_id="start",
bash_command="echo start",
)
end = BashOperator(
task_id="end",
bash_command="echo stop",
)
start >> end
So far, in the two DAGs that we wrote, the tasks run one by one. It is excellent but may not be the most efficient. What if we have a pipeline, and there are some tasks that can be running in parallel?