Airflow 1.9.0 is queuing but not launching tasks
Airflow can be a bit tricky to setup.
- Do you have the
airflow scheduler
running? - Do you have the
airflow webserver
running? - Have you checked that all DAGs you want to run are set to On in the web ui?
- Do all the DAGs you want to run have a start date which is in the past?
- Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
- If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.
I've had for instance a DAG which was wrongly set to depends_on_past: True
which forbid the current instance to start correctly.
Also a great resource directly in the docs, which has a few more hints: Why isn't my task getting scheduled?.
I'm running a fork of the puckel/docker-airflow repo as well, mostly on Airflow 1.8 for about a year with 10M+ task instances. I think the issue persists in 1.9, but I'm not positive.
For whatever reason, there seems to be a long-standing issue with the Airflow scheduler where performance degrades over time. I've reviewed the scheduler code, but I'm still unclear on what exactly happens differently on a fresh start to kick it back into scheduling normally. One major difference is that scheduled and queued task states are rebuilt.
Scheduler Basics in the Airflow wiki provides a concise reference on how the scheduler works and its various states.
Most people solve the scheduler diminishing throughput problem by restarting the scheduler regularly. I've found success at a 1-hour interval personally, but have seen as frequently as every 5-10 minutes used too. Your task volume, task duration, and parallelism settings are worth considering when experimenting with a restart interval.
For more info see:
- Airflow: Tips, Tricks, and Pitfalls (section "The scheduler should be restarted frequently")
- Bug 1286825 - Airflow scheduler stopped working silently
- Airflow at WePay (section "Restart everything when deploying DAG changes.")
This used to be addressed by restarting every X runs using the SCHEDULER_RUNS
config setting, although that setting was recently removed from the default systemd scripts.
You might also consider posting to the Airflow dev mailing list. I know this has been discussed there a few times and one of the core contributors may be able to provide additional context.
Related Questions
- Airflow tasks get stuck at "queued" status and never gets running (especially see Bolke's answer here)
- Jobs not executing via Airflow that runs celery with RabbitMQ
Make sure you don't have datetime.now()
as your start_date
It's intuitive to think that if you tell your DAG to start "now" that it'll execute "now." BUT, that doesn't take into account how Airflow itself actually reads datetime.now()
.
For a DAG to be executed, the start_date must be a time in the past, otherwise Airflow will assume that it's not yet ready to execute. When Airflow evaluates your DAG file, it interprets datetime.now()
as the current timestamp (i.e. NOT a time in the past) and decides that it's not ready to run. Since this will happen every time Airflow heartbeats (evaluates your DAG) every 5-10 seconds, it'll never run.
To properly trigger your DAG to run, make sure to insert a fixed time in the past (e.g. datetime(2019,1,1)) and set catchup=False (unless you're looking to run a backfill).
By design, an Airflow DAG will execute at the completion of its schedule_interval
That means one schedule_interval AFTER the start date. An hourly DAG, for example, will execute its 2pm run when the clock strikes 3pm. The reasoning here is that Airflow can't ensure that all data corresponding to the 2pm interval is present until the end of that hourly interval.
This is a peculiar aspect to Airflow, but an important one to remember - especially if you're using default variables and macros.
Time in Airflow is in UTC by default
This shouldn't come as a surprise given that the rest of your databases and APIs most likely also adhere to this format, but it's worth clarifying.
Full article and source here
I also had a similar issue, but it is mostly related to SubDagOperator with more than 3000 task instances in total (30 tasks * 44 subdag tasks).
What I found out is that airflow scheduler
mainly responsible for putting your scheduled tasks in to "Queued Slots" (pool), while airflow celery workers
is the one who pick up your queued task and put it into the "Used Slots" (pool) and run it.
Based on your description, your scheduler
should work fine. I suggest you check your "celery workers" log to see whether there is any error, or restart it to see whether it helps or not. I experienced some issues that celery workers normally go on strike for a few minutes then start working again (especially on SubDagOperator)
I am facing the issue today and found that bullet point 4 from tobi6 answer below worked out and resolved the issue
*'Do all the DAGs you want to run have a start date which is in the past?'*
I am using airflow version v1.10.3