How to continue execution on failed task after fixing error in playbook?

When writing and debugging Ansible playbooks, typical workflow is as follows:

  1. ansible-playbook ./main.yaml
  2. Playbook fails on some task
  3. Fix this task and repeat line 1, waiting for all previous tasks to execute again. Which takes a lot of time

Ideally, i'd like to resume execution on failed task, having inventory and all facts collected by previous tasks. Is it even possible? How to make playbook writing/debugging faster?


Take a look at Executing playbooks for troubleshooting. If you want to start executing your playbook at a particular task, you can do so with the --start-at-task option:

ansible-playbook playbook.yml --start-at-task="install packages"

The above will start executing your playbook at a task named “install packages”.

Alternatively, take a look at this previous answer How to run only one task in ansible playbook?

Finally, when a play fails, it usually gives you something along the lines of:

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/home/user/site.retry

Use that --limit command and it should retry from the failed task.


Future readers:

The --limit @/home/user/site.retry would not help in such a scenario, the .retry only stores the failed host and nothing more, so will just execute all tasks against failed hosts.

If you are using the latest version (Ansible 2.x) the --start-at-task does not work for tasks defined inside roles.

You can achieve similar effect by just using the --step flag e.g: ansible-playbook playbook.yml --step. The step asks you on before executing each task and you could choose (N)o/(y)es/(c)ontinue.

With this approach you selectively execute tasks when needed and also continue from point where it failed, after fixes.


Future Future readers:

As of Ansible 2.4.2.0 --start-at-task works for tasks defined in roles I created.

The ansible team is not willing to address this issue they suggest you keep your roles idempotent and replay the entire play, I don't have time for this. In my roles I am not using a massive amount of facts like @JeremyWhiting, so for me I can use this --start-at-task feature.

Still however, this is a manual task so instead I wrote some ansible rpm and added a "Resume" feature that follows these basic steps:

  • Enable the ansible log via /etc/ansible/ansible.cfg (uncomment log_path)
  • Clear the log before each run
  • After a failure, the "Resume" feature greps this log for the last "TASK" line, and uses sed to get what is inside the "[]"
  • Then it simply calls the last run play, with --start-at-task="$start_at_task"
  • Ensure that you have "any_errors_fatal: true" in your roles to stop the play at the failing task you wish to resume from

The ansible team is unwilling to create this basic (and very useful) feature so the only choice is to hack it together via some bash scripts.