How to detect why Ansible playbook hangs during execution
Most Probable cause of your problem would be SSH connection. When a task requires a long execution time SSH timeouts. I faced such problem once, in order to overcome the SSH timeout thing, create a ansible.cfg in the current directory from which your are running Ansible add the following:
[ssh_connection]
ssh_args = -o ServerAliveInterval=n
Where n
is the ServerAliveInterval (seconds) which we use while connecting to the server through SSH. Set it between 1-255. This will cause ssh client to send null packets to server every n
seconds to avoid connection timeout.
I was having same problems with a playbook.
It ran perfectly until some point then stopped so I've added async and poll parameters to avoid this behavior
- name: update packages full into each server
apt: upgrade=full
ignore_errors: True
async: 60
poll: 60
and it worked like a charm! I really don't know what happened but it seems now Ansible take in mind what's going on and don't freezes anymore !
Hope it helps
I had the same issues and after a bit of fiddling around I found the problem to be in the step of gathering facts. Here are a few tips to better resolve any similar issue.
Disable fact-gathering in your playbook:
---
- hosts: myservers
gather_facts: no
..
Rerun the playbook. If it works, then it means that the culprit is not in the SSH itself but rather in the script gathering the facts. We can debug that issue quite easily.
- SSH to the remote box
- Find the
setup
file somewhere in.ansible
folder. - Run it with
./setup
orpython -B setup
If it hangs, then we know that the problem is here for sure. To find excactly what makes it hang you can simply open the file with an editor and add print
statements mainly in the populate()
method of Facts
. Rerun the script and see how long it goes.
For me the issue seemed to be trying to resolve the hostname at line self.facts['fqdn'] = socket.getfqdn()
and with a bit of googling it turned out to be an issue with resolving the remote hostname.
A totally different work-around for me. I had this from a Debian Jessie (Linux PwC-Deb64 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux
) to another Debian image I was trying to build in AWS.
After many of the suggestions here didn't work for me, I got suspicion around the SSH "shared" connection. I went to my ansible.cfg
and found the ssh_args
lines and set ControlMaster=no
. This may now perform slowly because I've lost the SSH performance boost that this is supposed to give, but it seems like there is some interaction between this and apt-get
that is causing the issue.
Your ansible.cfg
could be in the directory that you run ansible
from, or in /etc/ansible
. If the latter, you may like to take a copy of it into a local directory before you start changing it!
In my case, ansible was "hanging forever" because apt-get was trying to ask me a question! How did I figure this out? I went to the target server and ran ps -aef | grep apt
and then did a kill
on the appropriate "stuck" apt-get
command.
Immediately after I did that, my ansible playbook sprang back to life and reported (with ansible-playbook -vvv
option given):
" ==> Deleted (by you or by a script) since installation.",
" ==> Package distributor has shipped an updated version.",
" What would you like to do about it ? Your options are:",
" Y or I : install the package maintainer's version",
" N or O : keep your currently-installed version",
" D : show the differences between the versions",
" Z : start a shell to examine the situation",
" The default action is to keep your current version.",
"*** buildinfo.txt (Y/I/N/O/D/Z) [default=N] ? "
After reading that helpful diagnostic output, I immediately realized I needed some appropriate dpkg options (see for example, this devops post). In my case, I chose:
apt:
name: '{{ item }}'
state: latest
update_cache: yes
# Force apt to always update to the newer config files in the package:
dpkg_options: 'force-overwrite,force-confnew'
loop: '{{ my_packages }}'
Also, don't forget to clean up after your killed ansible session with something like this, or your install will still likely fail:
sudo dpkg --configure -a