Ansible Fails to Authenticate Sudo Even When Sudo Pass is Given
Problem
Using the latest, stable Ansible build, I have an odd problem where my playbook hangs on one server during "Gathering_Facts" but works fine on other similar servers when using Sudo. On the Ansible server, I run as my user (NIS user) and use sudo (as root) on the remote server to make changes. If I remove Sudo from this setup, everything works fine.
Setup
Software Versions
- OS: RHEL 6.4
- Ansible version: ansible 1.8.2
-
Sudo version:
Sudo version 1.8.6p3 Sudoers policy plugin version 1.8.6p3 Sudoers file grammar version 42 Sudoers I/O plugin version 1.8.6p3
- SSH version: OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
Server Map
-------- User1@Server1: sudo -H -S -p (Hangs on Gathering_Facts) / User1@Ansible ---- \ -------- User1@Server2: sudo -H -S -p (Works fine)
Users
- User1: NIS accessible user on both Server1 & Server2.
- root: local root user for each server.
Ansible Configuration
Relevant parts of my ansible.cfg.
ansible.cfg
sudo = true
sudo_user = root
ask_sudo_pass = True
ask_pass = True
...
gathering = smart
....
# change this for alternative sudo implementations
sudo_exe = sudo
# what flags to pass to sudo
#sudo_flags = -H
...
# remote_user = ansible
Here is a simple test playbook to touch an empty file and then remove it. Really, I'm just wanting to test if I can get Ansible to correctly use sudo on the remote server. If the playbook runs at all, I'm in good shape.
TEST.yml
---
- hosts: Server1:Server2
vars:
- test_file: '/tmp/ansible_test_file.txt'
sudo: yes
tasks:
- name: create empty file to test connectivity and sudo access
file: dest={{ test_file }}
state=touch
owner=root group=root mode=0600
notify:
- clean
handlers:
- name: clean
file: dest={{ test_file }}
state=absent
Sudo Configuration
/etc/sudoers
Host_Alias SRV = Server1, Server2
User_Alias SUPPORT = User1, User2, User3
SUPPORT SRV=(root) ALL
This sudo configuration works just fine on BOTH servers. No problems with sudo itself.
How I Run It All
Very simple:
$ ansible-playbook test.yml SSH password: sudo password [defaults to SSH password]: PLAY [Server1:Server2] ********************************************** GATHERING FACTS *************************************************************** ok: [Server2] failed: [Server1] => {"failed": true, "parsed": false} Sorry, try again. [sudo via ansible, key=mxxiqyvztlfnbctwixzmgvhwfdarumtq] password: sudo: 1 incorrect password attempt TASK: [create empty file to test connectivity and sudo access] **************** changed: [Server2] NOTIFIED: [clean] ************************************************************* changed: [Server2] PLAY RECAP ******************************************************************** to retry, use: --limit @/home/User1/test.retry Server1 : ok=0 changed=0 unreachable=0 failed=1 Server2 : ok=3 changed=2 unreachable=0 failed=0
Fails regardless of if I explicitly enter both the SSH/Sudo passwords as well as implicitly (letting sudo pass default to SSH).
Remote Server Logs
Server1 (Fails)
/var/log/secure
Dec 31 15:21:10 Server1 sshd[27093]: Accepted password for User1 from x.x.x.x port 51446 ssh2 Dec 31 15:21:10 Server1 sshd[27093]: pam_unix(sshd:session): session opened for user User1 by (uid=0) Dec 31 15:21:11 Server1 sshd[27095]: subsystem request for sftp Dec 31 15:21:11 Server1 sudo: pam_unix(sudo:auth): authentication failure; logname=User1 uid=187 euid=0 tty=/dev/pts/1 ruser=User1 rhost= user=User1 Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): conversation failed Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): auth could not identify password for [User1] Dec 31 15:26:13 Server1 sudo: User1 : 1 incorrect password attempt ; TTY=pts/1 ; PWD=/home/User1 ; USER=root ; COMMAND=/bin/sh -c echo SUDO-SUCCESS-mxxiqyvztlfnbctwixzmgvhwfdarumtq; LANG=C LC_CTYPE=C /usr/bin/python /tmp/.ansible/tmp/ansible-tmp-1420039272.66-164754043073536/setup; rm -rf /tmp/.ansible/tmp/ansible-tmp-1420039272.66-164754043073536/ >/dev/null 2>&1 Dec 31 15:26:13 Server1 sshd[27093]: pam_unix(sshd:session): session closed for user User1
Server2 (runs fine)
/var/log/secure
Dec 31 15:21:12 Server2 sshd[31447]: Accepted password for User1 from x.x.x.x port 60346 ssh2 Dec 31 15:21:12 Server2 sshd[31447]: pam_unix(sshd:session): session opened for user User1 by (uid=0) Dec 31 15:21:12 Server2 sshd[31449]: subsystem request for sftp Dec 31 15:21:12 Server2 sudo: User1 : TTY=pts/2 ; PWD=/home/User1 ; USER=root ; COMMAND=/bin/sh -c echo SUDO-SUCCESS-vjaypzeocvrdlqalxflgcrcoezhnbibs; LANG=C LC_CTYPE=C /usr/bin/python /tmp/.ansible/tmp/ansible-tmp-1420039272.68-243930711246149/setup; rm -rf /tmp/.ansible/tmp/ansible-tmp-1420039272.68-243930711246149/ >/dev/null 2>&1 Dec 31 15:21:14 Server2 sshd[31447]: pam_unix(sshd:session): session closed for user User1
STrace Output
Here is the output from strace when targeting the root user's ansible command. Command:
while [[ -z $(ps -fu root|grep [a]nsible|awk '{print $2}') ]]; do
continue
done
strace -vfp $(ps -fu root|grep [a]nsible|awk '{print $2}') -o /root/strace.out`
Server1
23650 select(0, NULL, NULL, NULL, {1, 508055}) = 0 (Timeout) 23650 socket(PF_NETLINK, SOCK_RAW, 9) = 10 23650 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 23650 readlink("/proc/self/exe", "/usr/bin/sudo", 4096) = 13 23650 sendto(10, "|\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentic"..., 124, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 124 23650 poll([{fd=10, events=POLLIN}], 1, 500) = 1 ([{fd=10, revents=POLLIN}]) 23650 recvfrom(10, "$\0\0\0\2\0\0\0\1\0\0\0b\\\0\0\0\0\0\0|\0\0\0L\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 23650 recvfrom(10, "$\0\0\0\2\0\0\0\1\0\0\0b\\\0\0\0\0\0\0|\0\0\0L\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 23650 close(10) = 0 23650 write(2, "Sorry, try again.\n", 18) = 18 23650 gettimeofday({1420050850, 238344}, NULL) = 0 23650 socket(PF_FILE, SOCK_STREAM, 0) = 10 23650 connect(10, {sa_family=AF_FILE, path="/var/run/dbus/system_bus_socket"}, 33) = 0
Server2
6625 select(8, [5 7], [], NULL, NULL) = ? ERESTARTNOHAND (To be restarted) 6625 --- SIGCHLD (Child exited) @ 0 (0) --- 6625 write(8, "\21", 1) = 1 6625 rt_sigreturn(0x8) = -1 EINTR (Interrupted system call) 6625 select(8, [5 7], [], NULL, NULL) = 1 (in [7]) 6625 read(7, "\21", 1) = 1 6625 wait4(6636, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 6636 6625 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 6625 socket(PF_NETLINK, SOCK_RAW, 9) = 6 6625 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 6625 readlink("/proc/self/exe", "/usr/bin/sudo", 4096) = 13 6625 sendto(6, "x\0\0\0R\4\5\0\6\0\0\0\0\0\0\0op=PAM:session_c"..., 120, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 120 6625 poll([{fd=6, events=POLLIN}], 1, 500) = 1 ([{fd=6, revents=POLLIN}]) 6625 recvfrom(6, "$\0\0\0\2\0\0\0\6\0\0\0\330\355\377\377\0\0\0\0x\0\0\0R\4\5\0\6\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 6625 recvfrom(6, "$\0\0\0\2\0\0\0\6\0\0\0\330\355\377\377\0\0\0\0x\0\0\0R\4\5\0\6\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 6625 close(6) = 0 6625 open("/etc/security/pam_env.conf", O_RDONLY) = 6 6625 fstat(6, {st_dev=makedev(253, 1), st_ino=521434, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2980, st_atime=2014/12/31-16:10:01, st_mtime=2012/10/15-08:23:52, st_ctime=2014/06/16-15:45:35}) = 0 6625 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbc3a59a000 6625 read(6, "#\n# This is the configuration fi"..., 4096) = 2980 6625 read(6, "", 4096) = 0 6625 close(6) = 0 6625 munmap(0x7fbc3a59a000, 4096) = 0 6625 open("/etc/environment", O_RDONLY) = 6
My Guess
Server1 isn't getting the password properly or is incorrectly asking/waiting for a password. This doesn't look like a Sudo or Ansible problem (alone, they both work just fine), but Server1 doesn't seem to receive the credentials (or adhere to them) in a similar fashion as Server2. Server1 & 2 serve different purposes, so it's possible that they have some authentication or package version differences, but they were both built from the same repository; therefore, they shouldn't be THAT different.
PAM Auth
I thought maybe the systems had different PAM configurations causing the passwords to be handled a bit differently. I compared the /etc/pam.d/ files (using md5sum [file]
), and they are the same between the two systems.
Tests
Sudo STDIN
Tested another issue where sudo wouldn't read a password from STDIN, but that worked fine on both servers.
Test Sudo Ad-Hoc
-bash-4.1$ ansible Server1 -m file -a "dest=/tmp/ansible_test.txt state=touch" -sK SSH password: sudo password [defaults to SSH password]: Server1 | success >> { "changed": true, "dest": "/tmp/ansible_test.txt", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "size": 0, "state": "file", "uid": 0 }
Success! But why?!
TL;DR
- Server1 seems to be waiting on sudo password prompt while Server2 runs just fine.
- Running
ansible
"ad-hoc" on Server1 works fine. Running it as a playbook fails.
Question(s)
- What could cause my Ansible Sudo configuration to work fine on one server and be rejected on another?
- Does Ansible perform the password "pass" from local to remote machine differently when run ad-hoc versus playbook? I assumed they would be the same.
I'm thinking this is getting close to simply submitting a bug report to the GitHub page purely on the fact that sudo access has different results depending if I'm running ad-hoc or not.
What I would do is to use
strace -vfp `pidof sshd`
and see where it's failing.
Check the account as well, maybe it's restricted or something but my bet is that something is wrong with your /etc/hosts file or it does get changed in the process.
Using @lulian as a foothold in this answer, the problem came down to a rogue ansible_sudo_pass:
defined in the group_vars which was overriding the password entered for --ask-sudo-pass
.
Using the following:
while [[ -z $(ps -eaf|grep 'sshd: [U]ser1@pts/1') ]]; do
continue
done
strace -ff -vfp $(ps -eaf|grep 'sshd: [U]ser1@pts/1'|awk '{print $2}') -o /root/strace_sshd1_2.out
I was able to find that write(4, "{{ password }}\n", 15)
was being passed instead of the entered password. After some quick searching, I did indeed find ansible_sudo_pass
defined in my group_vars which was overriding my entered password.
As an FYI to everyone else, the ansible_sudo_pass:
definition seems to take precedence over --ask-sudo-pass
which, at first, seemed counter-intuitive. In the end, this is user error, but @lulian's methodology in debugging the SSH interaction as well as the relationship discovery between ansible_sudo_pass
and --ask-sudo-pass
should be very helpful for others out there. (Hopefully!)