Ansible Fails to Authenticate Sudo Even When Sudo Pass is Given

Problem

Using the latest, stable Ansible build, I have an odd problem where my playbook hangs on one server during "Gathering_Facts" but works fine on other similar servers when using Sudo. On the Ansible server, I run as my user (NIS user) and use sudo (as root) on the remote server to make changes. If I remove Sudo from this setup, everything works fine.

Setup

Software Versions

  • OS: RHEL 6.4
  • Ansible version: ansible 1.8.2
  • Sudo version:
    Sudo version 1.8.6p3
    Sudoers policy plugin version 1.8.6p3
    Sudoers file grammar version 42
    Sudoers I/O plugin version 1.8.6p3
    
  • SSH version: OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010

Server Map

                   -------- User1@Server1: sudo -H -S -p (Hangs on Gathering_Facts)
                  /
User1@Ansible ----
                  \
                   -------- User1@Server2: sudo -H -S -p (Works fine)

Users

  • User1: NIS accessible user on both Server1 & Server2.
  • root: local root user for each server.

Ansible Configuration

Relevant parts of my ansible.cfg.

ansible.cfg

sudo           = true
sudo_user      = root
ask_sudo_pass  = True
ask_pass       = True
...
gathering = smart
....
# change this for alternative sudo implementations
sudo_exe = sudo

# what flags to pass to sudo
#sudo_flags = -H
...
# remote_user = ansible

Here is a simple test playbook to touch an empty file and then remove it. Really, I'm just wanting to test if I can get Ansible to correctly use sudo on the remote server. If the playbook runs at all, I'm in good shape.

TEST.yml

---
- hosts: Server1:Server2
  vars:
  - test_file: '/tmp/ansible_test_file.txt'
  sudo: yes
  tasks:
  - name: create empty file to test connectivity and sudo access
    file: dest={{ test_file }}
          state=touch
          owner=root group=root mode=0600
    notify:
    - clean
  handlers:
  - name: clean
    file: dest={{ test_file }}
          state=absent

Sudo Configuration

/etc/sudoers

Host_Alias SRV     = Server1, Server2
User_Alias SUPPORT = User1, User2, User3
SUPPORT SRV=(root) ALL

This sudo configuration works just fine on BOTH servers. No problems with sudo itself.

How I Run It All

Very simple:

$ ansible-playbook test.yml
SSH password: 
sudo password [defaults to SSH password]:

PLAY [Server1:Server2] ********************************************** 

GATHERING FACTS *************************************************************** 
ok: [Server2]
failed: [Server1] => {"failed": true, "parsed": false}

Sorry, try again.
[sudo via ansible, key=mxxiqyvztlfnbctwixzmgvhwfdarumtq] password: 
sudo: 1 incorrect password attempt


TASK: [create empty file to test connectivity and sudo access] **************** 
changed: [Server2]

NOTIFIED: [clean] ************************************************************* 
changed: [Server2]

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/home/User1/test.retry

Server1                 : ok=0    changed=0    unreachable=0    failed=1   
Server2                 : ok=3    changed=2    unreachable=0    failed=0

Fails regardless of if I explicitly enter both the SSH/Sudo passwords as well as implicitly (letting sudo pass default to SSH).

Remote Server Logs

Server1 (Fails)

/var/log/secure

Dec 31 15:21:10 Server1 sshd[27093]: Accepted password for User1 from x.x.x.x port 51446 ssh2
Dec 31 15:21:10 Server1 sshd[27093]: pam_unix(sshd:session): session opened for user User1 by (uid=0)
Dec 31 15:21:11 Server1 sshd[27095]: subsystem request for sftp
Dec 31 15:21:11 Server1 sudo: pam_unix(sudo:auth): authentication failure; logname=User1 uid=187 euid=0 tty=/dev/pts/1 ruser=User1 rhost=  user=User1 
Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): conversation failed
Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): auth could not identify password for [User1]
Dec 31 15:26:13 Server1 sudo:  User1 : 1 incorrect password attempt ; TTY=pts/1 ; PWD=/home/User1 ; USER=root ; COMMAND=/bin/sh -c echo SUDO-SUCCESS-mxxiqyvztlfnbctwixzmgvhwfdarumtq; LANG=C LC_CTYPE=C /usr/bin/python /tmp/.ansible/tmp/ansible-tmp-1420039272.66-164754043073536/setup; rm -rf /tmp/.ansible/tmp/ansible-tmp-1420039272.66-164754043073536/ >/dev/null 2>&1
Dec 31 15:26:13 Server1 sshd[27093]: pam_unix(sshd:session): session closed for user User1 

Server2 (runs fine)

/var/log/secure

Dec 31 15:21:12 Server2 sshd[31447]: Accepted password for User1 from x.x.x.x port 60346 ssh2
Dec 31 15:21:12 Server2 sshd[31447]: pam_unix(sshd:session): session opened for user User1 by (uid=0)
Dec 31 15:21:12 Server2 sshd[31449]: subsystem request for sftp
Dec 31 15:21:12 Server2 sudo:  User1 : TTY=pts/2 ; PWD=/home/User1 ; USER=root ; COMMAND=/bin/sh -c echo SUDO-SUCCESS-vjaypzeocvrdlqalxflgcrcoezhnbibs; LANG=C LC_CTYPE=C /usr/bin/python /tmp/.ansible/tmp/ansible-tmp-1420039272.68-243930711246149/setup; rm -rf /tmp/.ansible/tmp/ansible-tmp-1420039272.68-243930711246149/ >/dev/null 2>&1
Dec 31 15:21:14 Server2 sshd[31447]: pam_unix(sshd:session): session closed for user User1 

STrace Output

Here is the output from strace when targeting the root user's ansible command. Command:

while [[ -z $(ps -fu root|grep [a]nsible|awk '{print $2}') ]]; do
    continue
done
strace -vfp $(ps -fu root|grep [a]nsible|awk '{print $2}') -o /root/strace.out`

Server1

23650 select(0, NULL, NULL, NULL, {1, 508055}) = 0 (Timeout)
23650 socket(PF_NETLINK, SOCK_RAW, 9)   = 10
23650 fcntl(10, F_SETFD, FD_CLOEXEC)    = 0
23650 readlink("/proc/self/exe", "/usr/bin/sudo", 4096) = 13
23650 sendto(10, "|\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentic"..., 124, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 124
23650 poll([{fd=10, events=POLLIN}], 1, 500) = 1 ([{fd=10, revents=POLLIN}])
23650 recvfrom(10, "$\0\0\0\2\0\0\0\1\0\0\0b\\\0\0\0\0\0\0|\0\0\0L\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
23650 recvfrom(10, "$\0\0\0\2\0\0\0\1\0\0\0b\\\0\0\0\0\0\0|\0\0\0L\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
23650 close(10)                         = 0
23650 write(2, "Sorry, try again.\n", 18) = 18
23650 gettimeofday({1420050850, 238344}, NULL) = 0
23650 socket(PF_FILE, SOCK_STREAM, 0)   = 10
23650 connect(10, {sa_family=AF_FILE, path="/var/run/dbus/system_bus_socket"}, 33) = 0

Server2

6625  select(8, [5 7], [], NULL, NULL)  = ? ERESTARTNOHAND (To be restarted)
6625  --- SIGCHLD (Child exited) @ 0 (0) ---
6625  write(8, "\21", 1)                = 1
6625  rt_sigreturn(0x8)                 = -1 EINTR (Interrupted system call)
6625  select(8, [5 7], [], NULL, NULL)  = 1 (in [7])
6625  read(7, "\21", 1)                 = 1
6625  wait4(6636, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 6636
6625  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
6625  socket(PF_NETLINK, SOCK_RAW, 9)   = 6
6625  fcntl(6, F_SETFD, FD_CLOEXEC)     = 0
6625  readlink("/proc/self/exe", "/usr/bin/sudo", 4096) = 13
6625  sendto(6, "x\0\0\0R\4\5\0\6\0\0\0\0\0\0\0op=PAM:session_c"..., 120, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 120
6625  poll([{fd=6, events=POLLIN}], 1, 500) = 1 ([{fd=6, revents=POLLIN}])
6625  recvfrom(6, "$\0\0\0\2\0\0\0\6\0\0\0\330\355\377\377\0\0\0\0x\0\0\0R\4\5\0\6\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
6625  recvfrom(6, "$\0\0\0\2\0\0\0\6\0\0\0\330\355\377\377\0\0\0\0x\0\0\0R\4\5\0\6\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
6625  close(6)                          = 0
6625  open("/etc/security/pam_env.conf", O_RDONLY) = 6
6625  fstat(6, {st_dev=makedev(253, 1), st_ino=521434, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2980, st_atime=2014/12/31-16:10:01, st_mtime=2012/10/15-08:23:52, st_ctime=2014/06/16-15:45:35}) = 0
6625  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbc3a59a000
6625  read(6, "#\n# This is the configuration fi"..., 4096) = 2980
6625  read(6, "", 4096)                 = 0
6625  close(6)                          = 0
6625  munmap(0x7fbc3a59a000, 4096)      = 0
6625  open("/etc/environment", O_RDONLY) = 6

My Guess

Server1 isn't getting the password properly or is incorrectly asking/waiting for a password. This doesn't look like a Sudo or Ansible problem (alone, they both work just fine), but Server1 doesn't seem to receive the credentials (or adhere to them) in a similar fashion as Server2. Server1 & 2 serve different purposes, so it's possible that they have some authentication or package version differences, but they were both built from the same repository; therefore, they shouldn't be THAT different.

PAM Auth

I thought maybe the systems had different PAM configurations causing the passwords to be handled a bit differently. I compared the /etc/pam.d/ files (using md5sum [file]), and they are the same between the two systems.

Tests

Sudo STDIN

Tested another issue where sudo wouldn't read a password from STDIN, but that worked fine on both servers.

Test Sudo Ad-Hoc

-bash-4.1$ ansible Server1 -m file -a "dest=/tmp/ansible_test.txt state=touch" -sK
SSH password: 
sudo password [defaults to SSH password]: 
Server1 | success >> {
    "changed": true, 
    "dest": "/tmp/ansible_test.txt", 
    "gid": 0, 
    "group": "root", 
    "mode": "0644", 
    "owner": "root", 
    "size": 0, 
    "state": "file", 
    "uid": 0
}

Success! But why?!

TL;DR

  1. Server1 seems to be waiting on sudo password prompt while Server2 runs just fine.
  2. Running ansible "ad-hoc" on Server1 works fine. Running it as a playbook fails.

Question(s)

  • What could cause my Ansible Sudo configuration to work fine on one server and be rejected on another?
  • Does Ansible perform the password "pass" from local to remote machine differently when run ad-hoc versus playbook? I assumed they would be the same.

I'm thinking this is getting close to simply submitting a bug report to the GitHub page purely on the fact that sudo access has different results depending if I'm running ad-hoc or not.


What I would do is to use

strace -vfp `pidof sshd`

and see where it's failing.

Check the account as well, maybe it's restricted or something but my bet is that something is wrong with your /etc/hosts file or it does get changed in the process.


Using @lulian as a foothold in this answer, the problem came down to a rogue ansible_sudo_pass: defined in the group_vars which was overriding the password entered for --ask-sudo-pass.

Using the following:

while [[ -z $(ps -eaf|grep 'sshd: [U]ser1@pts/1') ]]; do
    continue
done
strace -ff -vfp $(ps -eaf|grep 'sshd: [U]ser1@pts/1'|awk '{print $2}') -o /root/strace_sshd1_2.out

I was able to find that write(4, "{{ password }}\n", 15) was being passed instead of the entered password. After some quick searching, I did indeed find ansible_sudo_pass defined in my group_vars which was overriding my entered password.

As an FYI to everyone else, the ansible_sudo_pass: definition seems to take precedence over --ask-sudo-pass which, at first, seemed counter-intuitive. In the end, this is user error, but @lulian's methodology in debugging the SSH interaction as well as the relationship discovery between ansible_sudo_pass and --ask-sudo-pass should be very helpful for others out there. (Hopefully!)