Linux ping command exits early due to ICMP host unreachable

An automated script runs shutdown -r now on a machine, and after a 30s delay, uses ping to determine when the machine is available. I've recently switched the OS from Centos 5 to Oracle Linux 6 and found the behaviour of ping has changed.

I use ping with both a count (-c10), deadline (-w360) and delay (-W1) which should wait up to five minutes for ten successful replies from the machine.

I observe my own machine generating Destination Host Unreachable messages after 30 seconds that cause ping to exit after 3 errors ie. well before my desired deadline value. E.g. example exiting after ~37 seconds:

[cs@bst1 ~]# time ping -c10 -w360 -W1 hostother; echo $?
PING hostother (10.210.51.155) 56(84) bytes of data.
From bst1 (10.210.51.139) icmp_seq=36 Destination Host Unreachable
From bst1 (10.210.51.139) icmp_seq=37 Destination Host Unreachable
From bst1 (10.210.51.139) icmp_seq=38 Destination Host Unreachable

--- hostother ping statistics ---
38 packets transmitted, 0 received, +3 errors, 100% packet loss, time 37008ms
pipe 3

real    0m37.010s
user    0m0.001s
sys     0m0.000s
1

This seems to conflict with man ping:

If ping does not receive any reply packets at all it will exit with code 1. If a packet count and deadline are both specified, and fewer than count packets are received by the time the deadline has arrived, it will also exit with code 1. On other error it exits with code 2. Otherwise it exits with code 0. This makes it possible to use the exit code to see if a host is alive or not.

1) Is the behaviour of ping in the face of ICMP errors consistent with the man page? It seems the return code should be 2 under error conditions.

2) Is it possible to prevent my own machine from jumping in with these Destination Host Unreachable messages?

If I re-run ping a few times, it eventually sees the host and exits cleanly (return code 0).


Solution 1:

I suggest you percolate the timeout away from the ping and use the timeout command instead (part of coreutils):

timeout 300s bash -c "until ping -c10 hostother; do false; done"

You'll get 124 as return code if the command timed out; e.g. if it could not succeed in 10 consecutive pings in 5 minutes, and 0 if the ping succeeded, as soon as it happens.

I know this does not really answer the question (I admit the ping man page is not crystal clear) but hopefully solves your immediate issue.

Solution 2:

1) Yes, the behaviour of PING is consistent. "Destination host unreachable" can mean a number of things, but one of them is "this host has an address that indicates it's on my LAN, but it does not respond to ARP requests, and I have no valid ARP cache entry for it".

Here's me PINGing something on my LAN, and showing that it has no ARP cache entry:

[me@risby]$ ping 192.168.3.244
PING 192.168.3.244 (192.168.3.244) 56(84) bytes of data.
From 192.168.3.11 icmp_seq=1 Destination Host Unreachable
From 192.168.3.11 icmp_seq=2 Destination Host Unreachable
From 192.168.3.11 icmp_seq=3 Destination Host Unreachable
[...]
[me@risby]$ arp -a -n|grep 244
? (192.168.3.244) at <incomplete> on p1p1

PING isn't producing error 2 because it's true that no reply packets are received. It's also true this isn't PING's problem; it has asked the kernel to send icmp echo-requests, and the kernel has indicated that it can't do so. Here's an example of error 2, ie, "I, PING, simply cannot carry out those instructions; I have dropped the ball":

[me@risby]$ ping -c 3 192.168.3.999
ping: unknown host 192.168.3.999
[me@risby]$ echo $?
2

2) No.

As others have intimated ,you've picked the wrong way to test for a host being down, as opposed to ICMP-echo-request-unresponsive.