DNS resolve timeout on RHEL 6.3 behind firewall
I have a server running Red Hat 6.3 Enterprise inside a VMWare virtual machine. There's a Juniper SSG 5 firewall sitting between the server and the Internet. I'm trying to diagnose what it looks like to be DNS lookup timeout but I'm afraid I don't have the knowledge required to do it.
Here's the output of tcpdump when I do a $ wget www.google.com.br
:
$ tcpdump -i eth0 -n -vvv not port ssh 14:15:15.361010 IP (tos 0x0, ttl 64, id 8975, offset 0, flags [DF], proto UDP (17), length 63) 192.168.1.12.54835 > 200.196.66.30.domain: [bad udp cksum f4e1!] 47797+ A? www.google.com.br. (35) 14:15:15.361195 IP (tos 0x0, ttl 64, id 8976, offset 0, flags [DF], proto UDP (17), length 63) 192.168.1.12.54835 > 200.196.66.30.domain: [bad udp cksum 4e62!] 8028+ AAAA? www.google.com.br. (35) 14:15:15.362122 IP (tos 0x0, ttl 61, id 25375, offset 0, flags [none], proto UDP (17), length 283) 200.196.66.30.domain > 192.168.1.12.54835: [udp sum ok] 47797 q: A? www.google.com.br. 4/4/4 www.google.com.br. [1m] CNAME www-cctld.l.google.com., www-cctld.l.google.com. [4m47s] A 74.125.234.184, www-cctld.l.google.com. [4m47s] A 74.125.234.191, www-cctld.l.google.com. [4m47s] A 74.125.234.183 ns: google.com. [20h11m12s] NS ns1.google.com., google.com. [20h11m12s] NS ns2.google.com., google.com. [20h11m12s] NS ns3.google.com., google.com. [20h11m12s] NS ns4.google.com. ar: ns1.google.com. [3d1h39m] A 216.239.32.10, ns2.google.com. [3d23h47m54s] A 216.239.34.10, ns3.google.com. [3d21h48m53s] A 216.239.36.10, ns4.google.com. [3d14h11m28s] A 216.239.38.10 (255) 14:15:20.365434 IP (tos 0x0, ttl 64, id 8977, offset 0, flags [DF], proto UDP (17), length 63) 192.168.1.12.54835 > 200.196.66.30.domain: [bad udp cksum f4e1!] 47797+ A? www.google.com.br. (35) 14:15:20.366657 IP (tos 0x0, ttl 61, id 25377, offset 0, flags [none], proto UDP (17), length 283) 200.196.66.30.domain > 192.168.1.12.54835: [udp sum ok] 47797 q: A? www.google.com.br. 4/4/4 www.google.com.br. [55s] CNAME www-cctld.l.google.com., www-cctld.l.google.com. [4m42s] A 74.125.234.191, www-cctld.l.google.com. [4m42s] A 74.125.234.183, www-cctld.l.google.com. [4m42s] A 74.125.234.184 ns: google.com. [20h11m7s] NS ns4.google.com., google.com. [20h11m7s] NS ns2.google.com., google.com. [20h11m7s] NS ns3.google.com., google.com. [20h11m7s] NS ns1.google.com. ar: ns1.google.com. [3d1h38m55s] A 216.239.32.10, ns2.google.com. [3d23h47m49s] A 216.239.34.10, ns3.google.com. [3d21h48m48s] A 216.239.36.10, ns4.google.com. [3d14h11m23s] A 216.239.38.10 (255) 14:15:20.366760 IP (tos 0x0, ttl 64, id 8978, offset 0, flags [DF], proto UDP (17), length 63) 192.168.1.12.54835 > 200.196.66.30.domain: [bad udp cksum 4e62!] 8028+ AAAA? www.google.com.br. (35) 14:15:20.368486 IP (tos 0x0, ttl 61, id 25378, offset 0, flags [none], proto UDP (17), length 263) 200.196.66.30.domain > 192.168.1.12.54835: [udp sum ok] 8028 q: AAAA? www.google.com.br. 2/4/4 www.google.com.br. [55s] CNAME www-cctld.l.google.com., www-cctld.l.google.com. [3m7s] AAAA 2800:3f0:4001:805::1017 ns: google.com. [20h11m7s] NS ns3.google.com., google.com. [20h11m7s] NS ns2.google.com., google.com. [20h11m7s] NS ns4.google.com., google.com. [20h11m7s] NS ns1.google.com. ar: ns1.google.com. [3d1h38m55s] A 216.239.32.10, ns2.google.com. [3d23h47m49s] A 216.239.34.10, ns3.google.com. [3d21h48m48s] A 216.239.36.10, ns4.google.com. [3d14h11m23s] A 216.239.38.10 (235) 14:15:20.368936 IP (tos 0x0, ttl 64, id 36272, offset 0, flags [DF], proto TCP (6), length 60) 192.168.1.12.58407 > 74.125.234.191.http: Flags [S], cksum 0xa695 (correct), seq 2103294767, win 14600, options [mss 1460,sackOK,TS val 499988127 ecr 0,nop,wscale 7], length 0 14:15:20.370424 IP (tos 0x0, ttl 58, id 12210, offset 0, flags [none], proto TCP (6), length 60) 74.125.234.191.http > 192.168.1.12.58407: Flags [S.], cksum 0x2a65 (correct), seq 1378505609, ack 2103294768, win 14180, options [mss 1430,sackOK,TS val 4016826562 ecr 499988127,nop,wscale 6], length 0 [...]
The 5-second delay between the third and fourth lines happens consistently.
I cannot reproduce the delay using nslookup
, dig
or dig +dnssec
, so I'm out of ideas.
Does anyone have any clue as to what could be the problem?
This happens in dual stack IPv4/IPv6 environments where the machine doing a DNS lookup sends requests for AAAA and A records on the same socket, expecting to receive two replies back. This is default behavior for relatively recent versions of glibc. The Juniper firewall, however, drops the connection after the first reply comes back.
The Juniper knowledge base has an article describing how to configure the firewall to work around this issue.
As @BMDan noted, you can also add the line to /etc/resolv.conf
:
options single-request-reopen
This works around the broken firewall behavior by opening two connections to look up the AAAA and A records.