Name resolution difference between CentOS and Debian
I have a small Java program that loops calling InetAddress.getByName("example.com") every second. When I run it on a CentOS 6.4 box using 'strace -f' I see that /etc/resolv.conf is opened and read once:
$ grep /etc/resolv.conf strace.out
[pid 24810] open("/etc/resolv.conf", O_RDONLY) = 6
When I run it on Debian 7 I see that /etc/resolv.conf is repeatedly opened or stat()'d:
$ grep /etc/resolv.conf strace.out
[pid 41821] open("/etc/resolv.conf", O_RDONLY) = 10
[pid 41821] stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
[pid 41821] open("/etc/resolv.conf", O_RDONLY) = 10
[pid 41821] stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
[pid 41821] stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
Both systems have /etc/nsswitch.conf configured with
hosts: files dns
Neither system has a name caching daemon running.
I used the same version of the Oracle HotSot Java JVM on both machines to rule out any Java differences.
The CentOS 6.4 box has glibc 2.12 installed. The Debian 7 box has glibc 2.13 installed.
What accounts for the different behavior between the two operating systems with regards to opening and reading /etc/resolv.conf?
The RedHat glibc developers consider some bugs in their software not to be bugs. One of these bugs is the re-reading of resolv.conf after changing. glibc considers that the responsibility of the application, so each and every application will need to create its own logic for this.
Because this is absolutely bonkers, the eglibc developers have fixed this issue. So on non-eglibc systems your application will need to have its own logic for reinitializing nss_dns, or else it will need to be restarted after a resolv.conf change. On eglibc systems (Debian and things based on Debian), you get a less buggy libc.
We found this out the hard way after changing resolv.conf, decommissioning old DNS servers and then having to restart 1200+ mysql servers. Needless to say, this is not fun.
Not only are the C library versions different, but CentOS uses the GNU C library (glibc
) whereas Debian uses Embedded GLIBC (eglibc
), so the actual implementation of the name lookup system calls is completely different.
That would probably account for different system call behaviour between these two distributions.
I assume InetAddress.getByName
translates into getaddrinfo()
. You could start by reading the source of each syscall in the relevant C library implementation and versions.
Make sure you read the source from the actual package versions you are using. The packages in EL 6.4 have had over 2 years of improvements done compared to their original upstream versions. I assume the same is true of the Debian packages.