How to troubleshoot Linux in-kernel dns_resolver
Linux provides a facility that lets kernel and its modules to resolve DNS names by relying on user-space tools. This, for example, is used by CIFS to support referrals in the DFS.
The problem I'm seeing is that I can't get the kernel to resolve a specific DNS name and I don't understand why it fails.
To understand the root cause I have enabled debug output in both, CIFS and kernel dns resolver by running following commands:
echo "1" > /sys/module/dns_resolver/parameters/debug # dns_resolver
echo "7" > /proc/fs/cifs/cifsFYI # CIFS
Here's what I see in dmesg when the failure occurs:
fs/cifs/cifs_dfs_ref.c: DFS: ref path: \ESOTEST\dfstest\FS_SERV
fs/cifs/cifs_dfs_ref.c: DFS: node path: \FS\FS_SERV
fs/cifs/cifs_dfs_ref.c: DFS: fl: 2, srv_type: 0
fs/cifs/cifs_dfs_ref.c: DFS: ref_flags: 0, path_consumed: 24
fs/cifs/netmisc.c: address conversion returned 0 for FS
fs/cifs/netmisc.c: address conversion returned 0 for FS
[ls ] ==> dns_query((null),FS,2,(null))
fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: unable to resolve: FS
fs/cifs/cifs_dfs_ref.c: cifs_compose_mount_options: Failed to resolve server part of \\FS\FS_SERV to IP:
-22
And this is the output of a successful resolution:
fs/cifs/cifs_dfs_ref.c: DFS: node path: \ESOTEST\File-Server
fs/cifs/cifs_dfs_ref.c: DFS: fl: 2, srv_type: 0
fs/cifs/cifs_dfs_ref.c: DFS: ref_flags: 0, path_consumed: 28
fs/cifs/netmisc.c: address conversion returned 0 for ESOTEST
fs/cifs/netmisc.c: address conversion returned 0 for ESOTEST
[ls ] ==> dns_query((null),ESOTEST,7,(null))
[ls ] call request_key(,ESOTEST,)
[ls ] ==> dns_resolver_match(ESOTEST,ESOTEST)
[ls ] <== dns_resolver_match() = 1
[ls ] <== dns_query() = 14
fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: ESOTEST to 192.168.56.102
fs/cifs/cifsfs.c: Devname: \\ESOTEST\File-Server flags: 0
I am using Windows as the DNS server and I can resolve the name "FS" from the machine:
$ ping FS
PING FS.esodomain.com (192.168.56.104) 56(84) bytes of data.
64 bytes from fs.esodomain.com (192.168.56.104): icmp_seq=1 ttl=128 time=1.37 ms
64 bytes from fs.esodomain.com (192.168.56.104): icmp_seq=2 ttl=128 time=0.630 ms
I have also tried using key.dns_resolver to manually perform a test and it seems to work:
$ key.dns_resolver -vv -D "FS" 'hello'
I: Key description: 'dns_resolver;-1;-1;0;FS'
I: Callout info: 'hello'
D: Get A/AAAA RR for hostname:'FS', options:'hello'
D: Opt hello
D: Resolve 'FS' with 1ff
D: getaddrinfo = 0
D: RR: 0,2,1,6,10,(null)
D: append '192.168.56.104'
I: The key instantiation data is '192.168.56.104'
Contents of /etc/request-key.conf are:
create dns_resolver * * /sbin/key.dns_resolver %k
create user debug:* negate /bin/keyctl negate %k 30 %S
create user debug:* rejected /bin/keyctl reject %k 30 %c %S
create user debug:* expired /bin/keyctl reject %k 30 %c %S
create user debug:* revoked /bin/keyctl reject %k 30 %c %S
create user debug:loop:* * |/bin/cat
create user debug:* * /usr/share/keyutils/request-key-debug.sh %k %d %c %S
negate * * * /bin/keyctl negate %k 30 %S
The reason I am fiddling with this is that I'm trying to get a Windows DFS share to mount successfully. I am able to mount and access folders that are hosted on the root server but when I try to access a sub-folder which refers to an external server I get:
ls: cannot access /mnt/dfstest/FS_SERV/: Invalid argument
I'm on 3.7.10 kernel:
Linux gentoo 3.7.10-gentoo-r1 #3 SMP Fri Apr 19 17:32:20 PDT 2013 x86_64 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz GenuineIntel GNU/Linux
In a network capture I don't see any DNS requests for "FS" while I see a request for "ESOTEST". This suggests that the request is not ever made.
What next steps would you recommend to troubleshoot this?
This seems to be caused by the Linux kernel. Specifically, by the dns_resolver. "FS" is not even attempted at resolution.
The following lines in dns_resolver (net/dns_resolver/dns_query.c) seem to cause this:
if (namelen < 3)
return -EINVAL;
I don't know why there is this check. I will try renaming the other server from "FS" to something longer. I will try recompiling the kernel with this check removed.
UPDATE: yes, that was the reason and it works after re-naming the hostname to a longer name