Bind: "unexpected end of input" due to NS

I have stumbled upon an odd error in a master-slave(s) configuration of Bind.

The zone works fine on the master, but on the slaves I'm getting these kind of errors:

21-May-2014 19:06:07.573 general: info: zone example.com/IN: refresh: failure trying master 1.2.3.4#53 (source 0.0.0.0#0): unexpected end of input

This is what my bind file looks like:

@   IN SOA  ns1.example.com.    admin.example.com. (
    2014052116    ; Serial
    28800         ; Refresh
    180           ; Retry
    604800        ; Expire
    21600       ) ; Minimum

                86400   IN A        1.2.3.4
                86400   IN MX       10 mail.example.com.
                86400   IN MX       20 mail2.example.com.
                86400   IN NS       ns1.example.com.
                86400   IN NS       ns2.example.com.
                86400   IN NS       ns3.example.com.
                86400   IN NS       ns1.example.net.
                86400   IN NS       ns2.example.net.
                86400   IN NS       ns3.example.net.
                86400   IN NS       ns1.example.org.

; until here it works -- if I uncomment the below here, I'll get "end of input" failures.
;               86400   IN NS       ns2.example.org.
;               86400   IN NS       ns3.example.org.


*               86400   IN A        1.2.3.4
[...]

If I uncomment the two NS lines that are commented -- I'll get "End of Input" errors. If I keep them commented, everything works fine.

Is there a maximum amount of NS or file size that causes it to crash?

Thanks.

Edit:

named-checkzone:

master # named-checkzone example.com example.com. 
zone example.com/IN: example.com/MX 'mail.example.com' is a CNAME (illegal)
zone example.com/IN: example.com/MX 'mail2.example.com' is a CNAME (illegal)
zone example.com/IN: loaded serial 2014052105
OK

global options:

options {
    directory "/var/cache/bind";
    auth-nxdomain no;    # conform to RFC1035
    listen-on-v6 { any; };
    listen-on { any; };
    dnssec-enable yes;
    recursion no;
    statistics-file "/var/log/named.stats";
    try-tcp-refresh yes;
};

Version (same on all three servers):

# named -v
BIND 9.8.4-rpz2+rl005.12-P1

I think you're running up against the maximum allowed UDP DNS packet size of 512 bytes. Prior to making the expected AXFR request (which runs in TCP mode; no size constraint), a slave server will also make a SOA query in order to confirm that the master considers itself authoritative for the zone.

The problem you run into here is that that the SOA response is going to contain more than than just the QUESTION and ANSWER sections:

  • The AUTHORITY section will contain all of your configured nameservers.
  • The ADDITIONAL section will contain all known A and AAAA records for those nameservers.

This is why adjusting your NS records or their associated A/AAAA records is having an impact on the success of the entire zone transfer, but adding other record types has no influence. Your combined authority data is just too huge for what can be transmitted over UDP.

Unfortunately, I don't know of any workarounds for this. The BIND Administrator's Reference Manual does make reference to a try-tcp-refresh option, but this defaults to yes and it is not disabled in your options. I'm not sure the zone transfer is the end of your problems though. Even if it were to succeed, this would cause problems for any clients that were to in turn make any request that would include your AUTHORITY and ADDITIONAL sections. EDNS0 is designed to solve problems like this, but I think AUTHORITY bloat is too functionally low level for it to be able to kick in.

Hopefully my analysis is wrong to some extent. I think you have a very interesting problem and I would like to see someone supply a better answer to this, because I'd stand to learn from it as well.