Tell bind to ignore SOA domain check on forwarder zones

I've a weird issue with bind.

Premise: I'm using bind (version 9.16_11) installed on pfSense, but despite this I can change almost anything on bind configuration.

I've configured a simple forward zone, the configuration is something like this:

zone "dom001.my-domain.com" {
        type forward;
        forward only;
        forwarders { 192.168.29.10; };
};

Now, if I try to do a nslookup to an host in this domain I see an error. Example:

Non-authoritative answer:
Name:   mail2.dom001.my-domain.com
Address: 192.168.210.126
** server can't find mail2.dom001.my-domain.com: SERVFAIL

The weird thing is that the answer is received (you can see the address in the response) but despite this I see the SERVFAIL error.

Other weird thing, dig doesn't reports any error:

; <<>> DiG 9.16.6 <<>> mail2.dom001.my-domain.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53129
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 3218b8a1b8f64565eb9bd6636124bf73640809a4347f3bcf (good)
;; QUESTION SECTION:
;mail2.dom001.my-domain.com. IN A

;; ANSWER SECTION:
mail2.dom001.my-domain.com. 30 IN A  192.168.210.126

;; Query time: 30 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Tue Aug 24 11:44:19 CEST 2021
;; MSG SIZE  rcvd: 110

During these queries I see some 'warnings' on bind's logs:

Aug 24 10:42:58 named   19540   lame-servers: info: FORMERR resolving 'mail2.dom001.my-domain.com/AAAA/IN': 192.168.29.10#53
Aug 24 10:42:58 named   19540   resolver: notice: DNS format error from 192.168.29.10#53 resolving mail2.dom001.my-domain.com/AAAA for client 10.16.16.41#38299: Name cluster.local (SOA) not subdomain of zone dom001.my-domain.com -- invalid response

I've checked further and it seems that the issue is related to SOA records on forwarder server:

;; QUESTION SECTION:
;mail2.dom001.my-domain.com. IN SOA

;; ANSWER SECTION:
cluster.local.          30      IN      SOA     ns.dns.cluster.local. hostmaster.cluster.local. 1629766398 7200 1800 86400 30

In fact the answer is cluster.local instead of dom001.my-domain.com.

This issue is causing strange behavior depending on OS used. For example I see that most Linux server are working fine, while some version of Alpine Linux cannot resolve hostnames on that domain.

And even with the server that are working fine, I have bind's logs full of errors due to this issue.

Unlucky I cannot control the forwarder server and change the SOA record.

My question is: how I can configure bind in order to ignore the SOA record of that forwarder and accept the answer even if the SOA is not coerent?

I know that's not the best solution, but I need to workaround the misconfigured forwarder.

Thanks in advance for your help!


Solution 1:

I don't believe there are any options in BIND that will make it accept that answer, as it appears unrelated to the query.
Seeing that type of inconsistent answer is definitely not expected and I think that if you truly want to (temporarily?) accept and pass on these answers (clients may not like them either, of course), you may have to look at other software that does not care for the response contents in the same way.
(I suspect that dnsdist, being a proxy rather than a recursor, could do this for you.)

That said, I think I can somewhat clear up some of the confusion...

The nslookup situation is based on how nslookup sends two separate queries by default, one for A and one for AAAA.
The A query is successful and clearly had the relevant A record as the answer, the AAAA query was not successful, presumably there was no AAAA record and as negative responses always come with the (supposed to be) relevant SOA record that probably triggers the exact problem you described.

I expect that you can also reproduce the problem with dig just fine if you make it send the same query that failed, so you would need to send a query for AAAA to get the same failure that nslookup got for one of its two queries.

As for the behavior of the other nameserver, it's not really a case of "editing the SOA", it's more some kind of logic bug in the nameserver software. It should not actually be possible to find a cluster.local record when looking up mail2.dom001.my-domain.com, that is in a whole different branch of the tree.