What does a DHCP-client consider to be the "best" answer?
We have training rooms where normally Windows XP is installed (via PXE). The "normal" DNS/DHCP infrastructure are Windows-Servers. The training room has its own VLAN (different from the Windows servers), so there is most propably an IP helper for DHCP requests active on the Cisco router where all PCs from that room are connected to.
Now we wanted to convert some of the PCs to Linux instead. The idea was: Put our own Laptop with a DHCP server into the VLAN of the room and override the "normal" DHCP response. The idea was that this should work, since a directly attached DHCP server in that VLAN should have a faster response-time than the "normal" DHCP server located some hops away from that VLAN.
It turned out that this did not work. We had to manually release the lease on the original DHCP server to get it working.
On the Laptop we did see the client requesting the IP and "our" dhcp was sending NACKs to the Windows IP request, before that we did offer our own response.
Old Question: Why did this not work out as expected? What is making the PC regain its old lease?
Update 2012-08-08:
The regain-issue has been explained in the DHCP-RFC. Now this explains why the PC regains its old lease.
Now we do release the IP from the Windows-DHCP-server before giving it another try.
Again - the Windows-DHCP-server wins.
I suspect that there is some algorithm for the dhcp-client which determines the "best" dhcp-answer for the client. The new question is:
How does the client choose the "best" answer?
Solution 1:
It is vendor, even firmware specific how a client reacts to multiple DHCP answers.
Variants I have seen over the years are:
1) Accept the first regardless whether it is an ACK or NACK.
2) Take the first ACK, ignore NACK's completely.
3) Take the last ACK received within a set time-interval (usually 5-10 seconds).
Example: Some years ago we had issues with Ricoh MFP's.
We had 2 DHCP servers. One supplied the addresses, the other only additional DHCP options. The 2nd server always answered first.
The Ricoh's used variant 1) even if the 1st offer only contained DHCP options. Ricoh changed it to variant 2) with a firmware update after we explained the problem to them.
Solution 2:
Assuming the router is still acting as a DHCP relay and forwarding the request to your original server, then the reason it did that is simply because that Windows DHCP server told it to go ahead and use the IP. In this instance the DHCPNACK from the new server is irrelevant, as a DHCP client will consider all responses, and since it got an offer from the Windows DHCP box, its perfectly happy to use it.
PC: Oh hi world, can I use 192.168.1.123?
New-DHCP: I say no.
Old-DHCP: I say yes.
PC: Someone said yes! Sweet, I'll use it!
Solution 3:
If nothing else helps - RTFM (read the fine manual). In this case the first one was the hit.
RFC 2131 outlines DHCP-operations.
Section 1.6 states that DHCP must:
Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism,
Now the interesting question is how that design goal is being achieved on a client that has no knowledge of its past. Section 3.2 outlines:
3.2 Client-server interaction - reusing a previously allocated network address
If a client remembers and wishes to reuse a previously allocated
network address, a client may choose to omit some of the steps
described in the previous section. The timeline diagram in figure 4
shows the timing relationships in a typical client-server interaction for a client reusing a previously allocated network address.
The client broadcasts a DHCPREQUEST message on its local subnet. The message includes the client's network address in the 'requested IP address' option. As the client has not received its network address, it MUST NOT fill in the 'ciaddr' field. BOOTP relay agents pass the message on to DHCP servers not on the same subnet. If the client used a 'client identifier' to obtain its address, the client MUST use the same 'client identifier' in the DHCPREQUEST message.
Servers with knowledge of the client's configuration parameters respond with a DHCPACK message to the client. Servers SHOULD NOT check that the client's network address is already in use; the client may respond to ICMP Echo Request messages at this point.
So a DHCP-server holding an active lease gets precedence by using a shortcut in the protcol.
- Client: DHCREQUEST (MAC-Adress, broadcast, will be transmittet in local broadcast domain - here the local VLAN and via IP-helper to the Windows-DHCP-server)
- Laptop-DHCP-Server: DHCPOFFER
- Windows-DHCP-Server: Hey - I already know you - DHCPACK
- Client: Oh - I got two responses. One that already knows me. Cool I will take that
From then on the Laptop-DHCP-Server is being ignored by the Client.
So the solution in our case will probably be (I will update this when we actually test it):
- Make sure Client is off
- Turn off DHCP-Server on Laptop, fake Client-MAC on Laptop, DHCP-Request
- Release IP
- Regain original IP and MAC, turn on DHCP-Server
- Turn on client and do a PXE-boot...