Why do partially failed/failing switches fail to pass DHCP?

I've noticed this several times: a switch starts to behave oddly. Usually if the switch doesn't fail outright, what tends to get noticed is that DHCP doesn't work.

We had a Linksys SRW-224P fail today. Systems which were still connected worked properly, until it came time to renew their DHCP lease. Once the lease expired, they stopped working, but up until then we couldn't detect a failure. This includes PoE VoIP phones -- they work fine until their lease is up, at which point they're done.

I've noticed this on the above-mentioned Linksys, three varieties of 3Com, and possibly half a dozen dumb switches.

What is it about DHCP that makes it sensitive to failing switches?


Perhaps you are looking at this from the wrong direction instead of asking why does DHCP not work maybe you should be asking why does TCP-based communication working on a unreliable network where there is packet loss or corruption.

TCP based communication is meant to be reliable and the protocol is designed to retry communication other protocols like UDP are not reliable. DHCP just happens to be using UDP. On a typical network most of what you see these days is TCP-based. Its resilience properties may be what is be allowing you to continue to have communication over failing hardware.


I would say that DHCP is a little more complex than say http requests, but I would debate myself trying to claim that a way to find out if your switch is breaking is to check if DHCP requests are successful through it. I'm pretty sure if you looked at your actual VOIP traffic, you'd notice packet loss. VOIP (UDP) packet loss might be noticeable in a call depending on the % being dropped. If a few get dropped it's no big deal, but with DHCP requests, these packets are actually important and since they don't get retransmitted it would break the request.

Might come in handy to understand all of the things DHCP does for you (via Cisco).