How does a NIC send a hardware interrupt? [closed]
Some context.
A few weeks ago, a NIC was replaced on a box without much troubleshooting to find a clear resolution to the problem. A senior administrator got into a small tiff with an entry-level admin concerning hardware interrupts and ethernet cards. Specifically, how they work. The entry-level admin gave a vague answer, insisted he was correct, and the matter was closed without a real conclusion.
I know, in theory, how a hardware interrupt works, but how does it specifically function when a NIC receives packets of information? What is happening on a hardware level? How would one properly diagnose whether or not physical damage has occurred in order to avoid what essentially amounted to throwing parts at a problem?
Solution 1:
I know, in theory, how a hardware interrupt works, but how does it specifically function when a NIC receives packets of information? What is happening on a hardware level?
When the NIC receives information, it checks to see if the conditions are met to trigger a hardware interrupt. This is typically done in firmware on the NIC controller. If, for example, a receive interrupt has already been sent but not yet acknowledged, there's no reason to send another.
If the NIC decides to send an interrupt, the actual mechanism depends on the NIC interface and how it's configured. The old way was to actually change the voltage on a dedicated interrupt line. This would go to the interrupt controller which would typically assert some other line that combined multiple interrupts. The OS would then ask the interrupt controller which interrupts had triggered.
The newer way is "message signaled interrupts" (MSI) where the NIC basically writes a particular word to a particular address and that causes an interrupt to be generated by some other piece of hardware, usually the bus controller. This allows a device to have more interrupts and also to target interrupts to specific processors.
How would one properly diagnose whether or not physical damage has occurred in order to avoid what essentially amounted to throwing parts at a problem?
It's usually difficult to do this and it's unlikely most people would have enough experience to diagnose the problem. Hardware can fail in different ways and it's often difficult to tell where in the chain the failure occurred. It's generally more efficient to just replace the most likely failed part, see if the problem goes away, and then repeat. If there's evidence of a NIC hardware problem, I'd try replacing the NIC first.