If you turn on spanning tree, how do you know there is an issue with your network?
I have been learning about spanning tree protocol (STP/RSTP/MSTP) and was wondering, once I turn it on and it's protecting against for example network loops, how do I know there is a network loop?
I suppose in most cases it would be obvious, because the room the loop is in would be down, but what if there is no complaint?
It seems like I would still want a way to know, that there is a network issue such as this. Maybe the device sending some kind of alert, or maybe someone has to check a log or something else occasionally?
Solution 1:
You watch your switch logs for spanning tree events, or configure your switches to send SNMP traps when STP shuts down a port.
Solution 2:
Testing. If you want to know that something is working, you test it.
Once you've enabled STP, schedule network maintenance and plug a cable in a loop. If the network is still working then the loop was detected by STP. If your network goes down then STP isn't working.
Solution 3:
Spanning tree does not think a loop is an "error". They are part of the protocol and it will find the ports that cause loops, and then disable forwarding on them. I think you're trying to use a protocol to find out if a certain condition exists, but that's not really its primary purpose. A "well-designed" network may very well have loops normally (for redundancy). In addition to turning on logging event spanning-tree status (or the equivalent on your platform), think outside the box. A loop in your network (if not disabled by spanning tree) will cause large traffic levels in a broadcast storm. So graph those levels and in your monitoring platform if you see a sharp rise in traffic you've probably got a loop.
Solution 4:
Here are some extra things to consider in your STP/RSTP/MSTP implementation along with your testing:
- Set your switch priorities to ensure that a predetermined switch is elected as the root and a secondary is designated to take over as root if the primary fails. This is the most common mistake i see in spanning tree implementations.
- Any port where you have a permanently-connected device (e.g. a server, printer, NAS) should be put in port fast mode (Cisco terminology; in HP ProCurve it's called edge port) to ensure they don't have a long wait time for STP convergence when they boot up.
- Any port where you connect to an edge device (including PCs, printers, servers, etc.) should have root guard enabled. This prevents people from connecting a misconfigured or unauthorised switch and causing reconvergence unexpectedly.
- Any port which is not a switch-to-switch link in your control (including PCs, printers, service provider routers) should have BPDU guard enabled, preferably set to disable the port when an STP BPDU is received. This way you find out immediately when people start doing the wrong things on your edge ports.