A couple of days ago, a discussion erupted on Twitter regarding the explanation and use cases for two of Cisco’s layer 2 edge protection technologies: BPDUGuard and BPDUFilter. There were some interesting explanations and scenarios offered up, and I thought I’d give my take on it here as it will take a few more than 140 characters to lay out.
For those of you not familiar with BPDUs and why we need to guard and filter them, here’s the dime store tour of bridging 101. The bridge is the most basic layer 2 device you can imagine. It is designed to connect one network to another network. The original bridge was designed by Radia Perlman while working at Digital Equipment Corp. It was originally put in place to connect one of their customer’s LANs to another. The story of the first bridge is outlined here: http://youtu.be/N-25NoCOnP4 and is highly recommended viewing for those not familiar with the origins of switching technology. Radia was tasked with designing a method for bridges to detect loops in the network and avoid them at all costs, as a bridging loop would be catastrophic for data transmission. She succeeded by creating a protocol that essentially form the network into a tree of nodes that prune links leading back to the root node. In order to form this tree, each bridge sends out a special data packet called a Bridge Protocol Data Unit (BPDU). This packet contains the information necessary for each bridge to build a path back to the root node/bridge and form a loop free path through the network. If you’re interested in the exhausting detail of how BPDUs and spanning tree protocol (STP) work at a fundamental layer, check out the Wikipedia link. You might say, “That’s great, but what does that have to do with my switches?” Well, if a bridge connects two networks together and segments their collision domains, think of a switch as the ultimate extension of that paradigm. A switch is a device that bridges networks on every port, segmenting each port into its own collision domain. Only, in today’s networks we don’t use hubs behind the bridge to connect end user devices, we use the switch as the user/system connection device.
So, now we know why switches send out BPDUs. And now we hopefully know that BPDUs are very critical to the proper operation of a network. So why on earth would we want to block or filter them? Well, firstly any time that a new BPDU is seen on the network from a switch, it must send BPDUs toward the current root with the Topology Change Notification (TCN) flag enabled. When the root bridge receives these BPDUs, it sets the TCN flag on its own BPDUs and forces the remaining nodes in the spanning tree to age out their topology tables. The switches must then recalculate the spanning tree topology to ensure that the new switch has a path to the current root bridge or that the new switch IS the new root bridge. This calculation can cause traffic to stop on your network for the duration of the calculation, which is 50 seconds by default. So, when might the new switch become the new root bridge and cause chaos and despair in your network? By default, bridges use a value known as the Bridge Priority to determine which one is the root. A bridge with a lower priority is elected as the root bridge for that particular spanning tree instance. Out of the box, the Bridge Priority value for most switches running regular 802.1d spanning tree is 32768. So, assuming that all the bridge priorities in the network are the same, how do we break the tie? The tie breaker is the MAC address of the bridge. In most cases, this means the the device with the lowest MAC address is elected the root bridge. And, in almost every case, the device with the lowest MAC address is the oldest bridge in your network. So, if you pull an old switch out of the storage closet and plug it into the network, you’re going to cause a spanning tree election, and if you haven’t modified the Bridge Priority on your switches that old switch just might be elected the root bridge. Which would cause your network to stop forwarding traffic for 50 critical seconds. Those 50 seconds feel like an eternity to your users. A word to the wise: ALWAYS set the bridge priority on the switch you want to be the root bridge. Trust me, it’ll save you hours of pain in the future.
Users dislike a non-responsive network. Immensely. And under the default circumstances, when a user plugs a device into the switch, the switch does its job to determine if this device is sending BPDUs. Which means the port has to do through the 50-second spanning tree process. In most cases, this is not only unnecessary but annoying for the end user. They don’t really care why what the switch is doing is so critical. They just want to check their email. How do we resolve this without breaking spanning tree? Cisco decided to fix this with Portfast. Portfast is a spanning tree enhancement that allows a network admin to say “This port is only going to ever have a PC plugged into it. Please ignore the normal spanning tree process.” What happens is that the port is immediately placed into the forwarding state, bypassing the learning and listening phases. Spanning tree is not disabled on the port, but we also don’t take the time to listen for BPDUs or learn the information they contain. This works great for end user nodes. They get to check e-mail right away and you don’t get calls about the “slow network”. And this works 90% of the time. The other 10% is the stuff nightmares are made of.
Gertrude has one network port in her office. She has a computer. She bought a network printer and a new laptop. She wants all three of these devices plugged into the network at the same time. She buys a switch from a big box store so she can plug all these things in at the same time, not wanting to bother the IT department since they’ll either say ‘no’ or take a month to run a new cable to her office. In her haste to get everything plugged in, she accidentally plugs one end of the network cable into the switch, and the other end into another port on the switch. Then, she plugs her switch into the port on the wall. And, if this port is Portfast-enabled, you’ve got yourself a Category 5 broadcast storm. If you’re lucky enough to have never lived through one of these, count yourself fortunate. Watch a spanning tree loop propagate through a network is like watching a firestorm. Switch CPU’s spike to 100% load trying to process all the BPDUs flooding the network. Users find themselves unable to get to the network, or in VoIP networks find themselves unable to use their phones. Servers start going haywire and seeing themselves fighting for static IP address with…themselves. And the only way for the IT department to fix the problem in most cases is to start unplugging switches until the culprit is found. And heaven help Gertrude when they find her switch…
How could something like this happen? Because Portfast assumes that the designated port is never going to have a switch connected to it, so it never bothers to listen for the BPDUs that would be a tell-tale sign of a loop. It would never block the port initially while waiting for more information. The Portfast switch gleefully starts forwarding packets and counting toward meltdown. Portfast assumes that nothing bad could come from that port. Anyone that works in IT knows that assumption is the mother of all frack-ups. So, Cisco gave us two protocols to combat frack-ups, BPDUGuard and BPDUFilter.
BPDUGuard is a Portfast enhancement that functions as a fail-closed gatekeeper for the port. As soon as a BPDU is detected on the port, BPDUGuard slams it shut and places the port into ‘err-disable’ mode. Unless a recovery mode is configure (it isn’t by default), that port stay shutdown until the admins recover it. In the above example, Gertrude plugs her switch in, and the switch detects a BPDU on a BPDUGuard-enabled port. It gets disabled, and Gertrude can’t get on the network. She calls into the IT helpdesk with her problem. The IT staff notice the port is err-disabled and investigate. The IT staff go out to her office and find the switch before they re-enable the port. After a stern talking-to, the network is saved and Gertrude gets her additional cable sometime next month. BPDUGuard is the most-configured protection mechanism for this kind of issue. Most IT admins want the port to shut off before the damage is done. The problem with BPDUGuard is that if you aren’t the network admin, or if you aren’t in a position to turn the port back on quickly the user will experience an outage until the port is recovered. If you’re a network admin that uses portfast, you should turn on BPDUGuard. Don’t ask, just turn it on and save yourself even more hours of pain in the future.
BPDUFilter is a Portfast enhancement that functions as the fail-open moderator for the port. Firstly, it prevents a switch from transmitting BPUDs on a Portfast enabled port (the switch still transmits BPDUs on Portfast ports). If a BPDU is detected on the Portfast-enabled port, the Portfast state is removed and the port is forced to transition through the normal states of blocking, listening, and learning before it can begin forwarding. In the above example, when Gertrude plugs her switch in, the uplink switch will detect the BPDU and force the device to transition through the regular spanning tree process. It should also detect the loop and disable the highest-numbered port on the switch to disable the loop. Gertrude will have to wait an additional minute before her port is up completely, but it will start forwarding. The IT admins may never know what happened unless they notice Gertrude’s port is no longer in Portfast mode, or that a new switch is transmitting BPDUs from her switch port. So why in the world would you use BPDUFilter? In my experience, it is used when you are not the network admin for a given network and have no easy way to re-enable those ports that would be disabled by BPDUGuard. Or, if the network policy for the particular network states that ports should begin forwarding immediately but that users should be able to connect devices without the port becoming disabled. For the record, if you ever find a network policy that looks like this send it to me. I’d really like to know who came up with it. BPDUFilter is rarely used in my experience as a Portfast protection mechanism.
So, as these things usually happen, the question was asked during our discussion “What happens if you enable both BPDUGuard and BPDUFilter at the same time?” Well, I found a great blog post on the subject here: http://bit.ly/cKpBTd Essentially, if you enable BPDUFilter globally and enable BPDUGuard on a particular interface, the interface specific configuration takes precedence and shuts the port down before BPDUFilter can transition the port back to normal. However, if you enable BPDUFilter using the interface-specific command and BPDUGuard using the interface-specific command, BPDUFilter will catch the BPDU first and transition the port to normal spanning tree mode before BPDUGuard can shut it down. So, they each will perform their function while locking out the other. The question becomes where each is configured (globally vs. interface-specific). For those of you who might be in the unfortunate position to still be running CatOS, the only way to enable BPDUFilter is globally. In this specific case, BPDUGuard will always win and the ports will be disabled. You would only use BPDUFilter in this case to prevent ports from transmitting BPDUs.
Since best practice guidelines tell us that switch-to-switch connections should be trunk links, you should enable Portfast on all your user-facing ports to cut down on delay and trouble tickets. But, if you have Portfast enabled, you better make sure to have BPDUGuard enabled at a minimum. It will save your bacon one day. The case for BPDUFilter is less compelling to me. If you are in one of the few scenarios where BPDUFilter makes more sense than BPDUGuard, by all means use it. It’s better than a poke in the eye with a sharp stick. Personally, I’ve used BPDUFilter once or twice with mixed results. My network started behaving quite strangely and some poorly-configured switches hanging off unidentified ports stopped responding until I removed the BPDUFilter configuration. So I mainly stick to BPDUGuard now. Better to have to re-enable a port after a user plugged in something they weren’t supposed to than to have to frantically unplug connections in the core in a vain effort to stem the raging broadcast storm.
Be sure to check out my additional testing and findings over here.