A couple of days ago, a discussion erupted on Twitter regarding the explanation and use cases for two of Cisco’s layer 2 edge protection technologies: BPDUGuard and BPDUFilter. There were some interesting explanations and scenarios offered up, and I thought I’d give my take on it here as it will take a few more than 140 characters to lay out.
For those of you not familiar with BPDUs and why we need to guard and filter them, here’s the dime store tour of bridging 101. The bridge is the most basic layer 2 device you can imagine. It is designed to connect one network to another network. The original bridge was designed by Radia Perlman while working at Digital Equipment Corp. It was originally put in place to connect one of their customer’s LANs to another. The story of the first bridge is outlined here: http://youtu.be/N-25NoCOnP4 and is highly recommended viewing for those not familiar with the origins of switching technology. Radia was tasked with designing a method for bridges to detect loops in the network and avoid them at all costs, as a bridging loop would be catastrophic for data transmission. She succeeded by creating a protocol that essentially form the network into a tree of nodes that prune links leading back to the root node. In order to form this tree, each bridge sends out a special data packet called a Bridge Protocol Data Unit (BPDU). This packet contains the information necessary for each bridge to build a path back to the root node/bridge and form a loop free path through the network. If you’re interested in the exhausting detail of how BPDUs and spanning tree protocol (STP) work at a fundamental layer, check out the Wikipedia link. You might say, “That’s great, but what does that have to do with my switches?” Well, if a bridge connects two networks together and segments their collision domains, think of a switch as the ultimate extension of that paradigm. A switch is a device that bridges networks on every port, segmenting each port into its own collision domain. Only, in today’s networks we don’t use hubs behind the bridge to connect end user devices, we use the switch as the user/system connection device.
So, now we know why switches send out BPDUs. And now we hopefully know that BPDUs are very critical to the proper operation of a network. So why on earth would we want to block or filter them? Well, firstly any time that a new BPDU is seen on the network from a switch, it must send BPDUs toward the current root with the Topology Change Notification (TCN) flag enabled. When the root bridge receives these BPDUs, it sets the TCN flag on its own BPDUs and forces the remaining nodes in the spanning tree to age out their topology tables. The switches must then recalculate the spanning tree topology to ensure that the new switch has a path to the current root bridge or that the new switch IS the new root bridge. This calculation can cause traffic to stop on your network for the duration of the calculation, which is 50 seconds by default. So, when might the new switch become the new root bridge and cause chaos and despair in your network? By default, bridges use a value known as the Bridge Priority to determine which one is the root. A bridge with a lower priority is elected as the root bridge for that particular spanning tree instance. Out of the box, the Bridge Priority value for most switches running regular 802.1d spanning tree is 32768. So, assuming that all the bridge priorities in the network are the same, how do we break the tie? The tie breaker is the MAC address of the bridge. In most cases, this means the the device with the lowest MAC address is elected the root bridge. And, in almost every case, the device with the lowest MAC address is the oldest bridge in your network. So, if you pull an old switch out of the storage closet and plug it into the network, you’re going to cause a spanning tree election, and if you haven’t modified the Bridge Priority on your switches that old switch just might be elected the root bridge. Which would cause your network to stop forwarding traffic for 50 critical seconds. Those 50 seconds feel like an eternity to your users. A word to the wise: ALWAYS set the bridge priority on the switch you want to be the root bridge. Trust me, it’ll save you hours of pain in the future.
Users dislike a non-responsive network. Immensely. And under the default circumstances, when a user plugs a device into the switch, the switch does its job to determine if this device is sending BPDUs. Which means the port has to do through the 50-second spanning tree process. In most cases, this is not only unnecessary but annoying for the end user. They don’t really care why what the switch is doing is so critical. They just want to check their email. How do we resolve this without breaking spanning tree? Cisco decided to fix this with Portfast. Portfast is a spanning tree enhancement that allows a network admin to say “This port is only going to ever have a PC plugged into it. Please ignore the normal spanning tree process.” What happens is that the port is immediately placed into the forwarding state, bypassing the learning and listening phases. Spanning tree is not disabled on the port, but we also don’t take the time to listen for BPDUs or learn the information they contain. This works great for end user nodes. They get to check e-mail right away and you don’t get calls about the “slow network”. And this works 90% of the time. The other 10% is the stuff nightmares are made of.
Gertrude has one network port in her office. She has a computer. She bought a network printer and a new laptop. She wants all three of these devices plugged into the network at the same time. She buys a switch from a big box store so she can plug all these things in at the same time, not wanting to bother the IT department since they’ll either say ‘no’ or take a month to run a new cable to her office. In her haste to get everything plugged in, she accidentally plugs one end of the network cable into the switch, and the other end into another port on the switch. Then, she plugs her switch into the port on the wall. And, if this port is Portfast-enabled, you’ve got yourself a Category 5 broadcast storm. If you’re lucky enough to have never lived through one of these, count yourself fortunate. Watch a spanning tree loop propagate through a network is like watching a firestorm. Switch CPU’s spike to 100% load trying to process all the BPDUs flooding the network. Users find themselves unable to get to the network, or in VoIP networks find themselves unable to use their phones. Servers start going haywire and seeing themselves fighting for static IP address with…themselves. And the only way for the IT department to fix the problem in most cases is to start unplugging switches until the culprit is found. And heaven help Gertrude when they find her switch…
How could something like this happen? Because Portfast assumes that the designated port is never going to have a switch connected to it, so it never bothers to listen for the BPDUs that would be a tell-tale sign of a loop. It would never block the port initially while waiting for more information. The Portfast switch gleefully starts forwarding packets and counting toward meltdown. Portfast assumes that nothing bad could come from that port. Anyone that works in IT knows that assumption is the mother of all frack-ups. So, Cisco gave us two protocols to combat frack-ups, BPDUGuard and BPDUFilter.
BPDUGuard is a Portfast enhancement that functions as a fail-closed gatekeeper for the port. As soon as a BPDU is detected on the port, BPDUGuard slams it shut and places the port into ‘err-disable’ mode. Unless a recovery mode is configure (it isn’t by default), that port stay shutdown until the admins recover it. In the above example, Gertrude plugs her switch in, and the switch detects a BPDU on a BPDUGuard-enabled port. It gets disabled, and Gertrude can’t get on the network. She calls into the IT helpdesk with her problem. The IT staff notice the port is err-disabled and investigate. The IT staff go out to her office and find the switch before they re-enable the port. After a stern talking-to, the network is saved and Gertrude gets her additional cable sometime next month. BPDUGuard is the most-configured protection mechanism for this kind of issue. Most IT admins want the port to shut off before the damage is done. The problem with BPDUGuard is that if you aren’t the network admin, or if you aren’t in a position to turn the port back on quickly the user will experience an outage until the port is recovered. If you’re a network admin that uses portfast, you should turn on BPDUGuard. Don’t ask, just turn it on and save yourself even more hours of pain in the future.
BPDUFilter is a Portfast enhancement that functions as the fail-open moderator for the port. Firstly, it prevents a switch from transmitting BPUDs on a Portfast enabled port (the switch still transmits BPDUs on Portfast ports). If a BPDU is detected on the Portfast-enabled port, the Portfast state is removed and the port is forced to transition through the normal states of blocking, listening, and learning before it can begin forwarding. In the above example, when Gertrude plugs her switch in, the uplink switch will detect the BPDU and force the device to transition through the regular spanning tree process. It should also detect the loop and disable the highest-numbered port on the switch to disable the loop. Gertrude will have to wait an additional minute before her port is up completely, but it will start forwarding. The IT admins may never know what happened unless they notice Gertrude’s port is no longer in Portfast mode, or that a new switch is transmitting BPDUs from her switch port. So why in the world would you use BPDUFilter? In my experience, it is used when you are not the network admin for a given network and have no easy way to re-enable those ports that would be disabled by BPDUGuard. Or, if the network policy for the particular network states that ports should begin forwarding immediately but that users should be able to connect devices without the port becoming disabled. For the record, if you ever find a network policy that looks like this send it to me. I’d really like to know who came up with it. BPDUFilter is rarely used in my experience as a Portfast protection mechanism.
So, as these things usually happen, the question was asked during our discussion “What happens if you enable both BPDUGuard and BPDUFilter at the same time?” Well, I found a great blog post on the subject here: http://bit.ly/cKpBTd Essentially, if you enable BPDUFilter globally and enable BPDUGuard on a particular interface, the interface specific configuration takes precedence and shuts the port down before BPDUFilter can transition the port back to normal. However, if you enable BPDUFilter using the interface-specific command and BPDUGuard using the interface-specific command, BPDUFilter will catch the BPDU first and transition the port to normal spanning tree mode before BPDUGuard can shut it down. So, they each will perform their function while locking out the other. The question becomes where each is configured (globally vs. interface-specific). For those of you who might be in the unfortunate position to still be running CatOS, the only way to enable BPDUFilter is globally. In this specific case, BPDUGuard will always win and the ports will be disabled. You would only use BPDUFilter in this case to prevent ports from transmitting BPDUs.
My Take
Since best practice guidelines tell us that switch-to-switch connections should be trunk links, you should enable Portfast on all your user-facing ports to cut down on delay and trouble tickets. But, if you have Portfast enabled, you better make sure to have BPDUGuard enabled at a minimum. It will save your bacon one day. The case for BPDUFilter is less compelling to me. If you are in one of the few scenarios where BPDUFilter makes more sense than BPDUGuard, by all means use it. It’s better than a poke in the eye with a sharp stick. Personally, I’ve used BPDUFilter once or twice with mixed results. My network started behaving quite strangely and some poorly-configured switches hanging off unidentified ports stopped responding until I removed the BPDUFilter configuration. So I mainly stick to BPDUGuard now. Better to have to re-enable a port after a user plugged in something they weren’t supposed to than to have to frantically unplug connections in the core in a vain effort to stem the raging broadcast storm.
EDIT
Be sure to check out my additional testing and findings over here.
Pingback: Tweets that mention Calm Before the Storm: BPDUGuard & BPDUFilter | The Networking Nerd -- Topsy.com
BPDUFilter behaves differently depending on where it is applied. Globally it will affect any PortFast port. When the port starts up it will transmit 10 BPDUs and listen for BPDUs. If it receives any during this time, it disables both PortFast and BPDUFilter and transitions to a full STP port. After that it stop transmitting BPDUs, but still listens and if it receives any, it again disables both PortFast and BPDUFilter and transitions to a full STP port.
Enabled on an interface, regardless of PortFast status, it completely disables transmitting and receiving BPDUs, effectively disabling STP. It will not transition to a full STP port if BPDUs are received.
Another good link that details what each port state does can be found here: http://anetworkerblog.com/2007/08/26/bpdu-guard-and-filter/
Pingback: Bluntly BPDU–The Redux | The Networking Nerd
I was slow in getting around to reading this, but…
The case for BPDUFilter instead of BPDUGuard becomes clear when you are a provider rather than an enterprise network, and I believe the service provider segment was the driving force for Cisco to implement it. You have customers connecting stuff directly to your edge switches, and you never know quite what they are going to hook up to that SFP or Cat5 port which often is your demarkation point for the service you provide (you don’t always provide CPE equipment to all customers…).
The customers are going to give your support staff a firestorm if it takes 50 seconds to get the line up every time they flap it, so portfast is a must – and at the same time you REALLY don’t want to learn your customers BPDUs – at all. BPDUFilter is your friend, and is in use quite extensively in most service provider networks.
great post, thanks for sharing
even though BPDU guard is enabled(BPDU Filter is disabled)…the port upon receiving the BPDU is going into forwarding state instead of Err-disable state…what would be the reason?
Assuming you toggled BPDUguard in global configuration mode (not on the port itself) and the port isn’t portfast.
Really usefull !!
I’d just like add saying .. beware when you use filter in large scale enterprise network.
I conducted some experiments in test environment on bpdu guard and filter individually.
In each case I have tested how the network would respond if there is a loop.
Bpdu guard simply shut the user end port(err-disabled) – No harm done to the entire network as such. Also easy to investigate the root cause.
But with bpdu filter and a loop.. I only had mac flaps , broadcast storms and high cpu utilization.
In addition, it will take ages to find the root cause if we are working in large networks. Bpdu filter simply disables STP.
My suggestion is to use bpdu filter only in small networks where the administrator can physically locate a loop if there is one.
caution: dont use filter or guard on trunk ports
thanks.
Hi,
I enjoyed reading your post. I would just like to clarify on which platforms you saw this behaviour (filter overriding guard). In Cisco Nexus terms, edge is the closest thing to portfast, and I want to highlight the fact that when port is type ‘edge’ and filter + guard are enabled together either both on an interface, or both globally, guard still blocks upon receiving a BPDU.
Cheers!
Sandy
Pingback: BPDUGUARD and BPDUFILTER together | loopback
Hi.
“In her haste to get everything plugged in, she accidentally plugs one end of the network cable into the switch, and the other end into another port on the switch.”
If Gertrude didn’t change the default settings on her switch, isn’t STP enabled on it as well? If it is, wouldn’t the STP on her switch eventually disable one of the ports, thereby disrupting the loop?
If your switches are loop-free thanks to STP, and a user connects a single switch to one switchport on one of your switches, then why could that cause a loop? The user’s switch would forward broadcast frames into 1 port, and that broadcast frame would be forwarded into a loop-free environment. Can you please help me understand how that could possibly result in a broadcast storm?Thanks.