Spanning Tree Isn’t Evil


In a recent article I wrote for Network Computing, I talked about how licensing costs for advanced layer 2 features were going to delay the adoption of TRILL and its vendor-specific derivatives. Along the way I talked about how TRILL was a much better solution for data centers than 802.1D spanning tree and its successor protocols. A couple of people seemed to think that I had the same distaste for spanning tree that I do for NAT:

Allow me to claify. I don’t dislike spanning tree. It has a very important job to do in a network. I just think that some networks have eclipsed the advantages of spanning tree.

In a campus network, spanning tree is a requirement. There are a large number of ports facing users that you have no control of beyond the switch level. Think about a college dorm network, for instance. Hundreds if not thousands of ports that students could be plugging in desktops, laptops, gaming consoles, or all other manner of devices. Considering that most student today have a combination of all of the above it stands to reason that many of them are going to try to circumvent polices in place allowing one device per port in each room. Once a tech-savvy student goes out and purchases a switch or SOHO router network admins need to make sure that the core network is as protected as it can be from accidental exposure.

Running 802.1w rapid spanning tree functions like Portfast and BPDUGuard on all user facing ports is not only best practice but should be the rule at all times. Radia Perlman gave an excellent talk about the history of spanning tree a few years ago about 10 minutes in (watch the whole thing if you haven’t already; it’s that good):

She talks about the development of spanning tree as something to mollify her bosses at DEC in the off chance that someone did something they weren’t supposed to with these fancy new Ethernet bridges. I mean, who would be careless enough to plug a bridge back into itself and flood the network with unknown unicast frames? As luck would have it, that’s *exactly* what happened the first time it was plugged in. You can never be sure that users aren’t going to shoot themselves in the foot. That’s what spanning tree really provides: peace of mind from human error.

A modern data center is a totally different animal from a campus network. Admins control access to every switch port. We know exactly where things are plugged in. It takes forms and change requests to touch anything in the server farm or the core. What advantage is spanning tree providing here? Sure, there is the off chance that I might make a mistake when recabling something. Odds are better I’m going to run into blocked links or disabled multipath connections to servers because spanning tree is doing the job it was designed to do decades ago. Data centers don’t need single paths back to a root bridge to do their jobs. They need high speed connections that allow for multiple paths to carry the maximum amount of data or provide for failover in the event of a problem.

In a perfect world, everything down to the switch would be a layer 3 connection. No spanning tree, no bridging loops. Unfortunately, this isn’t a perfect world. The data center has to be flat, sometimes flat across a large geographic area. This is because the networking inside hypervisors isn’t intelligent enough right now to understand the world beyond a MAC address lookup. We’re working on making the network smarter, but it’s going to take time. In the interim, we have to be aware that we’re reducing the throughput of a data center running spanning tree to a single link back to a root bridge. Or, we’re running without spanning tree and taking the risk that something catastrophic is going to blow up in our faces when disaster strikes.

TRILL is a better solution for the data center by far because of the multipath capabilities and failover computations. The fact that this is all accomplished by running IS-IS at layer 2 isn’t lost on me at all. Solving layer 2 issues with layer 3 designs has been done for years. But to accuse spanning tree of being evil because of all this is the wrong line of thinking. You can’t say that incandescent light bulbs are evil just because new technology like compact florescent (CFL) exists. They both serve the same purpose – to illuminate things. Sure, CFLs are more efficient for a given wattage. They also don’t produce nearly the same amount of heat. But, they are more expensive. For certain applications, like 3-way lamps and lights with dimmer switches, incandescent bulbs are still a much better and cheaper alternative. Is the solution to do away with all the old technology and force people to use new tech in an inefficient way? Or should we design around the old tech for the time being and a way to make the new tech work the way it should when we remodel?


Tom’s Take

As long as Ethernet exists, spanning tree will exist. That’s a fact of life. The risks of a meltdown due to bridging loops are getting worse with new technology. How fast do you think a 40GigE link will be able to saturate a network with unknown unicast frames in a bridging loop? Do you think even a multicore CPU would be able to stand up to that kind of abuse? The answer is instead to find new technology like TRILL and design our future around applying it in the best way possible. Spanning tree won’t go away overnight. Just like DOS, just like IPX. We can’t stop it. But we can contain it to where it belongs.

1 thought on “Spanning Tree Isn’t Evil

  1. Good post Tom.

    I already knew that you had a good view of STP. Sometimes I like to discuss things at Twitter etc just to get a discussion going. Then people following us hopefully can learn something from it. Not everyone runs state of the art networks and I still think it makes sense to learn STP as I believe that learning something is never a waste. Not that I would go about learning DLSW or something but you get the point.

    The Radia Perlman video is great, I’ve watched it before as have I read Interconnections: Bridges, Routers, Switches, and Internetworking Protocols. I do enjoy reading books/documents from the people that were there in the beginning. The WHY often makes a lot more sense if you understand how limited resources they had back then in form of computing power and so on.

    ISIS is another protocol that’s been around for ages and it’s used in TRILL and Fabricpath etc so the learning you put into the older protocols often benefit us when learning the new ones.

    Always happy discussing with you Tom!

Leave a comment