The Death of TRILL

wasteland_large

Networking has come a long way in the last few years. We’ve realized that hardware and ASICs aren’t the constant that we could rely on to make decisions in the next three to five years. We’ve thrown in with software and the quick development cycles that allow us to iterate and roll out new features weekly or even daily. But the hardware versus software battle has played out a little differently than we all expected. And the primary casualty of that battle was TRILL.

Symbiotic Relationship

Transparent Interconnection of Lots of Links (TRILL) was proposed as a solution to the complexity of spanning tree. Radia Perlman realized that her bridging loop solution wouldn’t scale in modern networks. So she worked with the IEEE to solve the problem with TRILL. We also received Shortest Path Bridging (SPB) along the way as an alternative solution to the layer 2 issues with spanning tree. The motive was sound, but the industry has rejected the premise entirely.

Large layer 2 networks have all kinds of issues. ARP traffic, broadcast amplification, and many other numerous issues plague layer 2 when it tries to scale to multiple hundreds or a few thousand nodes. The general rule of thumb is that layer 2 broadcast networks should never get larger than 250-500 nodes lest problems start occurring. And in theory that works rather well. But in practice we have issues at the software level.

Applications are inherently complicated. Software written in the pre-Netflix era of public cloud adoption doesn’t like it when the underlay changes. So things like IP addresses and ARP entries were assumed to be static. If those data points change you have chaos in the software. That’s why we have vMotion.

At the core, vMotion is a way for software to mitigate hardware instability. As I outlined previously, we’ve been fixing hardware with software for a while now. vMotion could ensure that applications behaved properly when they needed to be moved to a different server or even a different data center. But they also required the network to be flat to overcome limitations in things like ARP or IP. And so we went on a merry journey of making data centers as flat as possible.

The problem came when we realized that data centers could only be so flat before they collapsed in on themselves. ARP and spanning tree limited the amount of traffic in layer 2 and those limits were impossible to overcome. Loops had to be prevented, yet the simplest solution disabled bandwidth needed to make things run smoothly. That caused IEEE and IETF to come up with their layer 2 solutions that used CLNS to solve loops. And it was a great idea in theory.

The Joining

In reality, hardware can’t be spun that fast. TRILL was used as a reference platform for proprietary protocols like FabricPath and VCS. All the important things were there but they were locked into hardware that couldn’t be easily integrated into other solutions. We found ourselves solving problem after problem in hardware.

Users became fed up. They started exploring other options. They finally decided that hardware wasn’t the answer. And so they looked to software. And that’s where we started seeing the emergence of overlay networking. Protocols like VXLAN and NV-GRE emerged to tunnel layer 2 packets over layer 3 networks. As Ivan Pepelnjak is fond of saying layer 3 transport solves all of the issues with scaling. And even the most unruly application behaves when it thinks everything is running on layer 2.

Protocols like VXLAN solved an immediate need. They removed limitations in hardware. Tunnels and fabrics used novel software approaches to solve insurmountable hardware problems. An elegant solution for a thorny problem. Now, instead of waiting for a new hardware spin to fix scaling issues, customers could deploy solutions to fix the issues inherent in hardware on their own schedule.

This is the moment where software defined networking (SDN) took hold of the market. Not when words like automation and orchestration started being thrown about. No, SDN became a real thing when it enabled customers to solve problems without buying more physical devices.


Tom’s Take

Looking back, we realize now that building large layer 2 networks wasn’t the best idea. We know that layer 3 scales much better. Given the number of providers and end users running BGP to top-of-rack (ToR) switches, it would seem that layer 3 scales much better. It took us too long to figure out that the best solution to a problem sometimes takes a bit of thought to implement.

Virtualization is always going to be limited by the infrastructure it’s running on. Applications are only as smart as the programmer. But we’ve reached the point where developers aren’t counting on having access to layer 2 protocols that solve stupid decision making. Instead, we have to understand that the most resilient way to fix problems is in the software. Whether that’s VXLAN, NV-GRE, or a real dev team not relying on the network to solve bad design decisions.

Advertisements

Spanning Tree Isn’t Evil

In a recent article I wrote for Network Computing, I talked about how licensing costs for advanced layer 2 features were going to delay the adoption of TRILL and its vendor-specific derivatives. Along the way I talked about how TRILL was a much better solution for data centers than 802.1D spanning tree and its successor protocols. A couple of people seemed to think that I had the same distaste for spanning tree that I do for NAT:

Allow me to claify. I don’t dislike spanning tree. It has a very important job to do in a network. I just think that some networks have eclipsed the advantages of spanning tree.

In a campus network, spanning tree is a requirement. There are a large number of ports facing users that you have no control of beyond the switch level. Think about a college dorm network, for instance. Hundreds if not thousands of ports that students could be plugging in desktops, laptops, gaming consoles, or all other manner of devices. Considering that most student today have a combination of all of the above it stands to reason that many of them are going to try to circumvent polices in place allowing one device per port in each room. Once a tech-savvy student goes out and purchases a switch or SOHO router network admins need to make sure that the core network is as protected as it can be from accidental exposure.

Running 802.1w rapid spanning tree functions like Portfast and BPDUGuard on all user facing ports is not only best practice but should be the rule at all times. Radia Perlman gave an excellent talk about the history of spanning tree a few years ago about 10 minutes in (watch the whole thing if you haven’t already; it’s that good):

She talks about the development of spanning tree as something to mollify her bosses at DEC in the off chance that someone did something they weren’t supposed to with these fancy new Ethernet bridges. I mean, who would be careless enough to plug a bridge back into itself and flood the network with unknown unicast frames? As luck would have it, that’s *exactly* what happened the first time it was plugged in. You can never be sure that users aren’t going to shoot themselves in the foot. That’s what spanning tree really provides: peace of mind from human error.

A modern data center is a totally different animal from a campus network. Admins control access to every switch port. We know exactly where things are plugged in. It takes forms and change requests to touch anything in the server farm or the core. What advantage is spanning tree providing here? Sure, there is the off chance that I might make a mistake when recabling something. Odds are better I’m going to run into blocked links or disabled multipath connections to servers because spanning tree is doing the job it was designed to do decades ago. Data centers don’t need single paths back to a root bridge to do their jobs. They need high speed connections that allow for multiple paths to carry the maximum amount of data or provide for failover in the event of a problem.

In a perfect world, everything down to the switch would be a layer 3 connection. No spanning tree, no bridging loops. Unfortunately, this isn’t a perfect world. The data center has to be flat, sometimes flat across a large geographic area. This is because the networking inside hypervisors isn’t intelligent enough right now to understand the world beyond a MAC address lookup. We’re working on making the network smarter, but it’s going to take time. In the interim, we have to be aware that we’re reducing the throughput of a data center running spanning tree to a single link back to a root bridge. Or, we’re running without spanning tree and taking the risk that something catastrophic is going to blow up in our faces when disaster strikes.

TRILL is a better solution for the data center by far because of the multipath capabilities and failover computations. The fact that this is all accomplished by running IS-IS at layer 2 isn’t lost on me at all. Solving layer 2 issues with layer 3 designs has been done for years. But to accuse spanning tree of being evil because of all this is the wrong line of thinking. You can’t say that incandescent light bulbs are evil just because new technology like compact florescent (CFL) exists. They both serve the same purpose – to illuminate things. Sure, CFLs are more efficient for a given wattage. They also don’t produce nearly the same amount of heat. But, they are more expensive. For certain applications, like 3-way lamps and lights with dimmer switches, incandescent bulbs are still a much better and cheaper alternative. Is the solution to do away with all the old technology and force people to use new tech in an inefficient way? Or should we design around the old tech for the time being and a way to make the new tech work the way it should when we remodel?


Tom’s Take

As long as Ethernet exists, spanning tree will exist. That’s a fact of life. The risks of a meltdown due to bridging loops are getting worse with new technology. How fast do you think a 40GigE link will be able to saturate a network with unknown unicast frames in a bridging loop? Do you think even a multicore CPU would be able to stand up to that kind of abuse? The answer is instead to find new technology like TRILL and design our future around applying it in the best way possible. Spanning tree won’t go away overnight. Just like DOS, just like IPX. We can’t stop it. But we can contain it to where it belongs.