I Can’t Drive 25G

Ethernet

The race to make things just a little bit faster in the networking world has heated up in recent weeks thanks to the formation of the 25Gig Ethernet Consortium.  Arista Networks, along with Mellanox, Google, Microsoft, and Broadcom, has decided that 40Gig Ethernet is too expensive for most data center applications.  Instead, they’re offering up an alternative in the 25Gig range.

This podcast with Greg Ferro (@EtherealMind) and Andrew Conry-Murray (@Interop_Andrew) does a great job of breaking down the technical details on the reasoning behind 25Gig Ethernet.  In short, the current 10Gig connection is made of four multiplexed 2.5Gig connections.  To get to 25Gig, all you need to do is over clock those connections a little.  That’s not unprecedented, as 40Gig Ethernet accomplishes this by over clocking them to 10Gig, albeit with different optics.  Aside from a technical merit badge, one has to ask themselves “Why?”

High Hopes

As always, money is the factor here.  The 25Gig Consortium is betting that you don’t like paying a lot of money for your 40Gig optics.  They want to offer an alternative that is faster than 10Gig but cheaper than the next standard step up.  By giving you a cheaper option for things like uplinks, you gain money to spend on things.  Probably on more switches, but that’s beside the point right now.

The other thing to keep in mind, as mentioned on the Coffee Break podcast, is that the cable runs for these 25Gig connectors will likely be much shorter.  Short term that won’t mean much.  There aren’t as many long-haul connections inside of a data center as one might thing.  A short hop to the top-of-rack (ToR) switch, then another different hop to the end-of-row (EoR) or core switch.  That’s really about it.  One of the arguments against 40/100Gig is that it was designed for carriers for long-haul purposes.  25G can give you 60% of the speed of that link at a much lower cost.  You aren’t paying for functionality you likely won’t use.

Heavy Metal

Is this a good move?  That depends.  There aren’t any 25Gig cards for servers right now, so the obvious use for these connectors will be uplinks.  Uplinks that can only be used by switches that share 25Gig (and later 50Gig) connections.  As of today, that means you’re using Arista, Dell, or Brocade.  And that’s when the optics and switches actually start shipping.  I assume that existing switching lines will be able to retrofit with firmware upgrades to support the links, but that’s anyone’s guess right now.

If Mellanox and Broadcom do eventually start shipping cards to upgrade existing server hardware to 25Gig then you’ll have to ask yourself if you want to pursue the upgrade costs to drive that little extra bit of speed out of the servers.  Are you pushing the 10Gig links in your servers today?  Are they the limiting factor in your data center?  And will upgrading your servers to support twice the bandwidth per network connection help alleviate your bottlenecks? Or will they just move to the uplinks on the switches?  It’s a quandary that you have to investigate.  And that takes time and effort.


 

Tom’s Take

The very first thing I ever tweeted (4 years ago):

We’ve come a long way from ratified standards to deployment of 40Gig and 100Gig.  Uplinks in crowded data centers are going to 40Gig.  I’ve seen a 100Gig optic in the wild running a research network.  It’s interesting to see that there is now a push to get to a marginally faster connection method with 25Gig.  It reminds me of all the competing 100Mbit standards back in the day.  Every standard was close but not quite the same.  I feel that 25Gig will get some adoption in the market.  So now we’ll have to choose from 10Gig, 40Gig, or something in between to connect servers and uplinks.  It will either get sent to the standards body for ratification or die on the vine with no adoption at all.  Time will tell.

 

CCIE Version 5: Out With The Old

Cisco announced this week that they are upgrading the venerable CCIE certification to version five.  It’s been about three years since Cisco last refreshed the exam and several thousand people have gotten their digits.  However, technology marches on.  Cisco talked to several subject matter experts (SMEs) and decided that some changes were in order.  Here are a few of the ones that I found the most interesting.

CCIEv5 Lab Schedule

Time Is On My Side

The v5 lab exam has two pacing changes that reflect reality a bit better.  The first is the ability to take some extra time on the troubleshooting section.  One of my biggest peeves about the TS section was the hard 2-hour time limit.  One of my failing attempts had me right on the verge of solving an issue when the time limit slammed shut on me.  If I only had five more minutes, I could have solved that problem.  Now, I can take those five minutes.

The TS section has an available 30 minute overflow window that can be used to extend your time.  Be aware that time has to come from somewhere, since the overall exam is still eight hours.  You’re borrowing time from the configuration section.  Be sure you aren’t doing yourself a disservice at the beginning.  In many cases, the candidates know the lab config cold.  It’s the troubleshooting the need a little more time with.  This is a welcome change in my eyes.

Diagnostics

The biggest addition is the new 30-minute Diagnostic section.  Rather than focusing on problem solving, this section is more about problem determination.  There’s no CLI.  Only a set of artifacts from a system with a problem: emails, log files, etc.  The idea is that the CCIE candidate should be an expert at figuring out what is wrong, not just how to fix it.  This is more in line with the troubleshooting sections in the Voice and Security labs.  Parsing log files for errors is a much larger part of my time than implementing routing.  Teaching candidates what to look for will prevent problems in the future with newly minted CCIEs that can diagnose issues in front of customers.

Some are wondering if the Diagnostic section is going to be the new “weed out” addition, like the Open Ended Questions (OEQs) from v3 and early v4.  I see the Diagnostic section as an attempt to temper the CCIE with more real world needs.  While the exam has never been a test of ideal design, knowing how to fix a non-ideal design when problems occur is important.  Knowing how to find out what’s screwed up is the first step.  It’s high time people learned how to do that.

Be Careful What You Wish For

The CCIE v5 is seeing a lot of technology changes.  The written exam is getting a new section, Network Principles.  This serves to refocus candidates away from Cisco specific solutions and more toward making sure they are experts in networking.  There’s a lot of opportunity to reinforce networking here and not idle trivia about config minimums and maximums.  Let’s hope this pays off.

The content of the written is also being updated.  Cisco is going to make sure candidates know the difference between IOS and IOS XE.  Cisco Express Forwarding is going to get a focus, as is ISIS (again).  Given that ISIS is important in TRILL this could be an indication of where FabricPath development is headed.  The written is also getting more IPv6 topics.  I’ll cover IPv6 in just a bit.

The biggest change in content is the complete removal of frame relay.  It’s been banished to the same pile as ATM and ISDN.  No written, no lab.  In it’s place, we get Dynamic Multipoint VPN (DMVPN).  I’ve talked about why Frame Relay is on the lab before.  People still complained about it.  Now, you get your wish.  DMVPN with OSPF serves the same purpose as Frame Relay with OSPF.  It’s all about Stupid Router Tricks.  Using OSPF with DMVPN requires use of mGRE, which is a Non-Broadcast Multi-Access (NBMA) network.  Just like Frame Relay.  The fact that almost every guide today recommends you use EIGRP with DMVPN should tell you how hard it is to do.  And now you’re forced to use OSPF to simulate NBMA instead of Frame Relay.  Hope all you candidates are happy now.

vCCIE

The lab is also 100% virtual now.  No physical equipment in either the TS or lab config sections.  This is a big change.  Cisco wants to reduce the amount of equipment that needs to be physically present to build a lab.  They also want to be able to offer the lab in more places than San Jose and RTP.  Now, with everything being software, they could offer the lab at any secured PearsonVUE testing center.  They’ve tried in the past, but the access requirements caused some disaster.  Now, it’s all delivered in a browser window.  This will make remote labs possible.  I can see a huge expansion of the testing sites around the time of the launch.

This also means that hardware-specific questions are out.  Like layer 2 QoS on switches.  The last reason to have a physical switch (WRR and SRR queueing) is gone.  Now, all you are going to get quizzed on is software functionality.  Which probably means the loss of a few easy points.  With the removal of Frame Relay and L2 QoS, I bet that services section of the lab is going to be really fun now.

IPv6 Is Real

Now, for my favorite part.  The JNCIE has had a robust IPv6 section for years.  All routing protocols need to be configured for IPv4 and IPv6.  The CCIE has always had a separate IPv6 section.  Not any more.  Going forward in version 5, all routing tasks will be configured for v4 and v6.  Given that RIPng has been retired to the written exam only (finally), it’s a safe bet that you’re going to love working with OSPFv3 and EIGRP for IPv6.

I think it’s great that Cisco has finally caught up to the reality of the world.  If CCIEs are well versed in IPv6, we should start seeing adoption numbers rise significantly.  Ensuring that engineers know to configure v4 and v6 simultaneously means dual stack is going to be the preferred transition method.  The only IPv6-related thing that worries me is the inclusion of an item on the written exam: IPv6 Network Address Translation.  You all know I’m a huge fan of NAT.  Especially NAT66, which is what I’ve been told will be the tested knowledge.

Um, why?!? 

You’ve removed RIPng to the trivia section.  You collapsed multicast into the main routing portions.  You’re moving forward with IPv6 and making it a critical topic on the test.  And now you’re dredging up NAT?!? We don’t NAT IPv6.  Especially to another IPv6 address.  Unique Local Addresses (ULA) is about the only thing I could see using NAT66.  Ed Horley (@EHorley) thinks it’s a bad idea.  Ivan Pepelnjak (@IOSHints) doesn’t think fondly of it either, but admits it may have a use in SMBs.  And you want CCIEs and enterprise network engineers to understand it?  Why not use LISP instead?  Or maybe a better network design for enterprises that doesn’t need NAT66?  Next time you need an IPv6 SME to tell you how bad this idea is, call me.  I’ve got a list of people.


Tom’s Take

I’m glad to see the CCIE update.  Getting rid of Frame Relay and adding more IPv6 is a great thing.  I’m curious to see how the Diagnostic section will play out.  The flexible time for the TS section is way overdue.  The CCIE v5 looks to be pretty solid on paper.  People are going to start complaining about DMVPN.  Or the lack of SDN-related content.  Or the fact that EIGRP is still tested.  But overall, this update should carry the CCIE far enough into the future that we’ll see CCIE 60,000 before it’s refreshed again.

More CCIE v5 Coverage:

Bob McCouch (@BobMcCouch) – Some Thoughts on CCIE R&S v5

Anthony Burke (@Pandom_) – Cisco CCIE v5

Daniel Dib (@DanielDibSWE) – RS v5 – My Thoughts

INE – CCIE R&S Version 5 Updates Now Official

IPExpert – The CCIE Routing and Switching (R&S) 5.0 Lab Is FINALLY Here!

Will Dell Buy Aerohive?

DELL-Aerohive-Logo

One rumor I keep hearing about in the industry involves a certain buzzing wireless vendor and the world’s largest startup.  Acquisitions happen all the time.  Rumors of them are even more frequent.  But the more I thought about it, the more I realized this may be good for everyone.

Dell wants to own the stack from top to bottom.  In the past, they have had to partner with printer companies (Lexmark) and networking companies (Brocade and Juniper) to deliver parts of the infrastructure they couldn’t provide themselves.  In the case of printers, Dell found a way to build them on their own.  That reduced their reliance on Lexmark.  In the networking world, Dell shocked everyone by going outside their OEM relationship and buying Force10.  I’ve talked before about why the Force10 pickup was a better deal in the long run than Brocade.

Dell’s Desires

Dell needs specific pieces of the puzzle.  They don’t want to be encumbered with ancillary products that will need to be jettisoned later.  Buying Brocade would have required unwinding a huge fibre channel business.  In much the same way, I don’t think Dell will end up buying their current wireless OEM, Aruba Networks.  Aruba has decided to branch out past the doing simple wireless and moved into wired network switches and security and identity management programs like ClearPass.  Dell doesn’t want any of that.  They already have an issue integrating the Force10 networking expertise into the PowerConnect line.  I’ve been told in the past the FTOS will eventually come to PowerConnect, but that has yet to happen.  Integrating purchased companies isn’t easier.  That becomes exponentially harder the more product lines you have to integrate.

Aruba is too expensive for Dell to buy outright.  Michael Dell spent a huge chunk of his cash to get his company back from the shareholders.  He’s going to put it on a diet pretty soon.  I would expect to see a few product lines slimmed down or outright dropped.  That makes it tough to justify buying so much from another company.  Dell needs a scalpel, not a sledgehammer.

Aerohive’s Aspirations

Aerohive is the best target for Dell.  They are clearly fighting for third place in the wireless market behind Cisco and Aruba.  Aerohive has never been shy about punching above their weight.  They have the mentality of a scrappy terrier that won’t go down without a fight.  But, they are getting pressure to expand quickly across their product lines.  They took their time releasing an 802.11ac access point.  Their switching offering hasn’t caught on in the same way that of Aruba or Meraki (now a division of Cisco).

Aerohive is on the verge of going public.  I’m sure the infusion of cash would allow them to pay off some early investors as well as fund more development for 802.11ac Phase 2 gear and maybe a firewall offering.  The risk comes when you look at what happened to Ruckus Wireless shortly after their IPO.  While they did recover, it didn’t look very good for a company that supposedly did have a unique claim, their antenna design.  Aerohive is a cloud management platform like many others in the market.  You have to wonder how investors would view them.  Scrappy doesn’t sell stock.

Aerohive is now fighting in the new Gartner “Wired and Wireless Access” magic quadrant, which is an absolute disaster for everyone.  An analyst firm thinks that wireless is just like wired, so naturally it makes sense for AP vendors to start making switches, right?  Except the people who are really brilliant when it comes to wireless, like Matthew Gast and Victor Shtrom couldn’t care less about bits on copper.  They’ve spent the better part of their careers solving the RF problems in the world.  And now someone tells them that interference problems aren’t that much different than spanning tree?  I would have long since planted my head permanently onto my desk if I’d been told that in their position.

Aerohive gains a huge backer in the fight if Dell acquires them.  They get the name to go up against Cisco/Meraki.  The gain R&D from Dell with expertise around cloud management.  They can start developing integration with HiveManager and Dell’s SMB extensive product line.  Switch supply becomes a thing of the past.  Their entire software offering fits well with what Dell is trying to accomplish from a device independence perspective with regards to customers.

Tom’s Take

I don’t put much stock in random rumors.  But I’ve heard this one come up enough to make me ask some tough questions.  There are people in both camps that think it will happen sometime in 2014.  Dell has to get the books sorted out and figure out who’s in charge of buying things.  Aerohive has to see if there’s enough juice left in the market to IPO and not look foolish.  Maybe Dell needs to run the numbers and find out what it would take to cash out Aerohive’s investors and add the company to the growing Empire of Round Rock.  A little buzz for the World’s Largest Startup couldn’t hurt.

Avaya and the Magic of SPB

Avaya_logo-wpcf_200x57

I was very interested to hear from Avaya at Interop New York.  They were the company I knew the least about.  I knew the most about them from the VoIP side of the house, but they’ve been coming on strong with networking as well.  They are one of the biggest champions of 802.1aq, more commonly known as Shortest Path Bridging (SPB).  You may remember that I wrote a bit about SPB in the past and referred to it as the Betamax of networking fabric technologies.  After this presentation, I may be forced to eat my words to a degree.

Paul Unbehagen really did a great job with this presentation.  There were no slides, but he kept the attention of the crowd.  The whiteboard supported his message.  While informal, there was a lot of learning.  Paul knows SPB.  It’s always great to learn from someone that knows the protocol.

Multicast Magic

One of the things I keyed on during the presentation was the way that SPB deals with multicast.  Multicast is a huge factor in Ethernet today.  So much so that even the cheapest SOHO Ethernet switch has a ton of multicast optimization.  But multicast as implemented in enterprises is painful.  If you want to make an engineer’s blood run cold, walk up and whisper “PIM“.  If you want to watch a nervous breakdown happen in real time, follow that up with “RPF“.

RPF checks in multicast PIM routing are nightmarish.  It would be wonderful to get rid of RPF checks to eliminate any loops in the multicast routing table.  SPB accomplishes that by using a Dijkstra algorithm.  The same algorithm that OSPF and IS-IS use to compute paths.  Considering the heavily roots of IS-IS in SPB, that’s not surprising.  The use of Dijkstra means that additional receivers on a multicast tree don’t negatively effect the performance of path calculation.

I’ve Got My IS-IS On You

In fact, one of the optimized networks that Paul talked about involved surveillance equipment.  Video surveillance units that send via multicast have numerous endpoints and only a couple of receivers on the network.  In other words, the exact opposite problem multicast was designed to solve.  Yet, with SPB you can create multicast distribution networks that allow additional end nodes to attach to a common point rather than talking back to a rendezvous point (RP) and getting the correct tree structure from there.  That means fast convergence and simple node addition.

SPB has other benefits as well.  It supports 16.7 million ISIDs, which are much like VLANs or MPLS tags.  This means that networks can grow past the 4,096 VLAN limitation.  It looks a lot like VxLAN to me.  Except for the reliance on multicast and lack of a working implementation.  SPB allows you to use a locally significant VLAN for a service and then defined an ISID that will transport across the network to be decapsulated on the other side in a totally different VLAN that is attached to the ISID.  That kind of flexibility is key for deployments in existing, non-green field environments.

If you’d like to learn more about Avaya and their SPB technology, you can check them out at http://www.avaya.com.  You can also follow them on Twitter as @Avaya.


Tom’s Take

Paul said that 95% of all SPB implementations are in the enterprise.  That shocked me a bit, as I always thought of SPB as a service provider protocol.  I think the key comes down to something Paul said in the video.  When we are faced with applications or additional complexity today, we tend to just throw more headers at the problem.  We figured that wrapping the whole mess in a new tag or a new tunnel will take care of everything.  At least until it all collapses into a puddle.  Avaya’s approach with SPB was to go back down to the lower layers and change the architecture of things to optimize everything and make it work the right way on all kinds of existing hardware.  To quote Paul, “In the IEEE, we don’t build things for the fun it.”  That means SPB has their feet grounded in the right place.  Considering how difficult things can be in data center networking, that’s magical indeed.

Tech Field Day Disclaimer

Avaya was a presenter at the Tech Field Day Interop Roundtable.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

HP Networking and the Software Defined Store

HP

HP has had a pretty good track record with SDN.  Even if it’s not very well-known.  HP has embraced OpenFlow on a good number of its Procurve switches.  Given the age of these devices, there’s a good chance you can find them laying around in labs or in retired network closets to test with.  But where is that going to lead in the long run?

HP Networking was kind enough to come to Interop New York and participate in a Tech Field Day roundtable.  It had been a while since I talked to their team.  I wanted to see how they were handling the battle being waged between OpenFlow proponents like NEC and Brocade, Cisco and their hardware focus, and VMware with NSX.  Jacob Rapp and Chris Young (@NetManChris) stepped up to the plate to talk about SDN and the vision on HP.

They cover a lot of ground in here.  Probably the most important piece to me is the SDN app store.

The press picked up on this quickly.  HP has an interesting idea here.  I should know.  I mentioned it in passing in an article I wrote a month ago.  The more I think about the app store model, the more I realize that many vendors are going to go down the road.  Just not in the way HP is thinking.

HP wants to curate content for enterprises.  They want to ensure that software works with their controller to be sure that there aren’t any hiccups in implementation.  Given their apparent distaste for open source efforts, it’s safe to say that their efforts will only benefit HP customers.  That’s not to say that those same programs won’t work on other controllers.  So long as they operate according to the guidelines laid down by the Open Networking Foundation, all should be good.

Show Me The Money

Where’s the value then?  That’s in positioning the apps in the store.  Yes, you’re going to have some developers come to HP and want to simple apps to put in the store.  Odds are better that you’re going to see more recognizable vendors coming to the HP SDN store.  People are more likely to buy software from a name they recognize, like TippingPoint or F5.  That means that those companies are going to want to have a prime spot in the store.  HP is going to make something from hosting those folks.

The real revenue doesn’t come from an SMB buying a load balancer once.  It comes from a company offering it as a service with a recurring fee.  The vendor gets a revenue stream. HP would be wise to work out a recurring fee as well.  It won’t be the juicy 30% cut that Apple enjoys from their walled garden, but anything would be great for the bottom line.  Vendors win from additional sales.  Customers win from having curated apps that work every time that are easy to purchase, install, and configure.  HP wins because everyone comes to them.

Fragmentation As A Service

Now that HP has jumped on the idea of an enterprise-focused SDN app store, I wonder which company will be the next to offer one?  I also worry that having multiple app stores won’t end up being cumbersome in the long run.  Small developers won’t like submitting their app to four or five different vendor-affiliated stores.  More likely they’ll resort to releasing code on their own rather than jump through hoops.  That will eventually lead to support fragmentation.  Fragmentation helps no one.


Tom’s Take

HP Networking did a great job showcasing what they’ve been doing in SDN.  It was also nice to hear about their announcements the day before they broke wide to the press.  I think HP is going to do well with OpenFlow on their devices.  Integrating OpenFlow visibility into their management tools is also going to do wonders for people worried about keeping up with all the confusing things that SDN can do to a traditional network.  The app store is a very intriguing concept that bears watching.  We can only hope that it ends up being a well-respect entry in a long line of easing customers into the greater SDN world.

Tech Field Day Disclaimer

HP was a presenter at the Tech Field Day Interop Roundtable.  In addition, they also provided the delegates a 1TB USB3 hard disk drive.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

SDN and NFV – The Ups and Downs

TopSDNBottomNFV

I was pondering the dichotomy between Software Defined Networking (SDN) and Network Function Virtualization (NFV) the other day.  I’ve heard a lot of vendors and bloggers talking about how one inevitably leads to the other.  I’ve also seen a lot of folks saying that the two couldn’t be further apart on the scale of software networking.  The more I thought about these topics, the more I realized they are two sides of the coin.  The problem, at least in my mind, is the perspective.

SDN – Planning The Paradigm

Software Defined Networking telegraphs everything about what it is trying to accomplish right there in the name.  Specifically, the “Definition” part of the phrase.  I’ve made jokes in the past about the lack of definition in SDN as vendors try to adapt their solutions to fit the buzzword mold.  What I finally came to realize is that the SDN folks are all about definition. SDN is the Top Down approach to planning.  SDN seeks to decompose the network into subsystems that can be replaced or reprogrammed to suit the needs of those things which utilize the network.

As an example, SDN breaks the idea of switch down into things like “forwarding plane” and “control plane” and seeks to replace the control plane with alternative software, whether it be a controller-based architecture like OpenFlow or an overlay network similar to that of VMware/Nicira.  We can replace the OS of a switch with a concept like OpenFlow easily.  It’s just a mechanism for determining which entries are populated in the Content Addressable Memory (CAM) tables of the forwarding plane.  In top down design, it’s easy to create a stub entry or “black box” to hold information that flows into it.  We don’t particularly care how the black box works from the top of the design, just that it does its job when called upon.

Top Down designs tend to run into issues when those black boxes lack detail or are missing some critical functionality.  What happens when OpenFlow isn’t capable of processing flows fast enough to keep the CAM table of a campus switch populated with entries?  Is the switch going to fall back to process switching the packets?  That could be an big issue.  Top Down designs are usually very academic and elegant.  They also have a tendency to lack concrete examples and real world experience.  When you think about it, that does speak a lot about the early days of SDN – lots of definition of terminology and technology, but a severe lack of actual packet forwarding.

NFV – Working From The Ground Up

Network Function Virtualization takes a very different approach to the idea of turning hardware networks into software networks.  The driving principle behind NFV is replication of existing technology in a software state.  This is classic Bottom Up design.  Rather than spending a large amount of time planning and assembling the perfect system, Bottom Up designers tend to build as they go.  They concentrate on making something work first, then making their things work together second.

NFV is great for hands-on folks because it gives concrete, real results almost immediately. Once you’ve converted an load balancer or a router to a purely software-based construct you can see right away how it works and what the limitations might be.  Does it consume too many resources on the hypervisor?  Does it excel at forwarding small packets?  Does switching a large packet locally cause a fault?  These are problems that can be corrected in the individual system rapidly rather than waiting to modify the overall plan to account for difficulties in the virtualization process.

Bottom Up design does suffer from some issues as well.  The focus in Bottom Up is on getting things done on a case-by-case basis.  What do you do when you’ve converted all your hardware to software?  Do your NFV systems need to talk to one another?  That’s usually where Bottom Up design starts breaking down.  Without a grand plan at a higher level to ensure that systems can talk to each other this design methodology falls back to a series of “hacks” to get them connected.  Units developed in isolation aren’t required to play nice with everyone else until they are forced to do so.  That leads to increasing complex and fragile interconnection systems that could fail spectacularly should the wrong thread be yanked with sufficient force.


Tom’s Take

Which method is better?  Should we spend all our time planning the system and hope that our Powerpoint Designs work the right way when someone codes them in a few months?  Or should we say “damn the torpedoes” and start building things left and right and hope that someone will figure out a way to tie all these individual pieces together at some point?

Surprisingly, the most successful design requires elements of both.  People need to have a basic plan at the least when starting out on a plan to change the networking world.  Once the ideas are sketched out, you need a team of folks willing to burn the midnight oil and get the ideas implemented in real life to ensure that the plan works the right way.  The guidance from the top is essential to making sure everything works together in the end.

Whether you are leading from the top or the bottom, remember that everything has to meet in the middle sooner or later.

Spanning Tree Isn’t Evil

In a recent article I wrote for Network Computing, I talked about how licensing costs for advanced layer 2 features were going to delay the adoption of TRILL and its vendor-specific derivatives. Along the way I talked about how TRILL was a much better solution for data centers than 802.1D spanning tree and its successor protocols. A couple of people seemed to think that I had the same distaste for spanning tree that I do for NAT:

Allow me to claify. I don’t dislike spanning tree. It has a very important job to do in a network. I just think that some networks have eclipsed the advantages of spanning tree.

In a campus network, spanning tree is a requirement. There are a large number of ports facing users that you have no control of beyond the switch level. Think about a college dorm network, for instance. Hundreds if not thousands of ports that students could be plugging in desktops, laptops, gaming consoles, or all other manner of devices. Considering that most student today have a combination of all of the above it stands to reason that many of them are going to try to circumvent polices in place allowing one device per port in each room. Once a tech-savvy student goes out and purchases a switch or SOHO router network admins need to make sure that the core network is as protected as it can be from accidental exposure.

Running 802.1w rapid spanning tree functions like Portfast and BPDUGuard on all user facing ports is not only best practice but should be the rule at all times. Radia Perlman gave an excellent talk about the history of spanning tree a few years ago about 10 minutes in (watch the whole thing if you haven’t already; it’s that good):

She talks about the development of spanning tree as something to mollify her bosses at DEC in the off chance that someone did something they weren’t supposed to with these fancy new Ethernet bridges. I mean, who would be careless enough to plug a bridge back into itself and flood the network with unknown unicast frames? As luck would have it, that’s *exactly* what happened the first time it was plugged in. You can never be sure that users aren’t going to shoot themselves in the foot. That’s what spanning tree really provides: peace of mind from human error.

A modern data center is a totally different animal from a campus network. Admins control access to every switch port. We know exactly where things are plugged in. It takes forms and change requests to touch anything in the server farm or the core. What advantage is spanning tree providing here? Sure, there is the off chance that I might make a mistake when recabling something. Odds are better I’m going to run into blocked links or disabled multipath connections to servers because spanning tree is doing the job it was designed to do decades ago. Data centers don’t need single paths back to a root bridge to do their jobs. They need high speed connections that allow for multiple paths to carry the maximum amount of data or provide for failover in the event of a problem.

In a perfect world, everything down to the switch would be a layer 3 connection. No spanning tree, no bridging loops. Unfortunately, this isn’t a perfect world. The data center has to be flat, sometimes flat across a large geographic area. This is because the networking inside hypervisors isn’t intelligent enough right now to understand the world beyond a MAC address lookup. We’re working on making the network smarter, but it’s going to take time. In the interim, we have to be aware that we’re reducing the throughput of a data center running spanning tree to a single link back to a root bridge. Or, we’re running without spanning tree and taking the risk that something catastrophic is going to blow up in our faces when disaster strikes.

TRILL is a better solution for the data center by far because of the multipath capabilities and failover computations. The fact that this is all accomplished by running IS-IS at layer 2 isn’t lost on me at all. Solving layer 2 issues with layer 3 designs has been done for years. But to accuse spanning tree of being evil because of all this is the wrong line of thinking. You can’t say that incandescent light bulbs are evil just because new technology like compact florescent (CFL) exists. They both serve the same purpose – to illuminate things. Sure, CFLs are more efficient for a given wattage. They also don’t produce nearly the same amount of heat. But, they are more expensive. For certain applications, like 3-way lamps and lights with dimmer switches, incandescent bulbs are still a much better and cheaper alternative. Is the solution to do away with all the old technology and force people to use new tech in an inefficient way? Or should we design around the old tech for the time being and a way to make the new tech work the way it should when we remodel?


Tom’s Take

As long as Ethernet exists, spanning tree will exist. That’s a fact of life. The risks of a meltdown due to bridging loops are getting worse with new technology. How fast do you think a 40GigE link will be able to saturate a network with unknown unicast frames in a bridging loop? Do you think even a multicore CPU would be able to stand up to that kind of abuse? The answer is instead to find new technology like TRILL and design our future around applying it in the best way possible. Spanning tree won’t go away overnight. Just like DOS, just like IPX. We can’t stop it. But we can contain it to where it belongs.