A Modest Proposal for Cisco Interface Naming

If you’re going to be configuring an interface in a switch, which one are you going to use? The interface has a name and a number based on where it is on the device. The numbering part is fairly easy to figure out. The module number comes first, followed by the slot, and finally the port. In the world of Cisco, which is the one I’m the most familiar with, that means a fixed configuration switch usually has interfaces labeled 0/24, with no module and the slot almost always being zero. With a modular switch the interface would be labeled 2/0/28 to indicate the 28th port on the second line card.

The issue arises when you factor in the first part of the interface naming convention. The nomenclature used in the Cisco world since the beginning of time has been the interface speed. If your interface is a 100Mbit Ethernet interface then the interface name is FastEthernet0/48. If you’re using a 1Gbit interface it’s GigabitEthernet0/48. If it’s a 10Gbit interface it becomes TenGigabitEthernet0/48. It’s a progression of interface speeds. Even if the port is capable of using 10/100/1000 the port is referred to at the highest speed. The 10Gbit ports don’t negotiate to a lower speed so they are always TenGigabit.

Now, with the new Catalyst 9000X Series switches you have the option of a port that can do 10/100/1000/2.5G/5G/10G. It’s a multi-gigabit capable port as outlined in this video from a recent Field Day event:

I don’t even want to think about the nightmare of trying to remember the name of the interface in a situation like that. It’s hard enough still to remember what the switches are capable of, whether it’s a gigabit port or a ten gigabit port. It’s hard enough for me but it’s impossible for anyone looking to make a script or automation that needs to be repeatable over a wide variety of switches.

Naming Convention Unity

The obvious solution here is actually present in another Cisco platform already. The Nexus series of switches running Cisco NX-OS already knows how to handle interface naming issues. It’s a problem that occurs when you could be running Gigabit Ethernet or Fibre Channel over Ethernet interfaces.

How does NX-OS fix this? The interface type is always Ethernet. It doesn’t matter what speed the interface is running at because it’s always Ethernet under the hood. So the every interface is labeled Ethernet(module/slot/port). It’s wonderful when you’re working on the device because you never have to make a typo based on the interface speed. It’s also consistent across every platform and module that could be installed.

The solution to the problem actually came from my friend Bruno Wollmann (@WollmanBruno) during the event. It’s well past time to adopt the NX-OS convention in every other version of Cisco IOS. You can alias the old interface names to the new nomenclature to ease the transition if you’d like. Otherwise it’s time to start making people use the basic convention of calling everything Ethernet.

This would ease new users into IOS by making all interfaces the same and focusing on the module/slot/port complexity as the primary place to learn about naming. It also helps with the development of automation and orchestration scripts. Rather than needing to detect device types or write different scripts for NX-OS or IOS-XE you can just use the same logic for both.


Tom’s Take

It’s time we move into the future of interface names. Bruno made an excellent point when he said that it’s time to adopt a standard across the whole line of switches. Cisco may not be ready to do a single OS but having unity between the features will go a long way to making it easier to work on the various platforms. You can do a lot with aliasing in the system to ease the transition but maybe by the time we get to IOS 20 we can use something that’s more manageable for the future.

The Network Does Too Much

I’m at Networking Field Day this week and it’s good to be back in person around other brilliant engineers and companies. One of the other fun things that happens at Networking Field Day is that I get to chat with folks that help me think about things in new ways and come up with awesome ideas for networking blog posts.

One of the ones that was discussed quickly this week really got me thinking again about fragility and complexity. Thanks to Carl Fugate for reminding me about it. Essentially, networks are inherently unstable because they are doing far too much heavy lifting.

Swiss Army Design

Have you heard about the AxeSaw Reddit? It’s a page dedicated to finding silly tools that attempt to combine too many things together into one package that make the overall tool less useful. Like making a combination shovel and axe that isn’t easy to operate because you have to hold on to the shovel scoop as the handle for the axe and so on. It’s a goofy take on a trend of trying to make things too compact at the sake of usability.

Networking has this issue as well. I’ve talked about it before here but nothing has really changed in the intervening time since that post five years ago. The developers that build applications and systems still rely on the network to help solve challenges that shouldn’t be solved in the network. Things like first hop reachability protocols (FHRP) are perfect examples. Back in the dark ages, systems didn’t know how to handle what happened when a gateway disappeared. They knew they needed to find a new one eventually when the MAC address age timed out. However, for applications that couldn’t wait there needed to be a way to pick up the packets and keep them from timing out.

Great idea in theory, right? But what if the application knew how to handle that? Things like Cisco CallManager have been able to designate backup servers for years. Applications built for the cloud know how to fail over properly and work correctly when a peer disappears or a route to a resource fails. What happened? Why did we suddenly move from a model where you have to find a way to plan for failure with stupid switching tricks to a world where software just works as long as Amazon is online?

The answer is that we removed the ability for those stupid tricks to work without paying a high cost. You want to use Layer 2 tricks to fake NHRP? If it’s even available in the cloud you’re going to be paying a fortune for it. AWS wants you to use their tools that they optimize for. If you want to do things the way you’ve always done you can but you need to pay for that privileges.

With cost being the primary driver for all things, increased costs for stupid switching tricks have now given way to better software development. Instead of paying thousands of dollars a month for a layer 2 connection to run something like HSRP you can instead just make the application start searching for a new server when the old one goes away. You can write it to use DNS instead of IP so you can load balance or load share. You can do many, many exciting and wonderful things to provide a better experience that you wouldn’t have even considered before because you just relied on the networking team to keep you from shooting yourself in the foot.

Network Complexity Crunch

If the cloud forces people to use their solutions for reliability and such, that means the network is going to go away, right? It’s the apocalypse for engineer jobs. We’re all going to get replaced by a DevOps script. And the hyperbole continues on.

Reality is that networking engineers will still be as needed as highway engineers are even though cars are a hundred times safer now than in the middle of the 20th century. Just because you’ve accommodated for something that you used to be forced to do doesn’t mean you don’t need to build the pathway. You still need roads and networks to connect things that want to communicate.

What it means for the engineering team is an increased focus on providing optimal reliable communications. If we remove the need to deal with crazy ARP tricks and things like that we can focus on optimizing routing to provide multiple paths and ensure systems have better communications. We could even do crazy things like remove our reliability of legacy IP because the applications will survive a transition when they aren’t beholden to IP address or ARP to prevent failure.

Networking will become a utility. Like electricity or natural gas it won’t be visible unless it’s missing. Likewise, you don’t worry about the utility company solving issues about delivery to your home or business. You don’t ask them to provide backup pipelines or creative hacks to make it work. You are handed a pipeline that has bandwidth and the service is delivered. You don’t feel like the utility companies are outdated or useless because you’re getting what you pay for. And you don’t have to call them every time the heater doesn’t work or you flip a breaker. Because that infrastructure is on your side instead of theirs.


Tom’s Take

I’m ready for a brave new world where the network is featureless and boring. It’s an open highway with no airbags along the shoulder to prevent you from flying off the road. No drones designed to automatically pick you up and put you on the correct path to your destination if you missed your exit. The network is the pathway, not the system that provides all the connection. You need to rely on your systems and your applications to do the heavy lifting. Because we’ve finally found a solution that doesn’t allow the networking team to save the day we can absolutely build a world where the clients are responsible for their own behavior. The network needs to do what it was designed to do and no more. That’s how you solve complexity and fragility. Less features means less to break.

Chip Shortages Aren’t Sweet for Networking

Have you tried to order networking gear recently? You’re probably cursing because the lead times on most everything are getting long. It’s not uncommon to see lead times on wireless access points or switch gear reaching 180 days or more. Reports from the Internet say that some people are still waiting to get things they ordered this spring. The prospect of rapid delivery of equipment is fading like the summer sun.

Why are we here? What happened? And can we do anything about it?

Fewer Chips, More Air

The pandemic has obviously had the biggest impact for a number of reasons. When a fabrication facility shuts down it doesn’t just ramp back up. Even when all the workers are healthy and the city where it is located is open for business it takes weeks to bring everything back online to full capacity. Just like any manufacturing facility you can’t just snap your fingers and get back to churning out the widgets.

The pandemic has also strained supply chains around the world. Even if the fabs had stayed open this entire time you’d be looking at a shortage of materials to make the equipment. Global supply chains were running extremely lean in 2019 and exposing one aspect of them has created a cascade effect that has caused stress everywhere. The lack of toilet paper or lunchmeat in your grocery store shows that. Even when the supply is available the ability to deliver it is impacted.

The supply chain problem also belies the issue on the other side of the shipping container. Even if the fabs had enough chips to sell to anyone that wanted them it’s hard to get those parts delivered to the companies that make things. If this were simply an issue of a company not getting the materials it needed to make a widget in a reasonable time there wouldn’t be as much issue. But because these companies make things that other companies use to make things the hiccups in the chain are exacerbated. If TSMC is delayed by a month getting a run of chips out, that month-long delay only increases for those down the line.

We’ve got issues getting facilities back online. We’ve got supply chains causing problems all over the place. Simple economics says we should just build more facilities, right? The opportunity costs of not having enough production around means we have ample space to make more of the things we need and profit. You’re right. Companies like Intel are bringing new fabs online as fast as they can. Sadly, that is a process that is measured in months or even years. The capacity we need to offset the disruption to the chip market should have been built two years ago if we wanted it ready now.

All of these factors are mixed into one simple truth. Without the materials, manufacturing, or supply chain to deliver the equipment we’re going to be left out in the cold if we want something delivered today. Just in Time inventory is about to become Somewhere in Time inventory. We’re powerless to change the supply chain. Does that means we’re powerless to prevent disruption to our planning process?

Proactive Processes

We may not be able to assemble networking gear ourselves to speed up the process but we are far from helpless. The process and the planning around gear acquisition and deployment has to change to reflect the current state of the world. We can have an impact provided we’re ready to lead by example.

  • Procure NOW: Purchasing departments are notorious for waiting until the last minute to buy things. Part of that reasoning is that expenditures are worth less in the future than right now because those assets are more valuable today gaining interest or something. You need to go to the purchasing department and educate them about how things are working right now. Instead of them sitting on the project for another few months you need to tell them that the parts have to be ordered right now in order for them to be delivered in six or seven months. They’re going to fight you and tell you that they can just wait. However, we all know this isn’t going to clear up any time soon. If they persist in trying to tell you that you need to wait just have them try to go car shopping to illustrate the issue. If you want stuff by the end of Q1 2022, you need to get that order in NOW.
  • Preconfigure Things However You Can: If you’re stuck waiting six months to get switches and access points, are you going to be stuck waiting another month after they come in to configure them? I hope that answer is a resounding “NO”. There are resources available to make sure you can get things configured now so you’re not waiting when the equipment is sitting on a loading dock somewhere. You need to reach out to your VAR or your vendor and get some time on lab gear in the interim. If you ordered a wireless controller or a data center switch you can probably get some rack time on a very similar device or even the exact same one in a lab somewhere. That means you can work on a basic configuration or even provision things like VLANs or SSIDs so you’re not recreating the wheel when things come in. Even if all you have is a skeleton config you’re hours ahead of where you would be otherwise. And if the VAR or vendor gives you a hard time about lab gear you can always remind them that there are other options available for the next product refresh.
  • Minimum Viable Functionality: All this advice is great for a new pod or an addition to an existing network that isn’t critical. What if the gear you ordered is needed right now? What if this project can’t wait? How can you make things work today with nothing in hand? This is a bit trickier because it will require duplicate work. If you need to get things operational today you need to work with what you have today. That means you may have to salvage an old lab switch or pull something out of production and reduce available ports until the gear can arrive. It also means you’re going to have to backup the old configs, erase them completely (don’t forget about the VLAN database and VTP server configurations), and then put on the new info. When the new equipment comes in you’re going to have to do it all over again in reverse. It’s more work but it leads to things being operational today instead of constantly telling someone that it’s going to be a while. If you’re a VAR that’s doing this for a customer, you’d better make it very clear this is temporary and just a loan. Otherwise you might find your equipment being a permanent addition even after everything comes in.

Tom’s Take

The chip shortage is one of those things that’s going to linger under the best of circumstances. We’re going to be pressed to get gear in well into 2022. That means delayed projects and lots of arguing about what’s critical and what’s not. We can’t fix the semiconductor sector of the market but we can work to make sure that the impact felt there is the only one that impacts us right now. The more we do ahead of time to make things smooth the better it will be when it’s finally time to make things happen. Don’t let the lack of planning on the part of the supply chain sour your outlook on doing your role in networking.

802.11ax Is NOT A Wireless Switch

802.11ax is fast approaching. Though not 100% ratified by the IEEE, the spec is at the point where most manufacturers and vendors are going to support what’s current as the “final” version for now. While the spec for what marketing people like to call Wi-Fi 6 is not likely to change, that doesn’t mean that the ramp up to get people to buy it is showing any signs of starting off slow. One of the biggest problems I see right now is the decision by some major AP manufacturers to call 802.11ax a “wireless switch”.

Complex Duplex

In case you had any doubts, 802.11ax is NOT a switch.1 But the answer to why that is takes some explanation. It all starts with the network. More specifically, with Ethernet.

Ethernet is a broadcast medium. Packets are launched into the network and it is hoped that the packet finds the destination. All nodes on the network listen and, if the packet isn’t destined for them they discard it. This is the nature of the broadcast. If multiple stations try to talk at once, the packets collide and no one hears anything. That’s why Ethernet developed a collision detection  system called CSMA/CD.

Switches solved this problem by segmenting the collision domain to a single port. Now, the only communications between the stations would be in the event that the switch couldn’t find the proper port to forward a packet. In every other case, the switch finds where the packet is meant to be sent and forwards it to that location. It prevents collisions by ensuring that no two stations can transmit at any one time except to the switch in the middle. This also allows communications to be full-duplex, meaning the stations can send and receive at the same time.

Wireless is a different medium. The AP still speaks Ethernet, and there is a bridge between the Ethernet interface and the radios on the other side. But the radio interfaces work differently than Ethernet. Firstly, they are half-duplex only. That means that they have to send traffic or listen to receive traffic but they can’t do both at the same time. Wireless also uses a different version of collision detection called CSMA/CA, where the last A stands for “avoidance”. Because of the half-duplex nature of wireless, clients have a complex process to make sure the frequency is clear before transmitting. They have to check whether or not other wireless clients are talking and if the ambient RF is within the proper thresholds. After all those checks are confirmed, then the client transmits.

Because of the half-duplex wireless connection and the need for stations to have permission to send, some people have said that wireless is a lot like an Ethernet hub, which is pretty accurate. All stations and APs exist in the contention (collision) domain. Aside from the contention algorithm, there’s nothing to stop the stations from talking all at once. And for the entire life of 802.11 so far, it’s worked. 802.11ac started to introduce more features designed to let APs send frames to multiple stations at the same time. That’s what’s called Multi-user Multi-In, Multi-out (MU-MIMO).In theory, it could allow for full-duplex transmissions by allowing a client to send on one antenna and receive on another, but utilizing client radios in this way has impacts on other things.

Switching It Up

Enter 802.11ax. The Wi-Fi 6 feature that has most people excited is Orthogonal Frequency-Division Multiple Access (OFDMA). Simply put, OFDMA allows the clients and APs to not use the entire transmission channel for sending data. It can be sliced up into sub-channels that can be used for low-bandwidth applications to reserve time to talk to the AP. Combined with enhanced MU-MIMO support in 802.11ax, the idea is that clients can talk directly to the AP and allocate a specific sub-channel resource unit all to themselves.

To the marketing people in the room, this sounds just like a switch. Reserved channels, single station access, right? Except it is still not a switch. The AP is still a bridge between two media types for one thing, but more importantly the transmission medium still hasn’t magically become full-duplex. Stations may get around this with some kind of trickery, but they still need to wait for the all-clear to send data. Remember that all stations and APs still hear all the transmissions. It’s still a broadcast medium at the most basic. No amount of software configuration is going to fix that. And for the networking people in the room that might be saying “so what?”, remember when Cisco tried to sell us on the idea that StackWise was capable of 40Gbps of throughput because it could send in both directions on the StackWise ring at once? Remember when you started screaming “THAT’S NOT HOW BANDWIDTH WORKS!!!” That’s what this is, basically. Smoke and mirrors and ignoring the underlying physical layer constraints.

In fact, if you read the above resources, you’re going to find a lot of caveats at the end about support for protocols coming up and not being in the first version of the spec. That’s exactly what happened with 802.11ac. The promise of “gigabit Wi-Fi” took a couple of years and the MU-MIMO enhancements everyone was trumpeting never fully materialized. Just like all technology, the really good stuff was deferred to the next release.

To make sure that both sides are heard, it is rightly pointed out by wireless professionals like Sam Clements (@Samuel_Clements) that 802.11ax is the most “switch-like” so far, with multiple dynamic collision domains. However, in the immortal words of Tyler Durden, “Sticking feathers up your butt doesn’t make you a chicken.” The switch moniker is still a marketing construct and doesn’t hold any water in reality aside from a comparison to a somewhat similar technology. The operation of wireless APs may be hub-like or switch-like, but these devices are not either of those types of devices.

CPU Bound and Determined

The other issue that I see that prevents this from becoming a switch is the CPU on the AP becoming a point of contention. In a traditional Ethernet switch, the forwarding hardware is a specialized ASIC that is optimized to forward packets super fast. It does this with some trickery, including cut through for packets and trusting the incoming CRC. When packets bounce up to the CPU to be process-switched, it bogs the entire system down terribly. That’s why most networking texts will tell you to avoid process switching at all costs.

Now apply those lessons to wireless. All this protocol enhancement is now causing the CPU to have to do extra duty to work on time-slicing and sub-channel optimization. And remember that those CPUs are operating on 18-28 watts of power right now. Maybe the newer APs will get over 30 watts with new PoE options, but that means those CPUs are still going to be eating a lot of power to process all this extra software work. Even adding dedicated processing power to the AP isn’t going to fix things in the long run. That might be one of the reasons why Cisco has been pushing enhanced PoE in the run-up to their big 802.11ax launch at the end of April.


Tom’s Take

Let me say it again for the cheap seats: 802.11ax is NOT a wireless switch! The physical layer technology that 802.11 is built on won’t be switchable any time soon. 802.11ax has given us a lot of enhancements in the protocol and there is a lot to be excited about, like OFDMA, BSS coloring, and TWT. But, like the decision to over-simplify the marketing name, the idea of calling it a wireless switch just to give people a frame of reference so they buy more of them is just silly. It’s disingenuous and sounds more like a snake oil salesman than honest technology marketing. Rather than trying to trick the users with cute sounding terms, how about we keep the discussion honest and discuss the pros and cons of the technology?

Special thanks to my friends in the wireless space for proofreading this post and correcting my errors in technology:


  1. The title was kind of a spoiler ↩︎

Network Visibility with Barefoot Deep Insight

As you may have heard this week, Barefoot Networks is back in the news with the release of their newest product, Barefoot Deep Insight. Choosing to go down the road of naming a thing after what it actually does, Barefoot has created a solution to finding out why network packets are behaving the way they are.

Observer Problem

It’s no secret that modern network monitoring is coming out of the Dark Ages. ping, traceroute, and SNMP aren’t exactly the best tools to be giving any kind of real information about things. They were designed for a different time with much less packet flow. Even Netflow can’t keep up with modern networks running at multi-gigabit speeds. And even if it could, it’s still missing in-flight data about network paths and packet delays.

Imagine standing outside of the Holland Tunnel. You know that a car entered at a specific time. And you see the car exit. But you don’t know what happened to the car in between. If the car takes 5 minutes to traverse the tunnel you have no way of knowing if that’s normal or not. Likewise, if a car is delayed and takes 7-8 minutes to exit you can’t tell what caused the delay. Without being able to see the car at various points along the journey you are essentially guessing about the state of the transit network at any given time.

Trying to solve this problem in a network can be difficult. That’s because the OS running on the devices doesn’t generally lend itself to easy monitoring. The old days of SNMP proved that time and time again. Today’s networks are getting a bit better with regard to APIs and the like. You could even go all the way up the food chain and buy something like Cisco Tetration if you absolutely needed that much visibility.

Embedding Reporting

Barefoot solves this problem by using their P4 language in concert with the Tofino chipset to provide a way for there to be visibility into the packets as they traverse the network. P4 gives Tofino the flexibility to build on to the data plane processing of a packet. Rather than bolting the monitoring on after the fact you can now put it right along side the packet flow and collect information as it happens.

The other key is that the real work is done by the Deep Insight Analytics Software running outside of the switch. The Analytics platform takes the data collected from the Tofino switches and starts processing it. It creates baselines of traffic patterns and starts looking for anomalies in the data. This is why Deep Insight claims to be able to detect microbursts. Because the monitoring platform can analyze the data being fed to it and provide the operator with insights.

It’s important to note that this is info only. The insights gathered from Deep Insight are for informational purposes. This is where the skill of network professional comes into play. By gaining perspective into what could be causing issues like microbursts from the software you gain the ability to take your skills and fix those issues. Perhaps it’s a misconfigured ECMP pair. Maybe it’s a dead or dying cable in a link. Armed with the data from the platform, you can work your networking magic to make it right.

Barefoot says that Deep Insight builds on itself via machine learning. While machine learning is seems to be one of the buzzwords du jour it could be hoped that a platform that can analyze the states of packets can start to build an idea of what’s causing them to behave in certain ways. While not mentioned in the press release, it could also be inferred that there are ways to upload the data from your system to a larger set of servers. Then you can have more analytics applied to the datasets and more insights extracted.


Tom’s Take

The Deep Insight platform is what I was hoping to see from Barefoot after I saw them earlier this year at Networking Field Day 14. They are taking the flexibility of the Tofino chip and the extensibility of P4 and combining them to build new and exciting things that run right alongside the data plane on the switches. This means that they can provide the kinds of tools that companies are willing to pay quite a bit for and do it in a way that is 100% capable of being audited and extended by brilliant programmers. I hope that Deep Insight takes off and sees wide adoption for Barefoot customers. That will be the biggest endorsement of what they’re doing and give them a long runway to building more in the future.

Network Longevity – Think Car, Not iPhone

One of the many takeaways I got from Future:Net last week was the desire for networks to do more. The presenters were talking about their hypothesized networks being able to make intelligent decisions based on intent and other factors. I say “hypothesized” because almost everyone admitted that we aren’t quite there. Yet. But the more I thought about it, the more I realized that perhaps the timeline for these mythical networks is a bit skewed in favor of refresh cycles that are shorter than we expect.

Software Eats The World

SDN has changed the way we look at things. Yes, it’s a lot of hype. Yes, it’s an overloaded term. But it’s also the promise of getting devices to do much more than we had ever dreamed. It’s about automation and programmability and, now, deriving intent from plain language. It’s everything we could ever want a simple box of ASICs to do for us and more.

But why are we asking so much? Why do we now believe that the network is capable of so much more than it was just five years ago? Is it because we’ve developed a revolutionary new method for making chips that are ten times smarter and a hundred times faster? No. In fact, the opposite is true. Chips are a little faster and a little cheaper, but they aren’t custom construction. Most companies are embracing merchant silicon from companies like Broadcom and Mellanox. So, where’s all this functionality coming from?

Its’ the “S” in SDN. We’re seeing software driving this development. Yes, it’s the same thing we’ve been saying for years now. But it’s something more now. Admins and engineers aren’t just excited that network devices are more software now. They’re expecting it. They’re looking to the software to drive the development. Gone are the days of CLI-only access. Now, the interfaces are practically built on API-driven capabilities. The Aruba 8400 seems like it was built for the API first, with the GUI being a superset of API functions. People getting into the networking world are focusing on things like Python first and CLI syntax second. Software is winning again.

It’s like the iPhone revolution. The hardware is completely divorced from the software. I can upgrade to the latest version of Apple’s iOS and get most of the functionality I want in the OS. Aside from some hardware specific things the majority of the functions work no matter what. Every year, I get new toys to play with. Every 12 months I can enable new functions and have new avenues available to me. If you think back to the first iterations of the iPhone ten years ago, you can see how far software development has driven this device.

Hardware For The Long Haul

However, for all the grandeur and amazement that iPhone (and Android for that matter) show, it’s also created a side effect that is causing even more problems in the mobile device world that is bleeding over into other electronics. That is the rapid replacement cycle. I mentioned above how awesome it is that you can get new functions every year in your mobile device. But I didn’t mention how people are already looking forward to the newest hardware even though they may have purchased a new phone just 10 months ago.

The desire to get faster by leaps and bounds every year is almost comical at this point. When the new device is delivered an it’s just a bit faster than the last, people are disappointed. Even I was guilty of this when I moved from a 2013 MacBook to a 2016 model. It was only 20% faster. Even with everything else I got in the new package, I found myself chasing performance over every other feature.

The desire to get better performance isn’t just limited to phones and tablets and laptops. When network designers look to increase network performance, they want to do so in a way that makes their users happy. Gone are the days when a simple gigabit network connection would keep someone pleased. We now have to develop gigabit wireless connections to the backbone network, where data is passed to servers connected by 25Gig connections to spines and super spines running at 100Gig (for now). In some cases, that data doesn’t even move any more, and instead short-lived containers are spun up next to it to make things run faster.

Networks aren’t designed like iPhones. They aren’t built to be ripped out every year and replaced with something a little faster and a little shinier. They’re designed more like cars in that regard. They are large purchases that should last for years upon years, with their performance profile dictated by the design at the time. We’ve gone through the phases of running 10Mbit hubs. We upgraded to FastEthernet. We’re now living in a cycle where Gigabit and 10/40 Gigabit switches are ruling the roost. Some of the more forward-thinking folks are adopting 25/50Gigabit and even looking to faster technologies like 400Gig connections for spines.

However, the longevity of hardware remains. Capital expenditure isn’t the easiest thing in the world to accomplish. You need to budget for new devices in the networking world. You need to justify cost savings or new applications. You can’t just buy a fast new switch or two every year and claim that it makes everything better without some kind of justification. Even if you move to the cloud with your strategy, you’re not changing the purchasing model. You’re just trading capital expenditure to operational expenditure. You’re leasing a car instead of buying it. Because if you leave the cloud, you’re back to your old switching model with nothing to show for it.


Tom’s Take

I was pretty hard on the Future:Net presenters because I felt that their talk of ditching money grubbing vendors for the purity of whitebox switching running open source software was a bit heavy handed. Most organizations are in the middle of a refresh cycle and can’t afford to rip everything out and replace it today. Moreover, the value in whitebox switching isn’t realized when you install new hardware. It’s realized when you turn your switch into an iPhone. Where software development rules the day and your hardware fades away in the background. We’re not there yet. We’re still buying hardware for the sake of hardware as the software is catching up. Networking equipment is still built and bought like a car and will be for a few years to come. Maybe we can revisit this topic in 3 years and see how far software will have driven us by then.

Networking Grows To Invisibility

Cat5

Networking is done. The way you have done things before is finished. The writing has been on the wall for quite a while now. But it’s going to be a good thing.

The Old Standard

Networking purchase models look much different today than they have in the past. Enterprises no longer buy a switch or a router. Instead, they buy solution packages. The minimum purchase unit is a networking pod or rack. Perhaps your proof-of-concept minimum is a leaf-spine of no less than 3 switches. Firewalls are purchased in pairs. Nowhere in networking is something simple any longer.

With the advent of software, even the deployment of these devices is different. Automation and orchestration systems provide provisioning as the devices are brought online. Network Monitoring Systems ensure the devices are operating correctly via API call instead of relying on SNMP. Analytics and telemetry systems can pull statistics on the fly and create datasets that give you insight into all manner of network traffic. The intelligence built into the platform supporting the hardware is more apparent than ever before.

Networking is no longer about fast connectivity speed. Instead, networking is about stability. Providing a transport network that stays healthy instead of growing by leaps and bounds every few years. Organizations looking to model their IT departments after service providers and cloud providers care more about having a reliable system than the most cutting edge technology.

This is nothing new in IT. Both storage and virtualization have moved in this direction for a while. Hardware wizardry has been replaced by software intelligence. Custom hardware is now merchant-based and easy to replace and build. The expertise in deployment and operations has more to do with integration and architecture than in simple day-to-day setup.

The New Normal

Where does that leave networkers? Are we a dying breed, soon to join the Unix admins of the word and telco experts on a beach in retirement? The reality is that things aren’t as dire for us as one might believe.

It is true that we have shifted our thinking away from operations and more toward system building. Rather than worry if the switch ports have been provisioned, we instead look at creating resilient constructs that can survive outages and traffic spikes. Networks are becoming the utility service we’ve always hoped they would be.

This is not the end. It’s the beginning. As networks join storage and compute as utilities in the data center, the responsibilities for our sphere of wizardry are significantly reduced. Rather than spending our time solving crazy user or developer problems, we can instead focus on the key points of stability and availability.

This is going to be a huge shift for the consumers of IT as well. As cloud models have already shown us, people really want to get their IT on their schedules. They want to “buy” storage and networking when it’s needed without interruption. Creating a utility resource is the best way to accomplish that. No longer will the blame for delays be laid at the feet of IT.

But at the same time, the safety net of IT will be gone as well. Unlike Chief Engineer Scott, IT can’t save the day when a developer needs to solve a problem outside of their development environment. Things like First Hop Reachability Protocols (FHRP), multipathing, and even vMotion contribute to bad developer behavior. Without these being available in a utility IT setup, application writers are going to have to solve their own problems with their own tools. While the network team will end up being leaner and smarter, it’s going to make everything run much more smoothly.


Tom’s Take

I live for the day when networking is no different than the electrical grid. I would rather have a “dumb” network that provides connectivity rather than hoping against hope that my “smart” network has all the tricks it needs to solve everyone’s problem. When the simplicity of the network is the feature and we don’t solve problems outside the application stack, stability and reliability will rule the day.

Two Takes On ASIC Design

Making ASICs is a tough task. We learned this last year at Cisco Live Berlin from this conversation with Dave Zacks:

Cisco spent 6 years building the UADP ASIC that powers their next generation switches. They solved a lot of the issues with ASIC design and re-spins by creating some programmability in the development process.

Now, watch this video from Nick McKeown at Barefoot Networks:

Nick says many of the same things that Dave said in his video. But Nick and Barefoot took a totally different approach from Cisco. Instead of creating programmable elements in the ASIC design, then abstracted the entire language of function definition from the ASIC. By using P4 as the high level language and making the system compile the instruction sets down to run in the ASIC, they reduced the complexity, increased the speed, and managed to make the system flexible and capable of implementing new technologies even after the ASIC design is set in stone.

Oh, and they managed to do it in 3 years.

Sometimes, you have to think outside the box in order to come up with some new ideas. Even if that means you have to pull everything out of the box. By abstracting the language from the ASIC, Barefoot not only managed to find a way to increase performance but also to add feature sets to the switch quickly without huge engineering costs.

Some food for thought.

HPE Networking: Past, Present, and Future

hpe_pri_grn_pos_rgb

I had the chance to attend HPE Discover last week by invitation from their influencer team. I wanted to see how HPE Networking had been getting along since the acquisition of Aruba Networks last year. There have been some moves and changes, including a new partnership with Arista Networks announced in September. What follows is my analysis of HPE’s Networking portfolio after HPE Discover London and where they are headed in the future.

Campus and Data Center Divisions

Recently, HPE reorganized their networking division along two different lines. The first is the Aruba brand that contains all the wireless assets along with the campus networking portfolio. This is where the campus belongs. The edge of the network is an ever-changing area where connectivity is king. Reallocating the campus assets to the capable Aruba team means that they will do the most good there.

The rest of the data center networking assets were loaded into the Data Center Infrastructure Group (DCIG). This group is headed up by Dominick Wilde and contains things like FlexFabric and Altoline. The partnership with Arista rounds out the rest of the switch portfolio. This helps HPE position their offerings across a wide range of potential clients, from existing data center infrastructure to newer cloud-ready shops focusing on DevOps and rapid application development.

After hearing Dom Wilde speak to us about the networking portfolio goals, I think I can see where HPE is headed going forward.

The Past: HPE FlexFabric

As Dom Wilde said during our session, “I have a market for FlexFabric and can sell it for the next ten years.” FlexFabric represents the traditional data center networking. There is a huge market for existing infrastructure for customers that have made a huge investment in HPE in the past. Dom is absolutely right when he says the market for FlexFabric isn’t going to shrink the foreseeable future. Even though the migration to the cloud is underway, there are a significant number of existing applications that will never be cloud ready.

FlexFabric represents the market segment that will persist on existing solutions until a rewrite of critical applications can be undertaken to get them moved to the cloud. Think of FlexFabric as the vaunted buggy whip manufacturer. They may be the last one left, but for the people that need their products they are the only option in town. DCIG may have eyes on the future, but that plan will be financed by FlexFabric.

The Present: HPE Altoline

Altoline is where HPE was pouring their research for the past year. Altoline is a product line that benefits from the latest in software defined and webscale technologies. It is technology that utilizes OpenSwitch as the operating system. HPE initially developed OpenSwitch as an open, vendor-neutral platform before turning it over to the Linux Foundation this summer to run with development from a variety of different partners.

Dom brought up a couple of great use cases for Altoline during our discussion that struck me as brilliant. One of them was using it as an out-of-band monitoring solution. These switches don’t need to be big or redundant. They need to have ports and a management interface. They don’t need complexity. They need simplicity. That’s where Altoline comes into play. It’s never going to be as complex as FlexFabric or as programmable as Arista. But it doesn’t have to be. In a workshop full of table saw and drill presses, Altoline is a basic screwdriver. It’s a tool you can count on to get the easy jobs done in a pinch.

The Future: Arista

The Arista partnership, according to Dom Wilde, is all about getting ready for the cloud. For those customers that are looking at moving workloads to the cloud or creating a hybrid environment, Arista is the perfect choice. All of Arista’s recent solution sets have been focused on providing high-speed, programmable networking that can integrate a number of development tools. EOS is the most extensible operating system on the market and is a favorite for developers. Positioning Arista at the top of the food chain is a great play for customers that don’t have a huge investment in cloud-ready networking right now.

The question that I keep coming back to is…when does this Arista partnership become an acquisition? There is a significant integration between the two companies. Arista has essentially displaced the top of the line for HPE. How long will it take for Arista to make the partnership more permanent? I can easily foresee HPE making a play for the potential revenues produced by Arista and the help they provide moving things to the cloud.


Tom’s Take

I was the only networking person at HPE Discover this year because the HPE networking story has been simplified quite a bit. On the one hand, you have the campus tied up with Aruba. They have their own story to tell in a different area early next year. On the other hand, you have the simplification of the portfolio with DCIG and the inclusion of the Arista partnership. I think that Altoline is going to find a niche for specific use cases but will never really take off as a separate platform. FlexFabric is in maintenance mode as far as development is concerned. It may get faster, but it isn’t likely to get smarter. Not that it really needs to. FlexFabric will support legacy architecture. The real path forward is Arista and all the flexibility it represents. The question is whether HPE will try to make Arista a business unit before Arista takes off and becomes too expensive to buy.

Disclaimer

I was an invited guest of HPE for HPE Discover London. They paid for my travel and lodging costs as well as covering event transportation and meals. They did not ask for nor were they promised any kind of consideration in the coverage provided here. The opinions and analysis contained in this article represent my thoughts alone.

OpenFlow Is Dead. Long Live OpenFlow.

The King Is Dead - Long Live The King

Remember OpenFlow? The hammer that was set to solve all of our vaguely nail-like problems? Remember how everything was going to be based on OpenFlow going forward and the world was going to be a better place? Or how heretics like Ivan Pepelnjak (@IOSHints) that dared to ask questions about scalability or value of application were derided and laughed at? Yeah, good times. Today, I stand here to eulogize OpenFlow, but not to bury it. And perhaps find out that OpenFlow has a much happier life after death.

OpenFlow Is The Viagra Of Networking

OpenFlow is not that much different than Sildenafil, the active ingredient in Vigara. Both were initially developed to do something that they didn’t end up actually solving. In the case of Sildenafil, it was high blood pressure. The “side effect” of raising blood pressure in a specific body part wasn’t even realized until after the trials of the drug. The side effect because the primary focus of the medication that was eventually developed into a billion dollar industry.

In the same way, OpenFlow failed at its stated mission of replacing the forwarding plane programming method of switches. As pointed out by folks like Ivan, it had huge scalability issues. It was a bit clunky when it came to handling flow programming. The race from 1.0 to 1.3 spec finalization left vendors in the dust, but the freeze on 1.3 for the past few years has really hurt innovation. Objectively, the fact that almost no major shipping product uses OpenFlow as a forwarding paradigm should be evidence of it’s failure.

The side effect of OpenFlow is that it proved that networking could be done in software just as easily as it could be done in hardware. Things that we thought we historically needed ASICs and FPGAs to do could be done by a software construct. OpenFlow proved the viability of Software Defined Networking in a way that no one else could. Yet, as people abandoned it for other faster protocols or rewrote their stacks to take advantage of other methods, OpenFlow did still have a great number of uses.

OpenFlow Is a Garlic Press, Not A Hammer

OpenFlow isn’t really designed to solve every problem. It’s not a generic tool that can be used in a variety of situations. It has some very specific use cases that it does excel at doing, though. Think more like a garlic press. It’s a use case tool that is very specific for what it does and does that thing very well.

This video from Networking Field Day 13 is a great example of OpenFlow being used for a specific task. NEC’s flavor of OpenFlow, ProgrammableFlow, is used on conjunction with higher layer services like firewalls and security appliances to mitigate the spread of infections. That’s a huge win for networking professionals. Think about how hard it would be to track down these systems in a network of thousands of devices. Even worse, with the level of virulence of modern malware, it doesn’t take long before the infected system has infected others. It’s not enough to shut down the payload. The infection behavior must be removed as well.

What NEC is showing is the ultimate way to stop this from happening. By interrogating the flows against a security policy, the flow entries can be removed from switches across the network or have deny entries written to prevent communications. Imagine being able to block a specific workstation from talking to anything on the network until it can be cleaned. And have that happen automatically without human interaction. What if a security service could get new malware or virus definitions and install those flow entries on the fly? Malware could be stopped before it became a problem.

This is where OpenFlow will be headed in the future. It’s no longer about adapting the problems to fit the protocol. We can’t keep trying to frame the problem around how much it resembles a nail just so we can use the hammer in our toolbox. Instead, OpenFlow will live on as a point protocol in a larger toolbox that can do a few things really well. That means we’ll use it when we need to and use a different tool when needed that better suits the problem we’re actually trying to solve. That will ensure that the best tool is used for the right job in every case.


Tom’s Take

OpenFlow is still useful. Look at what Coho Data is using it for. Or NEC. Or any one of a number of companies that are still developing on it. But the fact that these companies have put significant investment and time into the development of the protocol should tell you what the larger industry thinks. They believe that OpenFlow is a dead end that can’t magically solve the problems they have with their systems. So they’ve moved to a different hammer to bang away with. I think that OpenFlow is going to live a very happy life now that people are leaving it to solve the problems it’s good at solving. Maybe one day we’ll look back on the first life of OpenFlow not as a failure, but instead as the end of the beginning of it become what it was always meant to be.