The Network Does Too Much


I’m at Networking Field Day this week and it’s good to be back in person around other brilliant engineers and companies. One of the other fun things that happens at Networking Field Day is that I get to chat with folks that help me think about things in new ways and come up with awesome ideas for networking blog posts.

One of the ones that was discussed quickly this week really got me thinking again about fragility and complexity. Thanks to Carl Fugate for reminding me about it. Essentially, networks are inherently unstable because they are doing far too much heavy lifting.

Swiss Army Design

Have you heard about the AxeSaw Reddit? It’s a page dedicated to finding silly tools that attempt to combine too many things together into one package that make the overall tool less useful. Like making a combination shovel and axe that isn’t easy to operate because you have to hold on to the shovel scoop as the handle for the axe and so on. It’s a goofy take on a trend of trying to make things too compact at the sake of usability.

Networking has this issue as well. I’ve talked about it before here but nothing has really changed in the intervening time since that post five years ago. The developers that build applications and systems still rely on the network to help solve challenges that shouldn’t be solved in the network. Things like first hop reachability protocols (FHRP) are perfect examples. Back in the dark ages, systems didn’t know how to handle what happened when a gateway disappeared. They knew they needed to find a new one eventually when the MAC address age timed out. However, for applications that couldn’t wait there needed to be a way to pick up the packets and keep them from timing out.

Great idea in theory, right? But what if the application knew how to handle that? Things like Cisco CallManager have been able to designate backup servers for years. Applications built for the cloud know how to fail over properly and work correctly when a peer disappears or a route to a resource fails. What happened? Why did we suddenly move from a model where you have to find a way to plan for failure with stupid switching tricks to a world where software just works as long as Amazon is online?

The answer is that we removed the ability for those stupid tricks to work without paying a high cost. You want to use Layer 2 tricks to fake NHRP? If it’s even available in the cloud you’re going to be paying a fortune for it. AWS wants you to use their tools that they optimize for. If you want to do things the way you’ve always done you can but you need to pay for that privileges.

With cost being the primary driver for all things, increased costs for stupid switching tricks have now given way to better software development. Instead of paying thousands of dollars a month for a layer 2 connection to run something like HSRP you can instead just make the application start searching for a new server when the old one goes away. You can write it to use DNS instead of IP so you can load balance or load share. You can do many, many exciting and wonderful things to provide a better experience that you wouldn’t have even considered before because you just relied on the networking team to keep you from shooting yourself in the foot.

Network Complexity Crunch

If the cloud forces people to use their solutions for reliability and such, that means the network is going to go away, right? It’s the apocalypse for engineer jobs. We’re all going to get replaced by a DevOps script. And the hyperbole continues on.

Reality is that networking engineers will still be as needed as highway engineers are even though cars are a hundred times safer now than in the middle of the 20th century. Just because you’ve accommodated for something that you used to be forced to do doesn’t mean you don’t need to build the pathway. You still need roads and networks to connect things that want to communicate.

What it means for the engineering team is an increased focus on providing optimal reliable communications. If we remove the need to deal with crazy ARP tricks and things like that we can focus on optimizing routing to provide multiple paths and ensure systems have better communications. We could even do crazy things like remove our reliability of legacy IP because the applications will survive a transition when they aren’t beholden to IP address or ARP to prevent failure.

Networking will become a utility. Like electricity or natural gas it won’t be visible unless it’s missing. Likewise, you don’t worry about the utility company solving issues about delivery to your home or business. You don’t ask them to provide backup pipelines or creative hacks to make it work. You are handed a pipeline that has bandwidth and the service is delivered. You don’t feel like the utility companies are outdated or useless because you’re getting what you pay for. And you don’t have to call them every time the heater doesn’t work or you flip a breaker. Because that infrastructure is on your side instead of theirs.


Tom’s Take

I’m ready for a brave new world where the network is featureless and boring. It’s an open highway with no airbags along the shoulder to prevent you from flying off the road. No drones designed to automatically pick you up and put you on the correct path to your destination if you missed your exit. The network is the pathway, not the system that provides all the connection. You need to rely on your systems and your applications to do the heavy lifting. Because we’ve finally found a solution that doesn’t allow the networking team to save the day we can absolutely build a world where the clients are responsible for their own behavior. The network needs to do what it was designed to do and no more. That’s how you solve complexity and fragility. Less features means less to break.

4 thoughts on “The Network Does Too Much

  1. This topic is systematic across our industry, not just limited to networking. Applications and Services are often written with no consideration for availability and rely upon the underlying infrastructure to pick up the pieces. Other examples include shared disk clustering, VMware Fault Tolerance, live migration of VMs [the list could go on]

    In addition to the above, many of us in the past could be accused of engineering the complexity at the infrastructure level because either a) we wanted to build bigger and better rather than simplifying and pushing back, making it someone else’s problem (eg the application) or b) we did not know any better at the time (or a combination of both)

  2. Pingback: The Network Does Too Much - Tech Field Day

  3. Pingback: [FI] Tietoliikennealan katsaus 2022-01 – loopback1.net

Leave a comment