Complexity is the enemy of understanding. Think about how much time you spend in your day trying to simplify things. Complexity is the reason why things like Reddit’s Explain Like I’m Five exist. We strive in our daily lives to find ways to simplify the way things are done. Well, except in networking.
Building On Shifting Sands
Networking hasn’t always been a super complex thing. Back when bridges tied together two sections of Ethernet, networking was fairly simple. We’ve spent years trying to make the network do bigger and better things faster with less input. Routing protocols have become more complicated. Network topologies grow and become harder to understand. Protocols do magical things with very little documentation beyond “Pure Freaking Magic”.
Part of this comes from applications. I’ve made my feelings on application development clear. Ivan Pepelnjak had some great comments on this post as well from Steve Chalmers and Derick Winkworth (@CloudToad). I especially like this one:
Derick is right. The application developers have forced us to make networking do more and more faster with less requirement for humans to do the work to meet crazy continuous improvement and continuous development goalposts. Networking, when built properly, is a static object like the electrical grid or a plumbing system. Application developers want it to move and change and breathe with their needs when they need to spin up 10,000 containers for three minutes to run a test or increase bandwidth 100x to support a rollout of a video streaming app or a sticker-based IM program designed to run during a sports championship.
We’ve risen to meet this challenge with what we’ve had to work with. In part, it’s because we don’t like being the scapegoat for every problem in the data center. We tire of sitting next to the storage admins and complaining about the breakneck pace of IT changes. We have embraced software enhancements and tried to find ways to automate, orchestrate, and accelerate. Which is great in theory. But in reality, we’re just covering over the problem.
The solution to our software networking issues seems simple on the surface. Want to automate? Add a layer to abstract away the complexity. Want to build an orchestration system on top of that? Easy to do with another layer of abstraction to tie automation systems together. Want to make it all go faster? Abstract away!
“All problems in computer science can be solved with another layer of indirection.”
This is a quote from Butler Lampson often attributed to David Wheeler. It’s absolutely true. Developers, engineers, and systems builders keep adding layers of abstraction and indirection on top of complex system and proclaiming that everything is now easier because it looks simple. But what happens why the abstraction breaks down?
Automobiles are perfect example of this. Not too many years ago, automobiles were relatively simple things. Sure, internal combustion engines aren’t toys. But most mechanics could disassemble the engine and fix most issues with a wrench and some knowledge. Today’s cars have computers, diagnostics systems, and require lots of lots of dedicated tools to even diagnose the problem, let alone fix it. We’ve traded simplicity and ease of repairability the appearance of “simple” which conceals a huge amount of complexity under the surface.
To refer back to the Lampson/Wheeler quote, the completion of it is, “Except, of course, for the problem of too many indirections.” Even forty years ago it was understood that too many layers of abstraction would eventually lead to problems. We are quickly reaching this point in networking today. With all the reliance on complex tools providing an overwhelming amount of data about every point of the network, we find ourselves forced to use dashboards and data lakes to keep up with the rapid pace of changes dictated to the network by systems integrations being driven by developer desires and not sound network systems thinking.
Networking professionals can’t keep up. Just as other systems now must be maintained by algorithms to keep pace, so too does the network find itself being run by software instead of augmented by it. Even if people wanted to make a change they would be unable to do so because validating those changes manually would cause issues or interactions that could create havoc later on.
So how do we fix the issues? Can we just scrap it all and start over? Sadly, the answer here is a resounding “no”. We have to keep moving the network forward to match pace with the rest of IT. But we can do our part to cut down on the amount of complexity and abstraction being created in the process. Documentation is as critical as ever. Engineers and architects need to make sure to write down all the changes they make as well as their proposed designs for adding services and creating new features. Developers writing for the network need to document their APIs and their programs liberally so that troubleshooting and extension are easily accomplished instead of just guessing about what something is or isn’t supposed to be doing.
When the time comes to build something new, instead of trying to plaster over it with an abstraction, we need to break things down into their basic components and understand what we’re trying to accomplish. We need to augment existing systems instead of building new ones on top of the old to make things look easy. When we can extend existing ideas or augment them in such as way as to coexist then we can worry less about hiding problems and more about solving them.
Abstraction has a place, just like NAT. It’s when things spiral out of control and hide the very problems we’re trying to fix that it becomes an abomination. Rather than piling things on the top of the issue and trying to hide it away until the inevitable day when everything comes crashing down, we should instead do the opposite. Don’t hide it, expose it instead. Understand the complexity and solve the problem with simplicity. Yes, the solution itself may require some hard thinking and some pretty elegant programming. But in the end that means that you will really understand things and solve the complexity conundrum.
Tom — you should look at Navigating Network Complexity… 🙂
Very interesting post with a few things to discuss, get clarified, perhaps to dispute. I’ll say at the outset that I hope I’m not just putting up straw men arguments.
“It’s when things spiral out of control and hide the very problems we’re trying to fix that it becomes an abomination.”
Does this refer to the point about orchestration on top of automation? If so, how can the networking part of the CD/CI build be delivered at a cadence matching the rest of the pipeline without automation and orchestration on top? I’m not clear what the alternative might be, especially as we try to dig out from the mountain of technical debt accumulated over the last 10-15 years so we have enough flexibility to meet the pace of business requirements.
Networking, like information security, is traditionally viewed as a gating factor in application delivery. I’d argue that even more abstraction is needed so that the networking product can be delivered more rapidly using the emerging tools depending on APIs, automation, and orchestration. Many enterprises are struggling with infrastructure as code. When the concept gets to networking it generally comes to a screeching halt.
“But we can do our part to cut down on the amount of complexity and abstraction being created in the process.”
“Don’t hide it, expose it instead. Understand the complexity and solve the problem with simplicity. Yes, the solution itself may require some hard thinking and some pretty elegant programming.”
Really unclear what this means. Expose what, the underpinnings that would otherwise be abstracted away? It’s fine to understand the details of an implementation to be able to solve a problem, but the point of APIs is to allow the programming that I think you’re referring to without worrying about the details.
Last, I found the comment interesting about a business need requiring application developers to “spin up 10,000 containers for three minutes to run a test or increase bandwidth 100x to support a rollout of a video streaming app….”
How does infrastructure, including server and storage handlers along with networking, meet this need without the abstractions provided by APIs, automation, and orchestration? Of course, most shops with real physical infrastructure on site wouldn’t be able to comply with such a request anyway, which is a whole different subject.
Great points here Rich. Yes, some clarification of thought process is in order…
The Spiraling Abomination – It is true that some level of abstraction is going to be necessary to make new networking work. I’m all for creating a layer above our technical debt to accelerate what needs to happen. But what needs to not happen is having three, four, or five layers on top of that to keep making things easy for developers to just write code that says “do stuff with things”. At the nth level of abstraction, we’ve lost our ability to see what’s going on underneath. To illustrate that point, go try to figure out what a fuel injector does inside of that little box. That’s what happens when we abstract a concept past the point of being able to troubleshoot it.
Exposure – APIs work because we dispatch commands to do things in the system without understanding the details. But aside from that, the details need to be understood by someone along the line. If we get to the point where no one remembers why we call a particular function ahead of a packet move then we’ve hit a wall that needs to be torn down. If the reason for calling that function is basic functionality, like ensuring a socket is open or that a system is reachable, then that needs to be documented somewhere. Exposure can be as simple as saying “this is what I mean when I make calls against this API”.
The path of traditional IT is to try and meet the needs of users with the features of cloud. That’s why OpenStack and Docker are taking the data center by storm. Yes, existing IT infrastructure today is going to have a hard time meeting those challenges. But what if you are a development shop that cares less about the email server being up and more about running unit tests? You’ll find that for some enterprises, the migration to cloud has as much to do with not worrying about basic services like email and IM/chat as it does with gaining flexibility. Newer systems are rolling out with higher bandwidth connections and more dense server cores to meet the needs of specialized shops doing crazy development work in this manner.