Throughout my career as a network engineer, I’ve heard lots of comparisons to emergency responders thrown around to describe what the networking team does. Sometimes we’re the network police that bust offenders of bandwidth polices. Other times there is the Network SWAT Team that fixes things that get broken when no one else can get the job done. But over and over again I hear network admins and engineers called “fire fighters”. I think it’s time to change how we look at the job of fires on the network.
Fight The Network
The president of my old company used to try to motivate us to think beyond our current job roles by saying, “We need to stop being firefighters.” It was absolutely true. However, the sentiment lacked some of the important details of what exactly a modern network professional actually does.
Think about your job. You spend most of your time implementing change requests and trying to fix things that don’t go according to plan. Or figuring out why a change six months ago suddenly decided today to create a routing loop. And every problem you encounter is a huge one that requires an “all hands on deck” mentality to fix it immediately.
People look at networks as a closed system that should never break or require maintenance of any kind until it breaks. There’s a reason why a popular job description (and Twitter handle) involves describing your networking job as a plumber or janitor. We’re the response personnel that get called out on holidays to fix a problem that creates a mess for other people that don’t know how to fix it.
And so we come to the fire fighter analogy. The idea of a group of highly trained individuals sitting around the NOC (or firehouse) waiting for the alarm to go off. We scramble to get the problem fixed and prevent disaster. Then go back to our NOC and clean up to do it all over again. People think about networking professionals this way because they only really talk to us when there’s a crisis that we need to deal with.
The catch with networking is that we’re very rarely sitting around playing ping pong or cleaning our gear. In the networking world we find ourselves being pushed from one crisis to the next. A change here creates a problem there. A new application needs changes to run. An old device created a catastrophe when a new one was brought online. Someone did something without approval that prevented the CEO from watching Youtube. The list is long and distinguished. But it all requires us to fix things at the end of the day.
The problem isn’t with the fire fighting mentality. Our society wouldn’t exist without firefighters. We still have to have protection against disaster. So how is it that we can have an ordered society with relatively few firefighters versus the kinds of chaos we see in networking?
Marshal, Marshal, Marshal
The key missing piece is the fire marshal. Firefighters put out fires. Fire marshals prevent them from happening in the first place. They do this by enforcing the local fire codes and investigating what caused a fire in the first place.
The most visible reminder of the job of a fire marshal is found in any bar or meeting room. There is almost always a sign by the door stating the maximum occupancy according to the local fire code. Occupancy limits prevent larger disasters should a fire occur. Yes, it’s a bit of bummer when you are told that no one else can enter the bar until some people leave. But the fire marshal would rather deal with unhappy patrons than with injuries or deaths in case of fire.
Other duties of the fire marshal include investigating the root cause of a fire and trying to find out how to prevent it in the future. Was is deliberate? Did it involve a faulty piece of equipment? These are all important questions to ask in order to find out how to keep things from happening all over again.
Let’s apply these ideas to the image of network professionals as firefighters. Instead of spending the majority of time dealing with the fallout from bad decisions there should be someone designated to enforce network policy and explain why those policies exist. Rather than caving to the whims of application developers that believe their program needs elevated QoS treatment, the network fire marshal can explain why QoS is tiered the way that it is and why including this new app in the wrong tier will create chaos down the road.
How about designating a network fire marshal to figure out why there was a routing table meltdown? Too often we find ourselves fixing the problem and not caring about the root cause so long as we can reassure ourselves that the problem won’t come back. A network fire marshal can do a post mortem investigation and find out which change caused the issue and how to prevent it in the future with proper documentation or policy changes.
It sounds simple and straightforward to many folks. Yet these are the things that tend to be neglected when the fires flare up and time becomes a premium resource. By having someone to fill the role of investigator and educator, we can only hope that network fires can be prevented before they start.
Networking can’t evolve if we spend the majority of our time dealing with the disasters we could have prevented with even a few minutes of education or caution. SDN seeks to prevent us from shooting off our own foot. But we can also help out by looking to the fire marshal as an example of how to prevent these network fires before they start. With some education and a bit of foresight we can reduce our workload of disaster response with disaster prevention. Imagine how much more time we would have to sit around the NOC and play ping pong?