Network Firefighters or Fire Marshals?


Throughout my career as a network engineer, I’ve heard lots of comparisons to emergency responders thrown around to describe what the networking team does. Sometimes we’re the network police that bust offenders of bandwidth polices. Other times there is the Network SWAT Team that fixes things that get broken when no one else can get the job done. But over and over again I hear network admins and engineers called “fire fighters”. I think it’s time to change how we look at the job of fires on the network.

Fight The Network

The president of my old company used to try to motivate us to think beyond our current job roles by saying, “We need to stop being firefighters.” It was absolutely true. However, the sentiment lacked some of the important details of what exactly a modern network professional actually does.

Think about your job. You spend most of your time implementing change requests and trying to fix things that don’t go according to plan. Or figuring out why a change six months ago suddenly decided today to create a routing loop. And every problem you encounter is a huge one that requires an “all hands on deck” mentality to fix it immediately.

People look at networks as a closed system that should never break or require maintenance of any kind until it breaks. There’s a reason why a popular job description (and Twitter handle) involves describing your networking job as a plumber or janitor. We’re the response personnel that get called out on holidays to fix a problem that creates a mess for other people that don’t know how to fix it.

And so we come to the fire fighter analogy. The idea of a group of highly trained individuals sitting around the NOC (or firehouse) waiting for the alarm to go off. We scramble to get the problem fixed and prevent disaster. Then go back to our NOC and clean up to do it all over again. People think about networking professionals this way because they only really talk to us when there’s a crisis that we need to deal with.

The catch with networking is that we’re very rarely sitting around playing ping pong or cleaning our gear. In the networking world we find ourselves being pushed from one crisis to the next. A change here creates a problem there. A new application needs changes to run. An old device created a catastrophe when a new one was brought online. Someone did something without approval that prevented the CEO from watching Youtube. The list is long and distinguished. But it all requires us to fix things at the end of the day.

The problem isn’t with the fire fighting mentality. Our society wouldn’t exist without firefighters. We still have to have protection against disaster. So how is it that we can have an ordered society with relatively few firefighters versus the kinds of chaos we see in networking?

Marshal, Marshal, Marshal

The key missing piece is the fire marshal. Firefighters put out fires. Fire marshals prevent them from happening in the first place. They do this by enforcing the local fire codes and investigating what caused a fire in the first place.

The most visible reminder of the job of a fire marshal is found in any bar or meeting room. There is almost always a sign by the door stating the maximum occupancy according to the local fire code. Occupancy limits prevent larger disasters should a fire occur. Yes, it’s a bit of bummer when you are told that no one else can enter the bar until some people leave. But the fire marshal would rather deal with unhappy patrons than with injuries or deaths in case of fire.

Other duties of the fire marshal include investigating the root cause of a fire and trying to find out how to prevent it in the future. Was is deliberate? Did it involve a faulty piece of equipment? These are all important questions to ask in order to find out how to keep things from happening all over again.

Let’s apply these ideas to the image of network professionals as firefighters. Instead of spending the majority of time dealing with the fallout from bad decisions there should be someone designated to enforce network policy and explain why those policies exist. Rather than caving to the whims of application developers that believe their program needs elevated QoS treatment, the network fire marshal can explain why QoS is tiered the way that it is and why including this new app in the wrong tier will create chaos down the road.

How about designating a network fire marshal to figure out why there was a routing table meltdown? Too often we find ourselves fixing the problem and not caring about the root cause so long as we can reassure ourselves that the problem won’t come back. A network fire marshal can do a post mortem investigation and find out which change caused the issue and how to prevent it in the future with proper documentation or policy changes.

It sounds simple and straightforward to many folks. Yet these are the things that tend to be neglected when the fires flare up and time becomes a premium resource. By having someone to fill the role of investigator and educator, we can only hope that network fires can be prevented before they start.

Tom’s Take

Networking can’t evolve if we spend the majority of our time dealing with the disasters we could have prevented with even a few minutes of education or caution. SDN seeks to prevent us from shooting off our own foot. But we can also help out by looking to the fire marshal as an example of how to prevent these network fires before they start. With some education and a bit of foresight we can reduce our workload of disaster response with disaster prevention. Imagine how much more time we would have to sit around the NOC and play ping pong?


The Splinter In Your Mind


You don’t know what it is, but it’s there, like a splinter in your mind, driving you mad. – Morpheus

We’ve all had a moment when we’re troubleshooting an issue and something just doesn’t feel right.  Or we’ve put together a solution and it works but there’s a little voice in the back of your mind telling you that something is missing.  You can’t quite put your finger on it but you know you can’t rest until you’ve figured it out.

The quote above comes from the first Matrix movie, when Morpheus is trying to explain to Thomas Anderson exactly why he feels so out of place in the world.  The term actually predates that movie, having been the title of an excellent Star Wars novel.  Splinter of the Mind’s Eye was released in 1978, so it’s almost as old as I am!  The term describes that feeling you get when something is nagging away at you and won’t let go.  I get it quite often.  Sometimes I recognize a person but can’t remember who they are.  Other times I can’t remember a critical step to a project.  But it comes very often when I’m troubleshooting a problem and the solution is just agonizingly out of reach.

How do you combat the splinter?  What can you do to overcome that feeling that will drive you crazy in short order?  Here are a few things I do.  Sometimes they help, sometimes they don’t.  But the idea is to try and dislodge the splinter and get your thought process rolling again.

Think It Through

This is probably my favorite solution.  When I’m faced with a tough problem and an elusive solution, my first step is to walk through the problem step by step.  If it’s a routing loop, I talk my way through the installation of the route into the routing table.  If the problem is a layer 2 issue, I think through the packet as it goes through the network.  The key is that you envision every step along the way.  Often our minds get distracted by an unimportant step and leave it out.  By going back through and thinking of every piece you often force an overlooked concern to the surface.  This can cause the splinter to move and create a new line of thinking.  Perhaps that routing loop is being caused by a redistribution?  Maybe you didn’t know the network used to run RIP and now is running OSPF.  By imagining the packet moving through the network, you can understand where the problem can occur.

You can also speak out loud when thinking things through.  I find it very useful to actually speak the words as I’m thinking.  That’s because my brain runs much faster than my mouth.  By forcing myself to put the thoughts into words, I can usually slow things down long enough to figure out the missing steps.

Draw It Out

If the problem is a little more nebulous, you might need a piece of paper or a whiteboard to draw things.  I’m not the best artist in the world, but I know that a crude diagram of what I’m thinking about will help me visualize things in a new way.  Maybe I forgot to add a piece to the drawing that fixes the issue.  Other times it just helps me think about the problem.  By filtering the splinter in your mind through another creative process like drawing, you can force it out in a different way.  I was very fond of this at my old job when I had a wall-sized whiteboard.  There’s no reason you can’t do it with a regular piece of paper though.  Colored pencils or markers can also help peel apart the layers of the issue.

Forget About It

Yes, it is strange advice to just forget about a problem.  But, ask yourself how many times you’ve stumbled onto the solution when you’re taking a shower or just about to fall asleep?  The brain is a miraculous computer, but sometimes it has a focus problem.  If you think about something for too long, you can get fatigued and lose your ability to apply critical reasoning.  It’s a “forest for the trees” kind of issue.

I always made it a point when I was troubleshooting a really hard problem to walk away for a few minutes.  Whether it was stepping out to get something to eat or just walking into a conference room for five minutes, I always tried to find some time to clear my mind and refocus on the situation.  By thinking about a shopping list or an order form or even the batting order of the 1962 Yankees you can jar the splinter loose and create new connections.  I always joked with my coworkers that the most efficient way to solve high severity issues was to install a shower in my office.  They didn’t find it nearly as funny as I did.

Tom’s Take

I can’t promise that these solutions are going to fix that nagging feeling in the back of your mind.  Some problems are just that tough.  But when you’ve applied every bit of critical reasoning you can to an issue and you’ve reached the point where your stuck but just can’t let go, sometimes it helps to apply one of the above methods.

If you let the splinter fester in the back of your mind, you’ll constantly be asking yourself what you can do or what you need to look at to fix things.  It will eventually consume you if you let it.  Instead, you should look at a way to move the splinter.  If you can do that you’ll sleep better at night.

Troubleshooting and Triage

When troubleshooting any major issue, people tend to feel a bit lost at first.  There is the crowd that wants to fix the immediate problem.  Then there is the group that wants to look at everything going on and address the root problem no matter how long it takes.  The key to troubleshooting is to realize how each of these approaches has their place and how they are both right and wrong at the same time.

The first approach is triage.  Think of it like a medical emergency room.  Their purpose is to fix the immediate symptoms and stabilize the patient.  Especially critical is the stabilization part.  You can’t fix a network that has bouncing routes or intermittent bridging loops.  Often the true root cause of the problem is buried beneath a pile of other symptoms.  Only when the immediate issues are resolved does the real problem surface.  Learning how to triage problems is a very important troubleshooting skill.  It gives a quick response while allowing the worst of the issue to be dealt with.

It’s important to remember that triage is just a quick fix.  Emergency rooms would never triage a patient without following up with a more in-depth consult or return visit.  Triage fails when engineers leave the patch in place and consider it the final solution.  Most times that I’ve seen this approach have been due to time constraints.  Rather than spending the time to research and test to find the true problem people are content to make the majority of the symptoms go away no matter how briefly.  It happens all the time.

“Just make it work for now.  We’ll fix it later.”

“If we configure it like this, will it stay up until the end of the quarter?”

“We don’t have time to debate this.  The CEO wants things up NOW!”

True in-depth troubleshooting is what happens when we have time and a clear way to solve the deeper root issues.  Deep troubleshooting figures out that the cause of a route flap is actually a bad Ethernet cable.  That’s not something you can easily determine from a quick analysis.  It takes time and effort to figure out.  When I worked on an inbound desktop help desk, we tested for CD-ROM failures by flipping the IDE cables back and forth on the IDE ports on the motherboard.  In part, this was to test to ensure the drive failure followed the switch of cables and ports.  In addition, it also tested the cable and port to make sure the dead drive wasn’t masking a bigger failure.  It took more time to do it properly but we never ran into an issue where a good CD-ROM drive was returned and the problem persisted.

In-depth troubleshooting can fail when there are so many problems masking the real issue that you start trying to fix the wrong problem.  Tunnel vision is easy to get when working on a problem.  If you tunnel in on an ancillary symptom and fail to fix the root cause you aren’t really doing much better than simple triage.  Just like a doctor, you need to ensure that you are treating the real problem under all the symptoms.  Remember not to be sidetracked by each small issue you uncover.  Fix them and keep digging for the real issue underneath it all.

Tom’s Take

I’ve had a lot of people comment that I was able to figure out problems quickly.  They also liked how I was able to “fix” things quickly.  That’s because I was very good at triage.  In my job as a VAR engineer, I didn’t really have time to dig deeper into the issue to uncover root cause.  Thankfully, a couple of the guys that I worked with were the exact opposite of me.  They loved digging into problems and pulling everything apart until they found the real issue.  They were labeled “slow” or “methodical” by some.  I loved working with them because the complemented my style perfectly.  I fix the big issues and make people happy.  They fix the underlying cause and keep them that way.  Just like ER doctors and specialists.  We both have our place.  It’s important to realize which is more important at a given time.

IT Jugglers

Juggle Balls

I once interviewed for a job where the interviewer asked how I decided to work on tasks. He said, “There are two kinds of workers. The first concentrates on a task and does nothing else until it is completed. They can only do one thing at a time. Then, there are the jugglers. Which one are you?” When I responded that I tended toward the latter, the interviewer smiled.  That was obviously the answer he was looking for.

IT is very much defined by focus. Being able to work on a project until it is totally finished is a very admirable quality to be desired. In my experience, especially in the VAR world, it is equally as important to be able to shift your focus quickly to other tasks that require attention. As indicated above, it’s not unlike juggling. Being able to focus on a project for a few hours or days and then move to a different project for a few hours can be a very critical skill for high level engineers.

Technology has been doing this for years. Think about a preemptive multitasking CPU. It appears to be many things at once. It’s really executing instructions for a given process for a period of time (a timeslice). Because you can process enough instructions in that time to accomplish a function it all appears to work like magic. The key is to tune the processor to use the right timeslices. If the timeslice is too long the processor will sit idle waiting for the program to generate new instructions. If the time slice is too short the program won’t be able to execute enough instructions during the window and the program will appear unresponsive. Just like a juggler, it’s all about the timing.

Choosing what to juggle in IT is almost as important as knowing how to do it. When you are just starting out with juggling, you use safe, soft objects to contain the damage. You don’t start off with chainsaws and molotov cocktails. When juggling IT projects, be sure to juggle those that don’t have hard deadlines or require critical path updates on a regular basis. If you’re required to provide a weekly update on an installation, be sure you’ve allocated enough time during the week to do something. Otherwise, that weekly installation report is going to look pretty thin.

When learning to juggle, most people spend entirely too much time worrying about the ball in their hand.  They tend to lose focus of all the other objects floating in the air.  That’s why they tend to start dropping them.  In the same way, you can’t be so dialed in on one project that you completely neglect all the other things going on.  Finding a good point to stop one task and start working on another is a very fine art.

This isn’t for everyone.  If you’re a person that can’t shift focus fast enough to keep all the balls (or projects) in the air without dropping something, you should avoid working on many things at once.  There’s no shame in having laser focus on something.  It works well for a lot of folks.  It gets hard things done right.  It’s just another way to do get the job done.

Tom’s Take

I’m a juggler.  I try to keep everything going at once while I wrap up what I can.  I do my best to avoid dropping things, but something slips through from time to time.  I also taught myself to juggle in real life.  I can keep three tennis balls going with no issues.  I realize my limitations, though.  I know that more than that is too many.  In the project space, I know that having more than I can handle is bad for everything, so I try to keep my focus on a manageable about of juggled things.  It’s better to juggle a few things well than juggle an impressive number of things poorly.  I’ll let you know when I work my way up to the chainsaws.

Solarwinds – The Right Tool For A New Job


The first presentation of Networking Field Day 5 day 2 was from our old friends at Solarwinds.  We heard from them before at NFD3, but the nice thing about Solarwinds is that they’ve always got new tools coming out.  I’ve also served as a Thwack Ambassador on their forums and been featured as an IT Spotlight Blogger.  I wanted to see what Solarwinds would bring to the table at NFD5.

The geeks from Solarwinds started out with a quick overview of the tool portfolio.  One thing to take note of: most of the tools that you use a standalone products are actually integrated into the larger Orion platform.  Solarwinds makes some of them available as free downloads for trials or point solutions.  You can get all of them together in one big toolbox, provided you have the horsepower to run it all.  It tend to lean more toward the “right tool, right job” mentality rather than getting the whole box.  For every IP SLA monitor crescent wrench I use regularly, there are a multitude of metric socket sets and emergency break tools that I may never even touch.  That’s why it’s great when Solarwinds makes their software available to all for only the investment of a registration.

You’ll also notice in the video around 20 minutes in, I mention something about Solarwinds and SDN.  Colin McNamara (@colinmacnamara) chided me a bit about “SDN washing” of their technology.  Colin does have a point about overuse of SDN to describe everything under the sun.  Sanjay Castelino even made a post to the effect that what Solarwinds is doing isn’t SDN.  In a sense, he’s right.  These tools aren’t network programmability or overlay networking or even automation.  To me though, a part of what Solarwinds is doing falls under the SDN spectrum in that they can program different devices from a single interface.  Sure, it’s not the sexy sports car idea of network slicing and service instantiation that others are looking at.  Even the ability to quickly configure devices and pull pertinent info from them is better than some of what we’ve got going on right now.  This software allows you to define parameters and configuration in your network.  That’s SDN of some flavor to me.  Maybe not mocha SDN with sprinkles but something a bit different.

This led to a bit of a derailment of the conversation.  The delegates seized on the Solarwinds development model of “giving the customers what they want.”  I’d heard this many times before, so it wasn’t necessarily new to me.  What’s key to me in that message is that you’re going to have a lot of content customers.  Not necessarily happy, but content.  The key difference to me comes from the model.  If you give the customers what they want, they will be pacified.  All their desires are met and the can do their jobs.  However, if you can break outside of the demand-based model and show them something they never knew they needed, you have a real chance to make them deliriously happy.  Think about something like the iPad.  Did we know we needed it before it was released?  Not likely.  Now think about how many people have jumped at the chance to own a tablet device.  If those companies had simply been giving their customers what they asked for think about the market that would have been missed.  I’m not saying that Solarwinds is doing a bad job by any means.  I just think they need to get a geek in the house working on crazy stuff that will make people say “holy cow!!!”

Solarwinds talked to us about their newest network monitoring pieces.  They’ve got some very interesting tools, including Network Performance Monitor.  There was also some discussion around their IP Address Managment (IPAM) tool, which is what I wrote about during my Thwack Ambassadorship.  Thankfully, we had Terry Slattery in the room.  Terry loves the network monitoring discussions, having founded Netcordia and release NetMRI for that purpose before it was purchased by Infoblox.  Terry has seen a lot, and he’s not afraid to tell you what he thinks.  When we discussed the features of User Device Tracker (UDT), he asked if it can do a time-based report on unused switch ports.  When the answer wasn’t clear, he told the geeks, “If you can’t do that, you need to write that down.” We all had a couple of good jokes at their expense, but that fact is that when Terry tells you something is important, especially when it comes to network monitoring the chances are it’s really important.

Solarwinds is also getting into the API game with SWIS – Solarwinds Information Service.  This SOAP interface (soon to be REST) gives you the ability to write programs to pull data from the network and insert/update the same in many devices.  See what I’m talking about with SDN and the ability to pull info from the network and push it back again?  I think Solarwinds really needs to focus their efforts in this area and drive some more programmability from their tools rather than the old methods of just hiding CLI command pushes and things of that nature.  By allowing users to code to an API, you’ve just abstracted all of the icky parts of the backend away and focused the conversation where it needs to be – on getting problems solved.

If you’d like to learn more about Solarwinds, be sure to check them out at  You can also follow them on Twitter as @solarwinds.  Be sure to check out their dicussion forums at

Tom’s Take

Solarwinds has awesome tools.  They’re going to have awesome tools in the future.  But they’ve hit on some pieces of the puzzle that are going to do much more than that.  Beyond giving us a toolbox with fancy handles and shiny stickers, they’ve started to do what a lot of other people have done and give us designs for what we should build with the tools they’ve given us.  By expanding into that area of allowing us to program to APIs and put the pieces into a bigger context, they have the ability to transcend being a point product vendor releasing neat toys.  When you can be a meaningful discussion point in any monitoring and management meeting without being dismissed as just a niche player, that’s handy indeed.

Tech Field Day Disclaimer

Solarwinds was a sponsor of Network Field Day 5.  As such, they were responsible for covering a portion of my travel and lodging expenses while attending Network Field Day 5.  In addition, Solarwinds provided me with breakfast at the hotel.  They also gave the delegates a t-shirt and a messenger bag, along with all the stickers and buttons we could fit into our carry ons.  At no time did they ask for, nor where they promised any kind of consideration in the writing of this review.  The opinions and analysis provided within are my own and any errors or omissions are mine and mine alone.

Why Is My SFP Not Working?


It’s 3 am. You’ve just finished installing your new Catalyst switches into the rack and you’re ready to turn them up and complete your cutover. You’ve been fighting for months to get the funding to get these switches so your servers can run at full gigabit speed. You had to cut some corners here and there. You couldn’t buy everything new, so you’re reusing as much of your old infrastructure as possible. Thankfully, the last network guy had the foresight to connect the fiber backbone at gigabit speeds. You turn on your switches and wait for the interminably long ASIC and port tests to complete. As you watch the console spam scroll up on your screen, you catch sight of something that makes your blood run cold:

%GBIC_SECURITY_CRYPT-4-VN_DATA_CRC_ERROR: GBIC in port 65586 has bad crc
 %PM-4-ERR_DISABLE: gbic-invalid error detected on Gi1/0/50, putting Gi1/0/50 in err-disable state

Huh?!? Why aren’t my fiber connections coming up? Am I going to have to roll the install back? What is going on here?!?

You will see this error message if you have a third party SFP inserted into the Catalyst switch. While Cisco (and many others) OEM their SFP transceivers from different companies, they all have a burned-in chip that contains info such as serial number, vendor ID, and security info like a Cyclic Redundancy Check (CRC). If any of this info doens’t match the database on the switch, the OS will mark the SFP as not supported and disable the port. The fiber connection won’t come up and you’ll find yourself screaming at terminal window at 3:30 in the morning.

Why do vendors do this? Some claim it’s vendor lock in. You are stuck ordering your modules from the vendor at an inflated cost instead of buying them from a different source. Others claim it’s to help TAC troubleshoot the switch better in case of a failure. Still others say that it’s because the manufacturing tolerances on the vendor SFPs is much better than the third party offerings, even from the same OEM. I don’t have the answer, but I can tell you that Cisco, HP, Dell, and many others do this all the time.

HP is the most curious case that I’ve run into. Their old series A SFP modules (HP calls them mini-GBICs) didn’t even have an HP logo. They bore the information from Finisar, an electroics OEM. The above scenario happened to me when I traded out a couple of HP 2848 swtiches for some newer 2610s. The fiber ports locked up solid and would not come alive for anything. I ended up putting the old switches back in place as glorified fiber media converters until I figured out that new SFPs were needed. While not horribly expensive, it did add a non-trivial cost to my project, not to mention all the extra hours of troubleshooting and banging my head against a wall.

Cisco has an undocumented and totally unsupported solution to this problem. Once you start getting the console spam from above, just enter these commands:

service unsupported-transceiver
no errdisable detect cause gbic-invalid

These commands are both hidden, so you can’t ? them. When you enter the first command, you get the Ominous Warning Message of Doom:

Warning: When Cisco determines that a fault or defect can be traced to the use of third-party transceivers installed by a customer or reseller, then, at Cisco’s discretion, Cisco may withhold support under warranty or a Cisco support program. In the course of providing support for a Cisco networking product Cisco may require that the end user install Cisco transceivers if Cisco determines that removing third-party parts will assist Cisco in diagnosing the cause of a support issue.

It goes without saying that calling TAC with a non-Cisco SFP in the slot is going to get you an immediate punt or request to remove said offending SFP. You’ll likely argue that your know the issue isn’t with the SFP that was working just fine an hour ago. They will counter with not being able to support non-Cisco gear. You’ll complain that removing the SFP will create additional connectivity issues and eventually you’ll hang up in frustration. So, don’t call TAC if you use this command. In fact, I would counsel that you should only use this command as a short term band-aid to get your out of the data center at 3 am so you can order genuine SFPs the next morning. Sadly, I also know how budgets work and how likely you are to get several hundred dollars of extra equipment you “forgot” to order. So caveat implementor.

Data Never Lies


If you’ve been watching the media in the last couple of weeks, you’ve probably seen the spat that has developed between John Broder of the New York Times and Elon Musk of Tesla Motors.  Broder took a Tesla Model S sedan on a test drive from New Jersey to Connecticut to test out the theory that the new supercharger stations that have been installed along the way would help electric cars to take long road trips without fear of running out of electricity.  Along the way, he ran into some difficulty and ultimately needed to have the car towed to a charging station.  After the story came out, Elon Musk immediately defended his product with a promise of data to support that assertion.  A couple of days later, he put up a long post on the Tesla blog with lots of charts, claiming that the Model S had lots of data to support longer driving distances, failure to fully charge at supercharger stations, and even that Broder was driving in circles in a parking lot.  After this post, Broder responded with another post of his own clarifying the rebuttal made by Musk and reaffirming how the test was carried out.  It’s certainly made for some interesting press releases and blog posts.  There has also been a greater discussion about how we present facts and dat in a case to support our argument or prove the other party is wrong.

Data Doesn’t Lie

If nothing else, Elon Musk did the right thing by attaching all manner of charts and graphs to his blog post.  He provided data (albeit collated and indexed) from the vehicle that gave a more precise picture of what went on than the recollection of a reporter that admittedly didn’t remember what he did or didn’t do during portions of the test drive.  Data never lies.  It’s a collection of facts and information that tells a single story.  If equals 7, there’s no other thing that could be.  However, the failing in data usually doesn’t come from the data itself.  It comes from interpretation.

Data Doesn’t Lie.  People Do.

The problem with the Elon Musk post is that he used the data to support his assertion that Broder did things like taking a long detour through Manhattan and driving in circles for half a mile in a parking lot in an attempt to force the car to completely discharge its battery.  This is the part where the narrative starts to break down and where most critics are starting their analysis.  Musk was right to include the data.  However, the analysis he offers is a bit wild.  Does rapid acceleration and deceleration over a short span of distance mean Broder was driving in circles attempting to drain the car?  Or was he lost in the dark, trying to find the charging station in the middle of the night like he claims in his rebuttal?  The data can only tell us what the car did.  It can’t explain the intentions of someone that wasn’t being monitored by sensors.

Let The Data Do The Talking

How does this situation apply to us in the networking/virtualization/IT world?  We find ourselves adrift in a sea of data.  We have protocols providing us status information and feeding us statistics around the clock.  We have systems that will correlate that data and provide a big picture.  We have system to aggregate the correlated data and sort it into action items and critical alert levels.  With all this data, it’s very easy for us to make assumptions about what we see.  The human brain wants to make patterns out of what we see in front of us.  The problem comes when the conclusion we reach is incorrect.  We may have a preconceived notion of what we want the data to say.  Sometimes its confirmation bias.  Other times its reporting bias.  We come to incorrect conclusions because we keep trying to make the data tell our story instead of listening to what the data tells us.  Elon Musk wanted the data to tell him (and us) that his car worked just fine and that the driver must have had some ulterior motive.  John Broder used the same data to support that while his recollection of some finer details wasn’t accurate in the original article, he harbored no malice during his test.  The data didn’t lie in either case.  We just have to decide who’s story is more accurate.

Tom’s Take

The smartest thing that you can do when providing network data or server statistics is leave your opinion out of it.  I make it a habit to give all the data I can to the person requesting it before I ever open my mouth.  Sure, people pay me to look at all that information and make sense of it.  Yes, I’ve been biased in my conclusions before.  I realize that I’m nowhere near neutral in many of my interpretations, whether it be defending the actions of myself or my team or using the data to support the correctness of a customer’s assumptions.  The key to preventing a back-and-forth argument is to simply let the data do all the talking for you.  If the data never lies, it can’t possibly lose the argument.  Let the data help you.  Don’t make the data do your dirty work for you.