I’ve always said that when I finally get tired of doing networking stuff, I’m going to write a book about teaching people structured troubleshooting. This would be useful for not only computer work, but a large variety of things like figuring out when your sink is backing up or how to change the serpentine belt on your car. The key to structured troubleshooting is thinking logically and taking it step by step. For the record, I’m going to call my book “Did You Plug the Damn Thing In?”
So, I figured I’d share some of my troubleshooting steps just to give people an idea about how I go about figuring out what’s wrong with something. Starting with how to know you’ve got a problem.
Step 1: Find out (specifically) what’s wrong.
Yes, that sounds simplistic and little condescending. And you’d be surprised at the number of people that can’t tell you what is wrong with something. Want to piss off your mechanic? Tell him your car is “making a funny noise”. Want to raise the ire of your network admin? Tell her the problem must be because “the network is slow”. Problems are specific. It’s not making a funny noise, it sounds like dragging a drawer full of silverware across a washboard. The network isn’t slow, it takes forever for me to open up http://www.google.com. Giving a problem a specific diagnosis means you can narrow your focus on what to start investigating. As the troubleshooter, you have to remember to ask probing questions. Don’t settle for ‘yes’ or ‘no’. Make the other person describe, in excruciating detail if necessary, what exactly is wrong. And take lots of notes. As the blog title suggests, the Apollo 13 astronauts didn’t tell Houston that their problem was a funny noise. It was a very specific diagnosis.
Step 2: Repeat the problem, if possible
Most problems are repeatable. There’s nothing worse that hearing that you’ve got a problem, only to be told “Well, it’s not happening right now. It’s pretty random.” No problem should be totally random, especially in the computer/technology realm. You should always know what steps it takes to make the problem happen. Granted, lots of catastrophic failures are sudden, such as power outages, blown fuses/power supplies, or outright parts failures. But, for the most part, things like routing loops or broadcast storms can be traced down to a specific event. Remember the old joke about the patient that tells the doctor, “Every time I move my arm like this it hurts”? All joking aside, that’s a great problem diagnosis. Because you can make it happen over and over again.
Step 3: Take LOTS of notes
I hate writing things down. My desk if filled with half-scribbled sticky notes with information that will be useless the next time I look at it. But when you start troubleshooting something, be sure to write down every detail. Why? Because you may often find yourself referring to your notes and find out that some minor detail at the start is the cause/solution to your problem. Such as, “We were cleaning the other day in the server room.” Not really important at the beginning, but after you determine there is a network outage to the e-mail server and find out someone unplugged the cord to make it look pretty under the desk, that little nugget of information could be very handy.
At this point, you should know exactly what you are facing and how to repeat it as necessary. You should also have a head start on documenting everything so you know when you’ve fixed the problem. Next time, we’ll explore how to get info about how to fix what ails you.
Pingback: Nobody Cares | The Networking Nerd
Pingback: What Is Root Cause Analysis? - Geek Speak - Resources & Events - THWACK