Solve the Simple Problems

One thing I’ve found out over the past decade of writing is that some problems are easy enough to solve that we sometimes forget about them. Maybe it’s something you encounter once in a great while. Perhaps it’s something that needed a little extra thought or a novel reconfiguration of an existing solution. Something so minor that you didn’t even think to write it down. Until you run into the problem again.

The truth behind most of these simple problems is that the solutions aren’t always apparent. Sure, you might be a genius when it comes to fixing the network or the storage array. Maybe you figured out how to install some new software to do a thing in a way that wasn’t intended. But did you write any of it down for later use? Did you make sure to record what you’ve done so someone else can use it for reference?

Part of the reason why I started blogging was to have those written solutions to problems I couldn’t find a quick answer to. What it became was way more than I had originally intended. But the posts that I write that still get the most attention aren’t my long think pieces on the state of the networking industry or multiplied engineers. It’s the simple solutions to questions or problems that keep driving traffic here day after day.

Look Around

A lot of my great posts come from me asking simple questions. How does BPDUGuard work on a switch? Why does the Apple Watch not unlock my MacBook? What is up with this SFP not working? When you ask the questions you have to figure out the answers. And that’s the hard and rewarding part of the puzzle.

I challenge you to go search out a simple problem. Say it’s an issue with data not being shared between two devices. The search results will almost always turn up a few pages that have a litany of solutions that are basic troubleshooting steps. Things like:

  • Ensure the devices are connected
  • Reset the network settings
  • Unpair and repair the devices
  • Restart everything
  • Call Tech Support

You’ve probably stumbled across these before. And the sad truth is that running down that laundry list of solutions will often fix issues, which is why they keep getting boosted back into the search results. But you know what’s missing? They why of the problem. It’s not enough to just toss things at a problem in the hope that it starts working again.You have to also figure out what went wrong and why it happened.

Networking people always want to know why something went wrong because we want to make sure it doesn’t happen again. Security people are even more stringent about figuring out the why behind a problem. They want to stop a potential breach or plug a hole that needs to be dealt with. So to them a solution is just a temporary fix until you can confirm that something won’t happen again.

This is why the work that writers do is so important. We explain the why behind problems. We figure out what caused something to go off the rails and then how to fix it so it doesn’t happen again. Those are the kinds of posts that get the most attention. Because they’re specific about the fix, enlightening about the education behind the problem, and most importantly aren’t just a laundry list of fixes to throw at something until it works.


Tom’s Take

If you’re someone out there that’s looking to start writing down your solutions to problems, you need to start with the questions behind what’s going on. It’s not enough to just regurgitate the fixes and hope that one of them has some kind of magic that works. You need to investigate, understand, and explain what’s going on. Once you can do that, you will have created something that gets lots of attention and will encourage you to keep up the questions for years to come.

 

Troubleshooting and Triage

When troubleshooting any major issue, people tend to feel a bit lost at first.  There is the crowd that wants to fix the immediate problem.  Then there is the group that wants to look at everything going on and address the root problem no matter how long it takes.  The key to troubleshooting is to realize how each of these approaches has their place and how they are both right and wrong at the same time.

The first approach is triage.  Think of it like a medical emergency room.  Their purpose is to fix the immediate symptoms and stabilize the patient.  Especially critical is the stabilization part.  You can’t fix a network that has bouncing routes or intermittent bridging loops.  Often the true root cause of the problem is buried beneath a pile of other symptoms.  Only when the immediate issues are resolved does the real problem surface.  Learning how to triage problems is a very important troubleshooting skill.  It gives a quick response while allowing the worst of the issue to be dealt with.

It’s important to remember that triage is just a quick fix.  Emergency rooms would never triage a patient without following up with a more in-depth consult or return visit.  Triage fails when engineers leave the patch in place and consider it the final solution.  Most times that I’ve seen this approach have been due to time constraints.  Rather than spending the time to research and test to find the true problem people are content to make the majority of the symptoms go away no matter how briefly.  It happens all the time.

“Just make it work for now.  We’ll fix it later.”

“If we configure it like this, will it stay up until the end of the quarter?”

“We don’t have time to debate this.  The CEO wants things up NOW!”

True in-depth troubleshooting is what happens when we have time and a clear way to solve the deeper root issues.  Deep troubleshooting figures out that the cause of a route flap is actually a bad Ethernet cable.  That’s not something you can easily determine from a quick analysis.  It takes time and effort to figure out.  When I worked on an inbound desktop help desk, we tested for CD-ROM failures by flipping the IDE cables back and forth on the IDE ports on the motherboard.  In part, this was to test to ensure the drive failure followed the switch of cables and ports.  In addition, it also tested the cable and port to make sure the dead drive wasn’t masking a bigger failure.  It took more time to do it properly but we never ran into an issue where a good CD-ROM drive was returned and the problem persisted.

In-depth troubleshooting can fail when there are so many problems masking the real issue that you start trying to fix the wrong problem.  Tunnel vision is easy to get when working on a problem.  If you tunnel in on an ancillary symptom and fail to fix the root cause you aren’t really doing much better than simple triage.  Just like a doctor, you need to ensure that you are treating the real problem under all the symptoms.  Remember not to be sidetracked by each small issue you uncover.  Fix them and keep digging for the real issue underneath it all.


Tom’s Take

I’ve had a lot of people comment that I was able to figure out problems quickly.  They also liked how I was able to “fix” things quickly.  That’s because I was very good at triage.  In my job as a VAR engineer, I didn’t really have time to dig deeper into the issue to uncover root cause.  Thankfully, a couple of the guys that I worked with were the exact opposite of me.  They loved digging into problems and pulling everything apart until they found the real issue.  They were labeled “slow” or “methodical” by some.  I loved working with them because the complemented my style perfectly.  I fix the big issues and make people happy.  They fix the underlying cause and keep them that way.  Just like ER doctors and specialists.  We both have our place.  It’s important to realize which is more important at a given time.