IT And The Exception Mentality

If you work in IT, you probably have a lot in common with other IT people. You work long hours. You have the attitude that every problem can be fixed. You understand technology well enough to know how processes and systems work. It’s fairly common in our line of work because the best IT people tend to think logically and want to solve issues. But there’s something else that I see a lot in IT people. We tend to focus on the exceptions to the rules.

Odd Thing Out

A perfectly good example of this is automation. We’ve slowly been building toward a future when software and scripting does the menial work of network administration and engineering. We’ve invested dollars and hours into making interfaces into systems that allow us to repeat tasks over and over again without intervention. We see it in other areas, like paperwork processing and auto manufacturing. There are those in IT, especially in networking, that resist that change.

If you pin them down on it, sometimes the answers are cut and dried. Loss of job, immaturity of software, and even unfamiliarity with programming are common replies. However, I’ve also heard a more common response growing from people: What happens when the automation screws up? What happens when a system accidentally upgrades things when it shouldn’t? Or a system disappears because it was left out of the scripts?

The exception position is a very interesting one to me. In part, it stems from the fact that IT people are problem solvers. This is especially true if you’ve ever worked in support or troubleshooting. You only see things that don’t work correctly. Whether it’s software misconfiguration, hardware failure, or cosmic rays, there is always something that is acting screwy and it’s your job to fix it. So the systems that you see on a regular basis aren’t working right. They aren’t following the perfect order that they should be. Those issues are the exceptions.

And when you take someone that sees the exceptions all day long and tries to fix them, you start looking for the exceptions everywhere in everything. Instead of wondering how a queue moves properly at the grocery store you instead start looking at why people aren’t putting things on the conveyor belt properly. You start asking whey someone brought 17 items into the 15-items-or-less line. You see all the problems. And you start trying to solve the issues instead of letting the process work.

Let’s step back to our automation example. What happens when you put the wrong upgrade image on a device? Was there a control in place to prevent you from doing that? Did you go around it because you didn’t think it was the right way to do something? I’ve heard stories of the upgrade process not validating images because they couldn’t handle the errors if the image didn’t match the platform. So the process relies on a person to validate the image is correct in the first place. And that much human interaction in a system will still cause issues. Because people make mistakes.

Outlook on Failure

But whether or not a script is going to make an error doesn’t affect the way we look at them and worry. For IT people that are too focused on the details, the errors are the important part. They tend to forget the process. They don’t see the 99% of devices that were properly upgraded by the program. Instead, they only focus on the 1% that weren’t. They’re only interested in being proved right by their suspicions. Because they only see failed systems they need the validation that something didn’t go right.

This isn’t to say that focusing on the details is a bad thing. It’s almost necessary for a troubleshooter. You can’t figure out where a process breaks if you don’t understand every piece of it. But IT people need to understand the big picture too. We’re not automating a process because we want to create exceptions. We’re doing it because we want to reduce stress and prevent common errors. That doesn’t mean that all errors are going to go away overnight. But it does mean that we can prevent common ones from occurring.

That also means that the need for expert IT people to handle the exceptions is even more important. Because if we can handle the easy problems, like VLAN typos or putting in the wrong subnet range, you can better believe that the real problems are going to take a really smart person to figure out! That means the real value of an expert troubleshooter is using the details to figure out this exception. So instead of worrying about why something might go wrong, you can instead turn your attention to figuring out this new puzzle.


Tom’s Take

I know how hard it is to avoid focusing on exceptions because I’m one of the people that is constantly looking for them. What happens if this redistribution crashes everything? What happens if this switch isn’t ready for the upgrade? What happens if this one iPad can’t connect to the wireless. But I realize that the Brave New World of IT automation needs people that can focus on the exceptions. However, we need to focus on them after they happen instead of worrying about them before they occur.

Advertisements