Do you have a door that sticks in your house? If it’s made out of wood the odds are good that you do. The kind that doesn’t shut properly or sticks out just a touch too far and doesn’t glide open like it used to. I’ve dealt with these kinds of things for years and Youtube is full of useful tricks to fix them. But all those videos start with the same tip: you have to find the place where the door is rubbing before you can fix it.
Enterprise IT is no different. We have to find the source of friction before we can hope to repair it. Whether it’s friction between people and hardware, users and software, or teams going at each other we have to know what’s causing the commotion before we can repair it. Just like with the sticking door, adding more force without understand the friction points isn’t a long-term solution.
Sticky Wickets
Friction comes from a variety of sources. People don’t understand how to use a device or a program. Perhaps it’s a struggle to understand who is supposed to be in charge of a change control or a provisioning process. It could even be as simple as someone applying outside issues to their regular day and causing problems because their interactions with previously stable systems is stilted.
Whatever the reasons for the friction, we need to understand what’s going on before we can start fixing. If we don’t know what we’re trying to solve we’re just going to throw solutions at the wall until something sticks. Then we hope that was the fix and we move on. Shotgun troubleshooting is never a permanent solution to anything. Instead, we need to take repeatable, documented steps to resolve the points of friction.
Ironically enough, the easiest issue to solve is the interpersonal kind. When teams argue about roles or permissions or even who should have the desks at the front of the data center it’s almost always a problem of a person against another person. You can’t patch people. You can’t upgrade people. You can’t even remove people and replace them with a new model (unless it’s a much bigger issue and HR needs to solve it). Instead, it’s a chance to get people talking. Be productive and make sure that everyone knows the outcome is the resolution of the problem. It’s not name calling or posturing. Lay out what the friction point is and make people talk about that. If you can keep them focused on the problem at hand and not at each others’ throats you should be able to get everyone to recognize what needs to change.
Mankind Versus Machines
People fighting people is a people problem. But people against the enterprise system isn’t always a cut-and-dried situation. That’s because machines are predictable in so many different kinds of ways. You can be sure that what you’re going to get is the right answer every time but you may not be able to ask the right questions to find that answer the way you want to.
Think back to how many times you’ve diagnosed a problem only to hit a wall. Maybe it’s that the CPU utilization on a device is higher than it should be. What next? Is it some software application? A bug in the system? Maybe a piece of malware running rampant? If it’s a networking device is it because of a packet flow causing issues? Or a failed table lookup causing a process switch to happen? There are a multitude of things that need to be investigated before you can decide on a course of action.
This is why shotgun troubleshooting is so detrimental to reducing friction. How do we know that the solution we tried isn’t making the problem worse? I wrote over ten years ago about removing fixes that don’t address the problem and it’s still very true today. If you don’t back out the things that didn’t address the issue you’re not only leaving bad fixes in place but you are potentially causing problems down the line when those changes impact other things.
Finding the sources of friction in your systems takes in-depth troubleshooting. Your users just want the problem to go away. They don’t want to answer forty questions about when it started or what’s been installed. They don’t want the perfect solution that ensures the problem never comes back. They just want to be able to work right now and then keep moving forward. That means you need a two-step process of triage and investigation. Get the problem under control and then investigate the friction points. Figure out how it happened and fix it after you get the users back up and running.
Lastly, document the issue and what resolved it. Write it all down somewhere, even if it’s just in your own notes. But if you do that, make sure you have a way of indexing everything so you can refer back to it at some point in the future. Part of the reason why I started this blog was to write down solutions to problems I discovered or changed along the way to ensure that I could always look them up. If you can’t publish them on the Internet or don’t feel comfortable writing it all up, at least use a personal database system like Notion to keep it all in one searchable place. That way you don’t forget how clever you are and go back to reinventing the wheel over and over again every time the problem comes up.
Tom’s Take
Friction exists everywhere. Sometimes it’s necessary to keep us from sliding all over the road or keeping a rug in place in the living room. In enterprise IT friction can create issues with system and teams. Reducing it as much as possible keeps the teams working together and being as productive as possible. You can’t eliminate it completely and you shouldn’t remove it just for the sake of getting rid of something. Instead, analyze what you need to improve or fix and document how you did it. Like any door repair, the results should be immediate and satisfying.