I had the opportunity to chat with my friend Chris Marget (@ChrisMarget) this week for the first time in a long while. It was good to catch up with all the things that have been going on and reminisce about the good old days. One of the topics that came up during our conversation was around working inside big organizations and the way that change processes are built.
I worked at IBM as an intern 20 years ago and the process to change things even back then was arduous. My experience with it was the deployment procedures to set up a new laptop. When I arrived the task took an hour and required something like five reboots. By the time I left we had changed that process and gotten it down to half an hour and only two reboots. However, before we could get the new directions approved as the procedure I had to test it and make sure that it was faster and produced the same result. I was frustrated but ultimately learned a lot about the glacial pace of improvements in big organizations.
Slow and Steady Finishes the Race
Change processes work to slow down the chaos that comes from having so many things conspiring to cause disaster. Probably the most famous change management framework is the Information Technology Infrastructure Library (ITIL). That little four-letter word has caused a massive amount of headaches in the IT space. Stage 3 of ITIL is the one that deals with changes in the infrastructure. There’s more to ITIL overall, including asset management and continual improvement, but usually anyone that takes ITIL’s name in vain is talking about the framework for change management.
This isn’t going to be a post about ITIL specifically but about process in general. What is your current change management process? If you’re in a medium to large sized shop you probably have a system that requires you to submit changes, get the evaluated and approved, and then implemented on a schedule during a downtime window. If you’re in a small shop you probably just make changes on the fly and hope for the best. If you work in DevOps you probably call them “deployments” and they happen whenever someone pushes code. Whatever the actual name for the process is you have one whether you realize it or not.
The true purpose of change management is to make sure what you’re doing to the infrastructure isn’t going to break anything. As frustrating as it is to have to go through the process every time the process is the reason why. You justify your changes and evaluate them for impact before scheduling them. As opposed to something that can be termed as “Change and find out” kind of methodologies.
Process is ugly and painful and keeps you from making simple mistakes. If every part of a change form needs to be filled out you’re going to complete it to make sure you have all the information that is needed. If the change requires you to validate things in a lab before implementation then it’s forcing you to confirm that it’s not going to break anything along the way. There’s even a process exception for emergency changes and such that are more focused on getting the system running as opposed to other concerns. But whatever the process is it is designed to save you.
ITIL isn’t a pain in the ass on accident. It’s purposely built to force your justify and document at every step of the process. It’s built to keep you from creating disaster by helping you create the paper trail that will save you when everything goes wrong.
Saving Your Time But Not Your Sanity
I used to work with a great engineer name John Pross. John wrote up all the documentation for our migrations between versions of software, including Novell NetWare and Novell Groupwise. When it came time to upgrade our office Groupwise server there was some hesitation on the part of the executive suite because they were worried we were going to run into an error and lock them out of their email. The COO asked John if he had a process he followed for the migration. John’s response was perfect in my mind:
“Yes, and I treat every migration like the first one.”
What John meant is that he wasn’t going to skip steps or take shortcuts to make things go faster. Every part of the procedure was going to be followed to the letter. And if something came up that didn’t match what he thought the output should have been it was going to stop until he solved that issue. John was methodical like that.
People like to take shortcuts. It’s in our nature to save time and energy however we can. But shortcuts are where the change process starts falling apart. If you do something different this time compared to the last ten times you’ve done it because you’re in a hurry or you think this might be more efficient without testing it you’re opening yourself up for a world of trouble. Maybe not this time but certainly down the road when you try to build on your shortcut even more. Because that’s the nature of what we do.
As soon as you start cutting corners and ignoring process you’re going to increase the risk of creating massive issues rapidly. Think about something as simple as the Windows Server 2003 shutdown dialog box. People used to reboot a server on a whim. In Windows 2003, the server had a process that required you to type in a reason why you were manually shutting the server down from the console. Most people that rebooted the server fell into two camps: Those that followed their process and typed in the reason for the reboot and those that just typed “;Lea;lksjfa;ldkjfadfk” as the reason and then were confused six months from now when doing the post-mortem on an issue and cursing their snarky attitude toward reboot documentation.
Saving the Day
Change process saves you in two ways. The first is really apparent: it keeps you from making mistakes. By forcing you to figure out what needs to happen along the way and document the whole process from start to finish you have all the info you need to make things successful. If there’s an opportunity to catch mistakes along the way you’re going to have every opportunity to do that.
The second way change process saves you is when it fails. Yes, no process is perfect and there are more than a few times when the best intentions coupled with a flaw in the process created a beautiful disaster that gave everyone lots of opportunity to learn. The question always comes back to what was learned in that process.
Bad change failures usually lead to a sewer pipe of blame being pointed in your direction. People use process failures as a change to deflect blame and avoid repercussions for doing something they shouldn’t have or trying to increase their stock in the company. The truly honest failure analysis doesn’t blame anyone but the failed process and tries to find a way to fix it.
Chris told me in our conversation that he loved ITIL at one of his former jobs because every time it failed it led to a meaningful change in the process to avoid failure in the future. These are the reasons why blameless post-mortem discussions are so important. If the people followed the process and the process the people aren’t at fault. The process is incorrect or flawed and needs to be adjusted.
It’s like a recipe. If the instructions tell you to cook something for a specific amount of time and it’s not right, who is to blame? Is it you because you did what you were told? Is the recipe? Is it the instructions? If you start with the idea that you did the process right and start trying to figure out where the process is wrong you can fix the process for next time. Maybe you used a different kind of ingredient that needs more time. Or you made it thinner than normal and that meant cooking it too long this time. Whatever the result, you end up documenting the process and changing things for the future to prevent that mistake from happening again.
Of course, just like all good frameworks, change processes shouldn’t be changed without analysis. Because changing something just to save time or take a shortcut defeats the whole purpose! You need to justify why changes are necessary and prove they provide the same benefit with no additional exposure or potential loss. Otherwise you’re back to making changes and hoping you don’t get burned this time.
Tom’s Take
ITIL didn’t really become a popular thing until after I left IBM but I’m sure if I were still there I’d be up to my eyeballs in it right now. Because ITIL was designed to keep keyboard cowboys like me from doing things we really shouldn’t be doing. Change management process are designed to save us at every step of the way and make us catch our errors before they become outages. The process doesn’t exist to make our lives problematic. That’s like saying a seat belt in a car only exists to get in my way. It may be a pain when you’re dealing with it regularly but when you need it you’re going to wish you’d been using it the whole time. Trust in the process and you will be saved.