Everyone’s favorite time of the year is almost here! Is it because it’s the holiday season? Perhaps it’s the magic that happens at the end of the year? Or maybe, it’s because there’s an even better reason to get excited!
Change Freeze Season!
That’s right. Some of you reading this started jumping up and down like Buddy the Elf at the thought of having a change freeze. There’s something truly magical about laying down the law about not touching anything in the system until after the end-of-year reports are run and certified. For some, this means a total freeze of non-critical changes from the first of December all the way through the New Year until maybe even February. That’s a long time to have a frozen network? But why?
The Cold Shoulder
Change freezes are an easy thing to explain to the new admins. You simply don’t touch anything in the network during the freeze unless it’s broken. No tweaking. No experimenting. No improvements. Just critical break/fix changes only. There had better be a ticket. There should be someone yelling that something’s not right. Otherwise you’re in for it.
There are a ton of reasons for this. The first is something I remember from my VAR days as Boredom Repellent. When you find yourself at the end of year with nothing to do, you tend to get bored. After you’ve watched Die Hard for the fifteenth time this year you decide it’s time to clear out your project backlog. Or maybe you’ve been doing some learning modules instead. You find a great blog post from one of your favorite writers about a Great Awesome Amazing Feature That Will Save You Days Of Work If You Just Enable This One Simple Command!
In either case, the Boredom Repellent becomes like pheromones for problems. Those backlogged projects take more time than you expected. That simple feature you just need to enable isn’t so simple. It might even involve an entire code upgrade train to enable it. Pretty soon you find yourself buried in a CLI mess with people screaming about very real downtime. Now, instead of being bored you’re working until the wee hours of the night because of something you did.
The second reason for change freezes at the end of the year is management. You know, the people that call and scream at you as soon as their email appears to be running slow. The people that run reports once a month at 6:00pm and then call you because they get a funny warning message on their screen. Those folks. Guess what? End-of-year is their time to shine in all their glory.
This is usually the time they are under the most stress. Those reports have to be reprinted. All the financials from the year need to be consolidated and verified. The taxes will need to be paid. And all that paperwork and pressure adds up to stress. The kind of stress that makes any imperfections in the network seem ten times more important than before. Report screen not show success within 10 ms? Problem. Printer run out of yellow toner? Network problem. Laptop go to sleep while someone went to lunch and now the entire report is gone? Must be your problem. And guess who gets to work around the clock to solve it with someone bearing down on them from on high?
Don’t Let It Go
The fact is that we can’t have people doing things in the network without tracking those changes back to reasons. That applies for adventurous architects wanting to squeeze out the last ounce of performance from that amazing new switch. And it goes double for the CFO demanding you put his traffic into AF41 so it gets to the server faster so his reports don’t take six hours to print.
It all comes back to the simple fact that we have no way to track changes in our network and we have no way of knowing what will happen when we make one live. It feels an awful lot like this GIF:
Crazy, right? Yet every time we hit the Enter key, we are amazed at the results. Even for “modern” OSes with sanity checking, like Junos or IOS-XR, you have no way of knowing if a change you make on one device somewhere in the branch is going to crash OSPF or BGP for the entire organization. And even if there was a big loud warning popup that said, “ALERT: YOU ARE GOING TO BREAK EVERYTHING!!!”, odds are good we would just click past it.
Network automation and orchestration systems can prevent this. They can take the control of change management out of the hands of bored engineers and wrap it in process and policy. And if the policy says Change Freeze then that’s what you get. No changes. Likewise, if there is a critical need, like patching out a backdoor or something, that policy can be overridden and noted so that if there is a bug eight months from now in that code train that causes issues you can have documentation of the reason for the change when someone comes to chew you out.
Likewise, there are other solutions out there that try to prototype the entire network to figure out what will happen when you make a change. Companies like Forward Networks and Veriflow can prototype your network in a model that can assess the impact of a change before you commit to it. It’s the dream of a bored engineer because you can run simulations to your heart’s content to find out if two hours of code upgrades will really get you that 2% performance increase promised in that blog post. And for the CFO/CEO/CIO screaming at you to prioritize their traffic, these solutions can remind them that most of their traffic is Youtube and Spotify and having that at AF41 will cause massive issues for them.
What’s important is that you and the rest of the team realize that change freezes aren’t a solution to the problem of an unstable network. Instead, they are treating the symptoms that crop up from the underlying disease of the network not being a deterministic system. Unlike some other machines, networks run just fine at sub-optimal performance levels. You can make massive mistakes that will live in a network for years and never show their ugly face. That is, until you make a small change that upsets equilibrium and causes the whole system to fail, cascade style, and leave you holding the keyboard as it were.
I both love and hate Change Freeze season. I know it’s for the best because any changes that get made during this time will ultimately result in long hours at work undoing those changes. I also know that the temptation to experiment with things is very, very strong this time of year. But I feel like Change Freeze season will soon go the way of the aluminum Christmas tree when we get change management and deterministic network modeling systems in place to verify changes on a system-wide basis and not just sanity checking configs at a device level. Tracking, prototyping, and verification will solve our change freeze problems eventually. And that will make it the most wonderful time of the year all year long.
Pingback: Link Propagation 148: See You Next Year