On Old Configs and Automation


I used to work with a guy that would configure servers for us and always include an extra SCSI card in the order. When I asked him about it one day, he told me, “I left it out once and it delayed the project. So now I just put them on every order.” Even after I explained that we didn’t need it over and over again, he assured me one day we might.

Later, when I started configuring networking gear I would always set a telnet password for every VTY line going into the switch. One day, a junior network admin asked me why I configured all 15 instead of just the first 5 like they learn in the Cisco guides. I shrugged my shoulders and just said, “That’s how I’ve always done it.”

The Old Ways

There’s no more dangerous phrase than “That’s the way it’s always been.”

Time and time again we find ourselves falling back on the old rule of thumb or an old working configuration that we’ve made work for us. It’s comfortable for the human mind to work from a point of reference toward new things. We find ourselves doing it all the time. Whether it’s basing a new configuration on something we’ve used before or trying to understand a new technology by comparing it to something we’ve worked on in the past.

But how many times have those old configurations caused us grief? How many times have we been troubleshooting a problem only to find that we configured something that shouldn’t have been configured in the way that it was. Maybe it was an old method of doing hunt groups. Or perhaps it was a legacy QoS configuration that isn’t supported any more.

Part of our issue as networking professionals is that we want to concentrate on the important things. We want to implement solutions and ideas that work for our needs and not concentrate on the minutia of configuration. Sure, the idea of configuration a switch from bare metal to working config is awesome the first time you do it. But the fifteenth time you have to configure one in a row is less awesome. That’s why copy-and-paste configurations are so popular with people that just want to get the job done.

New Hotness

This idea of using old configurations for new things takes even more importance when you start replacing the boring old configuration methods with new automation and API-driven configuration models. Yes, APIs make it a lot easier to configure a switch programmatically. And automation tools like Puppet and Ansible make it much faster to bring a switch online from nothing to running in the minimum amount of time.

However, even with this faster configuration method, are we still using old, outdated configurations to solve problems? Sure, I don’t have to worry about configuring VLANs on the switch one at a time. But if my configuration is still referencing VLANs that are no longer in the system that makes it very difficult to keep the newer switches running optimally. And that’s just assuming the configuration is old and outdated. What if we’re still using deprecated commands?

APIs are great because they won’t support unsupported things. But if we don’t scrub the configuration now and then to remove these old lines and protocols we’ll quickly find ourselves in a world of trouble. Because those outdated and broken things will bring the API to a halt. Yes, the valid commands will still be entered correctly, but if those valid commands rely on something invalid to work properly you’re going find things broken very fast.

What makes this whole thing even more irritating is that most configurations need to be validated and approved before being implemented. Which makes the whole process even more difficult to manage. Part of the reason why old configs live for so long is that they need weeks or months of validation to be implemented effectively. When new platforms or configuration methods crop up it could delay new equipment installation. This sometimes leads to people installing new gear with “approved” configs that might not be the best fit in order to get that new equipment into service.


Tom’s Take

So, how do we fix all this? What’s the trick? Well, it’s really a combination of things. We need to make sure we audit configs regularly to keep the old stuff from continuing on past the expiration dates. We also need to continually resubmit new configurations to the approvals process. Just like disaster recovery documentation, configurations are living, breathing documents that should always be current and refreshed. The more current your configurations, the less likely you are to have old cruft keeping your systems running at subpar performance. And the less likely you are to have to worry about breaking new things like APIs and automation systems in the future.

2 thoughts on “On Old Configs and Automation

  1. Hi Tom, thank for writing about this, I always like another perspective on this topic as our international community slowly graduates to managing old networks in a mature, modern way.
    Starting with rolling out configurations fast and consistently thanks to automation is a smart move. A possible next step can be to view the config of each separate box as just a description of state, and describe that state in a more abstract way, for example a set of variables in a YAML file. State describes things like interfaces, VLAN’s, protocol configuration etc. You can combine your variables with a template for each type of box you manage, then parse that combination into a full configuration for each box. Once you’ve got that, you can poll boxes and compare running config to what it’s supposed to be in your repository and remediate if necessary. The repository and automation system becomes the only place you touch device configurations by changing the variables. An alternative to just comparing configuration can be actively probing the box for individual settings.

  2. i love the story with the SCSI card.
    There’s also a story about org flexibility in here – I know them, I see them, all the time.
    At a point SCSI was a standard technology. Ideally, that would be enough to procure a spare card.
    They were running servers with SCSI – so if one uses it, why not have some spares around.
    If a project got delayed ONCE because ONCE there wasn’t a card, why not suggest procuring TWO cards in case this happens and if really both should find a use then think about changing the standard… Nope. We put a damn unused card in every computer now, one more component to fail, have drivers, cost money.
    How long does it take to put a SCSI card in a smaller server?
    30 minutes would be a lot. on a larger enterprise box it would likely take an hour or more, and that is supposed to run 5 years straight. probably also just add a standard SCSI card just in case since the downtime costs a lot more than the card. (They had PCI hotplug, so the actual downtime would be exactly 0 seconds, but lets ignore that)

    Yet, the cost-/hardware efficient approach requires still having the flexibility and capability to put a card into a computer. More and more I see that orgs have put themselves into a corner where their actual technical capability is lower than that of the simplest PC Store.

    They let their systems fall out of maintenance, and suffer for years but aren’t even able to replace a mainboard without MASSIVE overhead (months and shivers) so if they grab that $200 spare part for the $400 server and it doesn’t work they collapse.
    It should take them 1 day, maybe 2. If at all, but they’re better off buying a new $10000 box (yes they needed to replace the old thing anyway, but if they run a 7y “till it dies cycle” but only 3 o 5 y maintenance they’re optimizing for problems, not for good operation)

    It is honestly sick, and the thing usually goes back to something like your SCSI card story.
    Basically, the ball had already dropped when that project got delayed out of proportion and noone intervened to that.

Leave a comment