I didn’t get a chance to attend Networking Field Day Exclusive at Juniper NXTWORK 2019 this year but I did get to catch some of the great live videos that were recorded and posted here. Mist, now a Juniper Company, did a great job of talking about how they’re going to be extending their AI-driven networking into the realm of wired networking. They’ve been using their AI virtual assistant, named “Marvis”, for quite a while now to solve basic wireless issues for admins and engineers. With the technology moving toward the copper side of the house, I wanted to talk a bit about why this is important for the sanity of people everywhere.
Finding the Answer
Network and wireless engineers are walking storehouses of useless trivia knowledge. I know this because I am one. I remember the hello and dead timers for OSPF on NBMA networks. I remember how long it takes BGP to converge or what the default spanning tree bridge priority is for a switch. Where some of my friends can remember the batting average for all first basemen in the league in 1971, I can instead tell you all about LSA types and the magical EIGRP equation.
Why do we memorize this stuff? We live in a world with instant search at our fingertips. We can find anything we might need thanks to the omnipotent Google Search Box. As long as we can avoid sponsored results and ads we can find the answer to our question relatively quickly. So why do we require people to memorize esoteric trivia? Is it so we can win free drinks at the bar after we’re done troubleshooting?
The problem isn’t that we have to know the answer. It’s that we need to know the answer in order to ask the right question. More often than now we find ourselves stuck in the initial phase of figuring out the problem. The results are almost always the same – things aren’t working. Finding the cause isn’t always easy though. We have to find some nugget of information to latch onto in order to start the process.
One of my old favorites was trying to figure out why a network I was working with had a segmented spanning tree. One side of the network was working just fine but there were three switches daisy chained together that didn’t. Investigations turned up very little. Google searches were failing me. It wasn’t until I keyed in on a couple of differences that I found out that I had improperly used a BPDU filtering command because of a scoping issue. Sure, it only took me two hours of searching to find it after I discovered the problem. But if I hadn’t memorized the BDPU filtering and guard commands and their behavior I wouldn’t have even known to ask about them. So it’s super important to know how every minutia of every protocol works, right?
Presenting the Right Questions
Not exactly. We, as human computers, memorize the answers to more efficiently search through our database to find the right answers. If the problem takes 5 minutes to present we can eliminate a bunch of causes. If it’s happening in layer 3 and not layer 2 we can toss out a bunch of other stuff. Our knowledge is allowing us to discard useless possibilities and focus on the right result.
And it’s horribly inefficient. I can attest to that given my various attempts to learn OSPF hello and dead timers through osmosis of falling asleep in my big CCNP Routing book. The answers don’t crawl off the page and into your brain no matter how loudly you snore into it. So I spent hours learning something that I might use two or three times in my career. There has to be a better way.
Not coincidentally, that’s where the AI-driven systems from Mist, and now Juniper, come into play. Marvis is wonderful at looking at symptoms and finding potential causes. It’s what we do as humans. Except Marvis has no inherent biases. It also doesn’t misremember the values for a given protocol or get confused about whether or not OSPF point-to-point networks are broadcast or not. Marvis just knows what it was programmed with. But it does learn.
Learning is the key to how these AI and machine learning (ML) driven systems have to operate. People tend to discount solutions because they think there’s no way it could be that solution this time. For example, a haiku:
It’s not DNS.
Could it be DNS?
It was DNS.
DNS is often the cause of our problems even if we usually discount it out of hand in the first five minutes of troubleshooting. Even if it was only DNS 50% of the time we would still toss DNS as the root cause within the first five minutes because we’ve “trained” our brains to know what a DNS problem looks like without realizing how many things DNS can really affect.
But AI and ML don’t make these false correlations. Instead, they learn every time what the cause was. They can look at the network and see the failure state, present options based on the symptoms, and even if you don’t check in your changes they can analyze the network and figure out what change caused everything to start working again. Now, the next time the problem crops up, a system like Marvis can present you with a list of potential solutions with confidence levels. If DNS is at the top of the list, you might want to look into DNS first.
AI is going to make us all better troubleshooters because it’s going to make us all less reliant on poor memory. Instead of misremembering how a protocol should be configure, AI and ML will tell us how it should look. If something is causing routing loops or if layer 2 issues are happening because of duplex mismatches we’ll be able to see that quickly and have confidence it’s the right answer instead of just guessing and throwing things at the wall until they stick. Just like Google has supplanted the Cliff Claven people at the bar that are storehouses of useless knowledge, so too will AI and ML reduce our dependence on know-it-alls that may not have all the answers.
I’m ready to be forgetful. I’m tired of playing “stump the chump” in troubleshooting with the network playing the part of the stumper and me playing the chump. I’ve memorized more useless knowledge than I ever care to recall in my life. But I don’t want to have to do the work any longer. Instead, I want to apply my gifts to training algorithms with more processing power than me to do all the heavy lifting. I’m more than happy to look at DNS and EIGRP timers than try to remember if MTU and reliability are part of the K-values for this network.