It’s Always the Network is a refrain that causes operations teams to shudder. No matter what your flavor of networking might be it’s always your fault. Even if the actual problem is DNS, a global BGP outage, or even some issue with the SaaS provider. Why do we always get blamed? And how can you prevent this from happening to you?
Users don’t know about the world outside of their devices. As soon as they click on something in a browser window they expect it to work. It’s a lot like ordering a package and having it delivered. It’s expected that the package arrives. You don’t concern yourself with the details of how it needs to be shipped, what routes it will take, and how factors that exist half a world away could cause disruptions to your schedule at home.
The network is the same to the users. If something doesn’t work with a website or a remote application it must be the “network” that is at fault. Because your users believe that everything not inside of their computer is the network. Networking is the way that stuff happens everywhere else. As professionals we know the differences between the LAN, the WAN, the cloud, and all points in between. However your users do not.
Have you heard about MTTI? It’s a humorous metric that we like to quote now that stands for “mean time to innocence”. Essentially we’re saying that we’re racing to find out why it’s not our problem. We check the connections, look at the services in our network, and then confidently explain to the user that someone else is at fault for what happened and we can’t fix that. Sounds great in theory, right?
Would you proudly proclaim that it’s someone else’s fault to the CEO? The Board of Directors? How about to the President of the US? Depending on how important the person you are speaking to might be you might change the way you present the information. Regular users might get “sorry, not my fault” but the CEO could get “here’s the issue and we need to open a ticket to fix it”. Why the varying service levels? Is it because regular users aren’t as important as the head of the company? Or is it because we don’t want to put in the extra effort for a knowledge workers that we would gladly do for the person that in ultimately going to approve a raise for us if we do extra work? Do you see the fallacy here?
Keeping Your Eyes Peeled
The answer, of course, is that every user at least deserves an explanation of where the problem is. You might be bent on justifying that it’s not in your network so you don’t have to do any more work but you also did enough work to place the blame outside of your area of control. Why not go one step further? You could check a dashboard for services on a cloud provider to see if they’re degraded. Or do a quick scan of news sites or social media to see if it’s a more widespread issue. Even checking a service to see if the site is down for more than just you is more information than just “not my problem”.
The habit of monitoring other services allows you to do two things very quickly. First is that it lowers the all-important MTTI metric. If you can quickly scan the likely culprits for outages you can then say with certainty that it’s out of your control. However, the biggest benefit to monitoring services outside of your organization that you rely on is that you’ll have forewarning for issues in your own organization. If your users come to you and say that they can’t get to Microsoft 365 and you already know that’s because they messed up a WAN router configuration you’ll head off needless blame. You’ll also know where to look next time to understand challenges.
If you’re already monitoring the infrastructure in your organization it only makes sense to monitor the infrastructure that your users need outside of it. Maybe you can’t get AWS to send you the SNMP strings you need to watch all their internal switches but you can easily pull information from their publicly available sources to help determine where the issues might lie. That helps you in the same way that monitoring switch ports in your organization helps you today. You can respond to issues before they get reported so you have a clear picture of what the impact will be.
When I worked for a VAR one of my biggest pet peeves was “not my problem” thinking. Yes, it may not be YOUR problem but it is still A problem for the user. Rather than trying to shift the blame let’s put our heads together to investigate where the problem lies and create a fix or a workaround for it. Because the user doesn’t care who is to blame. They just want the issue resolved. And resolution doesn’t come from passing the buck.