Avoiding A MacGyvered Network

Posted on June 1, 2018 by networkingnerd

Ivan Pepelnjak has an interesting post up today about MacGyver-ing in the network. He and Simon Milhomme are right that most small-to-medium sized networks are pretty much non-reference architectures and really, really difficult to manage and maintain properly on the best of days. On the worst of days, they’re a nightmare that make you want to run screaming into the night. But why?

One Size Never Fits All

Part of the issue is that reference architectures and cookie-cutter designs aren’t made for SMEs. Sure, the large enterprise and cloud providers have their own special snowflakes. But so too do small IT shops that have been handed a pile of parts and told to make it work.

People like Greg Ferro and Peyton Maynard-Koran believe this is due to vendors and VARs pushing hardware and sales cycles like crazy. I have attributed it to the lack of real training and knowledge about networking. But, it also has a lot to do with the way that people see IT as a cost center. We don’t provide value like marketing. We don’t collect checks like accounting. At best, we’re no different than the utility companies. We’re here because we have to be.

Likewise, when IT is forced into making decisions based on some kind of rebate or sale we tend to get stuck with the blame when things don’t work correctly. People that wouldn’t bat an eye at buying a new Rolex or BMW get downright frugal when they’re deciding which switches to buy to run their operation for the next 10 years. So, they compromise. Surely you all can make this switch work? It only has half the throughput as the one you originally specced but it was on sale!

So, stuck with bad decisions and no real documentation or reference points, we IT wizards do what we can with what we have. Just like Angus MacGyver. Sure, those networks are a pain to manage. They’re a huge issue when it’s time to upgrade to the next discount switch. And given the number of fires that we have to fight on a daily basis we never get a chance to go back and fix when we nailed up in the first place. Nothing is more permanent than a temporary fix. And no install lasts longer than one that was prefaced with “let me just get this working for the next week”.

Bespoke Off The Rack

So, how do we fix the MacGyver-ing? It’s not going to be easy. It’s going to be painful and require us to step out of our comfort zones.

Shut Up And Do As I Say. This one is hard. Instead of having management breathing down your neck to get something installed or working on their schedule, you need to push back. You need to tell people that rushing is only going to complicate things. You need to fire back when someone hands you equipment that doesn’t meet your spec. You need to paraphrase the famous Shigeru Miyamoto – “A delayed network is eventually good, but a rushed installation is bad forever.”
Document Like It Will Be Read Aloud In Court. Documentation isn’t just a suggestion. It’s a necessity. We’ve heard about the “hit by a bus” test. It’s more than just that, though. You need to not only be able to replace yourself but also to be able to have references for why you made the decisions you did. MacGyver-ing happens when we can’t remember why we made the decisions we did. It happens when we need to come up with a last minute solution to an impossible problem created by other people (NSFW language). Better to have that solution documented in full so you know what’s going on three years from now.
Plan Like It’s Happening Tomorrow. Want to be ready for a refresh? Plan it today. Come back to your plan. Have a replacement strategy for every piece of equipment in your organization. Refresh the plan every few months. When someone comes to you with a new idea or a new thing, write up a plan. Yes, it’s going to be a lot of spinning your wheels. but it’s also going to be a better solution when someone comes to you and says that it’s time to make a project happen. Then, all you need to do is pull out your plan and make it happen. Imagine it like the opposite of a DR plan – Instead of the worst case scenario this is your chance to make it the best case.

Tom’s Take

There’s a reason the ringtone on my phone has been the theme from MacGyver for the last 15 years. I’m good at building something out of nothing. Making things work when they shouldn’t. And, as bad as it sounds, sometimes that’s what’s needed. Especially in the VAR world. But it shouldn’t be standard operating procedure. Instead, make your plan, execute it when the time comes, and make sure no one pushes you into a situation where MacGyver-ing is the last option you have.

Time To Get Back To Basics?

Posted on May 11, 2018 by networkingnerd

I’ve had some fascinating networking discussions over the past couple of weeks at Dell Technologies World, Interop, and the spring ONUG meeting. But two of them have hit on some things that I think need to be addressed in the industry. Both Russ White and Ignas Bagdonas of the IETF have come to me and talked about how they feel networking professionals have lost sight of the basics.

How Stuff Works

If you walk up to any network engineer and ask them to explain how TCP works, you will probably get a variety of answers. Some will try to explain it to you in basic terms to avoid getting too in depth. Others will swamp you with a technical discussion that would make the protocol inventors proud. But still others will just shrug their shoulders and admit they don’t really understand the protocol.

It’s a common problem when a technology gets to the point of being mature and ubiquitous. One of my favorite examples is the fuel system on an internal combustion engine. On older cars or small engines, the carburetor is responsible for creating the correct fuel and air mixture that is used to power the cylinders. Getting that mix right is half science and half black magic. For the people that know how to do it properly it’s an easy thing that they can do to drive the maxim performance from an engine. For those that have tried it and failed, it’s something best left alone to run at defaults.

The modern engine uses fuel injection. It’s a black box. It can be reprogrammed or tinkered with but it’s something that is tuned in different ways from the traditional carburetor. It’s not something that’s designed to be played around with unless you really know what you’re doing. The technology has reached the point where it’s ubiquitous and easy to use, but very difficult to repair without a specialized skill set.

Most regular car drivers will look under the hood and quickly realize they know nothing about what’s going on. Some technical folks will be able to figure out what certain parts do by observing their behavior. But if you ask those same people how a fuel injection system or carburetor works they’ll probably give you a blank stare.

That’s the issue we find in modern networking. We’ve been creating VLANs and BGP route maps for years. Some people have spent their entire careers tuning multicast or optimizing layer 2 interconnects. But if you corner them and ask them how the protocol works or how best to design an architecture that takes advantage of their life’s work they can’t tell you aside from referencing some old blog post or some vendor’s validated design on their hardware.

Russ and Ignas each touch on something important. In the good old days before there were a hundred certification guides and a thousand SRNDs people had to do real work to find the best solution for a problem. They had to put pencil to paper and sort out the mess before they could make something work. That’s where the engineering side of the network comes from.

Today, it’s more “plug and play”. You drop in pieces of a solution and they should work together. In practice, that usually means all the pieces have to be from the same vendor or from approved partner sources. And anything that goes awry will need a team of experts and many, many consulting hours to figure out.

Imagine if we could only install networks without understanding how they work. Could you see a world where everything we install from a networking perspective is a black box like a fuel injector? That’s the case to a certain degree with cloud networking today. We don’t see what’s going on under the surface. We can only see what the interface exposes to us. That’s fine as long as the applications we are using support the things we’re trying to do with them. But when it comes to being able to fix the network at the level we’re used to seeing it could be difficult if not downright impossible.

Learning The Ropes

But, moreover, are the networking professionals that are configuring these networks even capable of making those changes? Does anyone other than Narbik really understand how EIGRP works? Facebook seems to think that lightweight messaging packets for routing protocols are outdated. So they used ZeroMQ without understanding why that’s a bad idea for slow speed links. They may understand how a routing protocol works in theory, but they don’t completely understand how it’s supposed to work in extreme cases.

Can we teach people the basics and understanding of protocols and design that they need in order to make proper designs outside of vendor reference documents? It’s a tall order for sure. Most blog posts are designed to talk about features or solve problems. Most literature from creators is designed to make their new widget work correctly. Very little documentation exists about integration or design. And a good portion of what does exist is a bit outmoded and needs to be spruced up.

We, as the stewards of networking, need to help this process along. We need to spend more time talking about design and theory. We need to dissect protocols and help people understand how to use the tools they have rather than hoping someone will build the best mousetrap ever to solve each piece of a complicated puzzle. We need to teach people to be thinkers and problem solvers. And, yes, that does mean a bit less complaining about things like vendor code quality and VAR behavior.

Why? Because those people are empowered by a lack of knowledge. Customers aren’t idiots. They have business reasons for the things they do. Our technology needs to support their business needs. Yes, that means we need to think critically about what we’re doing. And yes, they may mean eating our words now and then to avoid a showdown about something that’s ultimately unimportant in the long run.

If we increase the amount of knowledge about the important topics like design and protocols it should make the overall level of understanding go up for everyone. That means better designs. More integrated technology. Less reliance on being force-fed the bare minimum information necessary to make things work. And that means things will run faster and much more smoothly. Which is a win for everyone.

Tom’s Take

I’ll be the first to admit that I don’t know the dirty mechanics of Frame Relay switching or how to tune OSPF Hello timers for non-standard link types. It’s a skill I don’t use every day. But I know where to find them if I need them. And I know that it can help in certain situations where you see odd protocol behavior. Likewise, I know that if I need to go design a network for someone I need to get a lot of information about business needs and build the best network I can with the constraints that I have. Things like budget and time are universal. But one of those constraints shouldn’t be lack of knowledge about protocols and good design. Those are two things that should be ingrained into anyone that wants to have a title of “senior” anything.

Transitioning Away From Legacy IT

Posted on May 4, 2018 by networkingnerd

One of the more exciting things I saw at Dell Technologies World this week was the announcement by VMware that they are supporting Microsoft Azure now in additional to AWS. It’s interesting because VMware is trying to provide a proven, stable migration path for companies that are wanting to move to the cloud but still retain their investments in VMware and legacy virtualization. But is offing legacy transition a good idea?

Hold On For One More Day

If I were to mention VLAN 1002-1005 to networking people, they would likely jump up and tell me that I was crazy. Because those VLANs are not valid on any Cisco switches save for the Nexus line. But why? What makes these forbidden? Unless you’re studying for your CCIE you probably just know these are bad and move on.

Turns out, they are a legacy transition mechanism from the IOS-SX days. 1002 and 1004 were designed to bridge FDDI-to-Ethernet, and 1003 and 1005 did the same for Token Ring. As Greg Ferro points out here, this code was tightly bound into IOS-SX and likely couldn’t be removed for fear of breaking the OS. The reservation continued forward in all IOS branches except NX-OS, which pulled them due to lack of support for those protocols.

So, we’ve got a legacy transition mechanism causing problems for users well past the “use by” date. Token Ring was on the way out at IBM in 2001. And yet, for some reason seventeen years later I still have to worry about bridging it? Or how about the rumors that Windows skipped from version 8 to version 10 because legacy code bases assumed Windows 9 meant Windows 95? Something 23 years old forced a major version change?

We keep putting legacy bridges in place all the time to help migrate things. Virtualization isn’t the only culprit here. We’ve found all manner of things that we can do to “trick” systems into working with modern hardware. We even made one idea into a button. But we never really solve the underlying issues. We just keep creating workarounds until we’re forced to move.

The Dream Is Still Alive

As it turns out, it’s expensive to refactor code bases and update legacy software to support new hardware. We’ve hit this problem time and time again with all manner of products. I can remember when Cisco CallManager wouldn’t install on a spare server I had with the same model number as a support machine just because the CPU was exactly 100MHz too fast. It’s frustrating to say the least.

But, we also have to realize that legacy transition mechanisms are not permanent fixes. It’s right there in the name. Transition. We put them in place because it’s cheaper in the short term while we investigate long term methods to make everything work correctly. But it’s still important to find those long term solutions. Maybe it’s a new application. Or a patch to make it work with new hardware. Sometimes, as Apple has done, it’s a warning that old software will stop working soon.

As developers, it’s important to realize that your app may last long past the date you want to stop supporting it. If you could still install Office 2000 on a desktop, I’m almost positive that someone would try it. We still have ways to install and use DOS software! If you want to ensure that your software is being used correctly and that you aren’t issuing patches for it after you’ve retired to a comfortable island with no Internet connection, make sure you find a way to ease transitions and offer new connection options to users.

For those of you that are still stuck in the morass of supporting legacy software or hardware, take a look at what you’re using it for and try to make hard choices where appropriate. If your organization is moving to the cloud, maybe now is the time to cut off your support for an application that’s too old to survive the migration. Or maybe it’s time to retire the Domain Controller That Time Forgot. But you have to do it before you’re forced to virtualize it and support it in perpetuity in AWS or Azure.

Tom’s Take

I’ll be the first to admit that legacy hardware and software are really popular. I worked with a company one time that still had an AS/400 admin on staff because of one application. It just happened to the one that paid people. At Interop ITX this year, the CIO for Detroit mentioned that they had to bring a developer out of retirement to make sure people kept getting paid. But legacy can’t be a part of the future. You either need to find a way to move what you have while you look for something better or you need to cut it off at the knees and find a way to make those functions work somewhere else. Because you don’t want to be the last company running AS/400 payroll over token ring bridged to a Cisco switch on VLAN 1003.

On Old Configs and Automation

Posted on April 13, 2018 by networkingnerd

I used to work with a guy that would configure servers for us and always include an extra SCSI card in the order. When I asked him about it one day, he told me, “I left it out once and it delayed the project. So now I just put them on every order.” Even after I explained that we didn’t need it over and over again, he assured me one day we might.

Later, when I started configuring networking gear I would always set a telnet password for every VTY line going into the switch. One day, a junior network admin asked me why I configured all 15 instead of just the first 5 like they learn in the Cisco guides. I shrugged my shoulders and just said, “That’s how I’ve always done it.”

The Old Ways

There’s no more dangerous phrase than “That’s the way it’s always been.”

Time and time again we find ourselves falling back on the old rule of thumb or an old working configuration that we’ve made work for us. It’s comfortable for the human mind to work from a point of reference toward new things. We find ourselves doing it all the time. Whether it’s basing a new configuration on something we’ve used before or trying to understand a new technology by comparing it to something we’ve worked on in the past.

But how many times have those old configurations caused us grief? How many times have we been troubleshooting a problem only to find that we configured something that shouldn’t have been configured in the way that it was. Maybe it was an old method of doing hunt groups. Or perhaps it was a legacy QoS configuration that isn’t supported any more.

Part of our issue as networking professionals is that we want to concentrate on the important things. We want to implement solutions and ideas that work for our needs and not concentrate on the minutia of configuration. Sure, the idea of configuration a switch from bare metal to working config is awesome the first time you do it. But the fifteenth time you have to configure one in a row is less awesome. That’s why copy-and-paste configurations are so popular with people that just want to get the job done.

New Hotness

This idea of using old configurations for new things takes even more importance when you start replacing the boring old configuration methods with new automation and API-driven configuration models. Yes, APIs make it a lot easier to configure a switch programmatically. And automation tools like Puppet and Ansible make it much faster to bring a switch online from nothing to running in the minimum amount of time.

However, even with this faster configuration method, are we still using old, outdated configurations to solve problems? Sure, I don’t have to worry about configuring VLANs on the switch one at a time. But if my configuration is still referencing VLANs that are no longer in the system that makes it very difficult to keep the newer switches running optimally. And that’s just assuming the configuration is old and outdated. What if we’re still using deprecated commands?

APIs are great because they won’t support unsupported things. But if we don’t scrub the configuration now and then to remove these old lines and protocols we’ll quickly find ourselves in a world of trouble. Because those outdated and broken things will bring the API to a halt. Yes, the valid commands will still be entered correctly, but if those valid commands rely on something invalid to work properly you’re going find things broken very fast.

What makes this whole thing even more irritating is that most configurations need to be validated and approved before being implemented. Which makes the whole process even more difficult to manage. Part of the reason why old configs live for so long is that they need weeks or months of validation to be implemented effectively. When new platforms or configuration methods crop up it could delay new equipment installation. This sometimes leads to people installing new gear with “approved” configs that might not be the best fit in order to get that new equipment into service.

Tom’s Take

So, how do we fix all this? What’s the trick? Well, it’s really a combination of things. We need to make sure we audit configs regularly to keep the old stuff from continuing on past the expiration dates. We also need to continually resubmit new configurations to the approvals process. Just like disaster recovery documentation, configurations are living, breathing documents that should always be current and refreshed. The more current your configurations, the less likely you are to have old cruft keeping your systems running at subpar performance. And the less likely you are to have to worry about breaking new things like APIs and automation systems in the future.

Reclaiming 1.1.1.1 For The Internet

Posted on April 5, 2018 by networkingnerd

Hopefully by now you’ve seen the announcement that CloudFlare has opened a new DNS service at the address of 1.1.1.1. We covered a bit of it on this week’s episode of the Gestalt IT Rundown. Next to Gmail, it’s probably the best April Fool’s announcement I’ve seen. However, it would seem that the Internet isn’t quite ready for a DNS resolver service that’s easy to remember. And that’s thanks in part to the accumulation of bad address hygiene.

Not So Random Numbers

The address range of 1/8 is owned by APNIC. They’ve had it for many years now but have never announced it publicly. Nor have they ever made any assignments of addresses in that space to clients or customers. In a world where IPv4 space is at a premium, why would a RIR choose to lose 16 million addresses?

Edit: As pointed out by Dale Carder of ES.net in a comment below, APNIC has been assigning address space out of 1 /8 since 2010. However, the most commonly leaked prefixes in that subnet that are difficult to assign because of bogus announcements come from 1.0.0.0/14.

As it turns out, 1/8 is a pretty bad address space for two reasons. 1.1.1.1 and 1.2.3.4. These two addresses are responsible for most of the inadvertent announcements in the entire 1/8 space. 1.2.3.4 is easy to figure out. It’s the most common example IP address given when talking about something. Don’t believe me? Google is your friend. Instead of using 192.0.2.0/24 like we should be using, we instead use the most common PIN, password, and luggage combination in the world. But, at least 1.2.3.4 makes sense.

Why is 1.1.1.1 so popular? Well, the first reason is thanks to Airespace wireless controllers. Airespace uses 1.1.1.1 as the default virtual interface address for just about everything. Here’s a good explanation from Andrew von Nagy. When Airespace was sold to Cisco, this became a very popular address for Cisco wireless networks. Except now that it’s in use as a DNS resolver there are issues with using it. The wireless experts I’ve talked to recommend changing that address to 192.0.2.1, since that address has been marked off for examples only and will never be globally routable.

The other place where 1.1.1.1 seems to be used quite frequently is in Cisco ASA failover interfaces. Cisco documentation recommended using 1.1.1.1 for the primary ASA failover and 1.1.1.2 as the secondary interface. The heartbeats between those two interfaces were active as long as the units were paired together. But, if they were active and reachable then any traffic destined for those globally routable addresses would be black holed. Now, ASAs should probably be using 192.0.2.1 and 192.0.2.2 instead. But beware that this will likely require downtime to reconfigure.

The 1.1.1.1 address confusion doesn’t stop there. Some systems like Nomadix use them as the default logout address. Vodafone used to use it as an image caching server. ISPs are blocking it upstream in some ACLs. Some security organizations even recommend dropping traffic to 1/8 as a bogon prevention measure. There’s every chance that 1.1.1.1 is going to be dropped by something in your network or along the transit path.

Planning Not To Fail

So, how are you going to proceed if you really, really want to use CloudFlare DNS? Well, the first step is to make sure that you don’t have 1.1.1.1 configured anywhere in your network. That means checking WLAN controllers, firewalls, and example configurations. Odds are good you’re running RFC1918 space. But you should try to ping 1.1.1.1 anyway. If you can ping it, then you should traceroute the address. If the traceroute leaves your local network, you probably have a good path.

Once you’ve determined that you’re capable of reaching 1.1.1.1, you need to test it first. Configure it on a test machine or VM and make sure it’s actually resolving addresses for you. Better safe than sorry. Once you know it’s really working like you want it to work, configure it on your internal DNS servers as a forwarder. You still want internal control of DNS thanks to things like Active Directory. But configuring it as a forwarder means you can take advantage of all the features CloudFlare is building into the system while still retaining anything you’ve done locally.

Tom’s Take

I’m glad CloudFlare and APNIC are reclaiming 1.1.1.1 for some useful purpose. CloudFlare can take the traffic load of all the horribly misconfigured systems in the world. APNIC can use this setup to do some analytics work to find out exactly how screwed up things are. I wouldn’t be shocked to see something similar happen to 1.2.3.4 in the future if this bet pays off.

I’ve been using 1.1.1.1 since April 2nd and it works. It’s fast and hasn’t broken yet, which is the best that you can hope for from a DNS server. I’m sure I’ll play around with some of the advanced features as they come online but for now I’m just happy that one of the most recognizable IP addresses in the world is working for me.

Is Patching And Tech Support Bad? A Response

Posted on March 23, 2018 by networkingnerd

Hopefully, you’ve had a chance to watch this 7 minute video from Greg Ferro about why better patching systems can lead to insecure software. If you haven’t, you should:

Greg is right that moral hazard is introduced because, by definition, the party providing the software is “insured” against the risks of the party using the software. But, I also have a couple of issues with some of the things he said about tech support.

Are You Ready For The Enterprise

I’ve been working with some Ubiquiti access points recently. So far, I really enjoy them and I’m interested to see where their product is going. After doing some research, the most common issue with them seems to be their tech support offerings. A couple of Reddit users even posted in a thread that the lack of true enterprise tech support is the key that is keeping Ubiquiti from reaching real enterprise status.

Think about all the products that you’ve used over the last couple of years that offered some other kind of support aside from phone or rapid response. Maybe it was a chat window on the site. Maybe it was an asynchronous email system. Hell, if you’ve ever installed Linux in the past on your own you know the joys of heading to Google to research an issue and start digging into post after post on a long-abandoned discussion forum. DenverCoder9 would agree.

By Greg’s definition, tech support creates negative incentive to ship good code because the support personnel are there to pick up the pieces for all your mistakes when writing code. It even incentivizes people to set unrealistic goals and push code out the door to hit milestones. On the flip side, try telling people your software is so good that you don’t need tech support. It will run great on anything every time and never mess up once. If you don’t get laughed out of the building you should be in sales.

IT professionals are conditioned to need tech support. Even if they never call the number once they’re going to want the option. In the VAR world, this is the “throat to choke” mentality. When the CEO is coming down to find out why the network is down or his applications aren’t working, they are usually not interested in the solution. They want to yell at someone to feel like they’ve managed the problem. Likewise, tech support exists to find the cause of the problem in the long run. But it’s also a safety blanket that exists for IT professionals to have someone to yell at after the CEO gets to them. Tech Support is necessary. Not because of buggy code, but because of peace of mind.

How Complex Are You?

Greg’s other negative aspect to tech support is that it encourages companies to create solutions more complex than they need to be because someone will always be there to help you through them. Having done nationwide tech support for a long time, I can tell you that most of the time that complexity is there for a reason. Either you hide it behind a pretty veneer or you risk having people tell you that your product is “too simple”.

A good example case here is Meraki. Meraki is a well-known SMB/SME networking solution. Many businesses run on Meraki and do it well. Consultants sell Meraki by the boatload. You know who doesn’t like Meraki? Tinkerers. People that like to understand systems. The people that stay logged into configuration dashboards all day long.

Meraki made the decision when they started building their solution that they weren’t going to have a way to expose advanced functionality in the system to the tinkerers. They either buried it and set it to do the job it was designed to do or they left it out completely. That works for the majority of their customer base. But for those that feel they need to be in more control of the solution, it doesn’t work at all. Try proposing a Meraki solution to a group of die-hard traditionalist IT professionals in a large enterprise scenario. Odds are good you’re going to get slammed and then shown out of the building.

I too have my issues with Meraki and their ability to be a large enterprise solution. It comes down to choosing to leave out features for the sake of simplicity. When I was testing my Ubiquiti solution, I quickly found the Advanced Configuration setting and enabled it. I knew I would never need to touch any of those things, yet it made me feel more comfortable knowing I could if I needed to.

Making a solution overly complex isn’t a design decision. Sometimes the technology dictates that the solution needs to be complex. Yet, we knock solutions that require more than 5 minutes to deploy. We laud companies that hide complexity and claim their solutions are simple and easy. And when they all break we get angry because, as tinkerers, we can’t go in and do our best to fix them.

Tom’s Take

In a world where software in king, quality matters. But, so does speed and reaction. You can either have a robust patching system that gives you the ability to find and repair unforeseen bugs quickly or you can have what I call the “Blizzard Model”. Blizzard is a gaming company that has been infamous for shipping things “when they’re done” and not a moment before. If that means a project is 2-3 years late then so be it. But, when you’re a game company and this is your line of work you want it to be perfect out of the box and you’re willing to forego revenue to get it right. When you’re a software company that is expected to introduce new features on a regular cadence and keep the customers happy it’s a bit more difficult.

Perhaps creating a patching system incentivizes people to do bad things. Or, perhaps it incentivizes them to try new things and figure out new ideas without the fear that one misstep will doom the product. I guess it comes down to whether or not you believe that people are inherently going to game the system or not.

When Redundancy Strikes

Posted on March 16, 2018 by networkingnerd

Networking and systems professionals preach the value of redundancy. When we tell people to buy something, we really mean “buy two”. And when we say to buy two, we really mean buy four of them. We try to create backup routes, redundant failover paths, and we keep things from being used in a way that creates a single point of disaster. But, what happens when something we’ve worked hard to set up causes us grief?

Built To Survive

The first problem I ran into was one I knew how to solve. I was installing a new Ubiquiti Security Gateway. I knew that as soon as I pulled my old edge router out that I was going to need to reset my cable modem in order to clear the ARP cache. That’s always a thing that needs to happen when you’re installing new equipment. Having done this many times, I knew the shortcut method was to unplug my cable modem for a minute and plug it back in.

What I didn’t know this time was that the little redundant gremlin living in my cable modem was going to give me fits. After fifteen minutes of not getting the system to come back up the way that I wanted, I decided to unplug my modem from the wall instead of the back of the unit. That meant the lights on the front were visible to me. And that’s when I saw that the lights never went out when the modem was unplugged.

Turns out that my modem has a battery pack installed since it’s a VoIP router for my home phone system as well. That battery pack was designed to run the phones in the house for a few minutes in a failover scenario. But it also meant that the modem wasn’t letting go of the cached ARP entries either. So, all my efforts to make my modem take the new firewall were being stymied by the battery designed to keep my phone system redundant in case of a power outage.

The second issue came when I went to turn up a new Ubiquiti access point. I disconnected the old Meraki AP in my office and started mounting the bracket for the new AP. I had already warned my daughter that the Internet was going to go down. I also thought I might have to reprogram her device to use the new SSID I was creating. Imagine my surprise when both my laptop and her iPad were working just fine while I was hooking the new AP up.

Turns out, both devices did exactly what they were supposed to do. They connected to the other Meraki AP in the house and used it while the old one was offline. Once the new Ubiquiti AP came up, I had to go upstairs and unplug the Meraki to fail everything back to the new AP. It took some more programming to get everything running the way that I wanted, but my wireless card had done the job it was supposed to do. It failed to the SSID it could see and kept on running until that SSID failed as well.

Finding Failure Fast

When you’re trying to troubleshoot around a problem, you need to make sure that you’re taking redundancy into account as well. I’ve faced a few problems in my life when trying to induce failure or remove a configuration issue was met with difficulty because of some other part of the network or system “replacing” my hard work with a backup copy. Or, I was trying to figure out why packets were flowing around a trouble spot or not being inspected by a security device only to find out that the path they were taking was through a redundant device somewhere else in the network.

Redundancy is a good thing. Until it causes issues. Or until it makes your network behave in such a way as to be unpredictable. Most of the time, this can all be mitigated by good documentation practices. Being able to figure out quickly where the redundant paths in a network are going is critical to diagnosing intermittent failures.

It’s not always as easy as pulling up a routing table either. If the entire core is down you could be seeing traffic routing happening at the edge with no way of knowing the redundant supervisors in the chassis are doing their job. You need to write everything down and know what hardware you’re dealing with. You need to document redundant power supplies, redundant management modules, and redundant switches so you can isolate problems and fix them without pulling your hair out.

Tom’s Take

I rarely got to work with redundant equipment when I was installing it through E-Rate. The government doesn’t believe in buying two things to do the job of one. So, when I did get the opportunity to work with redundant configurations I usually found myself trying to figure out why things were failing in a way I could predict. After a while, I realized that I needed to start making my own notes and doing some investigation before I actually started troubleshooting. And even then, like my cable modem’s battery, I ran into issues. Redundancy keeps you from shooting yourself in the foot. But it can also make you stab yourself in the eye in frustration.

Making Alexa Tech Demos Useful

Posted on February 16, 2018 by networkingnerd

Technology always marches on. People want to see the latest gadgets doing amazing things, whether it be flying electric cars or telepathic eyeglasses. Our society is obsessed with the Jetsons and the look of the future. That’s why we’re developing so many devices to help us get there. But it’s time for IT to reconsider how they are using one of them for a purpose far from the original idea.

Speaking For The People

By all accounts, the Amazon Echo is a masterful device. It’s a smart speaker that connects to an Amazon service that offers you a wider variety of software programs, called skills, to enhance what you can do with it. I have several of these devices that were either given out as conference attendance gifts or obtained from other giveaways.

I find the Echo speaker a fascinating thing. It’s a good speaker. It can play music through my phone or other Bluetooth-connected devices. But, I don’t really use it for that purpose. Instead, I use the skills to do all kinds of other things. I play Jeopardy! frequently. I listen to news briefings and NPR on a regular basis. I get weather forecasts. My son uses the Echo to check simple fraction math when he’s doing homework. My daughter uses it to time her math facts practice.

It would appear that the power behind an Echo speaker lies not in the hardware, but in the software stack built on it. It’s so powerful that most people don’t even refer to the speaker as an “Echo”, but instead as “Alexa”, the default name used to activate the listening service. People ask Alexa all kinds of things. And Alexa provides answers or ways to get the answers. It’s so popular that modern IT organizations have started to get in on the action.

Alexa, Tell Me A Story

Enterprise IT vendors are starting to show off their programming skills by creating Alexa skills to integrate with their software. Ostensibly, this would be to showcase how the platform has a rich API that allows for a large amount of information to be queried all at once. Users could ask Alexa to give them a readout of what’s going on without having to log into the system at any given time. I’ve personally seen demos that ask Alexa to find out who is using all the network bandwidth, what is the status of the wireless network, and even details on protocols.

However, there is a huge downside to using Alexa for this purpose. Without specifically crafted questions, you get a readout that is like trying to drink from a monotone firehose. Alexa is just like any computer system in that it will dutifully read you whatever input is given to it. That’s fine if you want the kind of detail that you get in your average computer monitoring system. But, if you’re using a smart speaker to cut down on the amount of information you are processing, you probably don’t want the entire text of the system read out to you.

I always fall back on the idea of people trying to make small talk. When you ask someone how their day is going, you typically aren’t looking for a recitation of their entire schedule from start to finish with all the details they can pack in. You’re looking for simple answer – good, okay, or not good. That’s the basic level of information that anyone wants about anything. More specific queries can drill down into other areas, but the initial conversation needs to be easy to parse in one or two sentences.

Another issue with using Alexa for technical demos is how the system parses IP addresses and DNS names. Alexa will dutifully read an IP address to you one digit at a time, including periods between octets. That can be annoying for addresses in the old Class C range with lots of 3-digit numbers. Also, you’d have to write them down to get any kind of coherence about which system was being discussed, which does kind of eliminate the usefulness of getting information from a speaker. With DNS names, Alexa will try to read the name of the system as if it were a real word. That can produce results that range from hilarious to downright unintelligible. It makes trying to understand these briefings much harder.

So, how can this be fixed? The answer is actually quiet easy. Instead of making your Alexa skill read off every possible piece of information with a simple query, have it give a basic readout. Possible answers like:

Things look good now
There are a couple of trouble spots to look at. Would you like to know more?
There are quite a few problems. I suggest logging in to learn more.

Each of these answers gives the user a chance to understand things. A “good” response means everything is good and you don’t need to know more. An “okay” or middle response says there are only a couple of issues that could be summarized here. A “bad” response tells the user that there is too much information to be easily digested in an audio briefing an that they should log into the system to see more. That gives the user the option of getting more compact information in a format that makes sense to them rather than listening to the speaker drone on for 5 minutes about all the errors in the system.

Tom’s Take

Technology is a wonderful thing. Technology used for the proper purpose is even better. The Amazon Echo is a great tool that helps advance our understanding of what people listen to and how they use machine learning and AI to ask questions and get answers. But, ultimately the Echo is a consumer device built around consumer questions. It’s up to enterprise tech vendors to write skills that give us the chance to interact with the speaker, not just get an information dump first thing in the morning. Enterprise tech vendors need to understand that they are what makes Alexa’s briefing useful. Select the information they will receiving and package it in such as way as to make it digestible.

The Winds of Change From January

Posted on February 9, 2018 by networkingnerd

Some quick thoughts on networking from my last couple of weeks at Networking Field Day 17 and Tech Field Day Extra at Cisco Live Europe:

Cisco is in the middle of turning a big ship away from hardware. All their innovation is coming in the software side of the house. Big announcements around network assurance. It’s not enough any more to do the things. Now you need to prove they were done and show your work. Context and Intent only work if you can quantitatively show that they were applied.
Containers are still a thing. Cisco has a new container platform. I also had the chance to chat with a startup called AppOrbit that’s doing some interesting things around containers but including storage and networking. They should be primed for some announcements soon, so stayed tuned for that!
Automation is cool again. Well, maybe it never stopped being cool. But thanks to Extreme Networks and Juniper people are really hopping on the train to talk more about removing the limitations of the CLI and doing it with tools like Slack. Check out Lindsay Hill and Matt Oswalt showing this off to people in some finely crafted demos.
2018 is the year that the CLI dies. Sure, we’ll go with that. Between Slack and Github and even Cisco’s push to drive ACI through literally everything we’re going to see more and more people configuring networks with a mouse instead of a keyboard. Which is a bit crazy when you think about it, but it’s not so far fetched as you might think compared to the way people are configuring AWS right now. I dare you to find the CLI for AWS’s switches in your control panel.
Lastly, change is inevitable. People reading through the above items may say to themselves that their job is going to away. They may worry that they’re going to be an old fuddy duddy before they know it. If you never want to change, that’s fine. As Truman Boyes said this week: https://twitter.com/trumanboyes/status/961785937993846789 But if you want to really succeed and move along, you can’t be afraid to change. You need to pick up new skills and learn new things. Oceans and rivers don’t erode mountains because they are there. They wear them down because they are incapable of moving and changing. Change is thrust upon them.

Tom’s Take

Go out and make a change this week. Do something different. Use a different treadmill for your workout. Visit a store you’ve never seen before. Place yourself in a different situation and see how you respond to it. Then come back to your desk and look at your work. Look at containers and automation with new eyes. I bet it will look a lot less scary and lot more fun to you. Don’t be afraid of change. Embrace it and grow.

Is ACI Coming For The CLI?

Posted on February 2, 2018 by networkingnerd

I’m soon to depart from Cisco Live Barcelona. It’s been a long week of fun presentations. While I’m going to avoid using the words intent and context in this post, there is one thing I saw repeatedly that grabbed my attention. ACI is eating Cisco’s world. And it’s coming for something else very soon.

Devourer Of Interfaces

Application-Centric Infrastructure has been out for a while and it’s meeting with relative success in the data center. It’s going up against VMware NSX and winning in a fair number of deals. For every person that I talk to that can’t stand it I hear from someone gushing about it. ACI is making headway as the tip of the spear when it comes to Cisco’s software-based networking architecture.

Don’t believe me? Check out some of the sessions from Cisco Live this year. Especially the Software-Defined Access and DNA Assurance ones. You’re going to hear context and intent a lot, as those are the key words for this new strategy. You know what else you’re going to hear a lot?

Contract. Endpoint Group (EPG). Policy.

If you’re familiar with ACI, you know what those words mean. You see the parallels between the data center and the push in the campus to embrace SD-Access. If you know how to create a contract for an EPG in ACI, then doing it in DNA Center is just as easy.

If you’ve never learned ACI before, you can dive right in with new DNA Center training and get started. And when you finally figured out what you’re doing, you can not only use those skills to program your campus LAN. You can extend them into the data center network as well thanks to consistent terminology.

It’s almost like Cisco is trying to introduce a standard set of terms that can be used to describe consistent behaviors across groups of devices for the purpose of cross training engineers. Now, where have we seen that before?

Bye Bye, CLI

Oh yeah. And, while you’re at it, don’t forget that Arista “lost” a copyright case against Cisco for the CLI and didn’t get fined. Even without the legal ramifications, the Cisco-based CLI has been living on borrowed time for quite a while.

APIs and Python make programming networks easy. Provided you know Python, that is. That’s great for DevOps folks looking to pick up another couple of libraries and get those VLANs tamed. But it doesn’t help people that are looking to expand their skillset without leaning an entirely new language. People scared by semicolons and strict syntax structure.

That’s the real reason Cisco is pushing the ACI terminology down into DNA Center and beyond. This is their strategy for finally getting rid of the CLI across their devices. Now, instead of dealing with question marks and telnet/SSH sessions, you’re going to orchestrate policies and EPGs from your central database. Everything falls into place after that.

Maybe DNA Center does some fancy Python stuff on the back end to handle older devices. Maybe there’s even some crazy command interpreters literally force-feeding syntax to an ancient router. But the end goal is to get people into the tools used to orchestrate. And that day means that Cisco will have a central location from which to build. No more archaic terminal windows. No more console cables. Just the clean purity of the user interface built by Insieme and heavily influenced by Cisco UCS Director.

Tom’s Take

Nothing goes away because it’s too old. I still have a VCR in my house. I don’t even use it any longer. It sits in a closet for the day that my wife decides she wants to watch our wedding video. And then I spend an hour hooking it up. But, one of these days I’m going to take that tape and transfer it to our Plex server. The intent is still the same – my wife gets to watch videos. But I didn’t tell her not to use the VCR. Instead, I will give her a better way to accomplish her task. And on that day, I can retire that old VCR to the same pile as the CLI. Because I think the ACI-based terminology that Cisco is focusing on is the beginning of the end of the CLI as we know it.

One Size Never Fits All

Bespoke Off The Rack

Tom’s Take

Share this:

How Stuff Works

Learning The Ropes

Tom’s Take

Share this:

Hold On For One More Day

The Dream Is Still Alive

Tom’s Take

Share this:

The Old Ways

New Hotness

Tom’s Take

Share this:

Not So Random Numbers

Planning Not To Fail

Tom’s Take

Share this:

Are You Ready For The Enterprise

How Complex Are You?

Tom’s Take

Share this:

Built To Survive

Finding Failure Fast

Tom’s Take

Share this:

Speaking For The People

Alexa, Tell Me A Story

Tom’s Take

Share this:

Tom’s Take

Share this:

Devourer Of Interfaces

Bye Bye, CLI

Tom’s Take

Share this: