Time To Get Back To Basics?

I’ve had some fascinating networking discussions over the past couple of weeks at Dell Technologies World, Interop, and the spring ONUG meeting. But two of them have hit on some things that I think need to be addressed in the industry. Both Russ White and Ignas Bagdonas of the IETF have come to me and talked about how they feel networking professionals have lost sight of the basics.

How Stuff Works

If you walk up to any network engineer and ask them to explain how TCP works, you will probably get a variety of answers. Some will try to explain it to you in basic terms to avoid getting too in depth. Others will swamp you with a technical discussion that would make the protocol inventors proud. But still others will just shrug their shoulders and admit they don’t really understand the protocol.

It’s a common problem when a technology gets to the point of being mature and ubiquitous. One of my favorite examples is the fuel system on an internal combustion engine. On older cars or small engines, the carburetor is responsible for creating the correct fuel and air mixture that is used to power the cylinders. Getting that mix right is half science and half black magic. For the people that know how to do it properly it’s an easy thing that they can do to drive the maxim performance from an engine. For those that have tried it and failed, it’s something best left alone to run at defaults.

The modern engine uses fuel injection. It’s a black box. It can be reprogrammed or tinkered with but it’s something that is tuned in different ways from the traditional carburetor. It’s not something that’s designed to be played around with unless you really know what you’re doing. The technology has reached the point where it’s ubiquitous and easy to use, but very difficult to repair without a specialized skill set.

Most regular car drivers will look under the hood and quickly realize they know nothing about what’s going on. Some technical folks will be able to figure out what certain parts do by observing their behavior. But if you ask those same people how a fuel injection system or carburetor works they’ll probably give you a blank stare.

That’s the issue we find in modern networking. We’ve been creating VLANs and BGP route maps for years. Some people have spent their entire careers tuning multicast or optimizing layer 2 interconnects. But if you corner them and ask them how the protocol works or how best to design an architecture that takes advantage of their life’s work they can’t tell you aside from referencing some old blog post or some vendor’s validated design on their hardware.

Russ and Ignas each touch on something important. In the good old days before there were a hundred certification guides and a thousand SRNDs people had to do real work to find the best solution for a problem. They had to put pencil to paper and sort out the mess before they could make something work. That’s where the engineering side of the network comes from.

Today, it’s more “plug and play”. You drop in pieces of a solution and they should work together. In practice, that usually means all the pieces have to be from the same vendor or from approved partner sources. And anything that goes awry will need a team of experts and many, many consulting hours to figure out.

Imagine if we could only install networks without understanding how they work. Could you see a world where everything we install from a networking perspective is a black box like a fuel injector? That’s the case to a certain degree with cloud networking today. We don’t see what’s going on under the surface. We can only see what the interface exposes to us. That’s fine as long as the applications we are using support the things we’re trying to do with them. But when it comes to being able to fix the network at the level we’re used to seeing it could be difficult if not downright impossible.

Learning The Ropes

But, moreover, are the networking professionals that are configuring these networks even capable of making those changes? Does anyone other than Narbik really understand how EIGRP works? Facebook seems to think that lightweight messaging packets for routing protocols are outdated. So they used ZeroMQ without understanding why that’s a bad idea for slow speed links. They may understand how a routing protocol works in theory, but they don’t completely understand how it’s supposed to work in extreme cases.

Can we teach people the basics and understanding of protocols and design that they need in order to make proper designs outside of vendor reference documents? It’s a tall order for sure. Most blog posts are designed to talk about features or solve problems. Most literature from creators is designed to make their new widget work correctly. Very little documentation exists about integration or design. And a good portion of what does exist is a bit outmoded and needs to be spruced up.

We, as the stewards of networking, need to help this process along. We need to spend more time talking about design and theory. We need to dissect protocols and help people understand how to use the tools they have rather than hoping someone will build the best mousetrap ever to solve each piece of a complicated puzzle. We need to teach people to be thinkers and problem solvers. And, yes, that does mean a bit less complaining about things like vendor code quality and VAR behavior.

Why? Because those people are empowered by a lack of knowledge. Customers aren’t idiots. They have business reasons for the things they do. Our technology needs to support their business needs. Yes, that means we need to think critically about what we’re doing. And yes, they may mean eating our words now and then to avoid a showdown about something that’s ultimately unimportant in the long run.

If we increase the amount of knowledge about the important topics like design and protocols it should make the overall level of understanding go up for everyone. That means better designs. More integrated technology. Less reliance on being force-fed the bare minimum information necessary to make things work. And that means things will run faster and much more smoothly. Which is a win for everyone.


Tom’s Take

I’ll be the first to admit that I don’t know the dirty mechanics of Frame Relay switching or how to tune OSPF Hello timers for non-standard link types. It’s a skill I don’t use every day. But I know where to find them if I need them. And I know that it can help in certain situations where you see odd protocol behavior. Likewise, I know that if I need to go design a network for someone I need to get a lot of information about business needs and build the best network I can with the constraints that I have. Things like budget and time are universal. But one of those constraints shouldn’t be lack of knowledge about protocols and good design. Those are two things that should be ingrained into anyone that wants to have a title of “senior” anything.

Transitioning Away From Legacy IT

One of the more exciting things I saw at Dell Technologies World this week was the announcement by VMware that they are supporting Microsoft Azure now in additional to AWS. It’s interesting because VMware is trying to provide a proven, stable migration path for companies that are wanting to move to the cloud but still retain their investments in VMware and legacy virtualization. But is offing legacy transition a good idea?

Hold On For One More Day

If I were to mention VLAN 1002-1005 to networking people, they would likely jump up and tell me that I was crazy. Because those VLANs are not valid on any Cisco switches save for the Nexus line. But why? What makes these forbidden? Unless you’re studying for your CCIE you probably just know these are bad and move on.

Turns out, they are a legacy transition mechanism from the IOS-SX days. 1002 and 1004 were designed to bridge FDDI-to-Ethernet, and 1003 and 1005 did the same for Token Ring. As Greg Ferro points out here, this code was tightly bound into IOS-SX and likely couldn’t be removed for fear of breaking the OS. The reservation continued forward in all IOS branches except NX-OS, which pulled them due to lack of support for those protocols.

So, we’ve got a legacy transition mechanism causing problems for users well past the “use by” date. Token Ring was on the way out at IBM in 2001. And yet, for some reason seventeen years later I still have to worry about bridging it? Or how about the rumors that Windows skipped from version 8 to version 10 because legacy code bases assumed Windows 9 meant Windows 95? Something 23 years old forced a major version change?

We keep putting legacy bridges in place all the time to help migrate things. Virtualization isn’t the only culprit here. We’ve found all manner of things that we can do to “trick” systems into working with modern hardware. We even made one idea into a button. But we never really solve the underlying issues. We just keep creating workarounds until we’re forced to move.

The Dream Is Still Alive

As it turns out, it’s expensive to refactor code bases and update legacy software to support new hardware. We’ve hit this problem time and time again with all manner of products. I can remember when Cisco CallManager wouldn’t install on a spare server I had with the same model number as a support machine just because the CPU was exactly 100MHz too fast. It’s frustrating to say the least.

But, we also have to realize that legacy transition mechanisms are not permanent fixes. It’s right there in the name. Transition. We put them in place because it’s cheaper in the short term while we investigate long term methods to make everything work correctly. But it’s still important to find those long term solutions. Maybe it’s a new application. Or a patch to make it work with new hardware. Sometimes, as Apple has done, it’s a warning that old software will stop working soon.

As developers, it’s important to realize that your app may last long past the date you want to stop supporting it. If you could still install Office 2000 on a desktop, I’m almost positive that someone would try it. We still have ways to install and use DOS software! If you want to ensure that your software is being used correctly and that you aren’t issuing patches for it after you’ve retired to a comfortable island with no Internet connection, make sure you find a way to ease transitions and offer new connection options to users.

For those of you that are still stuck in the morass of supporting legacy software or hardware, take a look at what you’re using it for and try to make hard choices where appropriate. If your organization is moving to the cloud, maybe now is the time to cut off your support for an application that’s too old to survive the migration. Or maybe it’s time to retire the Domain Controller That Time Forgot. But you have to do it before you’re forced to virtualize it and support it in perpetuity in AWS or Azure.


Tom’s Take

I’ll be the first to admit that legacy hardware and software are really popular. I worked with a company one time that still had an AS/400 admin on staff because of one application. It just happened to the one that paid people. At Interop ITX this year, the CIO for Detroit mentioned that they had to bring a developer out of retirement to make sure people kept getting paid. But legacy can’t be a part of the future. You either need to find a way to move what you have while you look for something better or you need to cut it off at the knees and find a way to make those functions work somewhere else. Because you don’t want to be the last company running AS/400 payroll over token ring bridged to a Cisco switch on VLAN 1003.

A Review of RSA Conference

So, I recently went to my first RSA Conference. It’s something I’ve had on my radar for a while but never had the opportunity to do. However, with Security Field Day coming up later this year I thought it was high time I went to see what everything was about. Here are some ideas that I came up with during my pilgrimage to the big security conference.

  • It’s Huge. Like, really big. I’ve never seen a bigger conference before. I haven’t gone to Oracle OpenWorld or Dreamforce, but the size of the RSA show floor alone dwarfs anything I’ve seen. Three whole areas, including one dedicated to emerging vendors. That’s big. Almost too big in fact.
  • I Still Hate Moscone. It’s official. No conference should ever use this place again. It’s been 4 years since I railed against it and every word still applies. Doubly so this year, as RSA was being held during construction! Seriously. At this point, Moscone must be paying people to hold a convention there. RSA is too big. I don’t care if it’s cheap to ferry people up from Silicon Valley. Stop doing this to yourself and tarnishing your brand. Just go to Vegas if you want to stay close.
  • Everyone Has A Specialization. Maybe you’re developing a cloud plugin for lateral evasion detection. Or an endpoint hardening solution for mobile. Maybe you’re into detection, intrusion, pen testing, or perhaps just training. Everyone had representation of some kind or another. Everyone solved a problem in a unique or different way. But everyone had their niche. More than any other place I’ve been I saw companies doing all manner of things that are so specialized that I worry they’re either going to go out of business before they can develop a customer base or they’re going to get bought to add a feature onto someone’s existing platform. May fortune favor you all.
  • You Can’t Be Secure Enough. Funny enough, when the RSA Mobile app gets hacked you know someone’s having a bad day. The truth is that so many people out there are going to be taking shots at you when you’re visibility is high that you need to be prepared to take them. You can’t just assume you’re obscure enough to avoid detection. If Shodan has taught us anything, it’s that obscurity isn’t good enough any longer. You need to be looking at your posture and your weak areas. You need to be prepared to pare everything down to the point where no one can get in easily. Defense in depth needs to be very deep today.

Tom’s Take

RSA is similar to Cisco Live and VMworld, but it’s also much different. It’s bigger. People are more reserved. There are a TON of expo passes everywhere. I think I counted maybe 50 full conference passes the whole week. It’s too big for Moscone for sure. When the press room is located a block and half away from the convention center proper, you might need to rethink your strategy for a conference. But, RSA is still full of exciting companies and new ideas. And that’s going to keep it rolling along for a while no matter how many times the mobile app gets “hacked”.

It’s Time For Security Apprenticeships

Breaking into an industry isn’t easy. When you look at the amount of material that is necessary to learn IT skills it can be daunting and overwhelming. Don’t let the for-profit trade school ads fool you. You can’t go from ditch digger to computer engineer in just a few months. It takes time and knowledge to get there.

However, there is one concept in non-technical job roles that feels very appropriate to how we do IT training, specifically for security. And that’s the apprenticeship.

Building For The Future

Apprenticeship is a standard for electricians and carpenters. It’s the way that we train new people to do the work of the existing workforce. It requires time and effort and a lot of training. But, it also fixes several problems with the current trend of IT certification:

  1. You Can’t Get a Job Without Experience – Far too often we see people getting rejected for jobs at the entry level because they have no experience. But how are they supposed to get the experience without doing the job? IT roles paradoxically require you to be cheap enough to hire for nothing but expect you to do the job on day one. Apprenticeships fix this by showing that you’re willing to do the job and get trained at the same time. That way you can be properly trained on the job.
  2. You Need To Be Certified – Coupled tightly with the first problem is the certification trend. Yes, electricians and plumbers are certified to do the work. After 5 years of training. You don’t get to take a test after 3 months and earn Journeyman status. You have to earn it through training and work. It also ensures that every journeyman out there has the same basic level of work experience and training. Compare that with people getting a CCNA right out of college with no experience on devices or a desktop administrator position having an MCSE-level requirement to discourage people from applying.
  3. You Can’t Find a Mentor – This is a huge issue for entry-level positions. The people you work with are often so overtasked with work that they don’t have time to speak to you, let alone show you the ropes. By defining the role of the apprentice and the journeyman trainer, you can ensure that the mentoring relationship exists and is reinforced constantly through the training period. You don’t get through an apprenticeship in a vacuum.
  4. You Can’t Get Paid To Learn – This is another huge issue with IT positions. Employers don’t want to pay to train you. They want you to use the vast powers of the Internet to look up hacks on Stack Overflow and paste them into production equipment rather than buying a book for you or letting you take a class. Some employers are better than others about highly skilled workers, but those employers are usually leveraging those skills to make more money, like in a VAR or MSP. Apprenticeships fix this problem by including a classroom component in the program. You spend time in a classroom every other week while you’re doing the work.

Securing the Future

The apprenticeship benefits above go double for security-related professions. Whereas there is a lot of material to train networking and storage admins, security is more of a game of cat-and-mouse. As new protections are developed, we must always figure out of the old protections are still useful. We need to train people how to learn. We need to bring them up to speed on what works and what doesn’t and why googling for an old article about packet filtering firewalls isn’t as relevant in 2018 as it was a decade ago.

More than that, we need to get the security practitioners to a place where they can teach effectively and mentor people. We need to get away from the idea of just rattling off a long list of recommended protections to people as the way to train them about security. We need to make sure we’re teaching fundamentals and proper procedures. And they only way that can happen is with someone that has the time and motivation to learn. Like an apprentice.

This would also have the added luxury of ensure the next generation of security practitioners learns the pitfalls before they make mistakes that compromise the integrity of the data they’re protecting. Making a mistake in security usually means someone finding out something they aren’t supposed to know. Or data getting stolen and used for illicit means. By using apprenticeships as a way for people to grow, you can quickly bring people up to speed in a safe environment without worrying that their decisions will lead to problems with no one to check their work.


Tom’s Take

I got lucky in my IT career. I worked with two mentors that spent the time to train me like an apprentice. They showed me the ropes and made sure I learned things the right way. And I became successful because of their leadership. Today, I don’t see that mentality. I see people covering their own jobs or worried that they’re training a replacement instead of a valued team member. I worry that we’re turning out a generation of security, networking, and storage professionals that won’t have the benefit of knowing what to do beyond Google searches and won’t know how to teach the coming generation how to deal with the problems they face. It’s not too late to start by taking an apprentice under your wing today.

On Old Configs and Automation

I used to work with a guy that would configure servers for us and always include an extra SCSI card in the order. When I asked him about it one day, he told me, “I left it out once and it delayed the project. So now I just put them on every order.” Even after I explained that we didn’t need it over and over again, he assured me one day we might.

Later, when I started configuring networking gear I would always set a telnet password for every VTY line going into the switch. One day, a junior network admin asked me why I configured all 15 instead of just the first 5 like they learn in the Cisco guides. I shrugged my shoulders and just said, “That’s how I’ve always done it.”

The Old Ways

There’s no more dangerous phrase than “That’s the way it’s always been.”

Time and time again we find ourselves falling back on the old rule of thumb or an old working configuration that we’ve made work for us. It’s comfortable for the human mind to work from a point of reference toward new things. We find ourselves doing it all the time. Whether it’s basing a new configuration on something we’ve used before or trying to understand a new technology by comparing it to something we’ve worked on in the past.

But how many times have those old configurations caused us grief? How many times have we been troubleshooting a problem only to find that we configured something that shouldn’t have been configured in the way that it was. Maybe it was an old method of doing hunt groups. Or perhaps it was a legacy QoS configuration that isn’t supported any more.

Part of our issue as networking professionals is that we want to concentrate on the important things. We want to implement solutions and ideas that work for our needs and not concentrate on the minutia of configuration. Sure, the idea of configuration a switch from bare metal to working config is awesome the first time you do it. But the fifteenth time you have to configure one in a row is less awesome. That’s why copy-and-paste configurations are so popular with people that just want to get the job done.

New Hotness

This idea of using old configurations for new things takes even more importance when you start replacing the boring old configuration methods with new automation and API-driven configuration models. Yes, APIs make it a lot easier to configure a switch programmatically. And automation tools like Puppet and Ansible make it much faster to bring a switch online from nothing to running in the minimum amount of time.

However, even with this faster configuration method, are we still using old, outdated configurations to solve problems? Sure, I don’t have to worry about configuring VLANs on the switch one at a time. But if my configuration is still referencing VLANs that are no longer in the system that makes it very difficult to keep the newer switches running optimally. And that’s just assuming the configuration is old and outdated. What if we’re still using deprecated commands?

APIs are great because they won’t support unsupported things. But if we don’t scrub the configuration now and then to remove these old lines and protocols we’ll quickly find ourselves in a world of trouble. Because those outdated and broken things will bring the API to a halt. Yes, the valid commands will still be entered correctly, but if those valid commands rely on something invalid to work properly you’re going find things broken very fast.

What makes this whole thing even more irritating is that most configurations need to be validated and approved before being implemented. Which makes the whole process even more difficult to manage. Part of the reason why old configs live for so long is that they need weeks or months of validation to be implemented effectively. When new platforms or configuration methods crop up it could delay new equipment installation. This sometimes leads to people installing new gear with “approved” configs that might not be the best fit in order to get that new equipment into service.


Tom’s Take

So, how do we fix all this? What’s the trick? Well, it’s really a combination of things. We need to make sure we audit configs regularly to keep the old stuff from continuing on past the expiration dates. We also need to continually resubmit new configurations to the approvals process. Just like disaster recovery documentation, configurations are living, breathing documents that should always be current and refreshed. The more current your configurations, the less likely you are to have old cruft keeping your systems running at subpar performance. And the less likely you are to have to worry about breaking new things like APIs and automation systems in the future.

Reclaiming 1.1.1.1 For The Internet

Hopefully by now you’ve seen the announcement that CloudFlare has opened a new DNS service at the address of 1.1.1.1. We covered a bit of it on this week’s episode of the Gestalt IT Rundown. Next to Gmail, it’s probably the best April Fool’s announcement I’ve seen. However, it would seem that the Internet isn’t quite ready for a DNS resolver service that’s easy to remember. And that’s thanks in part to the accumulation of bad address hygiene.

Not So Random Numbers

The address range of 1/8 is owned by APNIC. They’ve had it for many years now but have never announced it publicly. Nor have they ever made any assignments of addresses in that space to clients or customers. In a world where IPv4 space is at a premium, why would a RIR choose to lose 16 million addresses?

Edit: As pointed out by Dale Carder of ES.net in a comment below, APNIC has been assigning address space out of 1 /8 since 2010. However, the most commonly leaked prefixes in that subnet that are difficult to assign because of bogus announcements come from 1.0.0.0/14.

As it turns out, 1/8 is a pretty bad address space for two reasons. 1.1.1.1 and 1.2.3.4. These two addresses are responsible for most of the inadvertent announcements in the entire 1/8 space. 1.2.3.4 is easy to figure out. It’s the most common example IP address given when talking about something. Don’t believe me? Google is your friend. Instead of using 192.0.2.0/24 like we should be using, we instead use the most common PIN, password, and luggage combination in the world. But, at least 1.2.3.4 makes sense.

Why is 1.1.1.1 so popular? Well, the first reason is thanks to Airespace wireless controllers. Airespace uses 1.1.1.1 as the default virtual interface address for just about everything. Here’s a good explanation from Andrew von Nagy. When Airespace was sold to Cisco, this became a very popular address for Cisco wireless networks. Except now that it’s in use as a DNS resolver there are issues with using it. The wireless experts I’ve talked to recommend changing that address to 192.0.2.1, since that address has been marked off for examples only and will never be globally routable.

The other place where 1.1.1.1 seems to be used quite frequently is in Cisco ASA failover interfaces. Cisco documentation recommended using 1.1.1.1 for the primary ASA failover and 1.1.1.2 as the secondary interface. The heartbeats between those two interfaces were active as long as the units were paired together. But, if they were active and reachable then any traffic destined for those globally routable addresses would be black holed. Now, ASAs should probably be using 192.0.2.1 and 192.0.2.2 instead. But beware that this will likely require downtime to reconfigure.

The 1.1.1.1 address confusion doesn’t stop there. Some systems like Nomadix use them as the default logout address. Vodafone used to use it as an image caching server. ISPs are blocking it upstream in some ACLs. Some security organizations even recommend dropping traffic to 1/8 as a bogon prevention measure. There’s every chance that 1.1.1.1 is going to be dropped by something in your network or along the transit path.

Planning Not To Fail

So, how are you going to proceed if you really, really want to use CloudFlare DNS? Well, the first step is to make sure that you don’t have 1.1.1.1 configured anywhere in your network. That means checking WLAN controllers, firewalls, and example configurations. Odds are good you’re running RFC1918 space. But you should try to ping 1.1.1.1 anyway. If you can ping it, then you should traceroute the address. If the traceroute leaves your local network, you probably have a good path.

Once you’ve determined that you’re capable of reaching 1.1.1.1, you need to test it first. Configure it on a test machine or VM and make sure it’s actually resolving addresses for you. Better safe than sorry. Once you know it’s really working like you want it to work, configure it on your internal DNS servers as a forwarder. You still want internal control of DNS thanks to things like Active Directory. But configuring it as a forwarder means you can take advantage of all the features CloudFlare is building into the system while still retaining anything you’ve done locally.


Tom’s Take

I’m glad CloudFlare and APNIC are reclaiming 1.1.1.1 for some useful purpose. CloudFlare can take the traffic load of all the horribly misconfigured systems in the world. APNIC can use this setup to do some analytics work to find out exactly how screwed up things are. I wouldn’t be shocked to see something similar happen to 1.2.3.4 in the future if this bet pays off.

I’ve been using 1.1.1.1 since April 2nd and it works. It’s fast and hasn’t broken yet, which is the best that you can hope for from a DNS server. I’m sure I’ll play around with some of the advanced features as they come online but for now I’m just happy that one of the most recognizable IP addresses in the world is working for me.

Wireless Thoughts From Aruba Atmosphere

I just got back from Aruba Atmosphere this week and I thought it would be a good chance to go over some of the cool stuff that I saw there.

  • Rasa is now Aruba NetInsights. That platform is going to be a big one for Aruba in the future. There’s a lot of information that is being gleaned from installations and it’s fueling some hard looks at best practices and such. Also funny that it’s being installed primarily in university campuses to profile coverage and client capabilities. Those are usually pretty hostile environments for users and administrators alike.
  • The security pieces that were shown off were also very interesting. The idea of port profiles has always made me a bit skeptical, but the way that Aruba is doing actual traffic profiling makes me think they have it this time. It’s also really cool that it can be done with non-managed devices in the middle. I think the key is that Aruba is doing actual traffic profiling instead of just looking at the basics behind the packets, like ports or VLANs. Real, automatic port security could be a huge win for places that need on-the-fly access to rapidly changing conditions. Like, say, university campuses.
  • Live demos rock when your technology is solid enough that you don’t have to worry about it blowing up on stage. We’re past the point of the wireless network blowing up the iPhone 4 for Steve Jobs. In fact, the best part was that the demo was slower because the demo guy had an extra espresso shot and the camera couldn’t hold still!

Look for some interesting discussion from Tech Field Day and stay tuned to hear more about Atmosphere from the delegates!

Is Patching And Tech Support Bad? A Response

Hopefully, you’ve had a chance to watch this 7 minute video from Greg Ferro about why better patching systems can lead to insecure software. If you haven’t, you should:

Greg is right that moral hazard is introduced because, by definition, the party providing the software is “insured” against the risks of the party using the software. But, I also have a couple of issues with some of the things he said about tech support.

Are You Ready For The Enterprise

I’ve been working with some Ubiquiti access points recently. So far, I really enjoy them and I’m interested to see where their product is going. After doing some research, the most common issue with them seems to be their tech support offerings. A couple of Reddit users even posted in a thread that the lack of true enterprise tech support is the key that is keeping Ubiquiti from reaching real enterprise status.

Think about all the products that you’ve used over the last couple of years that offered some other kind of support aside from phone or rapid response. Maybe it was a chat window on the site. Maybe it was an asynchronous email system. Hell, if you’ve ever installed Linux in the past on your own you know the joys of heading to Google to research an issue and start digging into post after post on a long-abandoned discussion forum. DenverCoder9 would agree.

By Greg’s definition, tech support creates negative incentive to ship good code because the support personnel are there to pick up the pieces for all your mistakes when writing code. It even incentivizes people to set unrealistic goals and push code out the door to hit milestones. On the flip side, try telling people your software is so good that you don’t need tech support. It will run great on anything every time and never mess up once. If you don’t get laughed out of the building you should be in sales.

IT professionals are conditioned to need tech support. Even if they never call the number once they’re going to want the option. In the VAR world, this is the “throat to choke” mentality. When the CEO is coming down to find out why the network is down or his applications aren’t working, they are usually not interested in the solution. They want to yell at someone to feel like they’ve managed the problem. Likewise, tech support exists to find the cause of the problem in the long run. But it’s also a safety blanket that exists for IT professionals to have someone to yell at after the CEO gets to them. Tech Support is necessary. Not because of buggy code, but because of peace of mind.

How Complex Are You?

Greg’s other negative aspect to tech support is that it encourages companies to create solutions more complex than they need to be because someone will always be there to help you through them. Having done nationwide tech support for a long time, I can tell you that most of the time that complexity is there for a reason. Either you hide it behind a pretty veneer or you risk having people tell you that your product is “too simple”.

A good example case here is Meraki. Meraki is a well-known SMB/SME networking solution. Many businesses run on Meraki and do it well. Consultants sell Meraki by the boatload. You know who doesn’t like Meraki? Tinkerers. People that like to understand systems. The people that stay logged into configuration dashboards all day long.

Meraki made the decision when they started building their solution that they weren’t going to have a way to expose advanced functionality in the system to the tinkerers. They either buried it and set it to do the job it was designed to do or they left it out completely. That works for the majority of their customer base. But for those that feel they need to be in more control of the solution, it doesn’t work at all. Try proposing a Meraki solution to a group of die-hard traditionalist IT professionals in a large enterprise scenario. Odds are good you’re going to get slammed and then shown out of the building.

I too have my issues with Meraki and their ability to be a large enterprise solution. It comes down to choosing to leave out features for the sake of simplicity. When I was testing my Ubiquiti solution, I quickly found the Advanced Configuration setting and enabled it. I knew I would never need to touch any of those things, yet it made me feel more comfortable knowing I could if I needed to.

Making a solution overly complex isn’t a design decision. Sometimes the technology dictates that the solution needs to be complex. Yet, we knock solutions that require more than 5 minutes to deploy. We laud companies that hide complexity and claim their solutions are simple and easy. And when they all break we get angry because, as tinkerers, we can’t go in and do our best to fix them.


Tom’s Take

In a world where software in king, quality matters. But, so does speed and reaction. You can either have a robust patching system that gives you the ability to find and repair unforeseen bugs quickly or you can have what I call the “Blizzard Model”. Blizzard is a gaming company that has been infamous for shipping things “when they’re done” and not a moment before. If that means a project is 2-3 years late then so be it. But, when you’re a game company and this is your line of work you want it to be perfect out of the box and you’re willing to forego revenue to get it right. When you’re a software company that is expected to introduce new features on a regular cadence and keep the customers happy it’s a bit more difficult.

Perhaps creating a patching system incentivizes people to do bad things. Or, perhaps it incentivizes them to try new things and figure out new ideas without the fear that one misstep will doom the product. I guess it comes down to whether or not you believe that people are inherently going to game the system or not.

When Redundancy Strikes

Networking and systems professionals preach the value of redundancy. When we tell people to buy something, we really mean “buy two”. And when we say to buy two, we really mean buy four of them. We try to create backup routes, redundant failover paths, and we keep things from being used in a way that creates a single point of disaster. But, what happens when something we’ve worked hard to set up causes us grief?

Built To Survive

The first problem I ran into was one I knew how to solve. I was installing a new Ubiquiti Security Gateway. I knew that as soon as I pulled my old edge router out that I was going to need to reset my cable modem in order to clear the ARP cache. That’s always a thing that needs to happen when you’re installing new equipment. Having done this many times, I knew the shortcut method was to unplug my cable modem for a minute and plug it back in.

What I didn’t know this time was that the little redundant gremlin living in my cable modem was going to give me fits. After fifteen minutes of not getting the system to come back up the way that I wanted, I decided to unplug my modem from the wall instead of the back of the unit. That meant the lights on the front were visible to me. And that’s when I saw that the lights never went out when the modem was unplugged.

Turns out that my modem has a battery pack installed since it’s a VoIP router for my home phone system as well. That battery pack was designed to run the phones in the house for a few minutes in a failover scenario. But it also meant that the modem wasn’t letting go of the cached ARP entries either. So, all my efforts to make my modem take the new firewall were being stymied by the battery designed to keep my phone system redundant in case of a power outage.

The second issue came when I went to turn up a new Ubiquiti access point. I disconnected the old Meraki AP in my office and started mounting the bracket for the new AP. I had already warned my daughter that the Internet was going to go down. I also thought I might have to reprogram her device to use the new SSID I was creating. Imagine my surprise when both my laptop and her iPad were working just fine while I was hooking the new AP up.

Turns out, both devices did exactly what they were supposed to do. They connected to the other Meraki AP in the house and used it while the old one was offline. Once the new Ubiquiti AP came up, I had to go upstairs and unplug the Meraki to fail everything back to the new AP. It took some more programming to get everything running the way that I wanted, but my wireless card had done the job it was supposed to do. It failed to the SSID it could see and kept on running until that SSID failed as well.

Finding Failure Fast

When you’re trying to troubleshoot around a problem, you need to make sure that you’re taking redundancy into account as well. I’ve faced a few problems in my life when trying to induce failure or remove a configuration issue was met with difficulty because of some other part of the network or system “replacing” my hard work with a backup copy. Or, I was trying to figure out why packets were flowing around a trouble spot or not being inspected by a security device only to find out that the path they were taking was through a redundant device somewhere else in the network.

Redundancy is a good thing. Until it causes issues. Or until it makes your network behave in such a way as to be unpredictable. Most of the time, this can all be mitigated by good documentation practices. Being able to figure out quickly where the redundant paths in a network are going is critical to diagnosing intermittent failures.

It’s not always as easy as pulling up a routing table either. If the entire core is down you could be seeing traffic routing happening at the edge with no way of knowing the redundant supervisors in the chassis are doing their job. You need to write everything down and know what hardware you’re dealing with. You need to document redundant power supplies, redundant management modules, and redundant switches so you can isolate problems and fix them without pulling your hair out.


Tom’s Take

I rarely got to work with redundant equipment when I was installing it through E-Rate. The government doesn’t believe in buying two things to do the job of one. So, when I did get the opportunity to work with redundant configurations I usually found myself trying to figure out why things were failing in a way I could predict. After a while, I realized that I needed to start making my own notes and doing some investigation before I actually started troubleshooting. And even then, like my cable modem’s battery, I ran into issues. Redundancy keeps you from shooting yourself in the foot. But it can also make you stab yourself in the eye in frustration.

Cisco Live CAE and Guest Keynote Announcements

As you may have heard by now, there have been a few exciting announcements from Cisco Live 2018 regarding the venue for the customer appreciation event and the closing keynote speakers.

Across The Universe

The first big announcement is the venue for the CAE. When you’re in Orlando, there are really only two options for the CAE. You either go to the House of the Mouse or you go to Universal Studios. The last two times that Cisco Live has gone to Orlando it has been to Universal. 2018 marks the third time!

Cisco is going big this year. They’ve rented the ENTIRE Universal Studios park. Not just the backlot. Not just the side parks. They WHOLE thing. You can get your fix on the Transformers ride, visit Harry Potter, or even partake of some of the other attractions as well. It’s a huge park with a lot of room for people to spread out and enjoy the scenery.

That’s not all. The wristband that gets you into the CAE also gets you access to Islands of Adventure before the full park opens! You can pregame the party by hanging out at Hogwarts, going to Jurassic Park, or joining your favorite superheroes for a picture or two for the kids. Access to Islands of Adventure isn’t exclusive, so you’ll be there with all the other tourists from around the world but it’s a great place to hang out before the party gets going!

Note that this year you will need the new Imagine pass or the Party Pass Add-on in order to access the CAE. There is no standalone social pass option or social add-on for conference passes.

Welcome To The Future

The closing keynote speakers have also been announced. Dr. Michio Kaku and Amy Webb will be on stage talking about the future of technology and how it will be impacting our society. Given the keynote that Rowan Trollope delivered during Cisco Live Barcelona, this comes as no surprise to me.

Cisco is very much trying to show that they are getting back on the leading edge of technology and driving innovation in the market. The problem with being the “800lb Gorilla” is that you’re also big and difficult to move. IBM faced the same problem before they shed their legacy and became leaner, more future-focused company. Others that tried to follow in their footsteps were less successful and either split apart or got scooped up in mergers.

Cisco is going through a transition period after the departure of John Chambers. Chuck Robbins is turning the ship as quickly as possible, but there need to be more outwards signs that things are being done to look toward a future where hardware isn’t as important as the innovation happening in software. By bringing in two of the most well known futurists in science and technology, Cisco is sending a signal to their audience of users and investors that the focus is going to be on emerging technology. This is a bit of a gamble for Cisco but it’s hoped that things pay off for them.

Note that there are also going to be other speakers in the Big Ideas Theater on the World of Solutions floor during the event. Access to the World of Solutions is restricted behind the new Imagine pass or full conference pass. There is no Social Pass option, and the party pass add-on does not grant access to the World of Solutions floor.


Tom’s Take

The Cisco Live CAE in Orlando is pretty much a known thing. It’s nice to see all of Universal this year with access to the new attractions at Islands of Adventure. People should be able to enjoy being outside in the Florida humidity instead of the blistering Las Vegas inferno. As well, the rides are going to be fun for a large number of the attendees.

It’s also good to see future-looking keynote speakers that are going to give their viewpoints on things that will impact our lives. With two speakers, I’m expecting another “interview” style closing keynote, which isn’t quite my favorite. But this is a step in the right direction. Here’s hoping that these additions to the event make Cisco Live a great show for those that will be attending.