Automating Your Job Away Isn’t Easy

programming

One of the most common complaints about SDN that comes from entry-level networking folks is that SDN is going to take their job away. People fear what SDN represents because it has the ability to replace their everyday tasks and put them out of a job. While this is nowhere close to reality, it’s a common enough argument that I hear it very often during Q&A sessions. How is it that SDN has the ability to ruin so many jobs? And how is it that we just now have found a way to do this?

Measure Twice

One of the biggest reasons that the automation portion of SDN has become so effective in today’s IT environment is that we can finally measure what it is that networks are supposed to be doing and how best to configure them. Think about the work that was done in the past to configure and troubleshoot networks. It’s often a very difficult task that involves a lot of intuition and guesswork. If you tried to explain to someone the best way to do things, you’d likely find yourself at a loss for words.

However, we’ve had boring, predictable standards for many years. Instead of cobbling together half-built networks and integrating them in the most obscene ways possible, we’ve instead worked toward planning and architecting things properly so they are built correctly from the ground up. No more guess work. No more last minute decisions that come back to haunt us years down the road. Those kinds of things are the basic building blocks for automation.

When something is built along the lines of predictable rules with proper adherence to standards, it’s something that can be understood by a non-human. Going all the way back to Basic Computing 101, the inputs of a system determine the outputs. More simply, Garbage In, Garbage Out. If your network configuration looks like a messy pile of barely operational commands it will only really work when a human can understand what’s going on. Machines don’t guess. They do exactly what they are told to do. Which means that they tend to break when the decisions aren’t clear.

Cut Once

When a system, script, or program can read inputs and make procedural decisions on those inputs, you can make some very powerful things happen. Provided, that is, that your chosen language is powerful enough to do those things. I’m reminded of a problem I worked on fifteen years ago during my internship at IBM. I needed to change the MTU size for a network adapter in the Windows 2000 registry. My programming language of choice wasn’t powerful enough for me to say something like, “Read these values into an array and change the last 2 or 3 to the following MTU”. So instead, I built a nested if statement that was about 15 levels deep to ensure I caught every possible permutation of the adapter binding order. It was messy. It was ugly. And it worked. But there was no way it would scale.

The most important thing to realize about SDN and automation is that we’ve moved past simply understanding basic values. We’ve finally graduated to a place where programs can make complex decisions based on a number of inputs. We’ve graduated from simple if-then-else constructs and up to a point where programs can take a number of inputs and make decisions based on them. Sure, in many cases the inputs are simple little things like tags or labels. But what we’re gaining is the ability to process more and more of those labels. We can create provisioning scripts that ensure that prod never talks to dev. We can automate turn-up of a new switch with multiple VLANs on different ports through the use of labels and object classes. We can even extrapolate this to a policy-based network language that we can use to build a task once and execute it over and over again on different hardware because we’re doing higher level processing instead of being hamstrung by specific device syntax.

Automation is going to cost some people their jobs. That’s a given. Just like every other manufacturing position, the menial tasks of assembling simple pieces or performing repetitive tasks can easily be accomplished by a machine or software construct. But writing those programs and working on those machines is a new kind of job in and of itself. A humorous anecdote from the auto industry says that the introduction of robots onto assembly lines caused many workers to complain and threaten to walk off the job. However, one worker picked up the manual for the robot and realized that he could easily start working on the it instead of the assembly line.


Tom’s Take

Automation isn’t a magic bullet to fix all your problems. It only works if things are ordered and structured in such a way that you can predictably repeat tasks over and over. And it’s not going to stop with one script or process. You need to continue to build, change, and extend your environment. Which means that your job of programming switches should now be looked at in light of building the programs that program switches. Does it mean that you need to forget the basics of networking? No, but it does mean that they way in which you think about them will change.

Is It Really Always The Network?

Keep Calm and Blame the Network

Image from Thomas LaRock

I had a great time over the last month writing a series of posts with my friend John Herbert (@MrTugs) over on the SolarWinds Geek Speak Blog. You can find the first post here. John and I explored the idea that people are always blaming the network for a variety of issues that are often completely unrelated to the actual operation of the network. It was fun writing some narrative prose for once, and the feedback we got was actually pretty awesome. But I wanted to take some time to explain the rationale behind my madness. Why is it that we are always blaming the network?!?

Visibility Is Vital

Think about all the times you’ve been working on an application and things start slowing down. What’s the first thing you think of? If it’s a standalone app, it’s probably some kind of processing lag or memory issues. But if that app connects to any other thing, whether it be a local network or a remote network via the Internet, the first culprit is the connection between systems.

It’s not a large logical leap to make. We have to start by assuming the the people that made the application knew what they were doing. If hundreds of other people aren’t having this problem, it must not be with the application, right? We’ve already started eliminating the application as the source of the issues even before we start figuring out what went wrong.

People will blame the most visible part of the system for issues. If that’s a standalone system sealed off from the rest of the world, it obviously must be the application. However, we don’t usually build these kinds of walled-off systems any longer. Almost every application in existence today requires a network connection of some kind. Whether it’s to get updates or to interact with other data or people, the application needs to talk to someone or maybe even everyone.

I’ve talked before about the need to make the network more of a “utility”. Part of the reason for this is that it lowers the visibility of the network to the rest of the IT organization. Lower visibility means fewer issues being incorrectly blamed on the network. It also means that the network is going to be doing more to move packets and less to fix broken application issues.

Blame What You Know

If your network isn’t stable to begin with, it will soon become the source of all issues in IT even if the network has nothing to do with the app. That’s because people tend to identify problem sources based on their own experience. If you are unsure of that, work on a consumer system helpdesk sometime and try and keep track of the number of calls that you get that were caused by “viruses” even if there’s no indication that this is a virus related issue. It’s staggering.

The same thing happens in networking and other enterprise IT. People only start troubleshooting problems from areas of known expertise. This usually breaks down by people shouting out solutions like, “I saw this once so it must be that! I mean, the symptoms aren’t similar and the output is totally different, but it must be that because I know how to fix it!”

People get uncomfortable when they are faced with troubleshooting something unknown to them. That’s why they fall back on familiar things. And if they constantly hear how the network is the source of all issues, guess what the first thing to get blamed is going to be?

Network admins and engineers have to fight a constant battle to disprove the network as the source of issues. And for every win they get it can come crashing down when the network is actually the problem. Validating the fears of the users is the fastest way to be seen as the issue every time.

Mean Time To Innocence

As John and I wrote the pieces for SolarWinds, what we wanted to show is that a variety of issues can look like network things up front but have vastly different causes behind the scenes. What I felt was very important for the piece was the distinguish the the main character, Amanda, go beyond the infamous Mean Time To Innocence (MTTI) metric. In networking, we all too often find ourselves going so far as to prove that it’s not the network and then leave it there. As soon as we’re innocent, it’s done.

Cross-function teams and other DevOps organizations don’t believe in that kind of boundary. Lines between silos blur or are totally absent. That means that instead of giving up once you prove it’s not your problem, you need to work toward fixing what’s at issue. Fix the problem, not the blame. If you concentrate on fixing the problems, it will soon become noticeable that networking team may not always be the problem. Even if the network is at fault, the network team will work to fix it and any other issues that you see.


Tom’s Take

I love the feedback that John and I have gotten so far on the series we wrote. Some said it feels like a situation they’ve been in before. Others have said that they applaud the way things were handled. I know that the narrative allows us to bypass some of the unsavory things that often happen, like argument and political posturing inside an organization to make other department heads look bad when a problem strikes. But what we really wanted to show is that the network is usually the first to get blamed and the last to keep its innocence in a situation like this.

We wanted to show that it’s not always the network. And the best way for you to prove that in your own organization is to make sure the network isn’t just innocent, but helpful in solving as many problems as possible.

Is The Rise Of SD-WAN Thanks To Ethernet?

Ethernet

SD-WAN has exploded in the market. Everywhere I turn, I see companies touting their new strategy for reducing WAN complexity, encrypting data in flight, and even doing analytics on traffic to help build QoS policies and traffic shaping for critical links. The first demo I ever watched for SDN was a WAN routing demo that chose best paths based on cost and time-of-day. It was simple then, but that kind of thinking has exploded in the last 5 years. And it’s all thanks to our lovable old friend, Ethernet.

Those Old Serials

When I started in networking, my knowledge was pretty limited to switches and other layer 2 devices. I plugged in the cables, and the things all worked. As I expanded up the OSI model, I started understanding how routers worked. I knew about moving packets between different layer 3 areas and how they controlled broadcast storms. This was also around the time when layer 3 switching was becoming a big thing in the campus. How was I supposed to figure out the difference between when I should be using a big router with 2-3 interfaces versus a switch that had lots of interfaces and could route just as well?

The key for me was media types. Layer 3 switching worked very well as long as you were only connecting Ethernet cables to the device. Switches were purpose built for UTP cable connectivity. That works really well for campus networks with Cat 5/5e/6 cabling. Switched Virtual Interfaces (SVIs) can handle a large amount of the routing traffic.

For WAN connectivity, routers were a must. Because only routers were modular in a way that accepted cards for different media types. When I started my journey on WAN connectivity, I was setting up T1 lines. Sometimes they had an old-fashioned serial connector like this:

s-l300

Those connected to external CSU/DSU modules. Those were a pain to configure and had multiple points of failure. Eventually, we moved up in the world to integrated CSU/DSU modules that looked like this:

ehwic-2-ports-t-1-e-1

Those are really awesome because all the configuration is done on the interface. They also take regular UTP cables instead of those crazy V.35 monsters.

cisco_v35_old_large

But those UTP cables weren’t Ethernet. Those were still designed to be used as serial connections.

It wasn’t until the rise of MPLS circuits and Transparent LAN services that Ethernet became the dominant force in WAN connectivity. I can still remember turning up my first managed circuit and thinking, “You mean I can use both FastEthernet interfaces? No cards? Wow!”.

Today, Ethernet dominates the landscape of connectivity. Serial WAN interfaces are relegated to backwater areas where you can’t get “real WAN connectivity”. And in most of those cases, the desire to use an old, slow serial circuit can be superseded by a 4G/LTE USB modem that can be purchased from almost any carrier. It would appear that serial has joined the same Heap of History as token ring, ARCnet, and other venerable connectivity options.

Rise, Ethernet

The ubiquity of Ethernet is a huge boon to SD-WAN vendors. They no longer have to create custom connectivity options for their appliances. They can provide 3-4 Ethernet interfaces and 2-3 USB slots and cover a wide range of options. This also allows them to simplify their board designs. No more modular chassis. No crazy requirements for WIC slots, NM slots, or any other crazy terminology that Cisco WAN engineers are all too familiar with.

Ethernet makes sense for SD-WAN vendors because they aren’t concerned with media types. All their intelligence resides in the software running on the box. They’d rather focus on creating automatic certificate-based IPsec VPNs than figuring out the clock rate on a T1 line. Hardware is not their end goal. It is much easier to order a reference board from Intel and plug it into a box than trying to configure a serial connector and make a custom integration.

Even SD-WAN vendors that are chasing after the service provider market are benefitting from Ethernet ubiquity. Service providers may still run serial connections in their networks, but management of those interfaces at the customer side is a huge pain. They require specialized technical abilities. It’s expensive to manage and difficult to troubleshoot remotely. Putting Ethernet handoffs at the CPE side makes life much easier. In addition, making those handoffs Ethernet makes it much easier to offer in-line service appliances, like those of SD-WAN vendors. It’s a good choice all around.

Serial connectivity isn’t going away any time soon. It fills an important purpose for high-speed connectivity where fiber isn’t an option. It’s also still a huge part of the install base for circuits, especially in rural areas or places where new WAN circuits aren’t easily run. Traditional routers with modular interfaces are still going to service a large number of customers. But Ethernet connectivity is quickly growing to levels where it will eclipse these legacy serial circuits soon. And the advantage for SD-WAN vendors can only grow with it.


Tom’s Take

Ethernet isn’t the only reason SD-WAN has succeeded. Ease of use, huge feature set, and flexibility are the real reasons when SD-WAN has moved past the concept stage and into deployment. WAN optimization now has SD-WAN components. Service providers are looking to offer it as a value added service. SD-WAN has won out on the merits of the technology. But the underlying hardware and connectivity was radically simplified in the last 5-7 years to allow SD-WAN architects and designers to focus on the software side of things instead of the difficulties of building complicated serial interfaces. SD-WAN may not owe it’s entire existence to Ethernet, but it got a huge push in the right direction for sure.

HPE Networking: Past, Present, and Future

hpe_pri_grn_pos_rgb

I had the chance to attend HPE Discover last week by invitation from their influencer team. I wanted to see how HPE Networking had been getting along since the acquisition of Aruba Networks last year. There have been some moves and changes, including a new partnership with Arista Networks announced in September. What follows is my analysis of HPE’s Networking portfolio after HPE Discover London and where they are headed in the future.

Campus and Data Center Divisions

Recently, HPE reorganized their networking division along two different lines. The first is the Aruba brand that contains all the wireless assets along with the campus networking portfolio. This is where the campus belongs. The edge of the network is an ever-changing area where connectivity is king. Reallocating the campus assets to the capable Aruba team means that they will do the most good there.

The rest of the data center networking assets were loaded into the Data Center Infrastructure Group (DCIG). This group is headed up by Dominick Wilde and contains things like FlexFabric and Altoline. The partnership with Arista rounds out the rest of the switch portfolio. This helps HPE position their offerings across a wide range of potential clients, from existing data center infrastructure to newer cloud-ready shops focusing on DevOps and rapid application development.

After hearing Dom Wilde speak to us about the networking portfolio goals, I think I can see where HPE is headed going forward.

The Past: HPE FlexFabric

As Dom Wilde said during our session, “I have a market for FlexFabric and can sell it for the next ten years.” FlexFabric represents the traditional data center networking. There is a huge market for existing infrastructure for customers that have made a huge investment in HPE in the past. Dom is absolutely right when he says the market for FlexFabric isn’t going to shrink the foreseeable future. Even though the migration to the cloud is underway, there are a significant number of existing applications that will never be cloud ready.

FlexFabric represents the market segment that will persist on existing solutions until a rewrite of critical applications can be undertaken to get them moved to the cloud. Think of FlexFabric as the vaunted buggy whip manufacturer. They may be the last one left, but for the people that need their products they are the only option in town. DCIG may have eyes on the future, but that plan will be financed by FlexFabric.

The Present: HPE Altoline

Altoline is where HPE was pouring their research for the past year. Altoline is a product line that benefits from the latest in software defined and webscale technologies. It is technology that utilizes OpenSwitch as the operating system. HPE initially developed OpenSwitch as an open, vendor-neutral platform before turning it over to the Linux Foundation this summer to run with development from a variety of different partners.

Dom brought up a couple of great use cases for Altoline during our discussion that struck me as brilliant. One of them was using it as an out-of-band monitoring solution. These switches don’t need to be big or redundant. They need to have ports and a management interface. They don’t need complexity. They need simplicity. That’s where Altoline comes into play. It’s never going to be as complex as FlexFabric or as programmable as Arista. But it doesn’t have to be. In a workshop full of table saw and drill presses, Altoline is a basic screwdriver. It’s a tool you can count on to get the easy jobs done in a pinch.

The Future: Arista

The Arista partnership, according to Dom Wilde, is all about getting ready for the cloud. For those customers that are looking at moving workloads to the cloud or creating a hybrid environment, Arista is the perfect choice. All of Arista’s recent solution sets have been focused on providing high-speed, programmable networking that can integrate a number of development tools. EOS is the most extensible operating system on the market and is a favorite for developers. Positioning Arista at the top of the food chain is a great play for customers that don’t have a huge investment in cloud-ready networking right now.

The question that I keep coming back to is…when does this Arista partnership become an acquisition? There is a significant integration between the two companies. Arista has essentially displaced the top of the line for HPE. How long will it take for Arista to make the partnership more permanent? I can easily foresee HPE making a play for the potential revenues produced by Arista and the help they provide moving things to the cloud.


Tom’s Take

I was the only networking person at HPE Discover this year because the HPE networking story has been simplified quite a bit. On the one hand, you have the campus tied up with Aruba. They have their own story to tell in a different area early next year. On the other hand, you have the simplification of the portfolio with DCIG and the inclusion of the Arista partnership. I think that Altoline is going to find a niche for specific use cases but will never really take off as a separate platform. FlexFabric is in maintenance mode as far as development is concerned. It may get faster, but it isn’t likely to get smarter. Not that it really needs to. FlexFabric will support legacy architecture. The real path forward is Arista and all the flexibility it represents. The question is whether HPE will try to make Arista a business unit before Arista takes off and becomes too expensive to buy.

Disclaimer

I was an invited guest of HPE for HPE Discover London. They paid for my travel and lodging costs as well as covering event transportation and meals. They did not ask for nor were they promised any kind of consideration in the coverage provided here. The opinions and analysis contained in this article represent my thoughts alone.