SD-WAN and Technical Debt

Back during Networking Field Day 22, I was having a fun conversation with Phil Gervasi (@Network_Phil) and Carl Fugate (@CarlFugate) about SD-WAN and innovation. I mentioned that it was fascinating to see how SD-WAN companies kept innovating but that bigger, more established companies that had bought into SD-WAN seemed to be having issues catching up. As our conversation continued I realized that technical debt plays a huge role in startup culture in all factors, not just with SD-WAN. However, SD-WAN is a great example of technical debt to talk about here.

Any Color You Want In Black

Big companies have investments in supply chains. They have products that are designed in a certain way because it’s the least expensive way to develop the project or it involves using technology developed by the company that gives them a competitive advantage. Think about something like the Cisco Nexus 9000-series switches that launched with Cisco ACI. Every one of them came with the Insieme ASIC that was built to accelerate the policy component of ACI. Whether or not you wanted to use ACI or Insieme in your deployment, you were getting the ASIC in the switch.

Policies like this lead to unintentional constraints in development. Think back five years to Cisco’s IWAN solution. It was very much the precursor to SD-WAN. It was a collection of technologies like Performance Routing (PfR), Application Visibility Control (AVC), Policy Based Routing (PBR), and Network Based Application Recognition (NBAR). If that alphabet soup of acronyms makes you break in hives, you’re not alone. Cisco IWAN was a platform very much market by potential and complexity.

Let’s step back and ask ourselves an important question: “Why?” Why was IWAN so complicated? Why was IWAN hard to deploy? Why did IWAN fail to capture a lot of market share and ride the wave that eventually became SD-WAN? Looking back, a lot of the choices that were made that eventually doomed IWAN can come down to existing technical debt. Cisco is a company that makes design decisions based on what they’ve been doing for a while.

I’m sure that the design criteria for IWAN came down to two points:

  1. It needs to run on IOS.
  2. It needs to be an ISR router.

That doesn’t sound like much. But imagine the constraints you run into with just those two limitations. You have a hardware platform that may not be suited for the kind of work you want to do. Maybe you want to take advantage of x86 chipset acceleration. Too bad. You have to run what’s in the ISR. Which means it could be underpowered. Or incapable of doing things like crypto acceleration for VPNs, which is important for building a mesh of encrypted tunnels. Or maybe you need some flexibility to build a better detection platform for applications. Except you have to use IOS. Which uses NBAR. And anything you write to extend NBAR has to run on their platforms going forward. Which means you need to account for every possible permutation of hardware that IOS runs on. Which is problematic at best.

See how technical debt can creep in from the most simplistic of sources? All we wanted to do was build a platform to connect WANs together easily. Now we’re mired in a years-old hardware choice and an aging software platform that can’t help us do what needs to be done. Is it any wonder why IWAN didn’t succeed in the original form? Or why so many people involved with the first generation of SD-WAN startups were involved with IWAN, even if just tangentially?

Debt-Free Development

Now, let’s look at a startup like CloudGenix, who was a presenter at Networking Field Day 22 and was recently acquired by Palo Alto Networks. They started off on a different path when they founded the startup. They knew what they wanted to accomplish. They had a vision for what would later be called SD-WAN. But instead of shoehorning it into an existing platform, they had the freedom to build what they wanted.

No need to keep the ISR platform? Great. That means you can build on x86 hardware to make your software more universally deployable on a variety of boxes. Speaking of boxes, using commercial off-the-shelf (COTS) equipment means you can buy some very small devices to run the software. You don’t need a system designed to use ATM modules or T1 connections. If all you little system is ever going to use is Ethernet there’s no reason to include expansion at all. Maybe USB for something like a 4G/LTE modem. But those USB ports are baked into the board already.

A little side note here that came from Olivier Huynh Van of Gluware. You know the USB capabilities on a Cisco ISR? Yeah, the ISR chipset didn’t support USB natively. And it’s almost impossible to find USB that isn’t baked into an x86 board today. So Cisco had to add it to the ISR in a way that wasn’t 100% spec-supported. It’s essentially emulated in the OS. Which is why not every USB drive works in an ISR. Take that for what’s it’s worth.

Back to CloudGenix. Okay, so you have a platform you can build on. And you can build software that can run on any x86 device with Ethernet ports and USB devices. That means your software doesn’t need to do complicated things. It also means there are a lot of methods already out there for programming network operating systems for x86 hardware, such as Intel’s Data Plane Development Kit (DPDK). However CloudGenix chose to build their OS, they didn’t need to build everything completely from scratch. Even if they chose to do it there are still a ton of resources out there to help them get started. Which means you don’t have to restart your development every time you need to add a feature.

Also, the focus on building the functions you want into an OS you can bend to your needs means you don’t need to rely on other teams to build pieces of it. You can build your own GUI. You can make it look however you want. You can also make it operate in a manner that is easiest for your customer base. You don’t need to include every knob or button or bell and whistle. You can expose or hide functions as you wish. Don’t want customers to have tons of control over VPN creation or certificate authentication? You don’t need to worry about the GUI team exposing it without your permission. Simple and easy.

One other benefit of developing on platforms without technical debt? It’s easy to port your software from physical to virtual. CloudGenix was already successful in porting their software to run from physical hardware to the cloud thanks to CloudBlades. Could you imagine trying to get the original Cisco IWAN running in a cloud package for AWS or Azure? If those hives aren’t going crazy right now I’m sure you must have nerves or steel.


Tom’s Take

Technical debt is no joke. Every decision you make has consequences. And they may not be apparent for this generation of products. People you may never meet may have to live with your decisions as they try to build their vision. Sometimes you can work with those constraints. But more often than not brilliant people are going to jump ship and do it on their own. Not everyone is going to succeed. But for those that have the vision and drive and turn out something that works the rewards are legion. And that’s more than enough to pay off any debts, technical or not.

Magical Mechanics

If you’re a fan of this blog, you’ve probably read my last post about the new SD-WAN magic quadrant that’s been making the rounds and generating discussion. Some people are smiling that this report places Cisco in an area other than leadership in the SD-WAN space. Others are decrying the report as being unfair and contradictory. I wanted to take another look at it given some new information and some additional thoughts on the results.

Fair and Square

The first thing I wanted to do is make sure that I was completely transparent with the way the Gartner Magic Quadrant (MQ) works. I have a very good idea thanks to a conversation with Andrew Lerner (@Fast_Lerner), who is the Research VP of Networking at Gartner. Andrew was nice enough to clarify my understanding of the MQ and accompanying documentation. I’ll quote him here to make sure I don’t get anything wrong:

In an MQ, we assess the overall vendors’ behavior and offering in the market. Product, service/support sales, marketing, innovation, etc. if a vendor has multiple products in a market and sells them regularly to the enterprise, they are part of the MQ assessment. Viable products are not “excluded”.

As you can see from Andrew’s explanation, the MQ takes into account all the aspects of a company. It’s not just a product. It’s the sales, marketing, and other aspects of the company that give the overall score for a company. So how does Gartner figure out how products and services? That’s where their Critical Capabilities documents come into play. They are focused exclusively on products and services. They don’t take marketing or sales or anything else into account.

According to Andrew, when Gartner did their Critical Capabilities document on Cisco, they looked at Meraki MX and IOS-XE only. Viptela vEdge was not examined. So, the CC documents give us the Gartner overview of technology behind the MQ analysis. While the CC documents are focused solely on the Meraki MX and IOS-XD SD-WAN technology, they are components of the overall analysis in the MQ that was published.

What does that all mean in the long run?

Axis and Allies

In order to break this down a bit further, let’s ignore the actual quadrants in the MQ for the moment. Instead, let’s think about this picture as a graph. One axis of the graph is “Ability to Execute,” or in other words can the company do what they say they’re going to do? The other axis is “Completeness of Vision.” This is a question of how the company has rounded out their understanding of the market forces and direction. I want to use this sample MQ with some labels thrown in to help my readers understand how each axis can affect the ranking a company gets:

So, let’s look at where Cisco was situated on those graphs. They are not in the upper right part of the graph, which is the “good” part according to what most people will tell you when they glance at it. Cisco was ranked almost directly in the middle of the graph. Why would that be?

Let’s look at the Execution axis. Why would Cisco have some issues with execution on SD-WAN? Well, the biggest one is probably the shift to IOS-XE and the issues with code quality. Almost everyone involved deploying IOS-XE has told me that Cisco had significant issues in the early releases. Daniel Dib (@DanielDibSwe) had a great conversation with me about the breakdown between the router code and the controller code a week ago. Here’s the first tweet in the chain:

So, there were issues that have been addressed. But is the code completely stable right now? I can’t say, since I haven’t deployed it. But ask around and see what the common wisdom is. I would be genuinely interested to hear how much better things have gotten in the last six months. But, those code quality issues from months ago are going to be a concern in the report. And you can’t guarantee that every box is going to be running on the latest code. Having issues with stable code is going to impact your ability to execute on your vision.

Now, let’s look at the Completeness of Vision axis. If you reference the above picture you’ll see that the Challengers square represents companies that don’t yet understand the market direction. Given that this was the location that Cisco was placed in (barely), let’s examine why that might be. Let’s start by asking “which Cisco product is best for my SD-WAN solution?” Do you know for sure? Which one are you going to be offered?

In my last post, I said that Cisco had been deemphasizing vEdge deployments in favor of IOS-XE. But I completely forgot about Meraki as an additional offering for SMBs and smaller deployments. Where does Meraki make the most sense? And how big does your network need to be before it outgrows a Meraki deployment? Are you chasing a feature? Or maybe you need a service that isn’t offered natively on the Meraki platform? All of these questions need to be answered when you look at what you’re going to do with SD-WAN.

The other companies that Cisco will tell you are their biggest competitors are probably VMware and Silver Peak. How many SD-WAN platforms do they sell? How about companies ranked closer to Cisco in the MQ like Citrix, CloudGenix, or Versa Networks? How many SD-WAN solutions do they offer? How about HPE, Juniper and Aryaka?

In almost every case, the answer is “one”. Each of these companies have settled on a single solution for SD-WAN or SD-Branch. They don’t split those deployments across different product lines. They may have different sized boxes for things, but they all run the same common software. Can you integrate an IOS-XE SD-WAN appliance into a Meraki deployment? Can you take a Meraki MX and make it work with vEdge?

You may be starting to see that the completeness of Cisco’s vision isn’t lacking in SD-WAN but instead how they’re going to accomplish it. Rather than having one solution that can be scaled to fit all needs, Cisco is choosing to offer two different solutions for SMBs and enterprises. And if you count vEdge as a separate product from IOS-XE, as Cisco has suggested in some of their internal reports, then you have three products! I’m not saying that Cisco doesn’t have a vision. But it really looks like that vision is hazier than it should be.

If Cisco had a unified vision with stable code that was integrated up and down the stack I have no doubts they would have been rated higher. When Cisco was deploying Viptela vEdge as the sole SD-WAN offering they had it was much easier to figure out how everything was going to integrate together. But, just like all transitions, the devil is in the details here as Cisco tries to move to IOS-XE. Code quality is going to be a potential source of problems no matter what. But if you are staking your reputation on moving everyone to a single code base from a different more stable one you had better get it right quickly. Otherwise you’re going to get it counted against you.


Tom’s Take

I really appreciate Andrew Lerner for reaching out regarding my analysis of the MQ. I will admit I wasn’t exactly spot on with the differences between the MQ and the Critical Capabilities documents. But the results are close. The CC analyzes Cisco’s big platforms and tells people how they work. The MQ takes everything as a whole and gives Cisco a ranking of where they stand in the market. Sales numbers aside, do you think Cisco should be a leader in the SD-WAN space? Do you feel that where they are today with IOS-XE and Meraki MX is a complete vision? If you do then you likely won’t care about the MQ either way. But for those that have questions about execution and vision, maybe it’s time to figure out what might have caused Cisco to be right in the middle this time around.

SD-WAN Squares and Perplexing Planes

The latest arcane polygon is out in the SD-WAN space. Normally, my fortune telling skills don’t involve geometry. I like to talk to real people about their concerns and their successes. Yes, I know that the gardening people do that too. It’s just that no one really bothers to read their reports and instead makes all their decisions based on boring wall art.

Speaking of which, I’m going to summarize that particular piece of art here. Note this isn’t the same for copyright reasons but close enough for you to get the point:

4D8DB810-3618-44EA-8AA2-99EB7EAA3E45

So, if you can’t tell by the colors here, the big news is that Cisco has slipped out of the top Good part of the polygon and is now in the bottom Bad part (denoted by the red) and is in danger of going out of business and being the laughing stock of the networking community. Well, no, not so much that last part. But their implementation has slipped into the lower part of the quadrant where first-stage startups and cash-strapped companies live and wish they could build something.

Cisco released a report rebutting those claims and it talks about how Viptela is a huge part of their deployments and that they have mindshare and marketshare. But after talking with some colleagues in the networking industry and looking a bit deeper into the issue, I think Cisco has the same problem that Boeing had this year. Their assurances are inconsistent with reality.

A WANderful World

When you deploy Cisco SD-WAN, what exactly are you deploying? In the past, the answer was IWAN. IWAN was very much an initial attempt to create SD-WAN. It wasn’t perfect and it was soon surpassed by other technologies, many of which were built by IWAN team members that left to found their own startups. But Cisco quickly realized they needed to jump in the fray and get some help. That’s when the bought Viptela.

Here’s where things start getting murky. The integration of Viptela SD-WAN has been problematic. The initial sales model after the acquisition was to continue to sell Viptela vEdge to the customer. The plan was then to integrate the software on a Cisco ISR, then to integrate the Viptela software into IOS-XE. They’ve been selling vEdge boxes for a long while and are supporting them still. But the transition plan to IOS-XE is in full effect. The handwriting on the wall is that soon Cisco will only offer IOS-XE SD-WAN for sale, not vEdge.

Flash forward to 2019. A report is released about the impact and forward outlook for SD-WAN. It has a big analyst name attached to it. In that report, they leave out the references to vEdge and instead grade Cisco on their IOS-XE offering which isn’t as mature as vEdge. Or as deployed as vEdge. Or, as stated by even Cisco, as stable as vEdge, at least at first. That means that Cisco is getting graded on their newest offering. Which, for Cisco, means they can’t talk about how widely deployed and stable vEdge is. Why would this particular analyst firm do that?

Same is Same

The common wisdom is that Gartner graded Cisco on the curve that the sales message is that IOS-XE is the way and the future of SD-WAN at Cisco. Why talk about what came before when the new hot thing is going to be IOS-XE? Don’t buy this old vEdge when this new ISR is the best thing ever.

Don’t believe me? Go call your Cisco account team and try to buy a vEdge. I bet you get “upsold” to an ISR. The path forward is to let vEdge fade away. So it only makes sense that you should be grading Cisco on what their plans are, not on what they’ve already sold. I’ll admit that I don’t put together analyst reports with graphics very often, so maybe my thinking is flawed. But if I call your company and they won’t sell me the product that you want me to grade you on, I’m going to guess I shouldn’t grade you on it.

So Cisco is mad that Gartner isn’t grading them on their old solution and is instead looking at their new solution in a vacuum. And since they can’t draw on the sales numbers or the stability of their existing solution that you aren’t going to be able to buy without a fight, they slipped down in the square to a place that doesn’t show them as the 800-lb gorilla. Where have a heard something like this before?

Plane to See

The situation that immediately sprung to mind was the issue with Boeing’s 737-MAX airliner. In a nutshell, Boeing introduced a new airliner with a different engine and configuration that changed its flight profile. Rather than try to get the airframe recertified, which could take months or years, they claimed it was the same as the old one and just updated training manuals. They also didn’t disclose there was a new software program that tried to override the new flight characteristics and that particular “flaw” caused the tragic crashes of two flights full of people.

I’m not trying to compare analyst reports to tragedies in any way. But I do find it curious that companies want their new stuff to be treated just like their old stuff with regards to approvals and regulation and analysis. They know that the new product is going to have issues and concerns but they don’t want you to count those because remember how awesome our old thing was?

Likewise, they don’t want you to count the old problems against them. Like the Pinto or the Edsel from Ford, they don’t want you to think their old issues should matter. Just look at how far we’ve come! But that’s not how these things work. We can’t take the old product into account when trying to figure out how the new one works. We can’t assume the new solution is the same as the old one without testing, no matter how much people would like us to do that. It’s like claiming the Space Shuttle was as good as the Saturn V rocket because it went into space and came from NASA.

If your platform has bugs in the first version, those count against you too. You have to work it all out and realize people are going to remember that instability when they grade your product going forward. Remember that common wisdom says not to install an operating system update until the first service patch. You have to consider that reputation every time you release a patch or an update.


Tom’s Take

The SD-WAN MQ tells me that Cisco isn’t getting any more favors from the analysts. The marketing message not lining up with the install base is the heart of a transition away from hardware and software that Cisco doesn’t own. The hope that they could just smile and shrug their shoulders and hope that no one caught on has been dashed. Instead, Cisco now has to realize their going to have to earn that spot back through good code releases and consistent support and licensing. No one is going to give them any slack with SD-WAN like they have with switches and unified communications. If Cisco thinks that they’re just going to be able to bluff their way through this product transition, that idea just won’t fly.

The End of SD-WAN’s Party In China

As I was listening to Network Break Episode 257 from my friends at Packet Pushers, I heard Greg and Drew talking about a new development in China that could be the end of SD-WAN’s big influence there.

China has a new policy in place, according to Axios, that enforces a stricter cybersecurity stance for companies. Companies doing business in China or with offices in China must now allow Chinese officials to get into their networks to check for security issues as well as verifying the supply chain for network security.

In essence, this is saying that Chinese officials can have access to your networks at any time to check for security threats. But the subtext is a little less clear. Do they get to control the CPE as well? What about security constructs like VPNs? This article seems to indicate that as of January 1, 2020, there will be no intra-company VPNs authorized by any companies in China, whether Chinese or foreign businesses in China.

Tunnel Collapse

I talked with a company doing some SD-WAN rollouts globally in China all the way back in 2018. One of the things that was brought up in that interview was that China was an unknown for American companies because of the likelihood of changing that model in the future. MPLS is the current go-to connectivity for branch offices. However, because you can put an SD-WAN head-end unit there and build an encrypted tunnel back to your overseas HQ it wasn’t a huge deal.

SD-WAN is a wonderful way to ensure your branches are secure by default. Since CPE devices “phone home” and automatically build encrypted tunnels back to a central location, such as an HQ, you can be sure that as soon as the device powers on and establishes global connectivity that all traffic will be secure over your VPN until you change that policy.

Now, what happens with China’s new policy? All traffic must transit outside of a VPN. Things like web traffic aren’t as bad but what about email? Or traffic destined for places like AWS or Azure? It was an unmentioned fact that using SD-WAN VPNs to transit through the content filters in place in China was a way around issues that might arise from accessing resources inside of a very well secured country-wide network.

With the policy change and enforcement guidelines set forth to be enacted in 2020, this could be a very big deal for companies hoping to use SD-WAN in China. First and foremost, you can’t use your intra-company VPN functions any longer. That effectively means that your branch office can’t connect to the HQ or the rest of your corporate network. Given some of the questions around intellectual property issues in China that might not be a bad thing. However, it is going to cause issues for your users trying to access the mail and other support services. Especially if they are hosted somewhere that is going to create additional scrutiny.

The other potential issue is whether or not Chinese officials are even going to allow you to use CPE of your own choosing in the future. If the mandate is that officials should have access to your network for security concerns, who is to say they can’t just dictate what CPE you should use in order to facilitate that access. Larger companies can probably negotiate for come kind of on-site server that does network scanning. But smaller branches are likely going to need to have an all-in-one device at the head end doing all the work. The additional benefit for the Chinese is that control of the head end CPE ensures that you can’t build a site-to-site VPN anywhere.

Peering Into The Future

Greg and Drew pontificate a bit on the future on what this means for organizations from foreign countries doing business in China in the future. I tend to agree with them on a few points. I think you’re going to see a push for Chinese offices of major companies treating them like zero-trust endpoints. All communications will be trading minimal information. Networks won’t be directly connected, either by VPN substitute or otherwise.

Looking further down the road makes the plans even more murky. Is there a way that you can certify yourself to have a standard for cybersecurity? We have something similar with regulations here in the US were we can submit compliance reports for various agencies and submit to audits or have audits performed by third parties. But if the government won’t take that as an answer how do you even go about providing the level of detail they want? If the answer is “you can’t”, then the larger discussion becomes whether or not you can comply with their regulations and reduce your business exposure while still making money in this market. And that’s a conversation no technology can solve.


Tom’s Take

SD-WAN gives us a wonderful set of features included in the package. Things like application inspection are wonderful to look at on a dashboard but I’ve always been a bigger fan of the automatic VPN service. I like knowing that as soon as I turn up my devices they become secure endpoints for all my traffic. Alas, all the technology in the world can be defeated by business or government regulation. If the rules say you can’t have a feature, you either have to play by the rules or quit playing the game. It’s up to businesses to decide how they’ll react going forward. But SD-WAN’s greatest feature may now have to be an unchecked box on that dashboard.

The Confluence of SD-WAN and Microsegmentation

If you had to pick two really hot topics in the networking space right now, you’d be hard-pressed to find two more discussed than SD-WAN and microsegmentation. SD-WAN is the former “king of the hill” in the network engineering. I can remember having more conversations about SD-WAN in the last couple of years than anything else. But as the SD-WAN market has started to consolidate and iterate, a new challenger has arrived. Microsegmentation is the word of the day.

However, I think that SD-WAN and microsegmentation are quickly heading toward a merger of ideas and solutions. There are a lot of commonalities between the two technologies that make a lot of sense running together.

SD-WAN isn’t just about packet switching and routing any longer. That’s because networking people have quickly learned that packet-by-packet processing of traffic is inefficient. All of our older network analysis devices could only see things one IP packet at a time. But the new wave of devices think in terms of flows. They can analyze a stream of packets to figure out what’s going on. And what generates those flows?

Applications.

The key to the new wave of SD-WAN technology isn’t some kind of magic method of nailing up VPNs between branch offices. It’s not about adding new connectivity types. Instead, it’s about application identification. App identification is how SD-WAN does QoS now. The move to using app markers means a more holistic method of treating application traffic properly.

SD-WAN has significant value in application handling. I recently chatted with Kumar Ramachandran of CloudGenix and he echoed that part of the reason why they’ve been seeing growth and recently received a Series C funding round was because of what they’re doing with applications. The battle of MPLS versus broadband has already been fought. The value isn’t going to come from edge boxes unless there is software that can help differentiate the solutions.

Segmenting Your Traffic

So, what does this have to do with microsegmentation? If you’ve been following that market, you already know that the answer is the application. Microsegmentation doesn’t work on a packet-by-packet basis either. It needs to see all the traffic flows from an application to figure out what is needed and what isn’t. Platforms that do this kind of work are big on figuring out which protocols should be talking to which hosts and shutting everything else down to secure that communication.

Microsegmentation is growing in the cloud world for sure. I’ve seen and talked to people from companies like Guardicore, Illumio, ShieldX, and Edgewise in recent months. Each of them has a slightly different approach to doing microsegmentation. But they all look at the same basic approach form the start. The application is the basic building block of their technology.

With the growth of microsegmentation in the cloud market to help ensure traffic flows between hosts and sites is secured, it’s a no-brainer that the next big SD-WAN platform needs to add this functionality to their solution. I say this because it’s not that big of a leap to take the existing SD-WAN application analytics software that optimizes traffic flows over links and change it to restrict traffic flow with policy support.

For SD-WAN vendors, it’s another hedge against the inexorable march of traffic into the cloud. There are only so many Direct Connect analogs that you can build before Amazon decides to put you out of business. But, if you can integrate the security aspect of application analytics into your platform you can make your solution very sticky. Because that functionality is critical to meeting audit goals and ensuring compliance. And you’re going to wish you had it when the auditors come calling.


Tom’s Take

I don’t think the current generation of SD-WAN providers are quite ready to implement microsegmentation in their platforms. But I really wouldn’t be surprised to see it in the next revision of solutions. I also wonder if that means that some of the companies that have already purchased SD-WAN companies are going to look at that functionality. Perhaps it will be VMware building NSX microsegmentaiton on top of VeloCloud. Or maybe Cisco will include some of their microsegmentation from ACI in Viptela. They’re going to need to look at that strongly because once companies that are still on their own figure it out they’re going to be the go-to solution for companies looking to provide a good, secure migration path to the cloud. And all those roads lead to an SD-WAN device with microsegmentation capabilities.

QoS Is Dead. Long Live QoS!

Ah, good old Quality of Service. How often have we spent our time as networking professionals trying to discern the archaic texts of Szigeti to learn how to make you work? QoS is something that seemed so necessary to our networks years ago that we would spend hours upon hours trying to learn the best way to implement it for voice or bulk data traffic or some other reason. That was, until a funny thing happened. Until QoS was useless to us.

Rest In Peace and Queues

QoS didn’t die overnight. It didn’t wake up one morning without a home to go to. Instead, we slowly devalued and destroyed it over a period of years. We did it be focusing on the things that QoS was made for and then marginalizing them. Remember voice traffic?

We spent years installing voice over IP (VoIP) systems in our networks. And each of those systems needed QoS to function. We took our expertise in the arcane arts of queuing and applied it to the most finicky protocols we could find. And it worked. Our mystic knowledge made voice better! Our calls wouldn’t drop. Our packets arrived when they should. And the world was a happy place.

That is, until voice became pointless. When people started using mobile devices more and more instead of their desk phones, QoS wasn’t as important. When the steady generation of delay-sensitive packets instead moved back to LTE instead of IP it wasn’t as critical to ensure that FTP and other protocols in the LAN interfered with it. Even when people started using QoS on their mobile devices the marking was totally inconsistent. George Stefanick (@WirelesssGuru) found that Wi-Fi calling was doing some weird packet marking anyway:

So, without a huge packet generation issue, QoS was relegated to some weird traffic shaping roles. Maybe it was video prioritization in places where people cared about video? Or perhaps it was creating a scavenger class for traffic in order to get rid of unwanted applications like BitTorrent. But overall QoS languished as an oddity as more and more enterprises saw their collaboration traffic moving to be dominated by mobile devices that didn’t need the old dark magic of QoS.

QoupS de Gras

The real end of QoS came about thanks to the cloud. While we spent all of our time trying to find ways to optimize applications running on our local enterprise networks, developers were busy optimizing applications to run somewhere else. The ideas were sound enough in principle. By moving applications to the cloud we could continually improve them and push features faster. By having all the bit off the local network we could scale massively. We could even collaborate together in real time from anywhere in the world!

But applications that live in the cloud live outside our control. QoS was always bounded by the borders of our own networks. Once a packet was launched into the great beyond of the Internet we couldn’t control what happened to it. ISPs weren’t bound to honor our packet markings without an SLA. In fact, in most cases the ISP would remark all our packets anyway just to ensure they didn’t mess with the ISP’s ideas of traffic shaping. And even those were rudimentary at best given how well QoS plays with MPLS in the real world.

But cloud-based applications don’t worry about quality of service. They scale as large as you want. And nothing short of a massive cloud outage will make them unavailable. Sure, there may be some slowness here and there but that’s nothing less than you’d expect to receive running a heavy application over your local LAN. The real genius of the cloud shift is that it forced developers to slim down applications and make them more responsive in places where they could be made to be more interactive. Now, applications felt snappier when they ran in remote locations. And if you’ve every tried to use old versions of Outlook across slow links you now how critical that responsiveness can be.

The End is The Beginning

So, with cloud-based applications here to stay and collaboration all about mobile apps now, we can finally carve the tombstone for QoS right? Well, not quite.

As it turns out, we are still using lots and lots of QoS today in SD-WAN networks. We’re just not calling it that. Instead, we’ve upgraded the term to something more snappy, like “Application Visibility”. Under the hood, it’s not much different than the QoS that we’ve done for years. We’re still picking out the applications and figuring out how to optimize their traffic patterns to make them more responsive.

The key with the new wave of SD-WAN is that we’re marrying QoS to conditional routing. Now, instead of being at the mercy of the ISP link to the Internet we can do something else. We can push bulk traffic across slow cheap links and ensure that our critical business applications have all the space they want on the fast expensive ones instead. We can push our out-of-band traffic out of an attached 4G/LTE modem. We can even push our traffic across the Internet to a gateway closer to the SaaS provider with better performance. That last bit is an especially delicious piece of irony, since it basically serves the same purpose as Tail-end Hop Off did back in the voice days.

And how does all this magical new QoS work on the Internet outside our control? That’s the real magic. It’s all tunnels! Yes, in order to make sure that we get our traffic where it needs to be in SD-WAN we simply prioritize it going out of the router and wrap it all in a tunnel to the next device. Everything moves along the Internet and the hop-by-hop treatment really doesn’t care in the long run. We’re instead optimizing transit through our network based on other factors besides DSCP markings. Sure, when the traffic arrives on the other side it can be optimized based on those values. However, in the real world the only thing that most users really care about is how fast they can get their application to perform on their local machine. And if SD-WAN can point them to the fastest SaaS gateway, they’ll be happy people.


Tom’s Take

QoS suffered the same fate as Ska music and NCIS. It never really went away even when people stopped caring about it as much as they did when it was the hot new thing on the block. Instead, the need for QoS disappeared when our traffic usage moved away from the usage it was designed to augment. Sure, SD-WAN has brought it back in a new form, QoS 2.0 if you will, but the need for what we used to spend hours of time doing with ancient tomes on knowledge is long gone. We should have a quiet service for QoS and acknowledge all that it has done for us. And then get ready to invite it back to the party in the form that it will take in the cloud future of tomorrow.

Are We Seeing SD-WAN Washing?

You may have seen a tweet from me last week referencing a news story that Fortinet was now in the SD-WAN market:

It came as a shock to me because Fortinet wasn’t even on my radar as an SD-WAN vendor. I knew they were doing brisk business in the firewall and security space, but SD-WAN? What does it really mean?

SD Boxes

Fortinet’s claim to be a player in the SD-WAN space brings the number of vendors doing SD-WAN to well over 50. That’s a lot of players. But how did the come out of left field to land a deal rumored to be over a million dollars for a space that they weren’t even really playing in six months ago?

Fortinet makes edge firewalls. They make decent edge firewalls. When I used to work for a VAR we used them quite a bit. We even used their smaller units as remote appliances to allow us to connect to remote networks and do managed maintenance services. At no time during that whole engagement did I ever consider them to be anything other than a firewall.

Fast forward to 2018. Fortinet is still selling firewalls. Their website still focuses on security as the primary driver for their lines of business. They do talk about SD-WAN and have a section for it with links to whitepapers going all the way back to May. They even have a contributed article for SDxCentral back and February. However, going back that far the article reads more like a security company that is saying their secure endpoints could be considered SD-WAN.

This reminds me of stories of Oracle counting database licenses as cloud licenses so they could claim to be the fourth largest cloud provider. Or if a company suddenly decided that every box they sold counted as an IPS because it had a function that could be enabled for a fee. The numbers look great when you start counting them creatively but they’re almost always a bit of a fib.

Part Time Job

Imagine if Cisco suddenly decided to start counting ASA firewalls as container engines because of a software update that allowed you to run Kubernetes on the box. People would lose their minds. Because no one buys an ASA to run containers. So for a company like Cisco to count them as part of a container deployment would be absurd.

The same can be said for any company that has a line of business that is focused on one specific area and then suddenly decides that the same line of business can be double-counted for a new emerging market. It may very well be the case that Fortinet has a huge deployment of SD-WAN devices that customers are very happy with. But if those edge devices were originally sold as firewalls or UTM devices that just so happened to be able to run SD-WAN software, it shouldn’t really count should it? If a customer thought they were buying a firewall they wouldn’t really believe it was actually an SD-WAN router.

The problem with this math is that everything gets inflated. Maybe those SD-WAN edge devices are dedicated. But, if they run Fortinet’s security suite are also being counting in the UTM numbers? Is Cisco going to start counting every ISR sold in the last five years as a Viptela deployment after the news this week that Viptela software can run on all of them? Where exactly are we going to draw the line? Is it fair to say that every x86 chip sold in the last 10 years should count for a VMware license because you could conceivably run a hypervisor on them? It sounds ridiculous when you put it like that, but only because of the timelines involved. Some crazier ideas have been put forward in the past.

The only way that this whole thing really works is if the devices are dedicated to their function and are only counted for the purpose they were installed and configured for. You shouldn’t get to add a UTM firewall to both the security side and the SD-WAN side. Cisco routers should only count as traditional layer 3 or SD-WAN, not both. If you try to push the envelope to put up big numbers designed to wow potential customers and get a seat at the big table, you need to be ready to defend your reporting of those numbers when people ask tough questions about the math behind those numbers.


Tom’s Take

If you had told me last year that Fortinet would sell a million dollars worth of SD-WAN in one deal, I’d ask you who they bought to get that expertise. Today, it appears they are content with saying their UTM boxes with a central controller count as SD-WAN. I’d love to put them up against Viptela or VeloCloud or even CloudGenix and see what kind of advanced feature sets they produce. If it’s merely a WAN aggregation box with some central control and a security suite I don’t think it’s fair to call it true SD-WAN. Just a rinse and repeat of some washed up marketing ideas.