Sorting Through SD-WAN


SD-WAN has finally arrived. We’re not longer talking about it in terms of whether or not it is a thing that’s going to happen, but a thing that will happen provided the budgets are right. But while the concept of SD-WAN is certain, one must start to wonder about what’s going to happen to the providers of SD-WAN services.

Any Which Way You Can

I’ve written a lot about SDN and SD-WAN. SD-WAN is the best example of how SDN should be marketed to people. Instead of talking about features like APIs, orchestration, and programmability, you need to focus on the right hook. Do you see a food processor by talking about how many attachments it has? Or do you sell a Swiss Army knife by talking about all the crazy screwdrivers it holds? Or do you simply boil it down to “This thing makes your life easier”?

The most successful companies have made the “easier” pitch the way forward. Throwing a kitchen sink at people doesn’t make them buy a whole kitchen. But showing them how easy and automated you can make installation and management will sell boxes by the truckload. You have to appeal the opposite nature that SD-WAN was created to solve. WANs are hard, SD-WANs make them easy.

But that only works if your SD-WAN solution is easy in the first place. The biggest, most obvious target is Cisco IWAN. I will be the first to argue that the reason that Cisco hasn’t captured the SD-WAN market is because IWAN isn’t SD-WAN. It’s a series of existing technologies that were brought together to try and make and SD-WAN competitor. IWAN has all the technical credibility of a laboratory full of parts of amazing machines. What it lacks is any kind of ability to tie all that together easily.

IWAN is a moving target. Which platform should I use? Do I need this software to make it run correctly? How do I do zero-touch deployments? Or traffic control? How do I plug a 4G/LTE modem into the router? The answers to each of these questions involves typing commands or buying additional software features. That’s not the way to attack the complexity of WANs. In fact, it feeds into that complexity even more.

Cisco needs to look at a true SD-WAN technology. That likely means acquisition. Sure, it’s going to be a huge pain to integrate an acquisition with other components like APIC-EM, but given the lead that other competitors have right now, it’s time for Cisco to come up with a solution that knocks the socks off their longtime customers. Or face the very real possibility of not having longtime customers any longer.

Every Which Way But Loose

The first-generation providers of SD-WAN bounced onto the scene to pick up the pieces from IWAN. Names like Viptela, VeloCloud, CloudGenix, Versa Networks, and more. But, aside from all managing to build roughly the same platform with very similar features, they’ve hit a might big wall. They need to start making money in order for these gambles to pay off. Some have customers. Others are managing the migration into other services, like catering their offerings toward service providers. Still others are ripe acquisition targets for companies that lack an SD-WAN strategy, like HPE or Dell. I expect to see some fallout from the first generation providers consolidating this year.

The second generation providers, like Riverbed and Silver Peak, all have something in common. They are building on a business they’ve already proven. It’s no coincidence that both Riverbed and Silver Peak are the most well-known names in WAN optimization. How well known? Even major Cisco partners will argue that they sell these two “best of breed” offerings over Cisco’s own WAAS solution. Riverbed and Silver Peak have a definite advantage because they have a lot of existing customers that rely on WAN optimization. That market alone is going to net them a significant number of customers over the next few years. They can easily sell SD-WAN as the perfect addition to make WAN optimization even easier.

The third category of SD-WAN providers is the late comers. I still can’t believe it, but I’ve been reading about providers that aren’t traditional companies trying to get into the space. Talk about being the ninth horse in an eight horse race. Honestly, at this point you’re better off plowing your investment money into something else, like Internet of Things or Virtual Reality. There’s precious little room among the existing first generation providers and the second generation stalwarts. At best, all you can hope for is a quick exit. At worst, your “novel” technology will be snapped up for pennies after you’re bankrupt and liquidating everything but the standing desks.

Tom’s Take

Why am I excited about the arrival of SD-WAN? Because now I can finally stop talking about it! In all seriousness, when the boardroom starts talking about things that means it’s past the point of being a hobby project and now has become a real debate. SD-WAN is going to change one of the most irritating aspects of networking technology for us. I can remember trying to study for my CCNP and cramming all the DSL and T1 knowledge a person could fit into a brain in my head. Now, it’s all point-and-click and done. IPSec VPNs, traffic analytics, and application identification are so easy it’s scary. That’s the power of SD-WAN to me. Easy to use and easy to extend. I think that the landscape of providers of SD-WAN technologies is going to look vastly different by the end of 2017. But SD-WAN is going to be here for the long haul.

Is The Rise Of SD-WAN Thanks To Ethernet?


SD-WAN has exploded in the market. Everywhere I turn, I see companies touting their new strategy for reducing WAN complexity, encrypting data in flight, and even doing analytics on traffic to help build QoS policies and traffic shaping for critical links. The first demo I ever watched for SDN was a WAN routing demo that chose best paths based on cost and time-of-day. It was simple then, but that kind of thinking has exploded in the last 5 years. And it’s all thanks to our lovable old friend, Ethernet.

Those Old Serials

When I started in networking, my knowledge was pretty limited to switches and other layer 2 devices. I plugged in the cables, and the things all worked. As I expanded up the OSI model, I started understanding how routers worked. I knew about moving packets between different layer 3 areas and how they controlled broadcast storms. This was also around the time when layer 3 switching was becoming a big thing in the campus. How was I supposed to figure out the difference between when I should be using a big router with 2-3 interfaces versus a switch that had lots of interfaces and could route just as well?

The key for me was media types. Layer 3 switching worked very well as long as you were only connecting Ethernet cables to the device. Switches were purpose built for UTP cable connectivity. That works really well for campus networks with Cat 5/5e/6 cabling. Switched Virtual Interfaces (SVIs) can handle a large amount of the routing traffic.

For WAN connectivity, routers were a must. Because only routers were modular in a way that accepted cards for different media types. When I started my journey on WAN connectivity, I was setting up T1 lines. Sometimes they had an old-fashioned serial connector like this:


Those connected to external CSU/DSU modules. Those were a pain to configure and had multiple points of failure. Eventually, we moved up in the world to integrated CSU/DSU modules that looked like this:


Those are really awesome because all the configuration is done on the interface. They also take regular UTP cables instead of those crazy V.35 monsters.


But those UTP cables weren’t Ethernet. Those were still designed to be used as serial connections.

It wasn’t until the rise of MPLS circuits and Transparent LAN services that Ethernet became the dominant force in WAN connectivity. I can still remember turning up my first managed circuit and thinking, “You mean I can use both FastEthernet interfaces? No cards? Wow!”.

Today, Ethernet dominates the landscape of connectivity. Serial WAN interfaces are relegated to backwater areas where you can’t get “real WAN connectivity”. And in most of those cases, the desire to use an old, slow serial circuit can be superseded by a 4G/LTE USB modem that can be purchased from almost any carrier. It would appear that serial has joined the same Heap of History as token ring, ARCnet, and other venerable connectivity options.

Rise, Ethernet

The ubiquity of Ethernet is a huge boon to SD-WAN vendors. They no longer have to create custom connectivity options for their appliances. They can provide 3-4 Ethernet interfaces and 2-3 USB slots and cover a wide range of options. This also allows them to simplify their board designs. No more modular chassis. No crazy requirements for WIC slots, NM slots, or any other crazy terminology that Cisco WAN engineers are all too familiar with.

Ethernet makes sense for SD-WAN vendors because they aren’t concerned with media types. All their intelligence resides in the software running on the box. They’d rather focus on creating automatic certificate-based IPsec VPNs than figuring out the clock rate on a T1 line. Hardware is not their end goal. It is much easier to order a reference board from Intel and plug it into a box than trying to configure a serial connector and make a custom integration.

Even SD-WAN vendors that are chasing after the service provider market are benefitting from Ethernet ubiquity. Service providers may still run serial connections in their networks, but management of those interfaces at the customer side is a huge pain. They require specialized technical abilities. It’s expensive to manage and difficult to troubleshoot remotely. Putting Ethernet handoffs at the CPE side makes life much easier. In addition, making those handoffs Ethernet makes it much easier to offer in-line service appliances, like those of SD-WAN vendors. It’s a good choice all around.

Serial connectivity isn’t going away any time soon. It fills an important purpose for high-speed connectivity where fiber isn’t an option. It’s also still a huge part of the install base for circuits, especially in rural areas or places where new WAN circuits aren’t easily run. Traditional routers with modular interfaces are still going to service a large number of customers. But Ethernet connectivity is quickly growing to levels where it will eclipse these legacy serial circuits soon. And the advantage for SD-WAN vendors can only grow with it.

Tom’s Take

Ethernet isn’t the only reason SD-WAN has succeeded. Ease of use, huge feature set, and flexibility are the real reasons when SD-WAN has moved past the concept stage and into deployment. WAN optimization now has SD-WAN components. Service providers are looking to offer it as a value added service. SD-WAN has won out on the merits of the technology. But the underlying hardware and connectivity was radically simplified in the last 5-7 years to allow SD-WAN architects and designers to focus on the software side of things instead of the difficulties of building complicated serial interfaces. SD-WAN may not owe it’s entire existence to Ethernet, but it got a huge push in the right direction for sure.

Cloud Apps And Pathways


Applications are king. Forget all the things you do to ensure proper routing in your data center. Forget the tweaks for OSPF sub-second failover or BGP optimal path selection. None of it matters to your users. If their login to Seibel or Salesforce or Netflix is slow today, you’ve failed. They are very vocal when it comes to telling you how much the network sucks today. How do we fix this?

Pathways Aren’t Perfect

The first problem is the cloud focus of applications. Once our packets leave our border routers it’s a giant game of chance as to how things are going to work next. The routing protocol games that govern the Internet are tried and true and straight out of RFC 1771(Yes, RFC 4271 supersedes it). BGP is a great tool with general purpose abilities. It’s becoming the choice for web scale applications like LinkedIn and Facebook. But it’s problematic for Internet routing. It scales well but doesn’t have the ability to make rapid decisions.

The stability of BGP is also the reason why it doesn’t react well to changes. In the old days, links could go up and down quickly. BGP was designed to avoid issues with link flaps. But today’s links are less likely to flap and more likely to need traffic moved around because of congestion or other factors. The pace that applications need to move traffic flows means that they tend to fight BGP instead of being relieved that it’s not slinging their traffic across different links.

BGP can be a good suggestion of path variables. That’s how Facebook uses it for global routing. But decisions need to be made on top of BGP much faster. That’s why cloud providers don’t rely on it beyond basic connectivity. Things like load balancers and other devices make up for this as best they can, but they are also points of failure in the network and have scalability limitations. So what can we do? How can we build something that can figure out how to make applications run better without the need to replace the entire routing infrastructure of the Internet?

GPS For Routing

One of the things that has some potential for fixing inefficiency with BGP and other basic routing protocols was highlighted during Networking Field Day 12 during the presentation from Teridion. They have a method for creating more efficiency between endpoints thanks to their agents. Founder Elad Rave explains more here:

I like the idea of getting “traffic conditions” from endpoints to avoid congestion. For users of cloud applications, those conditions are largely unknown. Even multipath routing confuses tried-and-true troubleshooting like traceroute. What needs to happen is a way to collect the data for congestion and other inputs and make faster decisions that aren’t beholden to the underlying routing structure.

Overlay networking has tried to do this for a while now. Build something that can take more than basic input and make decisions on that data. But overlays have issues with scaling, especially past the boundary of the enterprise network. Teridion has potential to help influence routing decisions in networks outside your control. Sadly, even the fastest enterprise network in the world is only as fast as an overloaded link between two level 3 interconnects on the way to a cloud application.

Teridion has the right idea here. Alternate pathways need to be identified and utilized. But that data needs to be evaluated and updated regularly. Much like the issues with Waze dumping traffic into residential neighborhoods when major arteries get congested, traffic monitors could cause overloads on alternate links if shifts happen unexpectedly.

The other reason why I like Teridion is because they are doing things without hardware boxes or the need to install software anywhere but the end host. Anyone working with cloud-based applications knows that the provider is very unlikely to provide anything outside of their standard offerings for you. And even if they manage, there is going to be a huge price tag. More often than not, that feature request will become a selling point for a new service in time that may be of marginal benefit until everyone starts using it. Then application performance goes down again. Since Teridion is optimizing communications between hosts it’s a win for everyone.

Tom’s Take

I think Teridion is on to something here. Crowdsourcing is the best way to gather information about traffic. Giving packets a better destination with shorter travel times means better application performance. Better performance means happier users. Happier users means more time spent solving other problems that have symptoms that aren’t “It’s slow” or “Your network sucks”. And that makes everyone happier. Even grumpy old network engineers.


Teridion was a presenter during Networking Field Day 12 in San Francisco, CA. As a participant in Networking Field Day 12, my travel and lodging expenses were covered by Tech Field Day for the duration of the event. Teridion did not ask for nor where they promised any kind of consideration in the writing of this post. My conclusions here represent my thoughts and opinions about them and are mine and mine alone.


BGP: The Application Networking Dream


There was an interesting article last week from Fastly talking about using BGP to scale their network. This was but the latest in a long line of discussions around using BGP as a transport protocol between areas of the data center, even down to the Top-of-Rack (ToR) switch level. LinkedIn made a huge splash with it a few months ago with their Project Altair solution. Now it seems company after company is racing to implement BGP as the solution to their transport woes. And all because developers have finally pulled their heads out of the sand.

BGP Under Every Rock And Tree

BGP is a very scalable protocol. It’s used the world over to exchange routes and keep the Internet running smoothly. But it has other power as well. It can be extended to operate in other ways beyond the original specification. Unlike rigid protocols like RIP or OSPF, BGP was designed in part to be extended and expanded as needs changes. IS-IS is a very similar protocol in that respect. It can be upgraded and adjusted to work with both old and new systems at the same time. Both can be extended without the need to change protocol versions midstream or introduce segmented systems that would run like ships in the night.

This isn’t the first time that someone has talked about running BGP to the ToR switch either. Facebook mentioned in this video almost three years ago. Back then they were solving some interesting issues in their own data center. Now, those changes from the hyperscale world are filtering into the real world. Networking teams are seeking to solve scaling issues without resorting to overlay networks or other types of workarounds. The desire to fix everything wrong with layer 2 has led to a revelation of sorts. The real reason why BGP is able to work so well as a replacement for layer 2 isn’t because we’ve solved some mystical networking conundrum. It’s because we finally figured out how to build applications that don’t break because of the network.

Apps As Far As The Eye Can See

The whole reason when layer 2 networks are the primary unit of data center measurement has absolutely nothing to do with VMware. VMware vMotion behaves the way that it does because legacy applications hate having their addresses changed during communications. Most networking professionals know that MAC addresses have a tenuous association to IP addresses, which is what allows the gratuitous ARP after a vMotion to work so well. But when you try to move an application across a layer 3 boundary, it never ends well.

When web scale companies started building their application stacks, they quickly realized that being pinned to a particular IP address was a recipe for disaster. Even typical DNS-based load balancing only seeks to distribute requests to a series of IP addresses behind some kind of application delivery controller. With legacy apps, you can’t load balance once a particular host has resolved a DNS name to an IP address. Once the gateway of the data center resolves that IP address to a MAC address, you’re pinned to that device until something upsets the balance.

Web scale apps like those built by Netflix or Facebook don’t operate by these rules. They have been built to be resilient from inception. Web scale apps don’t wait for next hop resolution protocols (NHRP) or kludgy load balancing mechanisms to fix their problems. They are built to do that themselves. When problems occur, the applications look around and find a way to reroute traffic. No crazy ARP tricks. No sly DNS. Just software taking care of itself.

The implications for network protocols are legion. If a web scale application can survive a layer 3 communications issue then we are no longer required to keep the entire data center as a layer 2 construct. If things like anycast can be used to pin geolocations closer to content that means we don’t need to worry about large failover domains. Just like Ivan Pepelnjak (@IOSHints) says in this post, you can build layer 3 failure domains that just work better.

BGP can work as your ToR strategy for route learning and path selection because you aren’t limited to forcing applications to communicate at layer 2. And other protocols that were created to fix limitations in layer 2, like TRILL or VXLAN, become an afterthought. Now, applications can talk to each other and fail back and forth as they need to without the need to worry about layer 2 doing anything other than what it was designed to do: link endpoints to devices designed to get traffic off the local network and into the wider world.

Tom’s Take

One of the things that SDN has promised us is a better way to network. I believe that the promise of making things better and easier is a noble goal. But the part that has bothered me since the beginning was that we’re still trying to solve everyone’s problems with the network. We don’t rearrange the power grid every time someone builds a better electrical device. We don’t replumb the house overtime we install a new sink. We find a way to make the new thing work with our old system.

That’s why the promise of using BGP as a ToR protocol is so exciting. It has very little to do with networking as we know it. Instead of trying to work miracles in the underlay, we build the best network we know how to build. And we let the developers and programmers do the rest.

Automating Change With Help From Fibonacci


A few recent conversations that I’ve seen and had with professionals about automation have been very enlightening. It all started with a post on StackExchange about an unsuspecting user that tried to automate a cleanup process with Ansible and accidentally erased the entire server farm at a service provider. The post was later determined to be a viral marketing hoax but was quite believable to the community because of the power of automation to make bad ideas spread very quickly.

Better The Devil You Know

Everyone in networking has been in a place where they’ve typed in something they shouldn’t have. Whether you removed the management network you were using to access the switch or created an access list that denied packets that locked you out of something. Or perhaps you typed an errant debug command that forced you to drive an hour to reboot a switch that was no longer responding. All of these things seem to happen to people as part of the learning process.

But how many times have we typed something in to create a change and found that it broke more than we expected? Like changing a native VLAN on a trunk and bringing down a link we didn’t intend to affect? These unforeseen accidents are the kinds of problems that can easily be magnified by scripts or automation.

I wrote a post about people blaming tools for SolarWinds a couple of months ago. In it, there was a story about a person that uploaded the wrong switch firmware to a server and used an automated tool to kick off an upgrade of his entire network. Only after the first switch failed to return to normal did he realize that he had downloaded an incorrect firmware to the server. And the command he used to kick off the upgrade was not the safe version of the command that checks for compatibility. Instead, it was the quick version of the command that copied the firmware directly into flash and rebooted the switch without confirmation.

While people are quick to blame tools for making mistakes race through the network quickly it should also be realized that those issues would be mistakes no matter what. Just because a system is capable of being automated doesn’t mean that your commands are exempt from being checked and rechecked. Too often a typo or an added word somewhere in the mix causes unintended chaos because we didn’t take the time to make sure there were no problems ahead of time.

Fibbonaci Style

I’ve always tried to do testing and regression in a controlled manner. For some places that have simulators and test networks to try out changes this method might still work. But for those of us that tend to fly by the seat of our pants on production devices, it’s best to artificially limit the damage before it becomes too great. Note that this method works with automation systems too provided you are controlling the logic behind it.

When you go out to test a network-wide change or perform an upgrade, pick one device as your guinea pig. It shouldn’t be something pushing massive production traffic like a core switch. Something isolated in the corner of the network usually works just fine. Test your change there outside production hours and with a fully documented blackout plan. Once you’ve implemented it, don’t touch anything else for at least a day. This gives the routing tables time to settle down completely and all of the aging timers a chance to expire and tables to recompile. Only then can you be sure that you’re not dealing with cached information.

Now that you’ve done it once, you’re ready to make it live to 10,000 devices, right? Abosolutely not. Now that you’ve proven that the change doesn’t cause your system to implode and take the network with it, you pick another single device and do it again. This time, you either pick a neighbor device to the first one or something on the other side of the network. The other side of the network ensures that changes don’t ripple across between devices over the 24-hour watch period. On the other hand, if the change involves direct connectivity changes between two devices you should test them to be sure that the links stay up. Much easier to recover one failed device than 40 or 400.

Once you follow the same procedure with the second upgrade and get clean results, it’s time to move to doing two devices at once. If you have a fancy automation system like Ansible or Puppet this is where you will be determining that the system is capable of handling two devices at once. If you’re still using scripts, this is where you make sure you’re pasting the right info into each window. Some networks don’t like two devices changing information at the same time. Your routing table shouldn’t be so unstable that a change like this would cause problems but you never know. You will know when you’re done with this.

Now that you’ve proven that you can make changes without cratering a switch, a link, or the entire network all at once, you can continue. Move on to three devices, then five, then eight. You’ll notice that these rollout plans are following the Fibonacci Sequence. This is no accident. Just like the appearance of these numbers in nature and math, having a Fibonacci rollout plan helps you evaluate changes and rollback problems before they grow out of hand. Just because you have the power to change the entire network at once doesn’t mean that you should.

Tom’s Take

Automation isn’t the bad guy here. We are. We are fallible creatures that make mistakes. Before, those mistakes were limited to how fast we could log into a switch. Today, computers allow us to make those mistakes much faster on a larger scale. The long-term solution to the problem is to test every change ahead of time completely and hope that your test catches all the problems. If it doesn’t then you had better hope your blackout plan works. But by introducing a rolling system similar to the Fibonacci sequence above. I think you’ll find that mistakes will be caught sooner and problems will be rectified before they are amplified. And if nothing else, you’ll have lots of time to explain to your junior admin all about the wonders of Fibonacci in nature.

This WAN Is Your WAN, This WAN Is My WAN

Straw Bales on Hill Landscape, Tuscany, Italy

Straw Bales on Hill Landscape, Tuscany, Italy

Ideas coalesce all the time in every vertical. You don’t really notice it until you wake up one day and suddenly everything around you looks identical. Wireless becoming the new access layer. Flash storage taking hold of the high end performance crown. And in networking we have the dominance of all things software defined. One recent development has coming along much faster than anyone could have predicted: Software Defined Wide Area Networking (SD-WAN).

Automatic For The People

SD-WAN is a force in modern networking because people want simplicity. While Ivan does a great job of decoupling marketing from reality, people still believe that SD-WAN is the silver bullet that will fix all of their WAN woes. Even during the original discussions of SD-WAN technology at conferences like ONUG, the overriding idea wasn’t around tying sites together or driving down costs to the point of feasibility. It was all about making life easier.

How does SD-WAN manage to accomplish this? It’s all black box networking. Just like the fuel injector in your car. There’s no crying about interoperability or standards-based protocols. You just plug things in and it all works, even if you can’t exactly plug one vendor solution into a competitor. Lock in wins again.

The ideas behind SD-WAN aren’t exactly new. Cisco talked about SD-WAN quite a bit at Networking Field Day 10. Here’s Jeff Reed on it:

The rest of the two hour session details how Cisco is using their Intelligent WAN (IWAN) product to drive SD-WAN. The names of the components all sound very familiar to networkers: DMVPN, NBAR, PfR, and so on. That’s because SD-WAN uses a lot of tried-and-true techniques to tie the concept together. There’s nothing earth-shattering about SD-WAN under the hood. In fact, a fair number of people that work at the “pioneering” SD-WAN startups all seem to have their roots in one or more traditional networking companies.

Fables of Reconstruction

Look at the other presenters at Networking Field Day 10. Two of them announced SD-WAN solutions even though they aren’t really known for expertise in SD-WAN. One of them wasn’t even known as a branch office acceleration solution. So why the SD-WAN land rush all of the sudden? What’s behind the need to have a solution?

You probably wouldn’t be surprised to learn that a lot of investors are backing expansion into SD-WAN technologies. It’s a hot property. But why? As above, customers aren’t interested in the technical wizardry that goes into SD-WAN. They aren’t clamoring for it to supplant their current WAN solution and offer a Rosetta Stone of inter-vendor WAN cooperation. What’s behind the push?

It probably goes something like this:

  1. Technologist needs to implement WAN architecture. Is dismayed that things are so difficult.
  2. Technologist starts searching for solutions about WAN. They probably start asking friends about it.
  3. Analyst firm hears that technologists are asking about WAN solutions. Releases a questionnaire asking which technologies you’d like to learn more about.
  4. Responses to questionnaires are loaded into a graph or report that people buy because they don’t know who to talk to.
  5. Companies realize customers want WAN solutions. They break their necks to offer those solutions to keep up with demand.
  6. Investors see companies beginning to offer WAN solutions and think there’s a huge untapped market. They start funding anyone that mentions WAN in a meeting.

By the way, you can replace “WAN” with any technology above and it still works.

Thanks to customers needing a solution for something they can’t configure easily they are going to be inundated with SD-WAN options by the time they turn around. And the biggest concern no long becomes “Who has the easiest solution?” but instead, “Who is still going to be here in six months?”

Collapse Into Now

The reckoning is coming in the SD-WAN market. If a company doesn’t already have an SD-WAN solution in development or if their solution won’t see daylight for another nine months, they are going to exercise the second “B” of innovation and buy it. And they have a lot of prime targets to choose from.

Investors get cagey without an exit strategy. How are they going to win at this game? They either have to get paid with an IPO, with a later round of funding, or by having someone buy out the investment. If an investor thinks they can get their money back (plus a bit of interest) by having this little startup bought by a traditional networking vendor you can better believe they will be advising the startup to sell.

The customers are the real losers in the case of a buyout, or worse a bankruptcy. Those highly proprietary solutions become dead weight if there isn’t any support for them any longer. Black box networking falls apart when the little magical creatures inside the box go away. Which means customers will be skittish of supporting a solution that is likely to go away any time soon.

Who will you support? An established vendor slow to roll out a solution? Or an up-and-coming company with new ideas but at risk of being snapped up by a big bank account?

Tom’s Take

I loved seeing all the SD-WAN discussion at Networking Field Day 10. SD-WAN is no longer magic sauce that aggregates DSL and MPLS circuits with encryption. Nuage Networks showed off deploying Docker apps to remote sites. Riverbed talked about using their WAN optimization experience to deploy SaaS solutions through SD-WAN.

We’ve heard from SD-WAN companies in the past at Networking Field Day. It’s interesting to hear the comparisons between the upstarts and the old geezers. It’s clear there is a ton of money that is being invested in SD-WAN. The trick is to find out your needs and pick the best solution for you. Otherwise you may find yourself losing your SD-WAN religion.


The Packet Flow Duality


Quantum physics is a funny thing. It seeks to solve all the problems in the physical world by breaking everything down into the most basic unit possible. That works for a lot of the observable universe. But when it comes to light, quantum physics has issues. Thanks to experiments and observations, most scientists understand that light isn’t just a wave and it’s not just a collection of particles either. It’s both. This concept is fundamental to understanding how light behaves. But can it also explain how data behaves?

Moving Things Around

We tend to think about data as a series of discrete data units being pushed along a path. While these units might be frames, packets, or datagrams depending on the layer of the OSI model that you are operating at, the result is still the same. A single unit is evaluated for transmission. A brilliant post from Greg Ferro (@EtherealMind) sums up the forwarding thusly:

  • Frames being forwarded by MAC address lookup occur at layer 2 (switching)
  • Packets being forwarded by IP address lookup occur at layer 3 (routing)
  • Data being forwarded at higher levels is a stream of packets (flow forwarding)

It’s simple when you think about it. But what makes it a much deeper idea is that lookup at layer 2 and 3 requires a lot more processing. Each of the packets must be evaluated to be properly forwarded. The forwarding device doesn’t assume that the destination is the same for a group of similar packets. Each one must be evaluated to ensure it arrives at the proper location. By focusing on the discrete nature of the data, we are forced to expend a significant amount of energy to make sense of it. As anyone that studied basic packet switching can tell you, several tricks were invented to speed up this process. Anyone remember store-and-forward versus cut-through switching?

Flows behave differently. They contain state. They have information that helps devices make intelligent forwarding decisions. Those decisions don’t have to be limited by destination MAC or IP addresses. They can be labels or VLANs or other pieces of identifying information. They can be anything an application uses to talk to another device, like a DNS entry. It allows us to make a single forwarding decision per flow and implement it quickly and efficiently. Think about a stateful firewall. It works because the information for a given packet stream (or flow) can be programmed into the device. The firewall is no longer examining every individual packet, but instead evaluates the entire group of packets when making decisions.

Consequently, stageful firewalls also give us a peek at how flows are processed. Rather than having a CAM table or an ARP table, we have a group of rules and policies. Those policies can say “given a group of packets in a flow matching these characteristics, execute the following actions”. That’s a far cry from trying to figure out where each one goes.

It’s All About Scale

A single drop of water is discrete. Just like a single data packet, it represents an atomic unit of water. Taken in this measurement, a single drop of water does little good. It’s only when those drops start to form together that their usefulness becomes apparent. Think of a river or a firehose. Those groups of droplets have a vector. They can be directed somewhere to accomplish something, like putting out a fire or cutting a channel across the land.

Flows should be the atomic unit that we base our networking decisions upon. Flows don’t require complex processing on a per-unit basis. Flows carry additional information above and beyond a 48-bit hex address or a binary address representing an IP entry. Flows can be manipulated and programmed. They can have policies applied. Flows can scale to great heights. Packets and frames are forever hampered by the behaviors necessary to deliver them to the proper locations.

Data is simultaneously a packet and a flow. We can’t separate the two. What we can do is change our frame of reference for operations. Just like experiments with light, we must choose one aspect of the duality to act until such time as the other aspect is needed. Light can be treated like a wave the majority of the time. It’s only when things like the photoelectric effect happen that our reference must change. In the same way, data should be treated like a flow for the majority of cases. Only when the very basic needs of packet/frame/datagram forwarding are needed should we abandon our flow focus and treat it as a group of discrete packets.

Tom’s Take

The idea of data flows isn’t new. And neither is treating flows as the primary form of forwarding. That’s what OpenFlow has been doing for quite a while now. What makes this exciting is when people with new networking ideas start using the flow as an atomic unit for decisions. When you remove the need to do packet-by-packet forwarding and instead focus on the flow, you gain a huge insight into the world around the packet. It’s not much a stretch to think that the future of networking isn’t as concerned with the switching of frames or routing of packets. Instead, it’s the forwarding of a flow of packets that will be exciting to watch. As long as you remember that data can be both packet and flow you will have taken your first step into a larger world of understanding.