Sorting Through SD-WAN


SD-WAN has finally arrived. We’re not longer talking about it in terms of whether or not it is a thing that’s going to happen, but a thing that will happen provided the budgets are right. But while the concept of SD-WAN is certain, one must start to wonder about what’s going to happen to the providers of SD-WAN services.

Any Which Way You Can

I’ve written a lot about SDN and SD-WAN. SD-WAN is the best example of how SDN should be marketed to people. Instead of talking about features like APIs, orchestration, and programmability, you need to focus on the right hook. Do you see a food processor by talking about how many attachments it has? Or do you sell a Swiss Army knife by talking about all the crazy screwdrivers it holds? Or do you simply boil it down to “This thing makes your life easier”?

The most successful companies have made the “easier” pitch the way forward. Throwing a kitchen sink at people doesn’t make them buy a whole kitchen. But showing them how easy and automated you can make installation and management will sell boxes by the truckload. You have to appeal the opposite nature that SD-WAN was created to solve. WANs are hard, SD-WANs make them easy.

But that only works if your SD-WAN solution is easy in the first place. The biggest, most obvious target is Cisco IWAN. I will be the first to argue that the reason that Cisco hasn’t captured the SD-WAN market is because IWAN isn’t SD-WAN. It’s a series of existing technologies that were brought together to try and make and SD-WAN competitor. IWAN has all the technical credibility of a laboratory full of parts of amazing machines. What it lacks is any kind of ability to tie all that together easily.

IWAN is a moving target. Which platform should I use? Do I need this software to make it run correctly? How do I do zero-touch deployments? Or traffic control? How do I plug a 4G/LTE modem into the router? The answers to each of these questions involves typing commands or buying additional software features. That’s not the way to attack the complexity of WANs. In fact, it feeds into that complexity even more.

Cisco needs to look at a true SD-WAN technology. That likely means acquisition. Sure, it’s going to be a huge pain to integrate an acquisition with other components like APIC-EM, but given the lead that other competitors have right now, it’s time for Cisco to come up with a solution that knocks the socks off their longtime customers. Or face the very real possibility of not having longtime customers any longer.

Every Which Way But Loose

The first-generation providers of SD-WAN bounced onto the scene to pick up the pieces from IWAN. Names like Viptela, VeloCloud, CloudGenix, Versa Networks, and more. But, aside from all managing to build roughly the same platform with very similar features, they’ve hit a might big wall. They need to start making money in order for these gambles to pay off. Some have customers. Others are managing the migration into other services, like catering their offerings toward service providers. Still others are ripe acquisition targets for companies that lack an SD-WAN strategy, like HPE or Dell. I expect to see some fallout from the first generation providers consolidating this year.

The second generation providers, like Riverbed and Silver Peak, all have something in common. They are building on a business they’ve already proven. It’s no coincidence that both Riverbed and Silver Peak are the most well-known names in WAN optimization. How well known? Even major Cisco partners will argue that they sell these two “best of breed” offerings over Cisco’s own WAAS solution. Riverbed and Silver Peak have a definite advantage because they have a lot of existing customers that rely on WAN optimization. That market alone is going to net them a significant number of customers over the next few years. They can easily sell SD-WAN as the perfect addition to make WAN optimization even easier.

The third category of SD-WAN providers is the late comers. I still can’t believe it, but I’ve been reading about providers that aren’t traditional companies trying to get into the space. Talk about being the ninth horse in an eight horse race. Honestly, at this point you’re better off plowing your investment money into something else, like Internet of Things or Virtual Reality. There’s precious little room among the existing first generation providers and the second generation stalwarts. At best, all you can hope for is a quick exit. At worst, your “novel” technology will be snapped up for pennies after you’re bankrupt and liquidating everything but the standing desks.

Tom’s Take

Why am I excited about the arrival of SD-WAN? Because now I can finally stop talking about it! In all seriousness, when the boardroom starts talking about things that means it’s past the point of being a hobby project and now has become a real debate. SD-WAN is going to change one of the most irritating aspects of networking technology for us. I can remember trying to study for my CCNP and cramming all the DSL and T1 knowledge a person could fit into a brain in my head. Now, it’s all point-and-click and done. IPSec VPNs, traffic analytics, and application identification are so easy it’s scary. That’s the power of SD-WAN to me. Easy to use and easy to extend. I think that the landscape of providers of SD-WAN technologies is going to look vastly different by the end of 2017. But SD-WAN is going to be here for the long haul.

Two Takes On ASIC Design

Making ASICs is a tough task. We learned this last year at Cisco Live Berlin from this conversation with Dave Zacks:

Cisco spent 6 years building the UADP ASIC that powers their next generation switches. They solved a lot of the issues with ASIC design and re-spins by creating some programmability in the development process.

Now, watch this video from Nick McKeown at Barefoot Networks:

Nick says many of the same things that Dave said in his video. But Nick and Barefoot took a totally different approach from Cisco. Instead of creating programmable elements in the ASIC design, then abstracted the entire language of function definition from the ASIC. By using P4 as the high level language and making the system compile the instruction sets down to run in the ASIC, they reduced the complexity, increased the speed, and managed to make the system flexible and capable of implementing new technologies even after the ASIC design is set in stone.

Oh, and they managed to do it in 3 years.

Sometimes, you have to think outside the box in order to come up with some new ideas. Even if that means you have to pull everything out of the box. By abstracting the language from the ASIC, Barefoot not only managed to find a way to increase performance but also to add feature sets to the switch quickly without huge engineering costs.

Some food for thought.

Visibility In Networking – Quick Thoughts from Networking Field Day


I’m at Networking Field Day 13 this week. You can imagine how much fun I’m having with my friends! I wanted to drop some quick thoughts on visibility for this week on you all about what we’re hearing and raise some interesting questions.

I Can See Clearly Now

Visibility is a huge issue for companies. Seeing what’s going on is hard for people. Companies like Ixia talk about the need to avoid dropping any packets to make sure we have complete knowledge of the network. But that requires a huge amount of hardware and design. You’re always going to need traditional monitoring even when everything is using telemetry and other data models. Make sure you size things right.

Forward Networks told us that there is an increasing call for finding a way to monitor both the underlay network and the overlay network. Most overlay companies give you a way to tie into their system via API or other telemetry. However, there is no visibility into the underlay because of the event horizon. Likewise, companies like Forward Networks are focusing on the underlay with mapping technologies and modeling software but they can’t pass back through the event horizon to see into the overlay. Whoever ends up finding a way to marry both of these together is going to make a lot of money.

Apstra is taking the track of not caring what the underlay looks like. They’re going to give you the tools to manage it all without hard setup. You can rip and replace switches as needed with multivendor support. That’s a huge win if you run a heterogeneous network or you’re looking to start replacing traditional hardware with white or bright box options. Likewise, their ability to pull configs can help you visualize your device setup more effectively no matter what’s under there.

Tom’s Take

I’ve got some more Networking Field Day thoughts coming soon, but I wanted to get some thoughts out there for you to think about this weekend. Stay tuned for some new ideas coming out of the event!

Cloud Apps And Pathways


Applications are king. Forget all the things you do to ensure proper routing in your data center. Forget the tweaks for OSPF sub-second failover or BGP optimal path selection. None of it matters to your users. If their login to Seibel or Salesforce or Netflix is slow today, you’ve failed. They are very vocal when it comes to telling you how much the network sucks today. How do we fix this?

Pathways Aren’t Perfect

The first problem is the cloud focus of applications. Once our packets leave our border routers it’s a giant game of chance as to how things are going to work next. The routing protocol games that govern the Internet are tried and true and straight out of RFC 1771(Yes, RFC 4271 supersedes it). BGP is a great tool with general purpose abilities. It’s becoming the choice for web scale applications like LinkedIn and Facebook. But it’s problematic for Internet routing. It scales well but doesn’t have the ability to make rapid decisions.

The stability of BGP is also the reason why it doesn’t react well to changes. In the old days, links could go up and down quickly. BGP was designed to avoid issues with link flaps. But today’s links are less likely to flap and more likely to need traffic moved around because of congestion or other factors. The pace that applications need to move traffic flows means that they tend to fight BGP instead of being relieved that it’s not slinging their traffic across different links.

BGP can be a good suggestion of path variables. That’s how Facebook uses it for global routing. But decisions need to be made on top of BGP much faster. That’s why cloud providers don’t rely on it beyond basic connectivity. Things like load balancers and other devices make up for this as best they can, but they are also points of failure in the network and have scalability limitations. So what can we do? How can we build something that can figure out how to make applications run better without the need to replace the entire routing infrastructure of the Internet?

GPS For Routing

One of the things that has some potential for fixing inefficiency with BGP and other basic routing protocols was highlighted during Networking Field Day 12 during the presentation from Teridion. They have a method for creating more efficiency between endpoints thanks to their agents. Founder Elad Rave explains more here:

I like the idea of getting “traffic conditions” from endpoints to avoid congestion. For users of cloud applications, those conditions are largely unknown. Even multipath routing confuses tried-and-true troubleshooting like traceroute. What needs to happen is a way to collect the data for congestion and other inputs and make faster decisions that aren’t beholden to the underlying routing structure.

Overlay networking has tried to do this for a while now. Build something that can take more than basic input and make decisions on that data. But overlays have issues with scaling, especially past the boundary of the enterprise network. Teridion has potential to help influence routing decisions in networks outside your control. Sadly, even the fastest enterprise network in the world is only as fast as an overloaded link between two level 3 interconnects on the way to a cloud application.

Teridion has the right idea here. Alternate pathways need to be identified and utilized. But that data needs to be evaluated and updated regularly. Much like the issues with Waze dumping traffic into residential neighborhoods when major arteries get congested, traffic monitors could cause overloads on alternate links if shifts happen unexpectedly.

The other reason why I like Teridion is because they are doing things without hardware boxes or the need to install software anywhere but the end host. Anyone working with cloud-based applications knows that the provider is very unlikely to provide anything outside of their standard offerings for you. And even if they manage, there is going to be a huge price tag. More often than not, that feature request will become a selling point for a new service in time that may be of marginal benefit until everyone starts using it. Then application performance goes down again. Since Teridion is optimizing communications between hosts it’s a win for everyone.

Tom’s Take

I think Teridion is on to something here. Crowdsourcing is the best way to gather information about traffic. Giving packets a better destination with shorter travel times means better application performance. Better performance means happier users. Happier users means more time spent solving other problems that have symptoms that aren’t “It’s slow” or “Your network sucks”. And that makes everyone happier. Even grumpy old network engineers.


Teridion was a presenter during Networking Field Day 12 in San Francisco, CA. As a participant in Networking Field Day 12, my travel and lodging expenses were covered by Tech Field Day for the duration of the event. Teridion did not ask for nor where they promised any kind of consideration in the writing of this post. My conclusions here represent my thoughts and opinions about them and are mine and mine alone.


Cisco Data Center Duel

Cisco Logo

Network Field Day 5 started off with a full day at Cisco. The Data Center group opened and closed the day, with the Borderless team sandwiched in between. Omar Sultan (@omarsultan) greeted us as we settled in for a continental breakfast before getting started.

The opening was a discussion of onePK, a popular topic as of late from Cisco. While the topic du jour in the networking world is software-defined networking (SDN), Cisco steers the conversation toward onePK. This, at its core, is API access to all the flavors of the Internetwork Operating System (IOS). While other vendors discuss how to implement protocols like OpenFlow or how to expose pieces of their underlying systems to developers, Cisco has built a platform to allow access into pieces and parts of the OS. You can write applications in Java or Python to pull data from the system or push configurations to it. The process is slowly being rolled out to the major Cisco platforms. The support for the majority of the Nexus switching line should give the reader a good idea of where Cisco thinks this technology will be of best use.

One of the specific applications that Cisco showed off to us using onePK is the use of Puppet to provision switches from bare metal to functioning with a minimum of human effor. Puppet integration was a big underlying topic at both Cisco and Juniper (more on that in the Juniper NFD5 post). Puppet is gaining steam in the netowrking industry as a way to get hardware up and running quickly with the least amount of fuss. Server admins have enjoyed the flexibility of Puppet for a some time. It’s good to see well-tested and approved software like this being repurposed for similar functionality in the world of routing and switching.

Next up was a discussion about the Cisco ONE network controller. Controllers are a very hot topic in the network world today. OpenFlow allows a central management and policy server to push information and flow data into switches. This allows network admins to get a “big picture” of the network and how the packets are flowing across it. Having the ability to view the network in its entirity also allows admins to start partitioning it in a process called “slicing.” This was one of the first applications that the Stanford wiz kids used OpenFlow to accomplish. It makes sense when you think about how universities wanted to partition off their test networks to prevent this radical OpenFlow idea from crashing the production hardware. Now, we’re looking at using slicing for things like multi-tenancy and security. The building blocks are there to make some pretty interesting leaps. The real key is that the central controller have the ability to keep up with the flows being pushed through the network. Cisco’s ONE controller not only speaks OpenFlow, but onePK as well. This means that while the ONE controller can talk to disparate networking devices running OpenFlow, it will be able to speak much more clearly to any Cisco devices you have lying around. That’s a pretty calculated play from Cisco, given that the initial target for their controller will be networks populated primarily by Cisco equipment. The use case that was given to us for the Cisco ONE controller was replacing large network taps with SDN options. Fans of NFD may remember our trip to Gigamon. Cisco hadn’t forgotten, as the network tap they used as an example in their slide looked just like the orange Gigamon switch we saw at a previous NFD.

After the presentations from the Borderless team, we ended the day with an open discussion around a few topics. This is where the real fun started. Here’s the video:

The first hour or so is a discussion around hybrid switching. I had some points in here about the standoff between hardware and software people not really wanting to get along right now. I termed it a Mexican Standoff because no one really wants to flinch and go down the wrong path. The software people just want to write overlays and things like and make it run on everything. The entrenched hardware vendors, like Cisco, want to make sure their hardware is providing better performance than anyone else (because that’s where their edge is). Until someone decides to take a chance and push things in different directions, we’re not going to see much movement. Also, around 1:09:00 is where we talked a bit about Cisco jumping into the game with a pure OpenFlow switch without much more on top of it. This concept seemed a bit foreign to some of the Cisco folks, as they can’t understand why people wouldn’t want IOS and onePK. That’s where I chimed in with my “If I want a pickup truck, I don’t take a chainsaw to a school bus.” You shouldn’t have to shed all the extra stuff to get the performance you want. Start with a smaller platform and work your way up instead of starting with the kitchen sink and stripping things away.

Shortly after this is where the fireworks started. One of Cisco’s people started arguing that OpenFlow isn’t the answer. He said that the customer he was talking to didn’t want OpenFlow. He even went so far as to say that “OpenFlow is a fantasy because it promises everything and there’s nothing in production.” (about 1:17:00) Folks, this was one of the most amazing conversations I’ve ever seen at a Network Field Day event. The tension in the room was palpable. Brent and Greg were on this guy the entire time about how OpenFlow was solving real problems for customers today, and in Brent’s case he’s running it in production. I really wonder how the results of this are going to play out. If Cisco hears that their customers don’t care that much about OpenFlow and just want their gear to do SDN like in onePK then that’s what they are going to deliver. The question then becomes whether or not network engineers that believe that OpenFlow has a big place in the networks of tomorrow can convince Cisco to change their ways.

If you’d like to learn more about Cisco, you can find them at  You can follow their data center team on Twitter as @CiscoDC.

Tom’s Take

Cisco’s Data Center group has a lot of interesting things to say about programmability in the network. From discussions about APIs to controllers to knock down, drag out aruguments about what role OpenFlow is going to play, Cisco has the gamut covered. I think that their position at the top of the network heap gives them a lot of insight into what’s going on. I’m just worried that they are going to use that to push a specific agenda and not embrace useful technologies down the road that solve customer problems. You’re going to hear a lot more from Cisco on software defined networking in the near future as they begin to roll out more and more features to their hardware in the coming months.

Tech Field Day Disclaimer

Cisco was a sponsor of Network Field Day 5.  As such, they were responsible for covering a portion of my travel and lodging expenses while attending Network Field Day 5.  In addition, Cisco provided me with a breakfast and lunch at their offices.  They also provided a Moleskine notebook, a t-shirt, and a flashlight toy.  At no time did they ask for, nor where they promised any kind of consideration in the writing of this review.  The opinions and analysis provided within are my own and any errors or omissions are mine and mine alone.

Additional NFD5 Blog Posts

NFD5: Cisco onePK – Terry Slattery

NFD5: SDN and Unicorn Blood – Omar Sultan