Overlay Transport and Delivery


The difference between overlay networks and underlay networks still causes issues with engineers everywhere.  I keep trying to find a visualization that boils everything down to the basics that everyone can understand.  Thanks to the magic of online ordering, I think I’ve finally found it.

Candygram for Mongo

Everyone on the planet has ordered something from Amazon (I hope).  It’s a very easy experience.  You click a few buttons, type in a credit card number, and a few days later a box of awesome shows up on your doorstep.  No fuss, no muss.  Things you want show up with no effort on your part.

Amazon is the world’s most well-known overlay network.  When you place an order, a point-to-point connection is created between you and Amazon.  Your item is tagged for delivery to your location.  It’s addressed properly and finds its way to you almost by magic.  You don’t have to worry about your location.  You can have things shipped to a home, a business, or a hotel lobby halfway across the country.  The magic of an overlay is that the packets are going to get delivered to the right spot no matter what.  You don’t need to worry about the addressing.

That’s not to say there isn’t some issue with the delivery.  With Amazon, you can pay for expedited delivery.  Amazon Prime members can get two-day shipping for a flat fee.  In overlays, your packets can take random paths depending on how the point-to-point connection is built.  You can pay to have a direct path provided the underlay cooperates with your wishes.  But unless a full mesh exists, your packet delivery is going to be at the mercy of the most optimized path.

Mongo Only Pawn In Game Of Life

Amazon only works because of the network of transports that live below it.  When you place an order, your package could be delivered any number of ways.  UPS, FedEx, DHL, and even the US Postal Service can be the final carrier for your package.  It’s all a matter of who can get your package there the fastest and the cheapest.  In many ways, the transport network is the underlay of physical shipping.

Routes are optimized for best forwarding.  So are UPS trucks.  Network conditions matter a lot to both packets and packages.  FedEx trucks stuck in traffic jams at rush hour don’t do much good.  Packets that traverse slow data center interconnects during heavy traffic volumes risk slow packet delivery.  And if the road conditions or cables are substandard?  The whole thing can fall apart in an instant.

Underlays are the foundation that higher order services are built on.  Amazon doesn’t care about roads.  But if their shipping times get drastically increased due to deteriorating roadways you can bet their going to get to the bottom of it.  Likewise, overlay networks don’t directly interact with the underlay but if packet delivery is impacted people are going to take a long hard look at what’s going on down below.

Tom’s Take

I love Amazon.  It beats shopping in big box stores and overpaying for things I use frequently.  But I realize that the infrastructure in place to support the behemoth that is Amazon is impressive.  Amazon only works because the transport system in place is optimized to the fullest.  UPS has a computer system that eliminates left turns from driver routes.  This saves fuel even if it means the routes are a bit longer.

Network overlays work the same way.  They have to rely on an optimized underlay or the whole system crashes in on itself.  Instead of worrying about the complexity of introducing an overlay on top of things, we need to work on optimizing the underlay to perform as quickly as possible.  When the underlay is optimized, the whole thing works better.

The Visibility Event Horizon


I’ve always been a science nerd.  Especially when it comes to astronomy.  I’ve always been fascinated by stellar objects.  The most fascinating has to be the black hole.  A region of space with intense gravity formed from the collapse of a stellar body.  I’ve read all about the peculiar properties of classical black holes.  Of particular interest to the networking field is the idea of the event horizon.

An event horizon is a boundary beyond which events no longer affect observers.  In layman’s terms, it is the point of no return for things falling into a black hole.  Anything that falls below the event horizon disappears from the perspective of the observer.  From the point of view of someone falling into the black hole, they never reach the event horizon yet are unable to contact the outside world.  The event horizon marks the point at which information disappears in a system.

How does this apply to the networking world?  Well, every system has a visibility boundary.  We tend to summarize information heading in both directions.  To a network engineer, everything above the transport layer of the OSI model doesn’t really matter.  Those are application decisions that are made by programmers that don’t affect the system other than to be a drain on resources.  To the programmers and admins, anything below the session layer of the OSI model is of little importance.  As long as a call can be made to utilize network resources, who cares what’s going on down there?

Looking Down

Software Defined Networking (SDN) vendors are enforcing these event horizons.  VMware NSX and the Microsoft Hyper-V virtual networking solution both function in a similar manner.  They both create overlay networks that virtualize resources below the level of the host.  Tunnels are created between devices or systems that ride on top of the physical network beneath.  This means that the overlay can function no matter the state of the underlay, provided a path between the two hosts exists.  However, it also means that the overlay obscures the details of the physical network.

There are many that would call this abstraction.  Why should the hosts care about the network state?  All they really want is a path to reach a destination.  It’s better to give them what they want and leave the gory details up to routing protocols and layer 2 loop avoidance mechanisms.  But, that abstraction becomes an event horizon when the overlay is unwilling (or unable) to process information from the underlay network.

Applications and hosts should be aware enough to listen to network conditions.  Overlays should not rely on OSPF or BGP to make tunnel endpoint rerouting decisions.  Putting undue strain on network processing is part of what has led to the situation we have now, where network operating systems need to be complex and intensive to calculate solutions to problems that could be better solved at a higher level.

If the network reports a traffic condition, like a failed link or a congested WAN circuit, that information should be able to flow back up to the overlay and act as a data point to trigger an alternate solution or path.  Breaking the event horizon for information flowing back up toward the overlay is crucial to allow the complex network constructs we’ve created, such as fabrics, to utilize the best possible solutions for application traffic.

Gazing Up

That’s not to say the event horizon doesn’t exist in the other direction as well.  The network has historically been ignorant of the needs of applications at a higher layer.  Network engineers have spent thousands of hours of time creating things like Quality of Service in an attempt to meet the unique needs of higher level programs.  Sometimes this works in a vacuum with no problems provided we’ve guess accurately enough to predict traffic patterns.  Other times, it fails spectacularly when the information changes too quickly.

The underlay network needs to destroy the event horizon that prevents information at higher layers from flowing down into the network.  Companies that have historically concentrated on networking alone have started to see how important this intelligence can be.  By allowing the network to respond to the needs of applications quickly developers can provide enough information to ensure that they programs are treated fairly by changing network conditions even without needing to listen to them.  In this way, the application people can no longer claim the network is a “black hole”.

Tom’s Take

Even as I was writing this, a number of news stories came out from a paper by Professor Stephen Hawking that stated that the classical event horizon doesn’t exist.  The short, short version is that the conditions close to a quantum singularity preclude a well-defined boundary that prevents the escape of all information, including light.  Pretty heady stuff.

In networking, we do have the luxury of a well-defined boundary between underlay networks and overlay networks.  We’ve seen the damage caused by the apparent event horizon for years.  Critical information wasn’t flowing back and forth as needed to help each side provide the best experience for users and engineers.  We need to ensure that this barrier is removed going forward.  The networking people can’t exist in a vacuum pretending that applications don’t have needs.  The overlay admins need to understand the the underlay is a storehouse of critical information and shouldn’t be ignored simply because tunnels are awesome.  Knowing about the event horizon is the first step to finding a way to blast through it.