BGP: The Application Networking Dream

bgp

There was an interesting article last week from Fastly talking about using BGP to scale their network. This was but the latest in a long line of discussions around using BGP as a transport protocol between areas of the data center, even down to the Top-of-Rack (ToR) switch level. LinkedIn made a huge splash with it a few months ago with their Project Altair solution. Now it seems company after company is racing to implement BGP as the solution to their transport woes. And all because developers have finally pulled their heads out of the sand.

BGP Under Every Rock And Tree

BGP is a very scalable protocol. It’s used the world over to exchange routes and keep the Internet running smoothly. But it has other power as well. It can be extended to operate in other ways beyond the original specification. Unlike rigid protocols like RIP or OSPF, BGP was designed in part to be extended and expanded as needs changes. IS-IS is a very similar protocol in that respect. It can be upgraded and adjusted to work with both old and new systems at the same time. Both can be extended without the need to change protocol versions midstream or introduce segmented systems that would run like ships in the night.

This isn’t the first time that someone has talked about running BGP to the ToR switch either. Facebook mentioned in this video almost three years ago. Back then they were solving some interesting issues in their own data center. Now, those changes from the hyperscale world are filtering into the real world. Networking teams are seeking to solve scaling issues without resorting to overlay networks or other types of workarounds. The desire to fix everything wrong with layer 2 has led to a revelation of sorts. The real reason why BGP is able to work so well as a replacement for layer 2 isn’t because we’ve solved some mystical networking conundrum. It’s because we finally figured out how to build applications that don’t break because of the network.

Apps As Far As The Eye Can See

The whole reason when layer 2 networks are the primary unit of data center measurement has absolutely nothing to do with VMware. VMware vMotion behaves the way that it does because legacy applications hate having their addresses changed during communications. Most networking professionals know that MAC addresses have a tenuous association to IP addresses, which is what allows the gratuitous ARP after a vMotion to work so well. But when you try to move an application across a layer 3 boundary, it never ends well.

When web scale companies started building their application stacks, they quickly realized that being pinned to a particular IP address was a recipe for disaster. Even typical DNS-based load balancing only seeks to distribute requests to a series of IP addresses behind some kind of application delivery controller. With legacy apps, you can’t load balance once a particular host has resolved a DNS name to an IP address. Once the gateway of the data center resolves that IP address to a MAC address, you’re pinned to that device until something upsets the balance.

Web scale apps like those built by Netflix or Facebook don’t operate by these rules. They have been built to be resilient from inception. Web scale apps don’t wait for next hop resolution protocols (NHRP) or kludgy load balancing mechanisms to fix their problems. They are built to do that themselves. When problems occur, the applications look around and find a way to reroute traffic. No crazy ARP tricks. No sly DNS. Just software taking care of itself.

The implications for network protocols are legion. If a web scale application can survive a layer 3 communications issue then we are no longer required to keep the entire data center as a layer 2 construct. If things like anycast can be used to pin geolocations closer to content that means we don’t need to worry about large failover domains. Just like Ivan Pepelnjak (@IOSHints) says in this post, you can build layer 3 failure domains that just work better.

BGP can work as your ToR strategy for route learning and path selection because you aren’t limited to forcing applications to communicate at layer 2. And other protocols that were created to fix limitations in layer 2, like TRILL or VXLAN, become an afterthought. Now, applications can talk to each other and fail back and forth as they need to without the need to worry about layer 2 doing anything other than what it was designed to do: link endpoints to devices designed to get traffic off the local network and into the wider world.


Tom’s Take

One of the things that SDN has promised us is a better way to network. I believe that the promise of making things better and easier is a noble goal. But the part that has bothered me since the beginning was that we’re still trying to solve everyone’s problems with the network. We don’t rearrange the power grid every time someone builds a better electrical device. We don’t replumb the house overtime we install a new sink. We find a way to make the new thing work with our old system.

That’s why the promise of using BGP as a ToR protocol is so exciting. It has very little to do with networking as we know it. Instead of trying to work miracles in the underlay, we build the best network we know how to build. And we let the developers and programmers do the rest.

Who Wants To Save Forever?

Save-icon

At the recent SpectraLogic summit in Boulder, much of the discussion centered around the idea of storing data and media in perpetuity. Technology has arrived at the point where it is actually cheaper to keep something tucked away rather than trying to figure out whether or not it should be kept. This is leading to a huge influx of media resources being available everywhere. The question now shifts away from storage and to retrieval. Can you really save something forever?

Another One Bites The Dust

Look around your desk. See if you can put your hands on each of the following:

* A USB Flash drive
* A DVD-RW
* A CD-ROM
* A Floppy Disk (bonus points for 5.25")

Odds are good that you can find at least three of those four items. Each of those items represents a common way of saving files in a removal format. I’m not even trying to cover all of the formats that have been used (I’m looking at you, ZIP drives). Each of these formats has been tucked away in a backpack or given to a colleague at some point to pass files back and forth.

Yet, each of these formats has been superseded sooner or later by something better. Floppies were ultraportable and very small. CD-ROMs were much bigger, but couldn’t be re-written without effort. DVD media never really got the chance to take off before bandwidth eclipsed the capacity of a single disc. And USB drives, while the removable media du jour, are mainly used when you can’t connect wirelessly.

Now, with cloud connectivity the idea of having removable media to share files seems antiquated. Instead of copying files to a device and passing it around between machines, you simply copy those files to a central location and have your systems look there. And capacity is very rarely an issue. So long as you can bring new systems online to augment existing storage space, you can effectively store unlimited amounts of data forever.

But how do we extract data from old devices to keep in this new magical cloud? Saving media isn’t that hard. But getting it off the source is proving to be harder than one might think.

Take video for instance. How can you extract data from an old 8mm video camera? It’s not a standard size to convert to VHS (unless you can find an old converter at a junk store). There are a myriad of ways to extract the data once you get it hooked up to an input device. But what happens if the source device doesn’t work any longer? If your 8mm camera is broken you probably can’t extract your media. Maybe there is a service that can do it, but you’re going to pay for that privilege.

I Want To Break Free

Assuming you can even extract the source media files for storage, we start running into another issue. Once I’ve saved those files, how can I be sure that I can read them fifty years from now? Can I even be sure I can read them five years from now?

Data storage formats are a constantly-evolving discussion. All you have to do is look at Microsoft Office. Office is the most popular workgroup suite in the entire world. All of those files have to be stored in a format that allows them to be read. One might be forgiven for assuming that Microsoft Word document formats are all the same or at least similar enough to be backwards compatible across all versions.

Each new version of the format includes a few new pieces that break backwards compatibility. Instead of leveraging new features like smaller file sizes or increased readability we are faced to continue using old formats like Word 97-2002 in order to ensure that file can be read by whomever they send it to for review.

Even the most portable for formats suffers from this malady. Portable Document Format (PDF) was designed by Adobe to be an application independent way to display files using a printing descriptor language. This means that saving a file as a PDF one system makes it readable on a wide variety of systems. PDF has become the de facto way to share files back and forth.

Yet it can suffer from format issues as well. PDF creation software like Adobe Acrobat isn’t immune from causing formatting problems. Files saved with certain attributes can only be read by updated versions of reader software that can understand them. The idea of a portable format only works when you restrict the descriptors available to the lowest common denominator so that all readers can display the format.

Part of this issue comes from the idea that companies feel the need to constantly “improve” things and force users to continue to upgrade software to be able to read the new formats. While Adobe has offered the PDF format to ISO for standardization, adding new features to the process takes time and effort. Adobe would rather have you keep buying Acrobat to make PDFs and downloading new versions to Reader to decode those new files. It’s a win-win situation for them and not as much of one for the consumers of the format.


Tom’s Take

I find it ironic that we have spent years of time and millions of dollars trying to find ways to convert data away from paper and into electronic formats. The irony is that those papers that we converted years ago are more readable that the data that we stored in the cloud. The only limitation of paper is how long the actual paper can last before being obliterated.

Think of the Rosetta Stone or the Code of Hammurabi. We know about these things because they were etched into stone. Literally. Yet, in the case of the Rosetta Stone we ran into file format issues. It wasn’t until we were able to save the Egyptian hieroglyphs as Greek that we were able to read them. If you want your data to stand the test of time, you need to think about more than the cloud. You need to make sure that you can retrieve and read it as well.