Routing Through the Forest of Trees

Some friends shared a Reddit post the other day that made me both shake my head and ponder the state of the networking industry. Here is the locked post for your viewing pleasure. It was locked because the comments were going to devolve into a mess eventually. The person making the comment seems to be honest and sincere in their approach to “layer 3 going away”. The post generated a lot of amusement from the networking side of IT about how this person doesn’t understand the basics but I think there’s a deeper issue going on.

Trails To Nowhere

Our visibility of the state of the network below the application interface is very general in today’s world. That’s because things “just work” to borrow an overused phrase. Aside from the occasional troubleshooting exercise to find out why packets destined for Azure or AWS are failing along the way when is the last time you had to get really creative in finding a routing issue in someone else’s equipment? We spend more time now trying to figure out how to make our own networks operate efficiently and less time worrying about what happens to the packets when they leave our organization. Provided, of course, that the users don’t start complaining about latency or service outages.

That means that visibility of the network functions below the interface of the application doesn’t really exist. As pointed out in the post, applications have security infrastructure that communicates with other applications and everything is nicely taken care of. Kind of like ordering packages from your favorite online store. The app places the order with a storefront and things arrive at your house. You don’t have to worry about picking the best shipping method or trying to find a storefront with availability or any of the older ways that we had to deal with weirdness.

That doesn’t mean that the processes that enable that kind of service are going away though. Optimizing transport networks is a skill that is highly specialized but isn’t a solved issue. You’ve probably heard by now that UPS trucks avoid left turns whenever possible to optimize safety and efficiency. The kind of route planning that needs to be done in order to eliminate as many left turns as possible from the route is massive. It’s on the order of a very highly specialized routing protocol. What OSPF and BGP are doing is akin to removing the “left turns” from the network. They find the best path for packets and keep up-to-date as the information changes. That doesn’t mean the network is going away. It means we’re finding the most efficient route through it for a given set of circumstances. If a shipping company decides tomorrow that they can no longer guarantee overnight delivery or even two-day shipping that would change the nature of the applications and services that offer that kind of service drastically. The network still matters.

OSI Has to Die

The other thing that jumped out at me about the post was the title. Referring to Layer 3 of the OSI model as a routing function. The timing was fortuitous because I had just finished reading Robert Graham’s excellent treatise on getting rid of the OSI model and I couldn’t agree more with him. Containing routing and addressing functions to a single layer of an obsolete model gives people the wrong ideas. At the very least is encourages them to form bad opinions about those ideas.

Let’s look at the post as an example. Taking a stance like “we don’t need layer three because applications will connect to each other” is bad. So is “We don’t need layer two because all devices can just broadcast for the destination”. It’s wrong to say those things but if you don’t know why it’s wrong then it doesn’t sound so bad. Why spend time standing up routing protocols if applications can just find their endpoints? Why bother putting higher order addresses on devices when the nature of Ethernet means things can just be found easily with a broadcast or neighbor discovery transmission? Except you know that’s wrong if you understand how remote networks operate and why having a broadcast domain of millions of devices would be chaos.

Graham has some very compelling points about relegating the OSI model to history and teaching how networks really operate. It helps people understand that there are multiple networks that exist at one time to get traffic to where it belongs. While we may see the Internet and Ethernet LAN as a single network they have different purposes. One is for local traffic delivery and the other is for remote traffic delivery. The closest analog for certain generations is the phone system. There was a time when you have local calls and long distance calls that required different dialing instructions. You still have it today but it’s less noticeable thanks to mobile devices not requiring long distance dialing instructions.

It might be more appropriate to think of the local/remote dichotomy like a private branch exchange (PBX) phone network. Phones inside the PBX have locally significant extensions that have no meaning outside of the system. Likewise, remote traffic can only enter the system through entry points created by administrators, like a main dial-in number that terminates on an extension or direct inward dial (DID) numbers that have significance outside the system. Extensions only matter for the local users and have no way to communicate outside without addressing rules. Outside addresses have no way of communicating into the local system without creating rules that allow it to happen. It’s a much better metaphor than the OSI model.


Tom’s Take

I don’t blame our intrepid poster for misunderstanding the way network addresses operate. I blame IT for obfuscating it because it doesn’t matter anymore to application developers. Sure, we’ve finally hit the point where the network has merged into a single entity with almost no distinction from remote WAN and local LAN. But we’ve also created a system where people forget the dependencies of the system at lower levels. You can’t encode signals without a destination and you can’t determine the right destination without knowing where it’s supposed to be. That’s true if you’re running a simple app in an RFC 1918 private space or the public cloud. Forgetting that little detail means you could end up lost in a forest not being able to route yourself out of it again.

The Puzzle of Peering with Kentik

If you’ve worked at an ISP or even just closely with them you’ve probably hearing the term peering quite a bit. Peering is essentially a reciprocal agreement to provide access to networks between two providers. Provider A agrees to allow Provider B to send traffic over and through their network in exchange for the same access in the other direction. Sounds easy, right? On a technical level it is pretty easy. You simply set up a BGP session with the partner provider and make sure all the settings match and you’ve got things rolling.

The technical part isn’t usually where peering gets complicated. Instead it’s almost always related to the business side of things. The policy and negations that have to happen for a good peering agreement take way more time that hammering out some BGP configuration stanzas. The amount of traffic to be sent, the latency requirements, and even the cost of the agreement are all things that have to be figured out before the first hello packet can be exchanged. This agreement is always up for negotiation too, since the traffic patterns can change before you realize it and put you at a disadvantage.

Peerless Data Collection

If you want to get the most out of your peering arrangement you need to know what’s going on. You need to have statistics about the key points of your agreement. You need to know if you’re holding up your end of the bargain as well as the company you’re working with. If you walk into a peering negotiation without the right data you’re going to be working from a disadvantage right away.

For example, did your partner company take all of the traffic they agreed to accept in the peering contract? Or did they have issues that forced you to send the traffic along a different route? Were you forced to send that traffic across a different route that had a higher cost? Did your users complain about network speed because of congestion outside of your control? If you can’t put your fingers on the answers to these questions quickly you’re going to find yourself with lots of angry users and customers not to mention peering partners that want answers from you as well.

Recently I had a chance to listen to a great presentation from Kentik during Networking Field Day: Service Provider. Nina Bargisten laid out some of the challenges that Kentik customers face with peering arrangements and how Kentik is helping to solve them:

One of the points that Nina discusses is that capacity planning is a huge undertaking for ISPs. With the supply chain issues that we’re currently facing in 2022 it’s not easy to order equipment to alleviate congestion problems. Even under somewhat normal circumstances it’s not likely that an ISP is going to go out and order a lot of new hardware just to deal with congestion. They might change some polices to route traffic in different directions but ultimately the decision has to be made about how to get customer packets through and out of their network to the ultimate destination.

Peering agreements can help with congestion. Adding more exit points to your network means some flows can exit through a different provider and either get to their destination faster or prevent a larger connection from being overwhelmed and congested. It’s not unlike having multiple options to use to arrive a destination when driving. Some streets are better for smaller amounts of traffic compared to larger highways and interstates that provide high-speed travel.

As mentioned above, it’s critical that you have data on your traffic and its performance. Are you sending everything through one route? Are you peering with providers that are getting less than half of the traffic load they agreed to take? These are all questions you have to ask to create a capacity plan. If you’re hearing complaints about congestion but you see that only two of your outbound connections are running a full capacity while the rest are sitting idle then you don’t have a congestion issue as much as you have a configuration problem to solve.

Kentik’s solution allows you to see what’s going on and help you make better decisions about the routes that traffic should be taking. As demonstrated above, their dashboard collects data from your network as well as many others and can tell you when you need to be configuring polices to send traffic to low volume peers instead of relying on congested links. It will also help you see trends for when links become congested and allow you to set thresholds to divide your traffic appropriately before it becomes an issue.

There’s a lot more info in the video above to help you with your capacity planning and peering negotiations. It all comes down to a simple maxim: Information is key. You can’t solve these puzzles without knowing what you have and what you need. If you’re just going to keep throwing peering agreements at a problem until it goes away you’re going to fail. You won’t solve your real issues by just adding another connection that never gets used. Instead you can use Kentik’s platform to provide the kinds of insights that will help you create value for your customers and save money at the same time.


Tom’s Take

Service providers think about traffic differently than enterprise admins. They have to worry about it coming into the network and leaving again. Instead of worrying about a couple of links to the wider Internet they have to worry about dozens. If you think it’s hard keeping track of all that data for the enterprise you can just imagine how hard it is when you scale it up to the service provider level. Thankfully companies like Kentik are applying their expertise to provide actionable information to help you make the right choices and maybe even negotiate some better deals.

If you’d like to learn more about this presentation, make sure you check out the full presentation on the Tech Field Day site or go to http://Kentik.com

The Value of Old Ideas

I had a fun exchange on Twitter this week that bears some additional thinking. Emirage (@Emirage6) tweeted a fun meme about learning BGP:

I retweeted it and a few people jumped in the fun, including a couple that said it was better to configure BGP for reasons. This led to a blog post about routing protocols with even more great memes and a good dose of reality for anyone that isn’t a multi-CCIE.

Explain It Like I’m Five

I want you to call your mom and explain BGP to her. Go on and do that now because I’m curious to see how you’d open that conversation. Unless your mom is in networking already I’m willing to bet you’re going to have to start really, really basic. In fact, given the number of news organizations that don’t even know what the letters in the acronym stand for I’d guess you are going to have a hard time talking about the path selection process or leak maps or how sessions are established.

Now, try that same conversation with RIP. I bet it goes a lot smoother. Why? Because RIP is a simple protocol. We don’t worry about prefixes or AS Path prepending or other things when we talk about RIP. Because RIPv1 is the most basic routing protocol you can get. It learns about routes and sends the information on. It’s so simple that you can usually get all of the info about it out of the way in a couple of hours of instruction in a networking class.

So why do we still talk about RIP? It’s so old. Even the second version of the protocol is better. There’s no intelligence. No link state. You can’t even pick a successor route! We should never talk about RIPv1 again in any form, right? So what would you suggest we start with? OSPF? BGP?

The value of RIP is not in the protocol. Instead, the value is in the simplicity of the discussion. If you’ve never heard of a routing protocol before you don’t want to hear about all the complexity of something that runs half the Internet. At least you don’t want to hear about it at first. What you need to hear is how a routing protocol works. That’s why RIP is a great introduction. It does the very minimal basic functions that a routing protocol should do. It learns routes and tells other routers about them.

RIP is also a great way to explain why other routing protocols were developed. We use OPSF and EIGRP and other IGRPs now because RIP doesn’t perform well outside of small networks. There are limitations around IP subnets and network diameter and “routing by rumor” that only make sense when you know how routing protocols are supposed to operate and how they can fall down. If you start with OSPF and learn about link-state first then you don’t understand how a router could ever have knowledge of a route that it doesn’t know about directly or learn about from a trusted source. In essence, RIP is a great first lesson because it is both bad and good.

The Expert’s Conundrum

The other issue at hand is that experts tend to feel like complicated subjects they understand are easy to explain. If that’s the case, then the Explain Like I’m Five Reddit shouldn’t exist. It turns out that trying to explain a complex topic to people without expert knowledge is super hard because they lack the frame or reference you need to help them understand. If you are a network engineer that doesn’t believe me then go ask a friend in the medical profession to explain how the endocrine system works. Don’t make they do a simple explanation. Make them tell you like they’d teach a medical student.

We lose sight of the fact that complex topics can’t be mastered quickly and certain amounts of introductory knowledge need to be introduced to bring people along on the journey. We teach about older models and retired protocols because the knowledge they contain can help us understand why we moved away from them. It also helps us to have a baseline level of knowledge about things that permeate the way we do our jobs.

If we removed things that we never use from the teaching curriculum we’d never have the OSI model since it is never implemented in the pure form in any protocol. We’d teach the TCP/IP model and just tell people that the other things don’t really matter. In fact, they do matter because the pure OSI model takes other things into account that aren’t just focused on the network protocol stack. It may seem silly to us as experts to say that we teach one thing but reality looks much different but that’s how learning works.

We still teach older methods of network topologies like bus and ring even though we don’t use them much any more. We do this because entry-level people need to know why we arrived at the method we use now. Even with the move toward new setups like 3-tier network design and leaf/spine architecture you need to know where Ethernet started to understand why we are where we are today.


Tom’s Take

It’s always important to get a reminder that people are just starting out in the learning process. While it’s easy to be an expert and sit back and claim that iBGP is the best way to approach a routing protocol problem you also have to remember that learners sometimes just need a quick-and-dirty lab setup to test another function. If they’re going to spend hours configuring BGP neighbor relationships instead of just enabling RIP on all router interfaces and moving on to the next part then they’re not learning the right things. Knowledge is important, even if it’s outdated. We still teach RIP and Frame Relay and Token Ring in networking because people need to understand how they operate and why we’ve moved on. They may not ever configure them in practice but they may also never configure BGP either. The value of information doesn’t decrease because it’s old.

What Happens When The Internet Breaks?

It’s a crazy idea to think that a network built to be completely decentralized and resilient can be so easily knocked offline in a matter of minutes. But that basically happened twice in the past couple of weeks. CloudFlare is a service provider that offers to sit in front of your website and provide all kinds of important services. They can prevent smaller sites from being knocked offline by an influx of traffic. They can provide security and DNS services for you. They’re quickly becoming an indispensable part of the way the Internet functions. And what happens when we all start to rely on one service too much?

Bad BGP Behavior

The first outage on June 24, 2019 wasn’t the fault of CloudFlare. A small service provider in Pennsylvania decided to use a BGP Optimizer from Noction to do some route optimization inside their autonomous system (AS). That in and of itself shouldn’t have caused a problem. At least, not until someone leaked those routes to the greater internet.

It was a comedy of errors. The provider in question announced their more specific routes to an upstream customer, who in turn announced them to Verizon. After that all bets are off. Because those routes were more specific than the aggregates they became the preferred routes. And when the whole world beats a path to your door to get to the rest of the world, you see issues.

Those issues caused CloudFlare to go offline. And when CloudFlare goes offline everyone starts having issues. The sites they are the front end for go offline, even if the path to those sites is still valid. That’s because CloudFlare is acting as the front end for your site when you use their service. It’s great because it means that when someone knocks your system offline or hits you with a ton of traffic you’re safe because CloudFlare can support a lot more bandwidth than you can, especially if you’re self hosted. But if CloudFlare is out, you’re out of luck.

There was a pretty important lesson to be learned in all this and CloudFlare did an okay job of explaining some of those lessons. But the tone of their article was a bit standoffish and seemed to imply that the people whose responsibility it was to keep the Internet running should do a better job of keeping their house in order. For those of you playing along at home, you’ll realize that the irony overlords were immediately summoned to mete out justice to CloudFlare.

Irregular Expression

On July 2nd, CloudFlare went down again. This time, instead of seeing issues with routing packets or delays, users of the service were greeted with 502 Bad Gateway errors. Again, when CloudFlare is down your site is down even if you’re not offline. And then the speculation started. Was this another BGP hijack? Was CloudFlare being attacked? No one knew and most of the places you could go look were offline, including one of the biggest offline site detectors, which was a user of CloudFlare services.

CloudFlare eventually posted a blog owning up to the fact that it wasn’t an attack or a BGP issue, but instead was the result of a bad web application firewall (WAF) rule being deployed globally in one go. A single regular expression (regex) was responsible for spiking the CPU utilization of the entirety of the CloudFlare network. And when all your CPUs are cranking along at 100% utilization across the board, you are effectively offline.

In the post-mortem CloudFlare had to eat a little crow and admit that their testing procedures for catching this particular issue were inadequate. To see the stance they took with Verizon and Noction just a week or so before and then to see how they had to admit that this one was all on them was a bit humbling for sure. But, more importantly, it shows that you have to be vigilant in every part of your organization to ensure that some issue that you deploy isn’t going to cause havoc on the other side. Especially if you’re the responsible party of a large percentage of traffic on the web.


Tom’s Take

I think CloudFlare is doing good work with their services. But I also think that too many people are relying on them to provide services that should be planned out and documented. It’s important to realize that no one service is going to provide all the things you need to stay resilient. You need to know how you’re keeping your site online and what your backup plan is when things go down.

And, if you’re running one of those services, you’d better be careful about running your mouth on the Internet.

Can Routing Be Oversimplified?

I don’t know if you’ve had a chance to see this Reddit thread yet, but it’s a funny one:

We eliminated routing protocols from our network!

Short non-clickbait summary: We deployed SD-WAN and turned off OSPF. We now have a /16 route for the internal network and a default route to the Internet where a lot of our workloads were moved into the cloud.

Bravo for this networking team for simplifying their network to this point. All other considerations aside, does this kind of future really bode well for SD-WAN?

Now You See Me

As pointed out in the thread above, the network team didn’t really get rid of their dynamic routing protocols. The SD-WAN boxes that they put in place are still running BGP or some other kind of setup under the hood. It’s just invisible to the user. That’s nothing new. Six years ago, Ivan Pepelnjak found out Juniper QFabric was running BGP behind the scenes too.

Hiding the networking infrastructure from the end user is nothing new. It’s a trick that has been used for years to allow infrastructures to be tuned and configured in such a way as to deliver maximum performance without letting anyone tinker with the secret sauce under the hood. You’ve been using it for years whether you realize it or not. Have MPLS? Core BGP routing is “hidden” from you. SD-WAN? Routing protocols are running between those boxes. Moved a bunch of workloads to AWS/Azure/GCE? You can better believe there is some routing protocol running under that stack.

Making things complex for the sake of making them hard to work on is foolish. We’ve spent decades and millions of dollars trying to make things easy. If you don’t believe me, look at the Apple iPhone. That device is a marvel at hiding all the complexity underneath. But, it also makes it really hard to troubleshoot when things go wrong.

Building On Shoulders

SD-WAN is doing great things for networking. I can remember years ago the thought of turning up a multi-site IPSec VPN configuration was enough to give me hives, let alone trying to actually do it. Today, companies like Viptela, VeloCloud, and Silver Peak make it easy to do. They’re innovating on top of the stack instead of inside it.

So much discussion in the community happens around building pieces of the stack. We spend time and effort making a better message protocol for routing information exchange. Or we build a piece of the HTTP stack that should be used in a bigger platform. We geek out about technical pieces because that’s where our energy feels the most useful.

When someone collects those stack pieces and tries to make them “easy”, we shout that company down and say that they’re hiding complexity and making the administrators and engineers “forget” how to do the real work. We spend more time focusing on what’s hidden and not enough on what’s being accomplished with the pieces. If you are the person that developed the fuel injection system in a car, are you going to sit there and tell Ford and Chevrolet than bundling it into a automotive platform is wrong?

So, while the end goal of any project like the one undertaken above is simplification or reducing problems because of less complex troubleshooting it is not a silver bullet. Hiding complexity doesn’t make it magically go away. Removing all your routing protocols in favor of a /16 doesn’t mean your routing networking runs any better. It means that your going to have to spend more time trying to figure out what went wrong when something does break.

Ask yourself this question: Would you rather spend more time building out the network and understand every nook and cranny of it or would you rather learn it on the fly when you’re trying to figure out why something isn’t working the way that it should? The odds are very good that you’re going to put the same amount of time into the network either way. Do you want to front load that time? Or back load it?


Tom’s Take

The Reddit thread is funny. Because half the people are dumping on the poster for his decision and the rest are trying to understand the benefits. It surely was created in such a way as to get views. And that worked admirably. But I also think there’s an important lesson to learn there. Simplicity for the sake of being simple isn’t enough. You have to replace that simplicity with due diligence. Because the alternative is a lot more time spent doing things you don’t want to do when you really don’t want to be doing them.

Programming Unbound

I’m doing some research on Facebook’s Open/R routing platform for a future blog post. I’m starting to understand the nuances a bit compared to OSPF or IS-IS, but during my reading I got stopped cold by one particular passage:

Many traditional routing protocols were designed in the past, with a strong focus on optimizing for hardware-limited embedded systems such as CPUs and RAM. In addition, protocols were designed as purpose-built solutions to solve the particular problem of routing for connectivity, rather than as a flexible software platform to build new applications in the network.

Uh oh. I’ve seen language like this before related to other software projects. And quite frankly, it worries me to death. Because it means that people aren’t learning their lessons.

New and Improved

Any time I see an article about how a project was rewritten from the ground up to “take advantage of new changes in protocols and resources”, it usually signals to me that some grad student decided to rewrite the whole thing in Java because they didn’t understand C. It sounds a bit cynical, but it’s not often wrong.

Want proof? Check out Linus Torvalds and his opinion about rewriting the Linux kernel in C++. Spoiler alert – “C++ is a horrible language.” And it gets more colorful from there. Linus has some very valid points about C++ that have been debated by lots of communities for the past ten years. But the fact remains that he has decided that completely rewriting the entire kernel in C++ is an exercise in futility.

In today’s world, we’re faced with a multitude of programming languages fighting for our attention. We’re evolved past FORTRAN, COBOL, C, and C++. We now live a world of Python, C#, J#, R, Ruby, and dozens more. And those don’t even include the languages that aren’t low-level and are more scripting. Every one of these languages was designed to solve a particular problem. And every one of them is begging to be used.

But it’s not enough that we have ten ways to write a function today. What’s more troublesome is that we’ve forgotten why certain languages were preferred over others in the past. We’ve forgotten that things used to be done the way they were done because we had no other alternatives. I can remember studying for my Novell CNE and taking 50-649, wherein Novell kept referring to OSPF as an “expensive” protocol to use on a NetWare server. At the time that test was created, OSPF was expensive from a CPU cycle standpoint. If the server was doing other things besides running a routing protocol you might see a potential impact if the CPU was slowed. And having OSPF calculations interrupted because someone was doing an FTP transfer could be expensive indeed.

No Wasted Space

More to the point, when people are faced with a limitation they have to be creative and concise. And nowhere is that more apparent than in E.T. the Extraterrestrial for the Atari 2600. Infamously, Howard Scott Warshaw had just one month to write a video game that would go on to be blasted critically, considered one of the worst of all time, and be blamed for the Video Game Crash of 1983. Yet, as one fan discovered years later when he set out to “fix” the game’s code, as bad is it may have been it was very well coded. From the article:

…it’s unlikely that Howard Scott Warshaw (the developer) included some useless code for us to replace…

So, a programmer for an outdated video game system had a month to code a complex game and managed to do it in such a way as to leave very little empty space to insert code patches? And yet my Facebook app on my iPhone requires how much space?!?

All joking aside, the issue with E.T. wasn’t code quality. The problems with the game have been documented over the years, but almost no one blames Warshaw’s coding capabilities. That’s because Warshaw was working within his limitations. He couldn’t magically make the Atari 2600 cartridge bigger. He couldn’t increase the CPU size on the system. He worked within him limitations and made the best game that he could make.

Now, let’s look at an article about Open/R and see some of Facebook’s criticisms of OSPF and IS-IS:

We didn’t want to get bogged down in discussions over the lower-level protocol details, such as frame formatting and handshakes…

While it might sound heavyweight compared with OSPF and ISIS, which use their own “lightweight” transports, we haven’t found this to be an issue in modern networking hardware…

Whether or not they were intended to be taken as such, these are some pretty interesting knocks against OSPF and IS-IS. What Facebook is essentially saying is that they didn’t want to worry about building the low level parts of the messaging system, so they picked something off the shelf. They also built it to be more resource intensive than necessary because they didn’t need to compromise when running it on Six Pack and Wedge.

So long as your routers have ample CPU cycles and memory, Open/R will run just fine. But how many people out there are running a data center server board in their edge router? How many routers out there have to take a reduced BGP table because they don’t have enough memory to fit the entire global IPv4 routing table in memory, let alone IPv6? If resources are infinite and time is irrelevant than building your protocols the way you want is of no consequence. But as soon as you add constraints to the equation, like support for older hardware or limited memory, you have to start making compromises to make things work.


Tom’s Take

I’m not saying that Open/R is a bad routing protocol. I’m going to save that analysis for a later time. But I do take a bit of umbrage with Facebook’s idea that OSPF and IS-IS are a bit outdated simply because they were programmed for a different era. If they were really that inept they would have been replaced or expanded by now. The fact that twenty-somethings got a bug to rewrite a routing protocol because they could and threw all caution to the wind with regard to resource usage should be a cautionary tale to any programmer out there. Never assume that you have more space than you need. Train yourself to do more with less. And be ready to compromise in case the worst case scenario becomes reality.

Cloud Apps And Pathways

jam

Applications are king. Forget all the things you do to ensure proper routing in your data center. Forget the tweaks for OSPF sub-second failover or BGP optimal path selection. None of it matters to your users. If their login to Seibel or Salesforce or Netflix is slow today, you’ve failed. They are very vocal when it comes to telling you how much the network sucks today. How do we fix this?

Pathways Aren’t Perfect

The first problem is the cloud focus of applications. Once our packets leave our border routers it’s a giant game of chance as to how things are going to work next. The routing protocol games that govern the Internet are tried and true and straight out of RFC 1771(Yes, RFC 4271 supersedes it). BGP is a great tool with general purpose abilities. It’s becoming the choice for web scale applications like LinkedIn and Facebook. But it’s problematic for Internet routing. It scales well but doesn’t have the ability to make rapid decisions.

The stability of BGP is also the reason why it doesn’t react well to changes. In the old days, links could go up and down quickly. BGP was designed to avoid issues with link flaps. But today’s links are less likely to flap and more likely to need traffic moved around because of congestion or other factors. The pace that applications need to move traffic flows means that they tend to fight BGP instead of being relieved that it’s not slinging their traffic across different links.

BGP can be a good suggestion of path variables. That’s how Facebook uses it for global routing. But decisions need to be made on top of BGP much faster. That’s why cloud providers don’t rely on it beyond basic connectivity. Things like load balancers and other devices make up for this as best they can, but they are also points of failure in the network and have scalability limitations. So what can we do? How can we build something that can figure out how to make applications run better without the need to replace the entire routing infrastructure of the Internet?

GPS For Routing

One of the things that has some potential for fixing inefficiency with BGP and other basic routing protocols was highlighted during Networking Field Day 12 during the presentation from Teridion. They have a method for creating more efficiency between endpoints thanks to their agents. Founder Elad Rave explains more here:

I like the idea of getting “traffic conditions” from endpoints to avoid congestion. For users of cloud applications, those conditions are largely unknown. Even multipath routing confuses tried-and-true troubleshooting like traceroute. What needs to happen is a way to collect the data for congestion and other inputs and make faster decisions that aren’t beholden to the underlying routing structure.

Overlay networking has tried to do this for a while now. Build something that can take more than basic input and make decisions on that data. But overlays have issues with scaling, especially past the boundary of the enterprise network. Teridion has potential to help influence routing decisions in networks outside your control. Sadly, even the fastest enterprise network in the world is only as fast as an overloaded link between two level 3 interconnects on the way to a cloud application.

Teridion has the right idea here. Alternate pathways need to be identified and utilized. But that data needs to be evaluated and updated regularly. Much like the issues with Waze dumping traffic into residential neighborhoods when major arteries get congested, traffic monitors could cause overloads on alternate links if shifts happen unexpectedly.

The other reason why I like Teridion is because they are doing things without hardware boxes or the need to install software anywhere but the end host. Anyone working with cloud-based applications knows that the provider is very unlikely to provide anything outside of their standard offerings for you. And even if they manage, there is going to be a huge price tag. More often than not, that feature request will become a selling point for a new service in time that may be of marginal benefit until everyone starts using it. Then application performance goes down again. Since Teridion is optimizing communications between hosts it’s a win for everyone.


Tom’s Take

I think Teridion is on to something here. Crowdsourcing is the best way to gather information about traffic. Giving packets a better destination with shorter travel times means better application performance. Better performance means happier users. Happier users means more time spent solving other problems that have symptoms that aren’t “It’s slow” or “Your network sucks”. And that makes everyone happier. Even grumpy old network engineers.

Disclaimer

Teridion was a presenter during Networking Field Day 12 in San Francisco, CA. As a participant in Networking Field Day 12, my travel and lodging expenses were covered by Tech Field Day for the duration of the event. Teridion did not ask for nor where they promised any kind of consideration in the writing of this post. My conclusions here represent my thoughts and opinions about them and are mine and mine alone.

 

BGP: The Application Networking Dream

bgp

There was an interesting article last week from Fastly talking about using BGP to scale their network. This was but the latest in a long line of discussions around using BGP as a transport protocol between areas of the data center, even down to the Top-of-Rack (ToR) switch level. LinkedIn made a huge splash with it a few months ago with their Project Altair solution. Now it seems company after company is racing to implement BGP as the solution to their transport woes. And all because developers have finally pulled their heads out of the sand.

BGP Under Every Rock And Tree

BGP is a very scalable protocol. It’s used the world over to exchange routes and keep the Internet running smoothly. But it has other power as well. It can be extended to operate in other ways beyond the original specification. Unlike rigid protocols like RIP or OSPF, BGP was designed in part to be extended and expanded as needs changes. IS-IS is a very similar protocol in that respect. It can be upgraded and adjusted to work with both old and new systems at the same time. Both can be extended without the need to change protocol versions midstream or introduce segmented systems that would run like ships in the night.

This isn’t the first time that someone has talked about running BGP to the ToR switch either. Facebook mentioned in this video almost three years ago. Back then they were solving some interesting issues in their own data center. Now, those changes from the hyperscale world are filtering into the real world. Networking teams are seeking to solve scaling issues without resorting to overlay networks or other types of workarounds. The desire to fix everything wrong with layer 2 has led to a revelation of sorts. The real reason why BGP is able to work so well as a replacement for layer 2 isn’t because we’ve solved some mystical networking conundrum. It’s because we finally figured out how to build applications that don’t break because of the network.

Apps As Far As The Eye Can See

The whole reason when layer 2 networks are the primary unit of data center measurement has absolutely nothing to do with VMware. VMware vMotion behaves the way that it does because legacy applications hate having their addresses changed during communications. Most networking professionals know that MAC addresses have a tenuous association to IP addresses, which is what allows the gratuitous ARP after a vMotion to work so well. But when you try to move an application across a layer 3 boundary, it never ends well.

When web scale companies started building their application stacks, they quickly realized that being pinned to a particular IP address was a recipe for disaster. Even typical DNS-based load balancing only seeks to distribute requests to a series of IP addresses behind some kind of application delivery controller. With legacy apps, you can’t load balance once a particular host has resolved a DNS name to an IP address. Once the gateway of the data center resolves that IP address to a MAC address, you’re pinned to that device until something upsets the balance.

Web scale apps like those built by Netflix or Facebook don’t operate by these rules. They have been built to be resilient from inception. Web scale apps don’t wait for next hop resolution protocols (NHRP) or kludgy load balancing mechanisms to fix their problems. They are built to do that themselves. When problems occur, the applications look around and find a way to reroute traffic. No crazy ARP tricks. No sly DNS. Just software taking care of itself.

The implications for network protocols are legion. If a web scale application can survive a layer 3 communications issue then we are no longer required to keep the entire data center as a layer 2 construct. If things like anycast can be used to pin geolocations closer to content that means we don’t need to worry about large failover domains. Just like Ivan Pepelnjak (@IOSHints) says in this post, you can build layer 3 failure domains that just work better.

BGP can work as your ToR strategy for route learning and path selection because you aren’t limited to forcing applications to communicate at layer 2. And other protocols that were created to fix limitations in layer 2, like TRILL or VXLAN, become an afterthought. Now, applications can talk to each other and fail back and forth as they need to without the need to worry about layer 2 doing anything other than what it was designed to do: link endpoints to devices designed to get traffic off the local network and into the wider world.


Tom’s Take

One of the things that SDN has promised us is a better way to network. I believe that the promise of making things better and easier is a noble goal. But the part that has bothered me since the beginning was that we’re still trying to solve everyone’s problems with the network. We don’t rearrange the power grid every time someone builds a better electrical device. We don’t replumb the house overtime we install a new sink. We find a way to make the new thing work with our old system.

That’s why the promise of using BGP as a ToR protocol is so exciting. It has very little to do with networking as we know it. Instead of trying to work miracles in the underlay, we build the best network we know how to build. And we let the developers and programmers do the rest.