The Puzzle of Peering with Kentik

If you’ve worked at an ISP or even just closely with them you’ve probably hearing the term peering quite a bit. Peering is essentially a reciprocal agreement to provide access to networks between two providers. Provider A agrees to allow Provider B to send traffic over and through their network in exchange for the same access in the other direction. Sounds easy, right? On a technical level it is pretty easy. You simply set up a BGP session with the partner provider and make sure all the settings match and you’ve got things rolling.

The technical part isn’t usually where peering gets complicated. Instead it’s almost always related to the business side of things. The policy and negations that have to happen for a good peering agreement take way more time that hammering out some BGP configuration stanzas. The amount of traffic to be sent, the latency requirements, and even the cost of the agreement are all things that have to be figured out before the first hello packet can be exchanged. This agreement is always up for negotiation too, since the traffic patterns can change before you realize it and put you at a disadvantage.

Peerless Data Collection

If you want to get the most out of your peering arrangement you need to know what’s going on. You need to have statistics about the key points of your agreement. You need to know if you’re holding up your end of the bargain as well as the company you’re working with. If you walk into a peering negotiation without the right data you’re going to be working from a disadvantage right away.

For example, did your partner company take all of the traffic they agreed to accept in the peering contract? Or did they have issues that forced you to send the traffic along a different route? Were you forced to send that traffic across a different route that had a higher cost? Did your users complain about network speed because of congestion outside of your control? If you can’t put your fingers on the answers to these questions quickly you’re going to find yourself with lots of angry users and customers not to mention peering partners that want answers from you as well.

Recently I had a chance to listen to a great presentation from Kentik during Networking Field Day: Service Provider. Nina Bargisten laid out some of the challenges that Kentik customers face with peering arrangements and how Kentik is helping to solve them:

One of the points that Nina discusses is that capacity planning is a huge undertaking for ISPs. With the supply chain issues that we’re currently facing in 2022 it’s not easy to order equipment to alleviate congestion problems. Even under somewhat normal circumstances it’s not likely that an ISP is going to go out and order a lot of new hardware just to deal with congestion. They might change some polices to route traffic in different directions but ultimately the decision has to be made about how to get customer packets through and out of their network to the ultimate destination.

Peering agreements can help with congestion. Adding more exit points to your network means some flows can exit through a different provider and either get to their destination faster or prevent a larger connection from being overwhelmed and congested. It’s not unlike having multiple options to use to arrive a destination when driving. Some streets are better for smaller amounts of traffic compared to larger highways and interstates that provide high-speed travel.

As mentioned above, it’s critical that you have data on your traffic and its performance. Are you sending everything through one route? Are you peering with providers that are getting less than half of the traffic load they agreed to take? These are all questions you have to ask to create a capacity plan. If you’re hearing complaints about congestion but you see that only two of your outbound connections are running a full capacity while the rest are sitting idle then you don’t have a congestion issue as much as you have a configuration problem to solve.

Kentik’s solution allows you to see what’s going on and help you make better decisions about the routes that traffic should be taking. As demonstrated above, their dashboard collects data from your network as well as many others and can tell you when you need to be configuring polices to send traffic to low volume peers instead of relying on congested links. It will also help you see trends for when links become congested and allow you to set thresholds to divide your traffic appropriately before it becomes an issue.

There’s a lot more info in the video above to help you with your capacity planning and peering negotiations. It all comes down to a simple maxim: Information is key. You can’t solve these puzzles without knowing what you have and what you need. If you’re just going to keep throwing peering agreements at a problem until it goes away you’re going to fail. You won’t solve your real issues by just adding another connection that never gets used. Instead you can use Kentik’s platform to provide the kinds of insights that will help you create value for your customers and save money at the same time.


Tom’s Take

Service providers think about traffic differently than enterprise admins. They have to worry about it coming into the network and leaving again. Instead of worrying about a couple of links to the wider Internet they have to worry about dozens. If you think it’s hard keeping track of all that data for the enterprise you can just imagine how hard it is when you scale it up to the service provider level. Thankfully companies like Kentik are applying their expertise to provide actionable information to help you make the right choices and maybe even negotiate some better deals.

If you’d like to learn more about this presentation, make sure you check out the full presentation on the Tech Field Day site or go to http://Kentik.com

The Value of Old Ideas

I had a fun exchange on Twitter this week that bears some additional thinking. Emirage (@Emirage6) tweeted a fun meme about learning BGP:

I retweeted it and a few people jumped in the fun, including a couple that said it was better to configure BGP for reasons. This led to a blog post about routing protocols with even more great memes and a good dose of reality for anyone that isn’t a multi-CCIE.

Explain It Like I’m Five

I want you to call your mom and explain BGP to her. Go on and do that now because I’m curious to see how you’d open that conversation. Unless your mom is in networking already I’m willing to bet you’re going to have to start really, really basic. In fact, given the number of news organizations that don’t even know what the letters in the acronym stand for I’d guess you are going to have a hard time talking about the path selection process or leak maps or how sessions are established.

Now, try that same conversation with RIP. I bet it goes a lot smoother. Why? Because RIP is a simple protocol. We don’t worry about prefixes or AS Path prepending or other things when we talk about RIP. Because RIPv1 is the most basic routing protocol you can get. It learns about routes and sends the information on. It’s so simple that you can usually get all of the info about it out of the way in a couple of hours of instruction in a networking class.

So why do we still talk about RIP? It’s so old. Even the second version of the protocol is better. There’s no intelligence. No link state. You can’t even pick a successor route! We should never talk about RIPv1 again in any form, right? So what would you suggest we start with? OSPF? BGP?

The value of RIP is not in the protocol. Instead, the value is in the simplicity of the discussion. If you’ve never heard of a routing protocol before you don’t want to hear about all the complexity of something that runs half the Internet. At least you don’t want to hear about it at first. What you need to hear is how a routing protocol works. That’s why RIP is a great introduction. It does the very minimal basic functions that a routing protocol should do. It learns routes and tells other routers about them.

RIP is also a great way to explain why other routing protocols were developed. We use OPSF and EIGRP and other IGRPs now because RIP doesn’t perform well outside of small networks. There are limitations around IP subnets and network diameter and “routing by rumor” that only make sense when you know how routing protocols are supposed to operate and how they can fall down. If you start with OSPF and learn about link-state first then you don’t understand how a router could ever have knowledge of a route that it doesn’t know about directly or learn about from a trusted source. In essence, RIP is a great first lesson because it is both bad and good.

The Expert’s Conundrum

The other issue at hand is that experts tend to feel like complicated subjects they understand are easy to explain. If that’s the case, then the Explain Like I’m Five Reddit shouldn’t exist. It turns out that trying to explain a complex topic to people without expert knowledge is super hard because they lack the frame or reference you need to help them understand. If you are a network engineer that doesn’t believe me then go ask a friend in the medical profession to explain how the endocrine system works. Don’t make they do a simple explanation. Make them tell you like they’d teach a medical student.

We lose sight of the fact that complex topics can’t be mastered quickly and certain amounts of introductory knowledge need to be introduced to bring people along on the journey. We teach about older models and retired protocols because the knowledge they contain can help us understand why we moved away from them. It also helps us to have a baseline level of knowledge about things that permeate the way we do our jobs.

If we removed things that we never use from the teaching curriculum we’d never have the OSI model since it is never implemented in the pure form in any protocol. We’d teach the TCP/IP model and just tell people that the other things don’t really matter. In fact, they do matter because the pure OSI model takes other things into account that aren’t just focused on the network protocol stack. It may seem silly to us as experts to say that we teach one thing but reality looks much different but that’s how learning works.

We still teach older methods of network topologies like bus and ring even though we don’t use them much any more. We do this because entry-level people need to know why we arrived at the method we use now. Even with the move toward new setups like 3-tier network design and leaf/spine architecture you need to know where Ethernet started to understand why we are where we are today.


Tom’s Take

It’s always important to get a reminder that people are just starting out in the learning process. While it’s easy to be an expert and sit back and claim that iBGP is the best way to approach a routing protocol problem you also have to remember that learners sometimes just need a quick-and-dirty lab setup to test another function. If they’re going to spend hours configuring BGP neighbor relationships instead of just enabling RIP on all router interfaces and moving on to the next part then they’re not learning the right things. Knowledge is important, even if it’s outdated. We still teach RIP and Frame Relay and Token Ring in networking because people need to understand how they operate and why we’ve moved on. They may not ever configure them in practice but they may also never configure BGP either. The value of information doesn’t decrease because it’s old.

Fast Friday- Labor Day Eve Edition

It’s a long weekend in the US thanks to Labor Day. Which is basically signaling the end of the summer months. Or maybe the end of March depending on how you look at it. The rest of the year is packed full of more virtual Zoom calls, conferences, Tech Field Day events, and all the fun you can have looking at virtual leaves turning colors.

It’s been an interesting news week for some things. And if you take out all the speculation about who is going to end up watching TikTok you are left with not much else. So I’ve been wondering out loud about a few things that I thought I would share.

  • You need a backup video conferencing platform. If Zoom isn’t crashing on you then someone is deleting WebEx VMs. Or maybe your callers can’t get the hang of the interface. Treat it like a failing demo during a presentation: if it doesn’t work in five minutes, go to plan B. Don’t leave your people waiting for something that may not happen.
  • If you work in the backbone service provider market, you need to do two things next week. First, you need to make sure all your hardware is patched. Some nasty stuff floating around that has the potential to cause memory exhaustion. That’s downtime in a nutshell. The next is that you need to go read up on BGP again and make sure you know how it propagates and what to do when it does something you’re not expecting. You can better believe the people at CenturyLink have bought lots of BGP reference books in the past few days for absolutely no reason at all.
  • Remember how most ISPs told us back at the start of this whole pandemic that they were graciously removing data caps and such to get us through our Netflix binges? Guess what? Those caps are starting to be put back in place now. Guess the start of school means the start of keeping us in check with our data usage. And the fall seasons of popular shows are coming out in a regular cadence now. Check to make sure you’re not going over your limits. And find out what it costs to do that. Don’t be afraid to block Youtube and Twitch if your kids are eating your bandwidth pool alive.
  • Lastly, stand up. Take a deep breath. Let it out. Relax your shoulders. Unclench your teeth. I’m sure you needed that. Now buckle down and get that thing done.

Tom’s Take

Make sure to stay tuned to the rest of March Part 6, otherwise known as September. More cool stuff headed your way before the equinox plunges us into eternal not-spring.

BGP Hell Is Other People

If you configure a newsreader to alert you every time someone hijacks a BGP autonomous system (AS), it will probably go off at least once a week. The most recent one was on the first of April courtesy of Rostelecom. But they’re not the only one. They’re just the latest. The incidences of people redirecting BGP, either by accident or by design, are becoming more and more frequent. And as we rely more and more on things like cloud computing and online applications to do our daily work and live our lives, the impact of these hijacks is becoming more and more critical.

Professional-Grade Protocol

BGP isn’t the oldest thing on the Internet. RFC 1105 is the initial draft of Border Gateway Protocol. The version that we use today, BGP4, is documented in RFC 4271. It’s a protocol that has enjoyed a long history of revisions and a reviled history of making networking engineers’ lives difficult. But why is that? How can a routing protocol be so critical and yet obtuse?

My friend Marko Milivojevic famously stated in his CCIE training career that, “BGP isn’t a routing protocol. It’s a policy engine.” When you look at the decisions of BGP in this light it makes a lot more sense. BGP isn’t necessarily concerns with the minutia of figuring out exactly how to get somewhere. Sure, it has a table of prefixes that it uses to make decisions about where to forward packets. Almost every protocol does this. But BGP is different because it’s so customizable.

Have you ever tried to influence a route in RIP or OSPF? It’s not exactly easy. RIP is almost impossible to manipulate outside of things like route poisoning or just turning off interfaces. Sometimes the simplest things are the most hardened. OSPF gives us a lot more knobs to play with, like interface bandwidth and link delay. We can tweak and twerk those values to our heart’s content to make packets flow a certain direction. But we don’t have a lot of influence outside of a specific area for those values. If you’ve ever had to memorize the minutia of OSPF not-so-stubby-areas, ASBRs, and the different between Type 5 and Type 7 LSAs you know that these topics were all but created for certification exams.

But what about BGP? How can you influence routes in BGP? Oh, man! How much time do you have??? We can manipulate things all sorts of ways!

  • Weight the routes to prefer one over another
  • Set LOCAL_PREFERENCE to pick which route to use in a multiple exit system
  • Configure multi-exit discriminator (MED) values
  • AS Path Prepending to reduce the likelihood of a path being chosen
  • Manipulate the underlying routing protocol to make certain routes look more preferred
  • Just change the router ID to something crazy low to break all the other ties in the system

That’s a lot of knobs! Why on earth would you do that to someone? Because professionals need options.

Optional Awfulness

BGP is one of those curious things that seems to be built without guardrails because it’s never used on accident. You’ve probably seen something similar in the real world whenever a person removes a safety feature or modifies a device to increase performance and remove an annoyance designed to slow them down. It could be anything from wrapping a bandana around a safety switch lockout to keep something running to pulling the trigger guard off a nail gun so you don’t keep hitting it with your fingers. Some professionals believe that safety features aren’t keeping them safe as much as they are slowing them down. Something as simple as removing the safety from a pellet gun can have dire consequences in the name of efficiency.

So, how does this apply to our new favorite policy engine that happens to route packets? Well, it applies quite a bit. There is no system of guardrails that keeps you from making dumb choices. Accidentally paste your own AS into the AS Path? That’s going to be a routing decision that is considered. Make a typo for an AS that doesn’t exist in the routing table? That’s going into the formula, too. Announcing to the entire world you have the best path to an AS somewhere on the other side of the world? BGP is happy to send traffic your way.

BGP assumes that professionals are programming it. Which means it’s not going to try and stop you from shooting off your own foot. And because the number of knobs that are exposed by the engine are large and varied you can spend a lot of time trying to troubleshoot just how half of a cloud provider’s traffic came barreling through your network for the last hour. CCIEs spend a lot of time memorizing BGP path selection because every step matters when trying to figure out why BGP is acting up. Likewise, knowing where the knobs are best utilized means knowing how to influence path selection. AS Path prepending is probably the best example of this. If you want to put that AS number in there a hundred times to influence path selection you go for it. Why? Because it’s something you can do. Or, more explicitly, something you aren’t prohibited from doing.

Which leads to the problem of route hijacking. BGP is going to do what you tell it to do because it assumes you’re not trying to do anything nefarious or stupid when you program it. Like an automation script, BGP is going to do whatever it is instructed to do by the policy engine as quickly as possible. Taking out normal propagation delays, BGP will sync things up within a matter of minutes. Maybe a few hours. Which means it’s not hard to watch a mistake cascade through the Internet. Or, in the case of people that are doing less-than-legal things, to watch the fruits of your labors succeed.

BGP isn’t inherently bad any more than claiming a catwalk without a handrail has an evil intent. Yes, the situation you find yourself in is less-than-ideal. Sure, something bad can happen if you screw up or do something you’re not supposed to. But blaming the protocol or the object or the situation is not going to fix the issue. We really only have two options at this point:

  • Better educate our users and engineers about how to use BGP and ensure that only qualified people are able to work on it
  • Create controls in BGP that limit the ability to use certain knobs and options in order to provide more security and reliability options.

Tom’s Take

I’m a proponent of both of those options. We need to ensure that people have the right training. However, we also need to ensure that nefarious actors are locked out and that we are protected from making dumb mistakes or that our errors aren’t propagated at light speed through the dark corners of the Internet. We can’t fix everything wrong with BGP but it’s the best option we have right now. Hellish though it may be, we have to find a way to make a better combination of the protocol and the people that use it.

Can Routing Be Oversimplified?

I don’t know if you’ve had a chance to see this Reddit thread yet, but it’s a funny one:

We eliminated routing protocols from our network!

Short non-clickbait summary: We deployed SD-WAN and turned off OSPF. We now have a /16 route for the internal network and a default route to the Internet where a lot of our workloads were moved into the cloud.

Bravo for this networking team for simplifying their network to this point. All other considerations aside, does this kind of future really bode well for SD-WAN?

Now You See Me

As pointed out in the thread above, the network team didn’t really get rid of their dynamic routing protocols. The SD-WAN boxes that they put in place are still running BGP or some other kind of setup under the hood. It’s just invisible to the user. That’s nothing new. Six years ago, Ivan Pepelnjak found out Juniper QFabric was running BGP behind the scenes too.

Hiding the networking infrastructure from the end user is nothing new. It’s a trick that has been used for years to allow infrastructures to be tuned and configured in such a way as to deliver maximum performance without letting anyone tinker with the secret sauce under the hood. You’ve been using it for years whether you realize it or not. Have MPLS? Core BGP routing is “hidden” from you. SD-WAN? Routing protocols are running between those boxes. Moved a bunch of workloads to AWS/Azure/GCE? You can better believe there is some routing protocol running under that stack.

Making things complex for the sake of making them hard to work on is foolish. We’ve spent decades and millions of dollars trying to make things easy. If you don’t believe me, look at the Apple iPhone. That device is a marvel at hiding all the complexity underneath. But, it also makes it really hard to troubleshoot when things go wrong.

Building On Shoulders

SD-WAN is doing great things for networking. I can remember years ago the thought of turning up a multi-site IPSec VPN configuration was enough to give me hives, let alone trying to actually do it. Today, companies like Viptela, VeloCloud, and Silver Peak make it easy to do. They’re innovating on top of the stack instead of inside it.

So much discussion in the community happens around building pieces of the stack. We spend time and effort making a better message protocol for routing information exchange. Or we build a piece of the HTTP stack that should be used in a bigger platform. We geek out about technical pieces because that’s where our energy feels the most useful.

When someone collects those stack pieces and tries to make them “easy”, we shout that company down and say that they’re hiding complexity and making the administrators and engineers “forget” how to do the real work. We spend more time focusing on what’s hidden and not enough on what’s being accomplished with the pieces. If you are the person that developed the fuel injection system in a car, are you going to sit there and tell Ford and Chevrolet than bundling it into a automotive platform is wrong?

So, while the end goal of any project like the one undertaken above is simplification or reducing problems because of less complex troubleshooting it is not a silver bullet. Hiding complexity doesn’t make it magically go away. Removing all your routing protocols in favor of a /16 doesn’t mean your routing networking runs any better. It means that your going to have to spend more time trying to figure out what went wrong when something does break.

Ask yourself this question: Would you rather spend more time building out the network and understand every nook and cranny of it or would you rather learn it on the fly when you’re trying to figure out why something isn’t working the way that it should? The odds are very good that you’re going to put the same amount of time into the network either way. Do you want to front load that time? Or back load it?


Tom’s Take

The Reddit thread is funny. Because half the people are dumping on the poster for his decision and the rest are trying to understand the benefits. It surely was created in such a way as to get views. And that worked admirably. But I also think there’s an important lesson to learn there. Simplicity for the sake of being simple isn’t enough. You have to replace that simplicity with due diligence. Because the alternative is a lot more time spent doing things you don’t want to do when you really don’t want to be doing them.

Should We Build A Better BGP?

One story that seems to have flown under the radar this week with the Net Neutrality discussion being so dominant was the little hiccup with BGP on Wednesday. According to sources, sources inside AS39523 were able to redirect traffic from some major sites like Facebook, Google, and Microsoft through their network. Since the ISP in question is located inside Russia, there’s been quite a lot of conversation about the purpose of this misconfiguration. Is it simply an accident? Or is it a nefarious plot? Regardless of the intent, the fact that we live in 2017 and can cause massive portions of Internet traffic to be rerouted has many people worried.

Routing by Suggestion

BGP is the foundation of the modern Internet. It’s how routes are exchanged between every autonomous system (AS) and how traffic destined for your favorite cloud service or cat picture hosting provider gets to where it’s supposed to be going. BGP is the glue that makes the Internet work.

But BGP, for all of the greatness that it provides, is still very fallible. It’s prone to misconfiguration. Look no further than the Level 3 outage last month. Or the outage that Google caused in Japan in August. And those are just the top searches from Google. There have been a myriad of problems over the course of the past couple of decades. Some are benign. Some are more malicious. And in almost every case they were preventable.

BGP runs on the idea that people configuring it know what they’re doing. Much like RIP, the suggestion of a better route is enough to make BGP change the way that traffic flows between systems. You don’t have to be a evil mad genius to see this in action. Anyone that’s ever made a typo in their BGP border router configuration will tell you that if you make your system look like an attractive candidate for being a transit network, BGP is more than happy to pump a tidal wave of traffic through your network without regard for the consequences.

But why does it do that? Why does BGP act so stupid sometimes in comparison to OSPF and EIGRP? Well, take a look at the BGP path selection mechanism. CCIEs can probably recite this by heart. Things like Local Preference, Weight, and AS_PATH govern how BGP will install routes and change transit paths. Notice that these are all set by the user. There are not automatic conditions outside of the route’s origin. Unlike OSPF and EIGRP, there is no consideration for bandwidth or link delay. Why?

Well, the old Internet wasn’t incredibly reliable from the WAN side. You couldn’t guarantee that the path to the next AS was the “best” path. It may be an old serial link. It could have a lot of delay in the transit path. It could also be the only method of getting your traffic to the Internet. Rather than letting the routing protocol make arbitrary decisions about link quality the designers of BGP left it up to the person making the configuration. You can configure BGP to do whatever you want. And it will do what you tell it to do. And if you’ve ever taken the CCIE lab you know that you can make BGP do some very interesting things when you’re faced with a challenge.

BGP assumes a minimum level of competency to use correctly. The protocol doesn’t have any built in checks to avoid doing stupid things outside of the basics of not installing incorrect routes in the routing table. If you suddenly start announcing someone else’s AS with better metrics then the global BGP network is going to think you’re the better version of that AS and swing traffic your way. That may not be what you want. Given that most BGP outages or configurations of this type only last a couple of hours until the mistake is discovered, it’s safe to say that fat fingers cause big BGP problems.

Buttoning Down BGP

How do we fix this? Well, aside from making sure that anyone touching BGP knows exactly what they’re doing? Not much. Some Regional Internet Registrars (RIRs) require you to preconfigure new prefixes with them before they can be brought online. As mentioned in this Reddit thread, RIPE is pretty good about that. But some ISPs, especially ones in the US that work with ARIN, are less strict about that. And in some cases, they don’t even bring the pre-loaded prefixes online at the correct time. That can cause headaches when trying to figure out why your networks aren’t being announced even though your config is right.

Another person pointed out the Mutually Agreed Norms for Routing Security (MANRS). These look like some very good common sense things that we need to be doing to ensure that routing protocols are secure from hijacks and other issues. But, MANRS is still a manual setup that relies on the people implementing it to know what they’re doing.

Lastly, another option would be the Resource Public Key Infrastructure (RPKI) service that’s offered by ARIN. This services allows people that own IP Address space to specify which autonomous systems can originate their prefixes. In theory, this is an awesome idea that gives a lot of weight to trusting that only specific ASes are allowed to announce prefixes. In practice, it requires the use of PKI cryptographic infrastructure on your edge routers. And anyone that’s ever configured PKI on even simple devices knows how big of a pain that can be. Mixing PKI and BGP may be enough to drive people back to sniffing glue.


Tom’s Take

BGP works. It’s relatively simple and gets the job done. But it is far too trusting. It assumes that the people running the Internet are nerdy pioneers embarking on a journey of discovery and knowledge sharing. It doesn’t believe for one minute that bad people could be trying to do things to hijack traffic. Or, better still, that some operator fresh from getting his CCNP isn’t going to reroute Facebook traffic through a Cisco 2524 router in Iowa. BGP needs to get better. Or we need to make some changes to ensure that even if BGP still believes that the Internet is a utopia someone is behind it to ensure those rose colored glasses don’t cause it to walk into a bus.

Cloud Apps And Pathways

jam

Applications are king. Forget all the things you do to ensure proper routing in your data center. Forget the tweaks for OSPF sub-second failover or BGP optimal path selection. None of it matters to your users. If their login to Seibel or Salesforce or Netflix is slow today, you’ve failed. They are very vocal when it comes to telling you how much the network sucks today. How do we fix this?

Pathways Aren’t Perfect

The first problem is the cloud focus of applications. Once our packets leave our border routers it’s a giant game of chance as to how things are going to work next. The routing protocol games that govern the Internet are tried and true and straight out of RFC 1771(Yes, RFC 4271 supersedes it). BGP is a great tool with general purpose abilities. It’s becoming the choice for web scale applications like LinkedIn and Facebook. But it’s problematic for Internet routing. It scales well but doesn’t have the ability to make rapid decisions.

The stability of BGP is also the reason why it doesn’t react well to changes. In the old days, links could go up and down quickly. BGP was designed to avoid issues with link flaps. But today’s links are less likely to flap and more likely to need traffic moved around because of congestion or other factors. The pace that applications need to move traffic flows means that they tend to fight BGP instead of being relieved that it’s not slinging their traffic across different links.

BGP can be a good suggestion of path variables. That’s how Facebook uses it for global routing. But decisions need to be made on top of BGP much faster. That’s why cloud providers don’t rely on it beyond basic connectivity. Things like load balancers and other devices make up for this as best they can, but they are also points of failure in the network and have scalability limitations. So what can we do? How can we build something that can figure out how to make applications run better without the need to replace the entire routing infrastructure of the Internet?

GPS For Routing

One of the things that has some potential for fixing inefficiency with BGP and other basic routing protocols was highlighted during Networking Field Day 12 during the presentation from Teridion. They have a method for creating more efficiency between endpoints thanks to their agents. Founder Elad Rave explains more here:

I like the idea of getting “traffic conditions” from endpoints to avoid congestion. For users of cloud applications, those conditions are largely unknown. Even multipath routing confuses tried-and-true troubleshooting like traceroute. What needs to happen is a way to collect the data for congestion and other inputs and make faster decisions that aren’t beholden to the underlying routing structure.

Overlay networking has tried to do this for a while now. Build something that can take more than basic input and make decisions on that data. But overlays have issues with scaling, especially past the boundary of the enterprise network. Teridion has potential to help influence routing decisions in networks outside your control. Sadly, even the fastest enterprise network in the world is only as fast as an overloaded link between two level 3 interconnects on the way to a cloud application.

Teridion has the right idea here. Alternate pathways need to be identified and utilized. But that data needs to be evaluated and updated regularly. Much like the issues with Waze dumping traffic into residential neighborhoods when major arteries get congested, traffic monitors could cause overloads on alternate links if shifts happen unexpectedly.

The other reason why I like Teridion is because they are doing things without hardware boxes or the need to install software anywhere but the end host. Anyone working with cloud-based applications knows that the provider is very unlikely to provide anything outside of their standard offerings for you. And even if they manage, there is going to be a huge price tag. More often than not, that feature request will become a selling point for a new service in time that may be of marginal benefit until everyone starts using it. Then application performance goes down again. Since Teridion is optimizing communications between hosts it’s a win for everyone.


Tom’s Take

I think Teridion is on to something here. Crowdsourcing is the best way to gather information about traffic. Giving packets a better destination with shorter travel times means better application performance. Better performance means happier users. Happier users means more time spent solving other problems that have symptoms that aren’t “It’s slow” or “Your network sucks”. And that makes everyone happier. Even grumpy old network engineers.

Disclaimer

Teridion was a presenter during Networking Field Day 12 in San Francisco, CA. As a participant in Networking Field Day 12, my travel and lodging expenses were covered by Tech Field Day for the duration of the event. Teridion did not ask for nor where they promised any kind of consideration in the writing of this post. My conclusions here represent my thoughts and opinions about them and are mine and mine alone.

 

BGP: The Application Networking Dream

bgp

There was an interesting article last week from Fastly talking about using BGP to scale their network. This was but the latest in a long line of discussions around using BGP as a transport protocol between areas of the data center, even down to the Top-of-Rack (ToR) switch level. LinkedIn made a huge splash with it a few months ago with their Project Altair solution. Now it seems company after company is racing to implement BGP as the solution to their transport woes. And all because developers have finally pulled their heads out of the sand.

BGP Under Every Rock And Tree

BGP is a very scalable protocol. It’s used the world over to exchange routes and keep the Internet running smoothly. But it has other power as well. It can be extended to operate in other ways beyond the original specification. Unlike rigid protocols like RIP or OSPF, BGP was designed in part to be extended and expanded as needs changes. IS-IS is a very similar protocol in that respect. It can be upgraded and adjusted to work with both old and new systems at the same time. Both can be extended without the need to change protocol versions midstream or introduce segmented systems that would run like ships in the night.

This isn’t the first time that someone has talked about running BGP to the ToR switch either. Facebook mentioned in this video almost three years ago. Back then they were solving some interesting issues in their own data center. Now, those changes from the hyperscale world are filtering into the real world. Networking teams are seeking to solve scaling issues without resorting to overlay networks or other types of workarounds. The desire to fix everything wrong with layer 2 has led to a revelation of sorts. The real reason why BGP is able to work so well as a replacement for layer 2 isn’t because we’ve solved some mystical networking conundrum. It’s because we finally figured out how to build applications that don’t break because of the network.

Apps As Far As The Eye Can See

The whole reason when layer 2 networks are the primary unit of data center measurement has absolutely nothing to do with VMware. VMware vMotion behaves the way that it does because legacy applications hate having their addresses changed during communications. Most networking professionals know that MAC addresses have a tenuous association to IP addresses, which is what allows the gratuitous ARP after a vMotion to work so well. But when you try to move an application across a layer 3 boundary, it never ends well.

When web scale companies started building their application stacks, they quickly realized that being pinned to a particular IP address was a recipe for disaster. Even typical DNS-based load balancing only seeks to distribute requests to a series of IP addresses behind some kind of application delivery controller. With legacy apps, you can’t load balance once a particular host has resolved a DNS name to an IP address. Once the gateway of the data center resolves that IP address to a MAC address, you’re pinned to that device until something upsets the balance.

Web scale apps like those built by Netflix or Facebook don’t operate by these rules. They have been built to be resilient from inception. Web scale apps don’t wait for next hop resolution protocols (NHRP) or kludgy load balancing mechanisms to fix their problems. They are built to do that themselves. When problems occur, the applications look around and find a way to reroute traffic. No crazy ARP tricks. No sly DNS. Just software taking care of itself.

The implications for network protocols are legion. If a web scale application can survive a layer 3 communications issue then we are no longer required to keep the entire data center as a layer 2 construct. If things like anycast can be used to pin geolocations closer to content that means we don’t need to worry about large failover domains. Just like Ivan Pepelnjak (@IOSHints) says in this post, you can build layer 3 failure domains that just work better.

BGP can work as your ToR strategy for route learning and path selection because you aren’t limited to forcing applications to communicate at layer 2. And other protocols that were created to fix limitations in layer 2, like TRILL or VXLAN, become an afterthought. Now, applications can talk to each other and fail back and forth as they need to without the need to worry about layer 2 doing anything other than what it was designed to do: link endpoints to devices designed to get traffic off the local network and into the wider world.


Tom’s Take

One of the things that SDN has promised us is a better way to network. I believe that the promise of making things better and easier is a noble goal. But the part that has bothered me since the beginning was that we’re still trying to solve everyone’s problems with the network. We don’t rearrange the power grid every time someone builds a better electrical device. We don’t replumb the house overtime we install a new sink. We find a way to make the new thing work with our old system.

That’s why the promise of using BGP as a ToR protocol is so exciting. It has very little to do with networking as we know it. Instead of trying to work miracles in the underlay, we build the best network we know how to build. And we let the developers and programmers do the rest.

Is LISP The Answer to Multihoming?

LISPMultihoming

One of the biggest use cases for Locator/Identifier Separation Protocol (LISP) that will benefit small and medium enterprises is the ability to multihome to different service providers without needing to run Border Gateway Protocol (BGP). It’s the answer to a difficult and costly problem. But is it really the best solution?

Current SMB users may find themselves in a situation where they can’t run BGP. Perhaps their upstream ISP blocks the ability to establish a connection. In many cases, business class service is required with additional fees necessary to multihome. In order to take full advantage of independent links to different ISPs, two (or more) NAT configurations are required to send and receive packets correctly across the balanced connections. While technically feasible, it’s a mess to troubleshoot. It also doesn’t scale when multiple egress connections are configured. And more often that not, the configuration to make everything work correctly exists on a single router in the network, eliminating the advantages of multihoming.

LISP seeks to solve this by using a mapping database to send packets to the correct Ingress Tunnel Router (ITR) without the need for BGP. The diagram of a LISP packet looks a lot like an overlay. That’s because it is in many ways. The LISP packets are tunneled from an Egress Tunnel Router (ETR) to a LISP speaking decapsulation point. Depending on the deployment policies of LISP for a given ISP, it could be the next hop router on a connection. It could also be a router several hops upstream. LISP is capable of operating over non-LISP speaking connections, but it does eventually need decapsulation.

Where’s the Achille’s Heel in this design? LISP may solve the issue without BGP, but it does introduce the need for the LISP session to terminate on a single device (or perhaps a group of devices). This creates issues in the event the link goes down and the backup link needs to be brought online. That tunnel state won’t be preserved across the failover. It’s also a gamble to assume your ISP will support LISP. Many large ISPs should give you options to terminate LISP connections. But what about the smaller ISP that services many SMB companies? Does the local telephone company have the technical ability to configure a LISP connection? Let along making it redundant and highly available?

Right Tool For The Job

I think back to a lesson my father taught me about tools. He told me, “Son, you can use a screwdriver as a chisel if you try hard enough. But you’re better off spending the money to buy a chisel.” The argument against using BGP to multihome ISP connections has always come down to cost. I’ve gotten into heated discussions with people that always come back to the expense of upgrading to a business-class connection to run BGP or ensure availability. NAT may allow you to multihome across two residential cable modems, but why do you need 99.999% uptime across those two if you’re not willing to pay for it?

LISP solves one issue only to introduce more. I see LISP being misused the same way NAT has been. LISP was proposed by David Meyer to solve the exploding IPv4 routing table and the specter of an out-of-control IPv6 routing table.  While multihoming is certainly another function that it can serve, I don’t think that was Meyer’s original idea.  BGP might not be perfect, but it’s what we’ve got.  We’ve been using it for a while and it seems to get the job done.  LISP isn’t going to replace BGP by a long shot.  All you have to do it look at LISP ALternate Topology (LISP-ALT), which was the first iteration of the mapping database before the current LISP-TREE.  Guess what LISP-ALT used for mapping?  That’s right, BGP.


Tom’s Take

LISP multihoming for IPv4 or IPv6 in SMEs isn’t going to fix the problem we have today with trying to create redundancy from consumer-grade connections.  It is another overlay that will create some complexity and eventually not be adopted because there are still enough people out there that are willing to forgo an interesting idea simply because it came from Cisco.  IPv6 multihoming can be fixed at the protocol level.  Tuning router advertisements or configuring routes at the edge with BGP will get the job done, even if it isn’t as elegant as LISP.  Using the right tool for the right job is the way to make multihoming happen.