The End of SD-WAN’s Party In China

As I was listening to Network Break Episode 257 from my friends at Packet Pushers, I heard Greg and Drew talking about a new development in China that could be the end of SD-WAN’s big influence there.

China has a new policy in place, according to Axios, that enforces a stricter cybersecurity stance for companies. Companies doing business in China or with offices in China must now allow Chinese officials to get into their networks to check for security issues as well as verifying the supply chain for network security.

In essence, this is saying that Chinese officials can have access to your networks at any time to check for security threats. But the subtext is a little less clear. Do they get to control the CPE as well? What about security constructs like VPNs? This article seems to indicate that as of January 1, 2020, there will be no intra-company VPNs authorized by any companies in China, whether Chinese or foreign businesses in China.

Tunnel Collapse

I talked with a company doing some SD-WAN rollouts globally in China all the way back in 2018. One of the things that was brought up in that interview was that China was an unknown for American companies because of the likelihood of changing that model in the future. MPLS is the current go-to connectivity for branch offices. However, because you can put an SD-WAN head-end unit there and build an encrypted tunnel back to your overseas HQ it wasn’t a huge deal.

SD-WAN is a wonderful way to ensure your branches are secure by default. Since CPE devices “phone home” and automatically build encrypted tunnels back to a central location, such as an HQ, you can be sure that as soon as the device powers on and establishes global connectivity that all traffic will be secure over your VPN until you change that policy.

Now, what happens with China’s new policy? All traffic must transit outside of a VPN. Things like web traffic aren’t as bad but what about email? Or traffic destined for places like AWS or Azure? It was an unmentioned fact that using SD-WAN VPNs to transit through the content filters in place in China was a way around issues that might arise from accessing resources inside of a very well secured country-wide network.

With the policy change and enforcement guidelines set forth to be enacted in 2020, this could be a very big deal for companies hoping to use SD-WAN in China. First and foremost, you can’t use your intra-company VPN functions any longer. That effectively means that your branch office can’t connect to the HQ or the rest of your corporate network. Given some of the questions around intellectual property issues in China that might not be a bad thing. However, it is going to cause issues for your users trying to access the mail and other support services. Especially if they are hosted somewhere that is going to create additional scrutiny.

The other potential issue is whether or not Chinese officials are even going to allow you to use CPE of your own choosing in the future. If the mandate is that officials should have access to your network for security concerns, who is to say they can’t just dictate what CPE you should use in order to facilitate that access. Larger companies can probably negotiate for come kind of on-site server that does network scanning. But smaller branches are likely going to need to have an all-in-one device at the head end doing all the work. The additional benefit for the Chinese is that control of the head end CPE ensures that you can’t build a site-to-site VPN anywhere.

Peering Into The Future

Greg and Drew pontificate a bit on the future on what this means for organizations from foreign countries doing business in China in the future. I tend to agree with them on a few points. I think you’re going to see a push for Chinese offices of major companies treating them like zero-trust endpoints. All communications will be trading minimal information. Networks won’t be directly connected, either by VPN substitute or otherwise.

Looking further down the road makes the plans even more murky. Is there a way that you can certify yourself to have a standard for cybersecurity? We have something similar with regulations here in the US were we can submit compliance reports for various agencies and submit to audits or have audits performed by third parties. But if the government won’t take that as an answer how do you even go about providing the level of detail they want? If the answer is “you can’t”, then the larger discussion becomes whether or not you can comply with their regulations and reduce your business exposure while still making money in this market. And that’s a conversation no technology can solve.


Tom’s Take

SD-WAN gives us a wonderful set of features included in the package. Things like application inspection are wonderful to look at on a dashboard but I’ve always been a bigger fan of the automatic VPN service. I like knowing that as soon as I turn up my devices they become secure endpoints for all my traffic. Alas, all the technology in the world can be defeated by business or government regulation. If the rules say you can’t have a feature, you either have to play by the rules or quit playing the game. It’s up to businesses to decide how they’ll react going forward. But SD-WAN’s greatest feature may now have to be an unchecked box on that dashboard.

The Confluence of SD-WAN and Microsegmentation

If you had to pick two really hot topics in the networking space right now, you’d be hard-pressed to find two more discussed than SD-WAN and microsegmentation. SD-WAN is the former “king of the hill” in the network engineering. I can remember having more conversations about SD-WAN in the last couple of years than anything else. But as the SD-WAN market has started to consolidate and iterate, a new challenger has arrived. Microsegmentation is the word of the day.

However, I think that SD-WAN and microsegmentation are quickly heading toward a merger of ideas and solutions. There are a lot of commonalities between the two technologies that make a lot of sense running together.

SD-WAN isn’t just about packet switching and routing any longer. That’s because networking people have quickly learned that packet-by-packet processing of traffic is inefficient. All of our older network analysis devices could only see things one IP packet at a time. But the new wave of devices think in terms of flows. They can analyze a stream of packets to figure out what’s going on. And what generates those flows?

Applications.

The key to the new wave of SD-WAN technology isn’t some kind of magic method of nailing up VPNs between branch offices. It’s not about adding new connectivity types. Instead, it’s about application identification. App identification is how SD-WAN does QoS now. The move to using app markers means a more holistic method of treating application traffic properly.

SD-WAN has significant value in application handling. I recently chatted with Kumar Ramachandran of CloudGenix and he echoed that part of the reason why they’ve been seeing growth and recently received a Series C funding round was because of what they’re doing with applications. The battle of MPLS versus broadband has already been fought. The value isn’t going to come from edge boxes unless there is software that can help differentiate the solutions.

Segmenting Your Traffic

So, what does this have to do with microsegmentation? If you’ve been following that market, you already know that the answer is the application. Microsegmentation doesn’t work on a packet-by-packet basis either. It needs to see all the traffic flows from an application to figure out what is needed and what isn’t. Platforms that do this kind of work are big on figuring out which protocols should be talking to which hosts and shutting everything else down to secure that communication.

Microsegmentation is growing in the cloud world for sure. I’ve seen and talked to people from companies like Guardicore, Illumio, ShieldX, and Edgewise in recent months. Each of them has a slightly different approach to doing microsegmentation. But they all look at the same basic approach form the start. The application is the basic building block of their technology.

With the growth of microsegmentation in the cloud market to help ensure traffic flows between hosts and sites is secured, it’s a no-brainer that the next big SD-WAN platform needs to add this functionality to their solution. I say this because it’s not that big of a leap to take the existing SD-WAN application analytics software that optimizes traffic flows over links and change it to restrict traffic flow with policy support.

For SD-WAN vendors, it’s another hedge against the inexorable march of traffic into the cloud. There are only so many Direct Connect analogs that you can build before Amazon decides to put you out of business. But, if you can integrate the security aspect of application analytics into your platform you can make your solution very sticky. Because that functionality is critical to meeting audit goals and ensuring compliance. And you’re going to wish you had it when the auditors come calling.


Tom’s Take

I don’t think the current generation of SD-WAN providers are quite ready to implement microsegmentation in their platforms. But I really wouldn’t be surprised to see it in the next revision of solutions. I also wonder if that means that some of the companies that have already purchased SD-WAN companies are going to look at that functionality. Perhaps it will be VMware building NSX microsegmentaiton on top of VeloCloud. Or maybe Cisco will include some of their microsegmentation from ACI in Viptela. They’re going to need to look at that strongly because once companies that are still on their own figure it out they’re going to be the go-to solution for companies looking to provide a good, secure migration path to the cloud. And all those roads lead to an SD-WAN device with microsegmentation capabilities.

QoS Is Dead. Long Live QoS!

Ah, good old Quality of Service. How often have we spent our time as networking professionals trying to discern the archaic texts of Szigeti to learn how to make you work? QoS is something that seemed so necessary to our networks years ago that we would spend hours upon hours trying to learn the best way to implement it for voice or bulk data traffic or some other reason. That was, until a funny thing happened. Until QoS was useless to us.

Rest In Peace and Queues

QoS didn’t die overnight. It didn’t wake up one morning without a home to go to. Instead, we slowly devalued and destroyed it over a period of years. We did it be focusing on the things that QoS was made for and then marginalizing them. Remember voice traffic?

We spent years installing voice over IP (VoIP) systems in our networks. And each of those systems needed QoS to function. We took our expertise in the arcane arts of queuing and applied it to the most finicky protocols we could find. And it worked. Our mystic knowledge made voice better! Our calls wouldn’t drop. Our packets arrived when they should. And the world was a happy place.

That is, until voice became pointless. When people started using mobile devices more and more instead of their desk phones, QoS wasn’t as important. When the steady generation of delay-sensitive packets instead moved back to LTE instead of IP it wasn’t as critical to ensure that FTP and other protocols in the LAN interfered with it. Even when people started using QoS on their mobile devices the marking was totally inconsistent. George Stefanick (@WirelesssGuru) found that Wi-Fi calling was doing some weird packet marking anyway:

So, without a huge packet generation issue, QoS was relegated to some weird traffic shaping roles. Maybe it was video prioritization in places where people cared about video? Or perhaps it was creating a scavenger class for traffic in order to get rid of unwanted applications like BitTorrent. But overall QoS languished as an oddity as more and more enterprises saw their collaboration traffic moving to be dominated by mobile devices that didn’t need the old dark magic of QoS.

QoupS de Gras

The real end of QoS came about thanks to the cloud. While we spent all of our time trying to find ways to optimize applications running on our local enterprise networks, developers were busy optimizing applications to run somewhere else. The ideas were sound enough in principle. By moving applications to the cloud we could continually improve them and push features faster. By having all the bit off the local network we could scale massively. We could even collaborate together in real time from anywhere in the world!

But applications that live in the cloud live outside our control. QoS was always bounded by the borders of our own networks. Once a packet was launched into the great beyond of the Internet we couldn’t control what happened to it. ISPs weren’t bound to honor our packet markings without an SLA. In fact, in most cases the ISP would remark all our packets anyway just to ensure they didn’t mess with the ISP’s ideas of traffic shaping. And even those were rudimentary at best given how well QoS plays with MPLS in the real world.

But cloud-based applications don’t worry about quality of service. They scale as large as you want. And nothing short of a massive cloud outage will make them unavailable. Sure, there may be some slowness here and there but that’s nothing less than you’d expect to receive running a heavy application over your local LAN. The real genius of the cloud shift is that it forced developers to slim down applications and make them more responsive in places where they could be made to be more interactive. Now, applications felt snappier when they ran in remote locations. And if you’ve every tried to use old versions of Outlook across slow links you now how critical that responsiveness can be.

The End is The Beginning

So, with cloud-based applications here to stay and collaboration all about mobile apps now, we can finally carve the tombstone for QoS right? Well, not quite.

As it turns out, we are still using lots and lots of QoS today in SD-WAN networks. We’re just not calling it that. Instead, we’ve upgraded the term to something more snappy, like “Application Visibility”. Under the hood, it’s not much different than the QoS that we’ve done for years. We’re still picking out the applications and figuring out how to optimize their traffic patterns to make them more responsive.

The key with the new wave of SD-WAN is that we’re marrying QoS to conditional routing. Now, instead of being at the mercy of the ISP link to the Internet we can do something else. We can push bulk traffic across slow cheap links and ensure that our critical business applications have all the space they want on the fast expensive ones instead. We can push our out-of-band traffic out of an attached 4G/LTE modem. We can even push our traffic across the Internet to a gateway closer to the SaaS provider with better performance. That last bit is an especially delicious piece of irony, since it basically serves the same purpose as Tail-end Hop Off did back in the voice days.

And how does all this magical new QoS work on the Internet outside our control? That’s the real magic. It’s all tunnels! Yes, in order to make sure that we get our traffic where it needs to be in SD-WAN we simply prioritize it going out of the router and wrap it all in a tunnel to the next device. Everything moves along the Internet and the hop-by-hop treatment really doesn’t care in the long run. We’re instead optimizing transit through our network based on other factors besides DSCP markings. Sure, when the traffic arrives on the other side it can be optimized based on those values. However, in the real world the only thing that most users really care about is how fast they can get their application to perform on their local machine. And if SD-WAN can point them to the fastest SaaS gateway, they’ll be happy people.


Tom’s Take

QoS suffered the same fate as Ska music and NCIS. It never really went away even when people stopped caring about it as much as they did when it was the hot new thing on the block. Instead, the need for QoS disappeared when our traffic usage moved away from the usage it was designed to augment. Sure, SD-WAN has brought it back in a new form, QoS 2.0 if you will, but the need for what we used to spend hours of time doing with ancient tomes on knowledge is long gone. We should have a quiet service for QoS and acknowledge all that it has done for us. And then get ready to invite it back to the party in the form that it will take in the cloud future of tomorrow.

The Voice of SD-WAN

SD-WAN is about migrating your legacy hardware away from silos like MPLS and policy-based routing and instead integrating everything under one dashboard and one central location to make changes and see the impacts that those changes have. But there’s one thing that SD-WAN can’t really do yet. And that’s prepare us the for the end of TDM voice.

Can You Hear Me Now?

Voice is a way of life for some people. Cisco spent years upon years selling CallManager into every office they could. From small two-line shops to global organizations with multiple PRIs and TEHO configured everywhere. It was a Cisco staple for years. Which also had Avaya following along quickly to get into the act too.

Today’s voice world is a little less clear. Millenials hate talking on the phone. Video is an oddity when it comes to communications. Asynchronous chat programs like WhatsApp or Slack rule the day today. People would rather communicate via text than voice. We all have mobile devices and the phone may be one of the least used apps on it.

Where does that leave traditional voice services? Not in a good place for sure. We still need phone lines for service-focused businesses or when we need to call a hotline for support. But the office phone system isn’t getting any new features anytime soon. The phone system is like the fax machine in the corner. It’s a feature complete system that is used when it has to be used by people that are forced to use it unhappily.

Voice systems are going to stay where they are by virtue of their ubiquity. They exist because TDM technology hasn’t really advanced in the past 20 years. We still have twisted pair connections to deliver FXO lines. We still have the most basic system in place to offer services to our potential customers and users. I know this personally because when I finally traded out my home phone setup for a “VoIP” offering from my cable provider, it was really just an FXS port on the back of a residential cable modem. That’s as high-tech as it gets. TDM is a solved problem.

Call If You WANt To

So, how does SD-WAN play into this? Well, as it turns out, SD-WAN is replacing the edge router very quickly. Devices that used to be Cisco ISRs are now becoming SD-WAN edge devices. They aggregate WAN connections and balance between them. They take MPLS and broadband and LTE instead of serial and other long-forgotten connection methods.

But you know what SD-WAN appliances can’t aggregate? TDM lines. They don’t have cards that can accept FXO, FXS, or even PRI lines. They don’t have a way to provide for DSP add-in cards or even come with onboard transcoding resources. There is no way for an SD-WAN edge appliance to function as anything other than a very advanced packet router.

This is a good thing for SD-WAN companies. It means that they have a focused, purpose built device that has more software features than hardware muscle. SD-WAN should be all about data packets. It’s not a multitool box. Even the SD-WAN vendors that ship their appliances with LTE cards aren’t trying to turn them into voice routers. They’re just easing the transition for people that want LTE backup for data paths.

Voice devices were moved out of the TDM station and shelf and into data routers as Cisco and other companies tried to champion voice over IP. We’re seeing the fallout from those decisions today. As the data routing devices become more specialized and focused on the software aspects of the technology, the hardware pieces that the ISR platform specialized in are now becoming a yoke holding the platform back. Now, those devices are causing those platforms to fail to evolve.

I can remember when I was first thinking about studying for my CCIE Voice lab back in 2007-2008. At the time, the voice lab still have a Catalyst 6500 switch running in it that needed to be configured. It had a single T1 interface on a line card that you had to get up and running in CallManager. The catch? That line card would only work with a certain Supervisor engine that only ran CatOS. So, you have to be intimately familiar with CatOS in order to run that lab. I decided that it wasn’t for me right then and there.

Hardware can hold the software back. ISRs can’t operate voice interfaces in SD-WAN mode. You can’t get all the advanced features of the software until you pare the hardware down to the bare minimum needed to route data packets. If you need to have the router function as a TDM aggregator or an SBC/IPIPGW you realize that the router really should be dedicated to that purpose. Because it’s functioning more as a TDM platform than a packet router at that point.


Tom’s Take

The world of voice that I lived in five or six years ago is gone. It’s been replaced with texting and Slack/Spark/WebEx Teams. Voice is dying. Cell phones connect us more than we’ve ever been before but yet we don’t want to talk to each other. That means that the rows and rows of desk phones we used to use are falling by the wayside. And so too are the routers that used to power them. Now, we’re replacing those routers with SD-WAN devices. And when the time finally comes for use to replace those TDM devices, what will we use? That future is very murky indeed.

Reclaiming 1.1.1.1 For The Internet

Hopefully by now you’ve seen the announcement that CloudFlare has opened a new DNS service at the address of 1.1.1.1. We covered a bit of it on this week’s episode of the Gestalt IT Rundown. Next to Gmail, it’s probably the best April Fool’s announcement I’ve seen. However, it would seem that the Internet isn’t quite ready for a DNS resolver service that’s easy to remember. And that’s thanks in part to the accumulation of bad address hygiene.

Not So Random Numbers

The address range of 1/8 is owned by APNIC. They’ve had it for many years now but have never announced it publicly. Nor have they ever made any assignments of addresses in that space to clients or customers. In a world where IPv4 space is at a premium, why would a RIR choose to lose 16 million addresses?

Edit: As pointed out by Dale Carder of ES.net in a comment below, APNIC has been assigning address space out of 1 /8 since 2010. However, the most commonly leaked prefixes in that subnet that are difficult to assign because of bogus announcements come from 1.0.0.0/14.

As it turns out, 1/8 is a pretty bad address space for two reasons. 1.1.1.1 and 1.2.3.4. These two addresses are responsible for most of the inadvertent announcements in the entire 1/8 space. 1.2.3.4 is easy to figure out. It’s the most common example IP address given when talking about something. Don’t believe me? Google is your friend. Instead of using 192.0.2.0/24 like we should be using, we instead use the most common PIN, password, and luggage combination in the world. But, at least 1.2.3.4 makes sense.

Why is 1.1.1.1 so popular? Well, the first reason is thanks to Airespace wireless controllers. Airespace uses 1.1.1.1 as the default virtual interface address for just about everything. Here’s a good explanation from Andrew von Nagy. When Airespace was sold to Cisco, this became a very popular address for Cisco wireless networks. Except now that it’s in use as a DNS resolver there are issues with using it. The wireless experts I’ve talked to recommend changing that address to 192.0.2.1, since that address has been marked off for examples only and will never be globally routable.

The other place where 1.1.1.1 seems to be used quite frequently is in Cisco ASA failover interfaces. Cisco documentation recommended using 1.1.1.1 for the primary ASA failover and 1.1.1.2 as the secondary interface. The heartbeats between those two interfaces were active as long as the units were paired together. But, if they were active and reachable then any traffic destined for those globally routable addresses would be black holed. Now, ASAs should probably be using 192.0.2.1 and 192.0.2.2 instead. But beware that this will likely require downtime to reconfigure.

The 1.1.1.1 address confusion doesn’t stop there. Some systems like Nomadix use them as the default logout address. Vodafone used to use it as an image caching server. ISPs are blocking it upstream in some ACLs. Some security organizations even recommend dropping traffic to 1/8 as a bogon prevention measure. There’s every chance that 1.1.1.1 is going to be dropped by something in your network or along the transit path.

Planning Not To Fail

So, how are you going to proceed if you really, really want to use CloudFlare DNS? Well, the first step is to make sure that you don’t have 1.1.1.1 configured anywhere in your network. That means checking WLAN controllers, firewalls, and example configurations. Odds are good you’re running RFC1918 space. But you should try to ping 1.1.1.1 anyway. If you can ping it, then you should traceroute the address. If the traceroute leaves your local network, you probably have a good path.

Once you’ve determined that you’re capable of reaching 1.1.1.1, you need to test it first. Configure it on a test machine or VM and make sure it’s actually resolving addresses for you. Better safe than sorry. Once you know it’s really working like you want it to work, configure it on your internal DNS servers as a forwarder. You still want internal control of DNS thanks to things like Active Directory. But configuring it as a forwarder means you can take advantage of all the features CloudFlare is building into the system while still retaining anything you’ve done locally.


Tom’s Take

I’m glad CloudFlare and APNIC are reclaiming 1.1.1.1 for some useful purpose. CloudFlare can take the traffic load of all the horribly misconfigured systems in the world. APNIC can use this setup to do some analytics work to find out exactly how screwed up things are. I wouldn’t be shocked to see something similar happen to 1.2.3.4 in the future if this bet pays off.

I’ve been using 1.1.1.1 since April 2nd and it works. It’s fast and hasn’t broken yet, which is the best that you can hope for from a DNS server. I’m sure I’ll play around with some of the advanced features as they come online but for now I’m just happy that one of the most recognizable IP addresses in the world is working for me.

Can Routing Be Oversimplified?

I don’t know if you’ve had a chance to see this Reddit thread yet, but it’s a funny one:

We eliminated routing protocols from our network!

Short non-clickbait summary: We deployed SD-WAN and turned off OSPF. We now have a /16 route for the internal network and a default route to the Internet where a lot of our workloads were moved into the cloud.

Bravo for this networking team for simplifying their network to this point. All other considerations aside, does this kind of future really bode well for SD-WAN?

Now You See Me

As pointed out in the thread above, the network team didn’t really get rid of their dynamic routing protocols. The SD-WAN boxes that they put in place are still running BGP or some other kind of setup under the hood. It’s just invisible to the user. That’s nothing new. Six years ago, Ivan Pepelnjak found out Juniper QFabric was running BGP behind the scenes too.

Hiding the networking infrastructure from the end user is nothing new. It’s a trick that has been used for years to allow infrastructures to be tuned and configured in such a way as to deliver maximum performance without letting anyone tinker with the secret sauce under the hood. You’ve been using it for years whether you realize it or not. Have MPLS? Core BGP routing is “hidden” from you. SD-WAN? Routing protocols are running between those boxes. Moved a bunch of workloads to AWS/Azure/GCE? You can better believe there is some routing protocol running under that stack.

Making things complex for the sake of making them hard to work on is foolish. We’ve spent decades and millions of dollars trying to make things easy. If you don’t believe me, look at the Apple iPhone. That device is a marvel at hiding all the complexity underneath. But, it also makes it really hard to troubleshoot when things go wrong.

Building On Shoulders

SD-WAN is doing great things for networking. I can remember years ago the thought of turning up a multi-site IPSec VPN configuration was enough to give me hives, let alone trying to actually do it. Today, companies like Viptela, VeloCloud, and Silver Peak make it easy to do. They’re innovating on top of the stack instead of inside it.

So much discussion in the community happens around building pieces of the stack. We spend time and effort making a better message protocol for routing information exchange. Or we build a piece of the HTTP stack that should be used in a bigger platform. We geek out about technical pieces because that’s where our energy feels the most useful.

When someone collects those stack pieces and tries to make them “easy”, we shout that company down and say that they’re hiding complexity and making the administrators and engineers “forget” how to do the real work. We spend more time focusing on what’s hidden and not enough on what’s being accomplished with the pieces. If you are the person that developed the fuel injection system in a car, are you going to sit there and tell Ford and Chevrolet than bundling it into a automotive platform is wrong?

So, while the end goal of any project like the one undertaken above is simplification or reducing problems because of less complex troubleshooting it is not a silver bullet. Hiding complexity doesn’t make it magically go away. Removing all your routing protocols in favor of a /16 doesn’t mean your routing networking runs any better. It means that your going to have to spend more time trying to figure out what went wrong when something does break.

Ask yourself this question: Would you rather spend more time building out the network and understand every nook and cranny of it or would you rather learn it on the fly when you’re trying to figure out why something isn’t working the way that it should? The odds are very good that you’re going to put the same amount of time into the network either way. Do you want to front load that time? Or back load it?


Tom’s Take

The Reddit thread is funny. Because half the people are dumping on the poster for his decision and the rest are trying to understand the benefits. It surely was created in such a way as to get views. And that worked admirably. But I also think there’s an important lesson to learn there. Simplicity for the sake of being simple isn’t enough. You have to replace that simplicity with due diligence. Because the alternative is a lot more time spent doing things you don’t want to do when you really don’t want to be doing them.

Programming Unbound

I’m doing some research on Facebook’s Open/R routing platform for a future blog post. I’m starting to understand the nuances a bit compared to OSPF or IS-IS, but during my reading I got stopped cold by one particular passage:

Many traditional routing protocols were designed in the past, with a strong focus on optimizing for hardware-limited embedded systems such as CPUs and RAM. In addition, protocols were designed as purpose-built solutions to solve the particular problem of routing for connectivity, rather than as a flexible software platform to build new applications in the network.

Uh oh. I’ve seen language like this before related to other software projects. And quite frankly, it worries me to death. Because it means that people aren’t learning their lessons.

New and Improved

Any time I see an article about how a project was rewritten from the ground up to “take advantage of new changes in protocols and resources”, it usually signals to me that some grad student decided to rewrite the whole thing in Java because they didn’t understand C. It sounds a bit cynical, but it’s not often wrong.

Want proof? Check out Linus Torvalds and his opinion about rewriting the Linux kernel in C++. Spoiler alert – “C++ is a horrible language.” And it gets more colorful from there. Linus has some very valid points about C++ that have been debated by lots of communities for the past ten years. But the fact remains that he has decided that completely rewriting the entire kernel in C++ is an exercise in futility.

In today’s world, we’re faced with a multitude of programming languages fighting for our attention. We’re evolved past FORTRAN, COBOL, C, and C++. We now live a world of Python, C#, J#, R, Ruby, and dozens more. And those don’t even include the languages that aren’t low-level and are more scripting. Every one of these languages was designed to solve a particular problem. And every one of them is begging to be used.

But it’s not enough that we have ten ways to write a function today. What’s more troublesome is that we’ve forgotten why certain languages were preferred over others in the past. We’ve forgotten that things used to be done the way they were done because we had no other alternatives. I can remember studying for my Novell CNE and taking 50-649, wherein Novell kept referring to OSPF as an “expensive” protocol to use on a NetWare server. At the time that test was created, OSPF was expensive from a CPU cycle standpoint. If the server was doing other things besides running a routing protocol you might see a potential impact if the CPU was slowed. And having OSPF calculations interrupted because someone was doing an FTP transfer could be expensive indeed.

No Wasted Space

More to the point, when people are faced with a limitation they have to be creative and concise. And nowhere is that more apparent than in E.T. the Extraterrestrial for the Atari 2600. Infamously, Howard Scott Warshaw had just one month to write a video game that would go on to be blasted critically, considered one of the worst of all time, and be blamed for the Video Game Crash of 1983. Yet, as one fan discovered years later when he set out to “fix” the game’s code, as bad is it may have been it was very well coded. From the article:

…it’s unlikely that Howard Scott Warshaw (the developer) included some useless code for us to replace…

So, a programmer for an outdated video game system had a month to code a complex game and managed to do it in such a way as to leave very little empty space to insert code patches? And yet my Facebook app on my iPhone requires how much space?!?

All joking aside, the issue with E.T. wasn’t code quality. The problems with the game have been documented over the years, but almost no one blames Warshaw’s coding capabilities. That’s because Warshaw was working within his limitations. He couldn’t magically make the Atari 2600 cartridge bigger. He couldn’t increase the CPU size on the system. He worked within him limitations and made the best game that he could make.

Now, let’s look at an article about Open/R and see some of Facebook’s criticisms of OSPF and IS-IS:

We didn’t want to get bogged down in discussions over the lower-level protocol details, such as frame formatting and handshakes…

While it might sound heavyweight compared with OSPF and ISIS, which use their own “lightweight” transports, we haven’t found this to be an issue in modern networking hardware…

Whether or not they were intended to be taken as such, these are some pretty interesting knocks against OSPF and IS-IS. What Facebook is essentially saying is that they didn’t want to worry about building the low level parts of the messaging system, so they picked something off the shelf. They also built it to be more resource intensive than necessary because they didn’t need to compromise when running it on Six Pack and Wedge.

So long as your routers have ample CPU cycles and memory, Open/R will run just fine. But how many people out there are running a data center server board in their edge router? How many routers out there have to take a reduced BGP table because they don’t have enough memory to fit the entire global IPv4 routing table in memory, let alone IPv6? If resources are infinite and time is irrelevant than building your protocols the way you want is of no consequence. But as soon as you add constraints to the equation, like support for older hardware or limited memory, you have to start making compromises to make things work.


Tom’s Take

I’m not saying that Open/R is a bad routing protocol. I’m going to save that analysis for a later time. But I do take a bit of umbrage with Facebook’s idea that OSPF and IS-IS are a bit outdated simply because they were programmed for a different era. If they were really that inept they would have been replaced or expanded by now. The fact that twenty-somethings got a bug to rewrite a routing protocol because they could and threw all caution to the wind with regard to resource usage should be a cautionary tale to any programmer out there. Never assume that you have more space than you need. Train yourself to do more with less. And be ready to compromise in case the worst case scenario becomes reality.