Hedgehog – The Network OS Distro?

You’ve probably seen by now that there’s a new entrant into the market for network operating systems. Hedgehog came out of stealth mode this week to fanfare from the networking community. If you read through the website you might question why I labeled them as a network operating system. While they aren’t technically the OS I think it’s more important to look at them as an OS distribution.

Cacophony of Choice

Hedgehog starts from a very simple premise. Cloud networking is where we’re all headed. Whether or not you’re running entirely on-premises, fully in the public cloud, or in some kind of super-multi-hybrid cloud offering you’re all chasing the same thing. You want a stable system that acts as a force multiplier for your operations teams to reduce deployment times for users to get their builds done. It’s been said before but the idea of cloud is to get IT out of the way of the business.

Streamlining processes means automating a lot of the things that were formerly done by people. That means building repeatable and consistent tools to make that happen. If anyone has ever worked on AWS or Google Cloud you have lots of access to that tooling. Perhaps it’s not as full-featured as rolling your own toolset but that’s the tradeoff for running in someone else’s cloud. Notice that I left Microsoft Azure off that list.

Azure’s networking stack has been built on SONiC, a LInux-based NOS that has been built to scale in the cloud and solve the challenges that Microsoft has faced in their hyperscale data centers. They’ve poured resources into making SONiC exactly what they needed. However, one of the challenges that is faced when that happens is some of those things don’t scale down to the enterprise. I’m not saying you shouldn’t use SONiC. I’m saying that it’s not easy to adapt SONiC to what you want to do if you’re not Microsoft.

Speedy Adoption

In a way, it’s the same problem that Linux faced 25 years ago. If you really wanted to run it on a system you could download the source code and compile it on your system to get the kernel running. However a kernel doesn’t do much without software running on top of it. How could I write blog posts or check the time or get on the Internet without other applications? That need for additional resources to make the process of using Linux easier and more complete is where we got the rise of the Linux distribution, often shortened to distro.

Distros made adopting Linux easier. You didn’t have to go out and find sources for programs to run them on your system after compiling the kernel. You could just install everything like a big package and get going. It also meant that you could swap out programs and tools much easier than other operating systems. Ever tried to get rid of Notepad on Windows? It’s practically a system tool. On the other hand I think most Linux users can tell me their five favorite text editors off the top of their head. The system is very extensible.

Hedgehog acts like the distro of yore for SONiC. It makes the process of obtaining the OS much easier than it would be otherwise and includes a toolset that amplifies your networking experience. The power of cloud networking comes from optimization and orchestration. Hedgehog gives you that capability. It allows you to run the same kinds of tooling that you would use in the cloud on your enterprise data center networking devices.

If you’re starting to standardize on Kubernetes to make your applications more portable to the cloud then Hedgehog and SONiC can help you. If you’re looking to build more edge computing functionality into the stack Hedgehog has you covered. The Hedgehog team is building the orchestration capabilities that are needed in the enterprise right now to help you leverage SONiC. Because that tooling doesn’t exist outside of Microsoft right now you can believe that the Hedgehog team is addressing the needs of enterprise operations teams. They are the drivers for SONiC adoption. Making sure they can take care of daily tasks is paramount.

The distro launched Linux into the enterprise. Clunky DIY tooling can only scale so far. If you want to be serious about adopting cloud-first mentality in your organization you need to make sure you’re using proven tools that scale and don’t fall apart every time you try to make a minor change. Your data center isn’t Facebook or Google or Azure. However the lessons learned there and the way that we apply them at the enterprise level will go a long way to providing the advantages of the cloud for every day use. Thanks to companies like Hedgehog that are concentrating on the way to bring that market we have a chance to see it sooner than we’d hoped.

To learn more about Hedgehog and how they make SONiC easier for the enterprise, make sure to check out their website at https://githedgehog.com/

QoS Is Dead. Long Live QoS!

Ah, good old Quality of Service. How often have we spent our time as networking professionals trying to discern the archaic texts of Szigeti to learn how to make you work? QoS is something that seemed so necessary to our networks years ago that we would spend hours upon hours trying to learn the best way to implement it for voice or bulk data traffic or some other reason. That was, until a funny thing happened. Until QoS was useless to us.

Rest In Peace and Queues

QoS didn’t die overnight. It didn’t wake up one morning without a home to go to. Instead, we slowly devalued and destroyed it over a period of years. We did it be focusing on the things that QoS was made for and then marginalizing them. Remember voice traffic?

We spent years installing voice over IP (VoIP) systems in our networks. And each of those systems needed QoS to function. We took our expertise in the arcane arts of queuing and applied it to the most finicky protocols we could find. And it worked. Our mystic knowledge made voice better! Our calls wouldn’t drop. Our packets arrived when they should. And the world was a happy place.

That is, until voice became pointless. When people started using mobile devices more and more instead of their desk phones, QoS wasn’t as important. When the steady generation of delay-sensitive packets instead moved back to LTE instead of IP it wasn’t as critical to ensure that FTP and other protocols in the LAN interfered with it. Even when people started using QoS on their mobile devices the marking was totally inconsistent. George Stefanick (@WirelesssGuru) found that Wi-Fi calling was doing some weird packet marking anyway:

So, without a huge packet generation issue, QoS was relegated to some weird traffic shaping roles. Maybe it was video prioritization in places where people cared about video? Or perhaps it was creating a scavenger class for traffic in order to get rid of unwanted applications like BitTorrent. But overall QoS languished as an oddity as more and more enterprises saw their collaboration traffic moving to be dominated by mobile devices that didn’t need the old dark magic of QoS.

QoupS de Gras

The real end of QoS came about thanks to the cloud. While we spent all of our time trying to find ways to optimize applications running on our local enterprise networks, developers were busy optimizing applications to run somewhere else. The ideas were sound enough in principle. By moving applications to the cloud we could continually improve them and push features faster. By having all the bit off the local network we could scale massively. We could even collaborate together in real time from anywhere in the world!

But applications that live in the cloud live outside our control. QoS was always bounded by the borders of our own networks. Once a packet was launched into the great beyond of the Internet we couldn’t control what happened to it. ISPs weren’t bound to honor our packet markings without an SLA. In fact, in most cases the ISP would remark all our packets anyway just to ensure they didn’t mess with the ISP’s ideas of traffic shaping. And even those were rudimentary at best given how well QoS plays with MPLS in the real world.

But cloud-based applications don’t worry about quality of service. They scale as large as you want. And nothing short of a massive cloud outage will make them unavailable. Sure, there may be some slowness here and there but that’s nothing less than you’d expect to receive running a heavy application over your local LAN. The real genius of the cloud shift is that it forced developers to slim down applications and make them more responsive in places where they could be made to be more interactive. Now, applications felt snappier when they ran in remote locations. And if you’ve every tried to use old versions of Outlook across slow links you now how critical that responsiveness can be.

The End is The Beginning

So, with cloud-based applications here to stay and collaboration all about mobile apps now, we can finally carve the tombstone for QoS right? Well, not quite.

As it turns out, we are still using lots and lots of QoS today in SD-WAN networks. We’re just not calling it that. Instead, we’ve upgraded the term to something more snappy, like “Application Visibility”. Under the hood, it’s not much different than the QoS that we’ve done for years. We’re still picking out the applications and figuring out how to optimize their traffic patterns to make them more responsive.

The key with the new wave of SD-WAN is that we’re marrying QoS to conditional routing. Now, instead of being at the mercy of the ISP link to the Internet we can do something else. We can push bulk traffic across slow cheap links and ensure that our critical business applications have all the space they want on the fast expensive ones instead. We can push our out-of-band traffic out of an attached 4G/LTE modem. We can even push our traffic across the Internet to a gateway closer to the SaaS provider with better performance. That last bit is an especially delicious piece of irony, since it basically serves the same purpose as Tail-end Hop Off did back in the voice days.

And how does all this magical new QoS work on the Internet outside our control? That’s the real magic. It’s all tunnels! Yes, in order to make sure that we get our traffic where it needs to be in SD-WAN we simply prioritize it going out of the router and wrap it all in a tunnel to the next device. Everything moves along the Internet and the hop-by-hop treatment really doesn’t care in the long run. We’re instead optimizing transit through our network based on other factors besides DSCP markings. Sure, when the traffic arrives on the other side it can be optimized based on those values. However, in the real world the only thing that most users really care about is how fast they can get their application to perform on their local machine. And if SD-WAN can point them to the fastest SaaS gateway, they’ll be happy people.


Tom’s Take

QoS suffered the same fate as Ska music and NCIS. It never really went away even when people stopped caring about it as much as they did when it was the hot new thing on the block. Instead, the need for QoS disappeared when our traffic usage moved away from the usage it was designed to augment. Sure, SD-WAN has brought it back in a new form, QoS 2.0 if you will, but the need for what we used to spend hours of time doing with ancient tomes on knowledge is long gone. We should have a quiet service for QoS and acknowledge all that it has done for us. And then get ready to invite it back to the party in the form that it will take in the cloud future of tomorrow.