OpenFlow Is Dead. Long Live OpenFlow.

Posted on November 29, 2016 by networkingnerd

The King Is Dead - Long Live The King

Remember OpenFlow? The hammer that was set to solve all of our vaguely nail-like problems? Remember how everything was going to be based on OpenFlow going forward and the world was going to be a better place? Or how heretics like Ivan Pepelnjak (@IOSHints) that dared to ask questions about scalability or value of application were derided and laughed at? Yeah, good times. Today, I stand here to eulogize OpenFlow, but not to bury it. And perhaps find out that OpenFlow has a much happier life after death.

OpenFlow Is The Viagra Of Networking

OpenFlow is not that much different than Sildenafil, the active ingredient in Vigara. Both were initially developed to do something that they didn’t end up actually solving. In the case of Sildenafil, it was high blood pressure. The “side effect” of raising blood pressure in a specific body part wasn’t even realized until after the trials of the drug. The side effect because the primary focus of the medication that was eventually developed into a billion dollar industry.

In the same way, OpenFlow failed at its stated mission of replacing the forwarding plane programming method of switches. As pointed out by folks like Ivan, it had huge scalability issues. It was a bit clunky when it came to handling flow programming. The race from 1.0 to 1.3 spec finalization left vendors in the dust, but the freeze on 1.3 for the past few years has really hurt innovation. Objectively, the fact that almost no major shipping product uses OpenFlow as a forwarding paradigm should be evidence of it’s failure.

The side effect of OpenFlow is that it proved that networking could be done in software just as easily as it could be done in hardware. Things that we thought we historically needed ASICs and FPGAs to do could be done by a software construct. OpenFlow proved the viability of Software Defined Networking in a way that no one else could. Yet, as people abandoned it for other faster protocols or rewrote their stacks to take advantage of other methods, OpenFlow did still have a great number of uses.

OpenFlow Is a Garlic Press, Not A Hammer

OpenFlow isn’t really designed to solve every problem. It’s not a generic tool that can be used in a variety of situations. It has some very specific use cases that it does excel at doing, though. Think more like a garlic press. It’s a use case tool that is very specific for what it does and does that thing very well.

This video from Networking Field Day 13 is a great example of OpenFlow being used for a specific task. NEC’s flavor of OpenFlow, ProgrammableFlow, is used on conjunction with higher layer services like firewalls and security appliances to mitigate the spread of infections. That’s a huge win for networking professionals. Think about how hard it would be to track down these systems in a network of thousands of devices. Even worse, with the level of virulence of modern malware, it doesn’t take long before the infected system has infected others. It’s not enough to shut down the payload. The infection behavior must be removed as well.

What NEC is showing is the ultimate way to stop this from happening. By interrogating the flows against a security policy, the flow entries can be removed from switches across the network or have deny entries written to prevent communications. Imagine being able to block a specific workstation from talking to anything on the network until it can be cleaned. And have that happen automatically without human interaction. What if a security service could get new malware or virus definitions and install those flow entries on the fly? Malware could be stopped before it became a problem.

This is where OpenFlow will be headed in the future. It’s no longer about adapting the problems to fit the protocol. We can’t keep trying to frame the problem around how much it resembles a nail just so we can use the hammer in our toolbox. Instead, OpenFlow will live on as a point protocol in a larger toolbox that can do a few things really well. That means we’ll use it when we need to and use a different tool when needed that better suits the problem we’re actually trying to solve. That will ensure that the best tool is used for the right job in every case.

Tom’s Take

OpenFlow is still useful. Look at what Coho Data is using it for. Or NEC. Or any one of a number of companies that are still developing on it. But the fact that these companies have put significant investment and time into the development of the protocol should tell you what the larger industry thinks. They believe that OpenFlow is a dead end that can’t magically solve the problems they have with their systems. So they’ve moved to a different hammer to bang away with. I think that OpenFlow is going to live a very happy life now that people are leaving it to solve the problems it’s good at solving. Maybe one day we’ll look back on the first life of OpenFlow not as a failure, but instead as the end of the beginning of it become what it was always meant to be.

Nutanix and Plexxi – An Affinity to Converge

Posted on November 23, 2016 by networkingnerd

Nutanix has been lighting the hyperconverged world on fire as of late. Strong sales led to a big IPO for their stock. They are in a lot of conversations about using their solution in place of large traditional virtualization offerings that include things like blade servers or big boxes. And even coming off the recent Nutanix .NEXT conference there were some big announcements in the networking arena to help them complete their total solution. However, I think Nutanix is missing a big opportunity that’s right in front of them.

I think it’s time for Nutanix to buy Plexxi.

Software Says

If you look at the Nutanix announcements around networking from .NEXT, they look very familiar to anyone in the server space. The highlights include service chaining, microsegmentation, and monitoring all accessible through an API. If this sounds an awful lot like VMware NSX, Cisco ACI, or any one of a number of new networking companies then you are in the right mode of thinking as far as Nutanix is concerned.

SDN in the server space is all about overlay networking. Segmentation of flows and service chaining are the reason why security is so hard to do in the networking space today. Trying to get traffic to behave in a certain way drives networking professionals nuts. Monitoring all of that to ensure that you’re actually doing what you say you’re doing just adds complexity. And the API is the way to do all of that without having to walk down to the data center to console into a switch and learn a new non-Linux CLI command set.

SDN vendors like VMware and Cisco ACI would naturally have jumped onto these complaints and difficulties in the networking world and both have offered solutions for them with their products. For Nutanix to have bundled solutions like this into their networking offering is no accident. They are looking to battle VMware head-to-head and need to offer the kind of feature parity that it’s going to take a make medium to large shops shift their focus away from the VMware ecosystem and take a long look at what Nutanix is offering.

In a way, Nutanix and VMware are starting to reinforce the idea that the network isn’t a magical realm of protocols and tricks that make applications work. Instead, it’s a simple transport layer between locations. For instance, Amazon doesn’t rely on the magic of the interstate system to get your packages from the distribution center to your home. Instead, the interstate system is just a transport layer for their shipping overlays – UPS, FedEX, and so on. The overlay is where the real magic is happening.

Nutanix doesn’t care what your network looks like. They can do almost everything on top of it with their overlay protocols. That would seem to suggest that the focus going forward should be to marginalize or outright ignore the lower layers of the network in favor of something that Nutanix has visibility into and can offer control and monitoring of. That’s where the Plexxi play comes into focus.

Plexxi Logo

Affinity for Awesome

Plexxi has long been a company in search of a way to sell what they do best. When I first saw them years ago, they were touting their Affinities idea as a way to build fast pathways between endpoints to provide better performance for applications that naturally talked to each other. This was a great idea back then. But it quickly got overshadowed by the other SDN solutions out there. It even caused Plexxi to go down a slightly different path for a while looking at other options to compete in a market that they didn’t really have a perfect fit product.

But the Affinities idea is perfect for hyperconverged solutions. Companies like Nutanix are marking their solutions as the way to create application-focused compute nodes on-site without the need to mess with the cloud. It’s a scalable solution that will eventually lead to having multiple nodes in the future as your needs expand. Hyperconverged was designed to be consumable per compute unit as opposed to massively scaling out in leaps and bounds.

Plexxi Affinities is just the tip of the iceberg. Plexxi’s networking connectivity also gives Nutanix the ability to build out a high-speed interconnect network with one advantage – noninterference. I’m speaking about what happens when a customer needs to add more networking ports to support this architecture. They need to make a call to their Networking Vendor of Choice. In the case of Cisco, HPE, or others, that call will often involve a conversation about what they’re doing with the new network followed by a sales pitch for their hyperconverged solution or a partner solution that benefits both companies. Nutanix has a reputation for being the disruptor in traditional IT. The more they can keep their traditional competitors out of the conversation, the more likely they are to keep the business into the future.

Tom’s Take

Plexxi is very much a company with an interesting solution in need of a friend. They aren’t big enough to really partner with hyperconverged solutions, and most of the hyperconverged market at this point is either cozy with someone else or not looking to make big purchases. Nutanix has the rebel mentality. They move fast and strike quickly to get their deals done. They don’t take prisoners. They look to make a splash and get people talking. The best way to keep that up is to bundle a real non-software networking component alongside a solution that will make the application owners happy and keep the conversation focused on a single source. That’s how Cisco did it back and the day and how VMware has climbed to the top of the virtualization market.

If Nutanix were to spend some of that nice IPO money on a Plexxi Christmas present, I think 2017 would be the year that Nutanix stops being discussed in hushed whispers and becomes a real force to be reckoned with up and down the stack.

Facebook Wedge 100 – The Future of the Data Center?

Posted on October 19, 2016 by networkingnerd

Facebook is back in the news again. This time, it’s because of the release of their new Wedge 100 switch into the Open Compute Project (OCP). Wedge was already making headlines when Facebook announced it two years ago. A fast, open sourced 40Gig Top-of-Rack (ToR) switch was huge. Now, Facebook is letting everyone in on the fun of a faster Wedge that has been deployed into production at Facebook data centers as well as being offered for sale through Edgecore Networks, which is itself a division of Accton. Accton has been leading the way in the whitebox switching market and Wedge 100 may be one of the ways it climbs to the top.

Holy Hardware!

Wedge 100 is pretty impressive from the spec sheet. They paid special attention to making sure the modules were expandable, especially for faster CPUs and special purpose devices down the road. That’s possible because Wedge is a highly specialized micro server already. Rather than rearchitecting the guts of the whole thing, Facebook kept the CPU and the monitoring stack and just put newer, faster modules on it to ramp to 32x100Gig connectivity.

As suspected in the above image, Facebook is using Broadcom Tomahawk as the base connectivity in their switch, which isn’t surprising. Tomahawk is the roadmap for all vendors to get to 100Gig. It also means that the downlink connectivity for these switches could conceivably work in 25/50Gig increments. However, given the enormous amount of east/west traffic that Facebook must generate, Facebook has created a server platform they call Yosemite that has 100Gig links as well. Given the probably backplane there, you can imagine the data that’s getting thrown around the data centers.

That’s not all. Omar Baldonado has said that they are looking at going to 400Gig connectivity soon. That’s the kind of mind blowing speed that you see in places like Google and Facebook. Remember that this hardware is built for a specific purpose. They don’t just have elephant flows. They have flows the size of an elephant herd. That’s why they fret about the operating temperature of optics or the rack design they want to use (standard versus Open Racks). Because every little change matters a thousand fold at that scale.

Software For The People

The other exciting announcement from Facebook was on the software front. Of course, FBOSS has been updated to work with Wedge 100. I found it very interesting in the press release that much of the programming in FBOSS went into interoperability with Wedge 40 and with fixing the hardware side of things. This makes some sense when you realize that Facebook didn’t need to spend a lot of time making Wedge 40 interoperate with anything, since it was a wholesale replacement. But Wedge 100 would need to coexist with Wedge 40 as the rollout happens, so making everything play nice is a huge point on the checklist.

The other software announcement that got the community talking was support for third-party operating systems running on Wedge 100. The first one up was Open Network Linux from Big Switch Networks. ONL ran on the original Wedge 40 and now runs on the Wedge 100. This means that if you’re familiar with running BSN OSes on your devices, you can drop in a Wedge 100 in your spine or fabric and be ready to go.

The second exciting announcement about software comes from a new company, Apstra. Apstra announced their entry into OCP and their intent to get their Apstra Operating System (AOS) running on Wedge 100 by next year. That has a big potential impact for Apstra customers that want to deploy these switches down the road. I hope to hear more about this from Apstra during their presentation at Networking Field Day 13 next month.

Tom’s Take

Facebook is blazing a trail for fast ToR switches. They’ve got the technical chops to build what they need and release the designs to the rest of the world to be used for a variety of ideas. Granted, your data center looks nothing like Facebook. But the ideas they are pioneering are having an impact down the line. If Open Rack catches on you may see different ideas in data center standardization. If the Six Pack catches on as a new chassis concept, it’s going to change spines as well.

If you want to get your hands dirty with Wedge, build a new 100Gig pod and buy one from Edgecore. The downlinks can break out into 10Gig and 25Gig links for servers and knowing it can run ONL or Apstra AOS (eventually) gives you some familiar ground to start from. If it runs as fast as they say it does, it may be a better investment right now than waiting for Tomahawk II to come to your favorite vendor.

Tomahawk II – Performance Over Programmability

Posted on October 12, 2016 by networkingnerd

tomahawk2

Broadcom announced a new addition to their growing family of merchant silicon today. The new Broadcom Tomahawk II is a monster. It doubles the speed of it’s first-generation predecessor. It has 6.4 Tbps of aggregate throughout, divided up into 256 25Gbps ports that can be combined into 128 50Gbps or even 64 100Gbps ports. That’s fast no matter how you slice it.

Broadcom is aiming to push these switches into niches like High-Performance Computing (HPC) and massive data centers doing big data/analytics or video processing to start. The use cases for 25/50Gbps haven’t really changed. What Broadcom is delivering now is port density. I fully expect to see top-of-rack (ToR) switches running 25Gbps down to the servers with new add-in cards connected to 50Gbps uplinks that deliver them to the massive new Tomahawk II switches running in the spine or end-of-row (EoR) configuration for east-west traffic disbursement.

Another curious fact of the Tomahawk II is the complete lack of 40Gbps support. Granted, the support was only paid lip service in the Tomahawk I. The real focus was on shifting to 25/50Gbps instead of the weird 10/40/100Gbps split we had in Trident II. I talked about this a couple of years ago and wasn’t very high on it back then, but I didn’t know the level of apathy people had for 40Gbps uplinks. The push to 25/50Gbps has only been held up so far by the lack of availability of new NICs for servers to enable faster speeds. Now that those are starting to be produced in volume expect the 40Gbps uplinks to be a relic of the past.

A Foot In The Door

Not everyone is entirely happy about the new Broadcom Tomahawk II. I received an email today with a quote from Martin Izzard of Barefoot Networks, discussing their new Tofino platform. He said in part:

Barefoot led the way in June with the introduction of Tofino, the world’s first fully programmable switches, which also happen to be the fastest switches ever built.

It’s true that Tofino is very fast. It was the first 6.4 Tbps switch on the market. I talked a bit about it a few months ago. But I think that Barefoot is a bit off on its assessment here and has a bit of an axe to grind.

Barefoot is pushing something special with Tofino. They are looking to create a super fast platform with programmability. P4 is not quite an FPGA and it’s not an ASIC. It’s a switch stripped to its core and rebuilt with a language all its own. That’s great if you’re a dev shop or a niche market that has to squeeze every ounce of performance out of a switch. In the world of cars, the best analogy would be looking at Tofino like a specialized sports car like a Koenigsegg Agera. It’s very fast and very stylish, but it’s purpose built to do one thing – drive really fast on pavement and carry two passengers.

Broadcom doesn’t really care about development shops. They don’t worry about niche markets. Because those users are not their customers. Their customers are Arista, Cisco, Brocade, Juniper and others. Broadcom really is the Intel of the switching world. Their platforms power vendor offerings. Buying a basic Tomahawk II isn’t something you’re going to be able to do. Broadcom will only sell these in huge lots to companies that are building something with them. To keep the car analogy, Tomahawk II is more like the old F-body cars produced by Chevrolet that later went on to become Camaros, Firebirds, and Trans Ams. Each of these cars was distinctive and had their fans, but the chassis was the same underneath the skin.

Broadcom wants everyone to buy their silicon and use it to power the next generation of switches. Barefoot wants a specialist kit that is faster than anything else on the market, provided you’re willing to put the time into learning P4 and stripping out all the bits they feel are unnecessary. Your use case determines your hardware. That hasn’t changed, nor is it likely to change any time soon.

Tom’s Take

The data center will be 25/50/100Gbps top to bottom when the next switch refresh happens. It could even be there sooner if you want to move to a pod-based architecture instead of more traditional designs. The odds are very good that you’re going to be running Tomahawk or Tomahawk II depending on which vendor you buy from. You’re probably only going to be running something special like Tofino or maybe even Cavium if you’ve got a specific workload or architecture that you need performance or programmability.

Don’t wait for the next round of hardware to come out before you have an upgrade plan. Write it now. Think about where you want to be in 4 years. Now double your requirements. Start investigating. Ask your vendor of choice what their plans are. If their plans stink, as their competitor. Get quotes. Get ideas. Be ready for the meeting when it’s scheduled. Make sure you’re ready to work with your management to bury the hatchet, not get a hatchet jobbed network.

Cisco vs. Arista: Shades of Gray

Posted on August 24, 2016 by networkingnerd

CiscoVArista

Yesterday was D-Day for Arista in their fight with Cisco over the SysDB patent. I’ve covered this a bit for Network Computing in the past, but I wanted to cover some new things here and put a bit more opinion into my thoughts.

Cisco Designates The Competition

As the great Stephen Foskett (@SFoskett) says, you always have to punch above your weight. When you are a large company, any attempt to pick on the “little guy” looks bad. When you’re at the top of the market it’s even tougher. If you attempt to fight back against anyone you’re going to legitimize them in the eye of everyone else wanting to take a shot at you.

Cisco has effectively designated Arista as their number one competitor by way of this lawsuit. Arista represents a larger threat that HPE, Brocade, or Juniper. Yes, I agree that it is easy to argue that the infringement constituted a material problem to their business. But at the same time, Cisco very publicly just said that Arista is causing a problem for Cisco. Enough of a problem that Cisco is going to take them to court. Not make Arista license the patent. That’s telling.

Also, Cisco’s route of going through the ITC looks curious. Why not try to get damages in court instead of making the ITC ban them from importing devices? I thought about this for a while and realized that even if there was a court case pending it wouldn’t work to Cisco’s advantage in the short term. Cisco doesn’t just want to prove that Arista is copying patents. They want to hurt Arista. That’s why they don’t want to file an injunction to get the switches banned. That could take years and involve lots of appeals. Instead, the ITC can just simply say “You cannot bring these devices into the country”, which effectively bans them.

Cisco has gotten what it wants short term: Arista is going to have to make changes to SysDB to get around the patent. They are going to have to ramp up domestic production of devices to get around the import ban. Their train of development is disrupted. And Cisco’s general counsel gets to write lots of posts about how they won.

Yet, even if Arista did blatantly copy the SysDB stuff and run with it, now Cisco looks like the 800-pound gorilla stomping on the little guy through the courts. Not by making better products. Not by innovating and making something that eclipses the need for software like this. No, Cisco won by playing the playground game of “You stole my idea and I’m going to tell!!!”

Arista’s Three-Edged Sword

Arista isn’t exactly coming out of this on the moral high ground either. Arista has gotten a black eye from a lot of the quotes being presented as evidence in this case. Ken Duda said that Arista “slavishly copied” Cisco’s CLI. There have been other comments about “secret sauce” and the way that SysDB is used. A few have mentioned to me privately that the copying done by Arista was pretty blatant.

Understanding is a three-edged sword: Your side, their side, and the truth.

Arista’s side is that they didn’t copy anything important. Cisco’s side is that EOS has enough things that have been copied that it should be shut down and burned to the ground. The truth lies somewhere in the middle of it all.

Arista didn’t copy everything from IOS. They hired people who worked on IOS and likely saw things they’d like to implement. Those people took ideas and ran with them to come up with a better solution. Those ideas may or may not have come from things that were worked on at Cisco. But if you hire a bunch of employees from a competitor, how do you ensure that their ideas aren’t coming from something they were supposed to have “forgotten”?

Arista most likely did what any other company in that situation would do: they gambled. Maybe SysDB was more copied that created. But so long as Arista made money and didn’t really become a blip on Cisco radar. That’s telling. Listen to this video, which starts at 4:40 and goes to about 6:40:

Doug Gourlay said something that has stuck with me for the last four years: “Everyone that ever set out to compete against Cisco and said, ‘We’re going to do it and be everything to everyone’ has failed. Utterly.”

Arista knew exactly which market they wanted to attack: 10Gig and 40Gig data center switches. They made the best switch they could with the best software they could and attacked that market with all the force they could muster. But, the gamble would eventually have to either pay off or come due. Arista had to know at some point that a strategy shift would bring them under the crosshairs of Cisco. And Cisco doesn’t forgive if you take what’s theirs. Even if, and I’m quoting from both a Cisco 10-K from 1996 and a 2014 Annual Report:

[It is] not economically practical or even possible to determine in advance whether a product or any of its components infringes or will infringe on the patent rights of others.

So Arista built the best switch they could with the knowledge that some of their software may not have been 100% clean. Maybe they had plans to clean it up later. Or iterate it out of existence. Who knows? Now, Arista has to face up to that choice and make some new ones to keep selling their products. Whether or not they intended to fight the 800-pound gorilla of networking at the start, they certainly stumbled into a fight here.

Tom’s Take

I’m not a lawyer. I don’t even pretend to be one. I do know that the fate of a technology company now rests in the hands of non-technical people that are very good and wringing nuance out of words. Tech people would look at this and shake their heads. Did Arista copy something? Probably? Was is something Cisco wanted copied? Probably not? Should Cisco have unloaded the legal equivalent of a thermonuclear warhead on them? Doubtful.

Cisco is punishing Arista to ensure no one every copies their ideas again. As I said before, the outcome of this case will doom the Command Line Interface. No one is going to want to tangle with Cisco again. Which also means that no one is going to want to develop things along the Cisco way again. Which means Cisco is going to be less relevant in the minds of networking engineers as REST APIs and other programming architectures become more important that remembering to type conf t every time.

Arista will survive. They will make changes that mean their switches will live on for customers. Cisco will survive. They get to blare their trumpets and tell the whole world they vanquished an unworthy foe. But the battle isn’t over yet. And instead of it being fought over patents, it’s going to be fought as the industry moves away from CLI and toward a model that doesn’t favor those who don’t innovate.

Will Dell Networking Wither Away?

Posted on June 28, 2016 by networkingnerd

chopping-block-Dell-EMC

The behemoth merger of Dell and EMC is nearing conclusion. The first week of August is the target date for the final wrap up of all the financial and legal parts of the acquisition. After that is done, the long task of analyzing product lines and finding a way to reduce complexity and product sprawl begins. We’ve already seen the spin out of Quest and Sonicwall into a separate entity to raise cash for the final stretch of the acquisition. No doubt other storage and compute products are going to face a go/no go decision in the future. But one product line which is in real danger of disappearing is networking.

Whither Whitebox?

The first indicator of the problems with Dell and networking comes from whitebox switching. Dell released OS 10 earlier this year as a way to capitalize on the growing market of free operating systems running on commodity hardware. Right now, OS 10 can run on Dell equipment. In the future, they are hoping to spread it out to whitebox devices. That assumes that soon you’ll see Dell branded OSes running on switches purchased from non-Dell sources booting with ONIE.

Once OS 10 pushes forward, what does that mean for Dell’s hardware business? Dell would naturally want to keep selling devices to customers. Whitebox switches would undercut their ability to offer cheap ports to customers in data center deployments. Rather than give up that opportunity, Dell is positioning themselves to run some form of Dell software on top of that hardware for management purposes, which has always been a strong point for Dell. Losing the hardware means little to Dell if they have to lose profit margin to keep it there in the first place.

The second indicator of networking issues comes from comments from Michael Dell at EMCworld this year. Check out this short video featuring him with outgoing EMC CEO Joe Tucci:

Some of the telling comments in here involve Michael Dell’s praise for the NSX business model and how it is being adopted by a large number of other vendors in the industry. Also telling is their reaffirmation that Cisco is an important partnership in VCE and won’t be going away any time soon. While these two things don’t seem to be related on the surface, they both point to a truth Dell is trying hard to accept.

In the future, with overlay network virtualization models gaining traction in the data center, the underlying hardware will matter little. In almost every case, the hardware choice will come down to one of two options:

Which switch is the cheapest?
Which switch is on the Approved List?

That’s it. That’s the whole decision tree. No one will care what sticker is on the box. They will only care that it didn’t cost a fortune and that they won’t get fired for buying it. That’s bad for companies that aren’t making white boxes or named Cisco. Other network vendors are going to try and add value in some way, but the overlay sitting on top of those bells and whistles will make it next to impossible to differentiate in anything but software. Whether that’s superior management capabilities, open plug-in model, or some other thing we haven’t thought of will make no difference in the end. Software will still be king and the hardware will be an inexpensive pawn or a costly piece that has been pre-approved.

Whither Wireless?

The other big inflection point that makes me worry about the Dell networking story is the lack of movement in the wireless space. Dell has historically been a company to partner first and acquire second. But with HPE’s acquisition of Aruba Networks last year, the dominos in the wireless space are still waiting to fall. Brocade raced out to buy Ruckus. Meru offered itself on a platter to anyone that would buy them. Now Aerohive stands as the last independent wireless vendor without a dance partner. Yes, they’ve announced that they are partnering with Dell, but have you been to the Dell Wireless Networking page? Can you guess what the Dell W-series is? Here’s a hint: it rhymes with “Peruba”.

Every time Dell leads with a W-series deployment, they are effectively paying their biggest competitor. They are opening the door to allowing HPE/Aruba to come in and not only start talking about wireless but servers, storage, and other networking as well. Dell would do well at this point to start deemphasizing the W-series and start highlighting the “new generation” of Aerohive APs and how they are going to the be the focus moving forward.

The real solution would be for Dell to buy a wireless company and take all the wireless expertise they are selling in-house. That would show they are serious about both the campus network of the future and the data center network needed to support their other server and storage infrastructure. Sadly, with Dell being leveraged due to the privatization of his company just two years ago and mounting debt for this mega merger, Dell is looking to make cash with spin offs instead of spending it on yet another company to ingest and subsume. Which means a real non-partner wireless solution is still many years away.

Tom’s Take

Dell’s networking strategy is in maintenance mode. Make switches to support faster speeds for now, probably with Tomahawk support soon, and hope that this whole networking thing goes software sooner rather than later. Otherwise, the need to shore up the campus wireless areas along with the coming decision about showing support fully behind NSX and partnerships is going to be a bitter pill to swallow. Perhaps Dell Networking will exist as an option for companies wanting a 100% Dell solution? Or maybe they are waiting for a new offering from Dell/EMC in the data center to drive profits to research and development to keep pace with Cisco and Arista? One can only hope that their networking flower doesn’t wither on the vine.

Running Barefoot – Thoughts on Tofino and P4

Posted on June 15, 2016 by networkingnerd

barefootgrass

The big announcement this week is that Barefoot Networks leaped out of stealth mode and announced that they’re working on a very, very fast datacenter switch. The Barefoot Tofino can do up to 6.5 Tbps of throughput. That’s a pretty significant number. But what sets the Tofino apart is that it also uses the open source P4 programming language to configure the device for everything, from forwarding packets to making routing decisions. Here’s why that may be bigger than another fast switch.

Feature Presentation

Barefoot admits in their announcement post that one of the ways they were able to drive the performance of the Tofino platform higher was to remove a lot of the accumulated cruft that has been added to switch software for the past twenty years. For Barefoot, this is mostly about pushing P4 as the software component of their switch platform and driving adoption of it in a wider market.

Let’s take a look at what this really means for you. Modern network operating systems typically fall into one of two categories. The first is the “kitchen sink” system. This OS has every possible feature you could ever want built in at runtime. Sure, you get all the packet forwarding and routing features you need. But you also carry the legacy of frame relay, private VLANs, Spanning Tree, and a host of other things that were good ideas at one time and now mean little to nothing to you.

Worse yet, kitchen sink OSes require you to upgrade in big leaps to get singular features that you need but carry a whole bunch of others you don’t want. Need routing between SVIs? That’s an Advanced Services license. Sure, you get BGP with that license too, but will you ever use that in a wiring closet? Probably not. Too bad though, because it’s built into the system image and can’t be removed. Even newer operating systems like NX-OS have the same kitchen sink inclusion mentality. The feature may not be present at boot time, but a simple command turns it on. The code is still baked into the kernel, it’s just loaded as a module instead.

On the opposite end of the scale, you have newer operating systems like OpenSwitch. The idea behind OpenSwitch is to have a purpose built system that does a few things really, really well. OpenSwitch can build a datacenter fabric very quickly and make it perform well. But if you’re looking for additional features outside of that narrow set, you’re going to be out of luck. Sure, that means you don’t need a whole bunch of useless features. But what about things like OSPF or Spanning Tree? If you decide later that you’d like to have them, you either need to put in a request to have it built into the system or hope that someone else did and that the software will soon be released to you.

We Can Rebuild It

Barefoot is taking a different track with P4. Instead of delivering the entire OS for you in one binary image, they are allowing you to build the minimum number of pieces that you need to make it work for your applications. Unlike OpenSwitch, you don’t have to wait for other developers to build in a function that you need in order to deploy things. You drop to an IDE and write the code you need to forward packets in a specific way.

There are probably some people reading this post that are nodding their heads in agreement right now about this development process. That’s good for Barefoot. That means that their target audience wants functionality like this. But Barefoot isn’t for everyone. The small and medium enterprise isn’t going to jump at the chance to spend even more time programming forwarding engines into their switches. Sure, the performance profile is off the chart. But it’s also a bit like buying a pricy supercar to drive back and forth to the post office. Overkill for 98% of your needs.

Barefoot is going to do well in financial markets where speed is very important. They’re also going to sell into big development shops where the network team needs pared-down performance in software and a forwarding chip that can blow the doors off the rest of the network for East <-> West traffic flow. Give that we haven’t seen a price tag on Tofino just yet, I would imagine that it’s priced well into those markets and beyond the reach of a shop that just needs two leaf nodes and a spine to connect them. But that’s exactly what needs to happen.

Tom’s Take

Barefoot isn’t going to appeal to shops that plug in a power cable and run a command to provision a switch. Barefoot will shine where people can write code that will push a switch to peak performance and do amazing things. Perhaps Barefoot will start offering code later on that gives you the ability to program basic packet forwarding into a switch or routing functions when needed without the requirement of taking hours of classes on P4. But for the initial release, keeping Tofino in the hands of dev shops is a great idea. If for no other reason than to cut down on support costs.

The 25GbE Datacenter Pipeline

Posted on June 8, 2016 by networkingnerd

pipeline

SDN may have made networking more exciting thanks to making hardware less important than it has been in the past, but that’s not to say that hardware isn’t important at all. The certainty with which new hardware will come out and make things a little bit faster than before is right there with death and taxes. One of the big announcements yesterday from Hewlett Packard Enterprise (HPE) during HPE Discover was support for a new 25GbE / 100GbE switch architecture built around the FlexFabric 5950 and 12900 products. This may be the tipping point for things.

The Speeds of the Many

I haven’t always been high on 25GbE. Almost two years ago I couldn’t see the point. Things haven’t gotten much different in the last 24 months from a speed perspective. So why the change now? What make this 25GbE offering any different than things from the nascent ideas presented by Arista?

First and foremost, the 25GbE released by HPE this week is based on the Broadcom Tomahawk chipset. When 25GbE was first presented, it was a collection of vendors trying to convince you to upgrade to their slightly faster Ethernet. But in the past two years, most of the merchant offerings on the market have coalesced around using Broadcom as the primary chipset. That means that odds are good your favorite switching platform is running Trident 2 or Trident 2+ under the hood.

With Broadcom backing the silicon, that means wider adoption of the specification. Why would anyone buy 25GbE from Brocade or Dell or HPE if the only vendor supporting it was that vendor of choice? If you can’t ever be certain that you’ll have support for the hardware in three or five years time, making an investment today seems silly. Broadcom’s backing means that eventually everyone will be adopting 25GbE.

Likewise, one of my other impediments to adoption was the lack of server NICs to ramp hosts to 25GbE. Having fast access ports means nothing if the severs can’t take advantage of them. HPE addressed this with the release of FlexFabric networking adapters that can run 25GbE Ethernet. More importantly, those adapters (and switches) can run at 10GbE as well. This means that adoption of higher bandwidth is no longer an all-or-nothing proposition. You don’t have to abandon your existing investment to get to 25GbE right away. You don’t have to build a lab pod to test things and then sneak it into production. You can just buy a 5950 today and clock the ports down to 10GbE while you await the availability and purchasing cycle to buy 25GbE NICs. Then you can flip some switches in the next maintenance window and be running at 25GbE speeds. And you can leave some ports enabled at 10GbE to ensure that there is maximum backwards compatibility.

The Needs of the Few

Odds are good that 25GbE isn’t going to be right for you today. HPE is even telling people that 25GbE only really makes sense in a few deployment scenarios, among which are large container-based hosts running thousands of virtual apps, flash storage arrays that use Ethernet as a backplane, or specialized high-performance computing (HPC) tricks with RDMA and such. That means the odds are good that you won’t need 25GbE first thing tomorrow morning.

However, the need for 25GbE is going to be there. As applications grow more bandwidth hungry and data centers keep shrinking in footprint, the network hardware you do have left needs to work harder and faster to accomplish more with less. If the network really is destined to become a faceless underlay that serves as a utility for applications, it needs to run flat out fast to ensure that developers can’t start blaming their utility company for problems. Multi-core server architectures and flash storage have solved two of the three legs of this problem. 25GbE host connectivity and the 100GbE backbone connectivity tied to it, solve the other side of the equation so everything balances properly.

Don’t look at 25GbE as an immediate panacea for your problems. Instead, put it on a timeline with your other server needs and see what the adoption rate looks like going forward. If server NICs are bought in large quantities, that will drive manufactures to push the technology onto the server boards. If there is enough need for connectivity at these speeds the switch vendors will start larger adoption of Tomahawk chipsets. That cycle will push things forward much faster than the 10GbE / 40GbE marathon that’s been going on for the past six years.

Tom’s Take

I think HPE is taking a big leap with 25GbE. Until the Dell/EMC merger is completed they won’t find themselves in a position to adopt Tomahawk quickly in the Force10 line. That means the need to grab 25GbE server NICs won’t materialize if there’s nothing to connect them. Cisco won’t care either way so long as switches are purchased and all other networking vendors don’t sell servers. So that leaves HPE to either push this forward to fall off the edge of the cliff. Time will tell how this will all work out, but it would be nice to see HPE get a win here and make the network the least of application developer problems.

Disclaimer

I was a guest of Hewlett Packard Enterprise for HPE Discover 2016. They paid for my travel, hotel, and meals during the event. While I was briefed on the solution discussed here and many others, there was no expectation of coverage of the topics discussed. HPE did not ask for, nor were they guaranteed any consideration in the writing of this article. The conclusions and analysis contained herein are mine and mine alone.

Flash Needs a Highway

Posted on June 1, 2016 by networkingnerd

CarLights

Last week at Storage Field Day 10, I got a chance to see Pure Storage and their new FlashBlade product. Storage is an interesting creature, especially now that we’ve got flash memory technology changing the way we think about high performance. Flash transformed the industry from slow spinning gyroscopes of rust into a flat out drag race to see who could provide enough input/output operations per second (IOPS) to get to the moon and back.

Take a look at this video about the hardware architecture behind FlashBlade:

It’s pretty impressive. Very fast flash storage on blades that can outrun just about anything on the market. But this post isn’t really about storage. It’s about transport.

Life Is A Network Highway

Look at the backplane of the FlashBlade chassis. It’s not something custom or even typical for a unit like that. The key is when the presenter says that the architecture of the unit is more like a blade server chassis than a traditional SAN. In essence, Pure has taken the concept of a midplane and used it very effectively here. But their choice of midplane is interesting in this case.

Pure is using the Broadcom Trident II switch as their networking midplane for FlashBlade. That’s pretty telling from a hardware perspective. Trident II runs a large majority of switches in the market today that are merchant silicon based. They are essentially becoming the Intel of the switch market. They are supplying arms to everyone that wants to build something quickly at low cost without doing any kind of custom research and development of their own silicon manufacturing.

Using a Trident II in the backplane of the FlashBlade means that Pure evaluated all the alternatives and found that putting something merchant-based in the midplane is cost effective and provides the performance profile necessary to meet the needs of flash storage. Saturating backplanes with IOPS can be accomplished. But as we learned from Coho Data, it takes a lot of CPU horsepower to create a flash storage system that can saturate 10Gig Ethernet links.

I Am Speed

Using Trident II as a midplane or backplane for devices like this has huge implications. First and foremost, networking technology has a proven track record. If Trident II wasn’t a stable and reliable platform, no one would have used it in their products. And given that almost everyone in the networking space has a Trident platform for sale, it speaks volumes about reliability.

Second, Trident II is available. Broadcom is throwing these units off the assembly line as fast as they can. That means that there’s no worry about silicon shortages or plant shutdowns or any one of a number of things that can affect custom manufacturing. Even if a company wants to look at a custom fabrication, it could take months or even years to bring things online. By going with a reference design like Trident II, you can have your software engineers doing the hard work of building a system to support your hardware. That speeds time to market.

Third, Trident is a modular platform. That part can’t be understated even though I think it wasn’t called out very much in the presentation from Pure. By having a midplane that is built as a removable module, it’s very easy to replace it should problems arise. That’s the point of field replaceable units (FRUs). But in today’s market, it’s just as easy to create a system that can run multiple different platforms as well. The blade chassis idea extends equally to both blades and mid or backplanes.

Imagine being able to order a Tomahawk-based controller unit for FlashBlade that only requires you to swap the units at the back of the system. Now, that investment in 10Gig blade connectivity with 40Gig uplinks just became 25Gig blade connectivity with 100Gig uplinks to the network. All for the cost of two network controller blades. There may be some software that needs to be written to make the transition smooth for the consumers in the system, but the hardware is more than capable of supporting a change like that.

Tom’s Take

I was thrilled to see Pure Storage building a storage platform that tightly integrates with networking the way that FlashBlade does. This is how the networking stack is going to be completely integrated with storage and compute. We should still look at things through the lens of APIs and programmability, but making networking and consistent transport layer for all things in the datacenter is a good start.

The funny thing about making something a consistent transport layer is that by design it has to be extensible. That means more and more folks are going to be driving those pieces into the network. Software can be created on top of this common transport to differentiate, much like we’re seeing with network operating systems right now. Even Pure was able to create a different kind of transport protocol to do the heavy lifting at low latency.

It’s funny that it took a presentation from a storage company to make me see the value of the network as something agnostic. Perhaps I just needed some perspective from the other side of the fence.

The Death of TRILL

Posted on May 11, 2016 by networkingnerd

wasteland_large

Networking has come a long way in the last few years. We’ve realized that hardware and ASICs aren’t the constant that we could rely on to make decisions in the next three to five years. We’ve thrown in with software and the quick development cycles that allow us to iterate and roll out new features weekly or even daily. But the hardware versus software battle has played out a little differently than we all expected. And the primary casualty of that battle was TRILL.

Symbiotic Relationship

Transparent Interconnection of Lots of Links (TRILL) was proposed as a solution to the complexity of spanning tree. Radia Perlman realized that her bridging loop solution wouldn’t scale in modern networks. So she worked with the IEEE to solve the problem with TRILL. We also received Shortest Path Bridging (SPB) along the way as an alternative solution to the layer 2 issues with spanning tree. The motive was sound, but the industry has rejected the premise entirely.

Large layer 2 networks have all kinds of issues. ARP traffic, broadcast amplification, and many other numerous issues plague layer 2 when it tries to scale to multiple hundreds or a few thousand nodes. The general rule of thumb is that layer 2 broadcast networks should never get larger than 250-500 nodes lest problems start occurring. And in theory that works rather well. But in practice we have issues at the software level.

Applications are inherently complicated. Software written in the pre-Netflix era of public cloud adoption doesn’t like it when the underlay changes. So things like IP addresses and ARP entries were assumed to be static. If those data points change you have chaos in the software. That’s why we have vMotion.

At the core, vMotion is a way for software to mitigate hardware instability. As I outlined previously, we’ve been fixing hardware with software for a while now. vMotion could ensure that applications behaved properly when they needed to be moved to a different server or even a different data center. But they also required the network to be flat to overcome limitations in things like ARP or IP. And so we went on a merry journey of making data centers as flat as possible.

The problem came when we realized that data centers could only be so flat before they collapsed in on themselves. ARP and spanning tree limited the amount of traffic in layer 2 and those limits were impossible to overcome. Loops had to be prevented, yet the simplest solution disabled bandwidth needed to make things run smoothly. That caused IEEE and IETF to come up with their layer 2 solutions that used CLNS to solve loops. And it was a great idea in theory.

The Joining

In reality, hardware can’t be spun that fast. TRILL was used as a reference platform for proprietary protocols like FabricPath and VCS. All the important things were there but they were locked into hardware that couldn’t be easily integrated into other solutions. We found ourselves solving problem after problem in hardware.

Users became fed up. They started exploring other options. They finally decided that hardware wasn’t the answer. And so they looked to software. And that’s where we started seeing the emergence of overlay networking. Protocols like VXLAN and NV-GRE emerged to tunnel layer 2 packets over layer 3 networks. As Ivan Pepelnjak is fond of saying layer 3 transport solves all of the issues with scaling. And even the most unruly application behaves when it thinks everything is running on layer 2.

Protocols like VXLAN solved an immediate need. They removed limitations in hardware. Tunnels and fabrics used novel software approaches to solve insurmountable hardware problems. An elegant solution for a thorny problem. Now, instead of waiting for a new hardware spin to fix scaling issues, customers could deploy solutions to fix the issues inherent in hardware on their own schedule.

This is the moment where software defined networking (SDN) took hold of the market. Not when words like automation and orchestration started being thrown about. No, SDN became a real thing when it enabled customers to solve problems without buying more physical devices.

Tom’s Take

Looking back, we realize now that building large layer 2 networks wasn’t the best idea. We know that layer 3 scales much better. Given the number of providers and end users running BGP to top-of-rack (ToR) switches, it would seem that layer 3 scales much better. It took us too long to figure out that the best solution to a problem sometimes takes a bit of thought to implement.

Virtualization is always going to be limited by the infrastructure it’s running on. Applications are only as smart as the programmer. But we’ve reached the point where developers aren’t counting on having access to layer 2 protocols that solve stupid decision making. Instead, we have to understand that the most resilient way to fix problems is in the software. Whether that’s VXLAN, NV-GRE, or a real dev team not relying on the network to solve bad design decisions.

OpenFlow Is The Viagra Of Networking

OpenFlow Is a Garlic Press, Not A Hammer

Tom’s Take

Share this:

Software Says

Affinity for Awesome

Tom’s Take

Share this:

Holy Hardware!

Software For The People

Tom’s Take

Share this:

A Foot In The Door

Tom’s Take

Share this:

Cisco Designates The Competition

Arista’s Three-Edged Sword

Tom’s Take

Share this:

Whither Whitebox?

Whither Wireless?

Tom’s Take

Share this:

Feature Presentation

We Can Rebuild It

Tom’s Take

Share this:

The Speeds of the Many

The Needs of the Few

Tom’s Take

Share this:

Life Is A Network Highway

I Am Speed

Tom’s Take

Share this:

Symbiotic Relationship

The Joining

Tom’s Take

Share this: