The Atomic Weight of Policy

Helium-atom

The OpenDaylight project put out a new element this week with their Helium release.  The second release is usually the most important, as it shows that you have a real project on your hands and not just a bunch of people coding in the back room to no avail.  Not that something like that was going to happen to ODL.  The group of people involved in the project have the force of will to change the networking world.

Helium is already having an effect on the market.  Brocade announced their Vyatta Controller last week, which is based on Helium code.  Here’s a handy video as well.  The other thing that Helium has brought forth is the ongoing debate about network policy.  And I think that little gem is going to have more weight in the long run than anything else.

The Best Policy

Helium contains group-based policies for making groups of network objects talk to each other.  It’s a crucial step to bring ODL from an engineering hobby project to a full-fledged product that can be installed by someone that isn’t a code wizard.  That’s because most of the rest of the world, including IT people, don’t speak in specific terms with devices.  They have an idea of what needs to be done and they rely on the devices to implement that idea.

Think about a firewall.  When is the last time you wrote a firewall rule by hand? Unless you are a CLI masochist, you use the GUI to craft a policy that says something like “prevent DNS queries from any device that isn’t a DNS server (referenced by this DNS Server object list)”.  We create those kinds of policies because we can’t account for every new server appearing on the network that wants to make a DNS query.  We block the ones that don’t need to be doing it, and we modify the DNS Server object list to add new servers when needed.

Yet, in networking we’re still playing with access lists and VLAN databases.  We must manually propagate this information throughout the network when updates occur.  Because no one relies on VTP to add and prune information in the network.  The tools we’ve been using to do this work are archaic and failure-prone at best.

Staying Neutral

The inclusion of policy components of Helium will go a long way to paving the road for more of the same in future releases.  Of course, there is already talk about Cisco’s OpFlex and how it didn’t make the cut for Helium despite being one of the most dense pieces of code proposed for ODL.  It’s good that Cisco and ODL both realized that putting out code that wasn’t ready was only going to hurt in the long run.  When you’ve had seven or eight major releases, you can lay an egg with a poorly implemented feature and it won’t be the end of the world.  But if you put out a stinker in the second release, you may not make it to the third.

But this isn’t about OpFlex.  Or Congress. Or any other policy language that might be proposed for ODL in the future.  It’s about making sure that ODL is a policy-driven controller infrastructure.  Markus Nispel of Extreme talked about SDN at Wireless Field Day 7.  In that presentation, he said that he thinks the industry will standardize on ODL as the northbound interface in the network.  For those not familiar, the northbound interface is the one that is user-facing.  We interact with the northbound controller interface while the southbound controller interface programs the devices.

ODL is making sure the southbound interface is OpenFlow.  What they need to do now is ensure the northbound interface can speak policy to the users configuring it.  We’ve all heard the rhetoric of “network engineers need to learn coding” or “non-programmers will be out of a job soon”.  But the harsher reality is that while network programmers are going to be very busy people on the backend, the day-to-day operations of the network will be handled by different teams.  Those teams don’t speak IOS, Junos, or OpenFlow.  They think in policy-based thoughts.

Ops teams don’t want to know how something is going to work when implemented.  They don’t want to spend hours troubleshooting why a VLAN didn’t populate only to find they typoed the number.  They want to plug information into a policy and let the controller do the rest.  That’s what Helium has started and what ODL represents.  An interface into the network for mortal Ops teams.  A way to make that work for everyone, whether it be an OpFlex interface into Cisco APIC programming ACI or a Congress interface to an NSX layer.  If you are a the standard controller, you need to talk to everyone no matter their language.

Tom’s Take

ODL is going to be the controller in SDN.  There is too much behind it to let it fail.  Vendors are going to adopt it as their base standard for SDN.  They may add bells and whistles but it will still be ODL underneath.  That means that ODL needs to set the standard for network interaction.  And that means policy.  Network engineers complain about fragility in networking and how it’s hard to do and in the very same breath say they don’t want the CLI to go away.  What they need to say is that policy gives everyone the flexibility to create robust, fault-tolerant configurations with a minimum of effort while still allowing for other interface options, like CLI and API.  If you are the standard, you can’t play favorites.  Policy is going to be the future of networking interfaces.  If you don’t believe that, you’ll find yourself quickly crushed under the weight of reality.

I Can’t Drive 25G

Ethernet

The race to make things just a little bit faster in the networking world has heated up in recent weeks thanks to the formation of the 25Gig Ethernet Consortium.  Arista Networks, along with Mellanox, Google, Microsoft, and Broadcom, has decided that 40Gig Ethernet is too expensive for most data center applications.  Instead, they’re offering up an alternative in the 25Gig range.

This podcast with Greg Ferro (@EtherealMind) and Andrew Conry-Murray (@Interop_Andrew) does a great job of breaking down the technical details on the reasoning behind 25Gig Ethernet.  In short, the current 10Gig connection is made of four multiplexed 2.5Gig connections.  To get to 25Gig, all you need to do is over clock those connections a little.  That’s not unprecedented, as 40Gig Ethernet accomplishes this by over clocking them to 10Gig, albeit with different optics.  Aside from a technical merit badge, one has to ask themselves “Why?”

High Hopes

As always, money is the factor here.  The 25Gig Consortium is betting that you don’t like paying a lot of money for your 40Gig optics.  They want to offer an alternative that is faster than 10Gig but cheaper than the next standard step up.  By giving you a cheaper option for things like uplinks, you gain money to spend on things.  Probably on more switches, but that’s beside the point right now.

The other thing to keep in mind, as mentioned on the Coffee Break podcast, is that the cable runs for these 25Gig connectors will likely be much shorter.  Short term that won’t mean much.  There aren’t as many long-haul connections inside of a data center as one might thing.  A short hop to the top-of-rack (ToR) switch, then another different hop to the end-of-row (EoR) or core switch.  That’s really about it.  One of the arguments against 40/100Gig is that it was designed for carriers for long-haul purposes.  25G can give you 60% of the speed of that link at a much lower cost.  You aren’t paying for functionality you likely won’t use.

Heavy Metal

Is this a good move?  That depends.  There aren’t any 25Gig cards for servers right now, so the obvious use for these connectors will be uplinks.  Uplinks that can only be used by switches that share 25Gig (and later 50Gig) connections.  As of today, that means you’re using Arista, Dell, or Brocade.  And that’s when the optics and switches actually start shipping.  I assume that existing switching lines will be able to retrofit with firmware upgrades to support the links, but that’s anyone’s guess right now.

If Mellanox and Broadcom do eventually start shipping cards to upgrade existing server hardware to 25Gig then you’ll have to ask yourself if you want to pursue the upgrade costs to drive that little extra bit of speed out of the servers.  Are you pushing the 10Gig links in your servers today?  Are they the limiting factor in your data center?  And will upgrading your servers to support twice the bandwidth per network connection help alleviate your bottlenecks? Or will they just move to the uplinks on the switches?  It’s a quandary that you have to investigate.  And that takes time and effort.


 

Tom’s Take

The very first thing I ever tweeted (4 years ago):

We’ve come a long way from ratified standards to deployment of 40Gig and 100Gig.  Uplinks in crowded data centers are going to 40Gig.  I’ve seen a 100Gig optic in the wild running a research network.  It’s interesting to see that there is now a push to get to a marginally faster connection method with 25Gig.  It reminds me of all the competing 100Mbit standards back in the day.  Every standard was close but not quite the same.  I feel that 25Gig will get some adoption in the market.  So now we’ll have to choose from 10Gig, 40Gig, or something in between to connect servers and uplinks.  It will either get sent to the standards body for ratification or die on the vine with no adoption at all.  Time will tell.

 

CCIE Version 5: Out With The Old

Cisco announced this week that they are upgrading the venerable CCIE certification to version five.  It’s been about three years since Cisco last refreshed the exam and several thousand people have gotten their digits.  However, technology marches on.  Cisco talked to several subject matter experts (SMEs) and decided that some changes were in order.  Here are a few of the ones that I found the most interesting.

CCIEv5 Lab Schedule

Time Is On My Side

The v5 lab exam has two pacing changes that reflect reality a bit better.  The first is the ability to take some extra time on the troubleshooting section.  One of my biggest peeves about the TS section was the hard 2-hour time limit.  One of my failing attempts had me right on the verge of solving an issue when the time limit slammed shut on me.  If I only had five more minutes, I could have solved that problem.  Now, I can take those five minutes.

The TS section has an available 30 minute overflow window that can be used to extend your time.  Be aware that time has to come from somewhere, since the overall exam is still eight hours.  You’re borrowing time from the configuration section.  Be sure you aren’t doing yourself a disservice at the beginning.  In many cases, the candidates know the lab config cold.  It’s the troubleshooting the need a little more time with.  This is a welcome change in my eyes.

Diagnostics

The biggest addition is the new 30-minute Diagnostic section.  Rather than focusing on problem solving, this section is more about problem determination.  There’s no CLI.  Only a set of artifacts from a system with a problem: emails, log files, etc.  The idea is that the CCIE candidate should be an expert at figuring out what is wrong, not just how to fix it.  This is more in line with the troubleshooting sections in the Voice and Security labs.  Parsing log files for errors is a much larger part of my time than implementing routing.  Teaching candidates what to look for will prevent problems in the future with newly minted CCIEs that can diagnose issues in front of customers.

Some are wondering if the Diagnostic section is going to be the new “weed out” addition, like the Open Ended Questions (OEQs) from v3 and early v4.  I see the Diagnostic section as an attempt to temper the CCIE with more real world needs.  While the exam has never been a test of ideal design, knowing how to fix a non-ideal design when problems occur is important.  Knowing how to find out what’s screwed up is the first step.  It’s high time people learned how to do that.

Be Careful What You Wish For

The CCIE v5 is seeing a lot of technology changes.  The written exam is getting a new section, Network Principles.  This serves to refocus candidates away from Cisco specific solutions and more toward making sure they are experts in networking.  There’s a lot of opportunity to reinforce networking here and not idle trivia about config minimums and maximums.  Let’s hope this pays off.

The content of the written is also being updated.  Cisco is going to make sure candidates know the difference between IOS and IOS XE.  Cisco Express Forwarding is going to get a focus, as is ISIS (again).  Given that ISIS is important in TRILL this could be an indication of where FabricPath development is headed.  The written is also getting more IPv6 topics.  I’ll cover IPv6 in just a bit.

The biggest change in content is the complete removal of frame relay.  It’s been banished to the same pile as ATM and ISDN.  No written, no lab.  In it’s place, we get Dynamic Multipoint VPN (DMVPN).  I’ve talked about why Frame Relay is on the lab before.  People still complained about it.  Now, you get your wish.  DMVPN with OSPF serves the same purpose as Frame Relay with OSPF.  It’s all about Stupid Router Tricks.  Using OSPF with DMVPN requires use of mGRE, which is a Non-Broadcast Multi-Access (NBMA) network.  Just like Frame Relay.  The fact that almost every guide today recommends you use EIGRP with DMVPN should tell you how hard it is to do.  And now you’re forced to use OSPF to simulate NBMA instead of Frame Relay.  Hope all you candidates are happy now.

vCCIE

The lab is also 100% virtual now.  No physical equipment in either the TS or lab config sections.  This is a big change.  Cisco wants to reduce the amount of equipment that needs to be physically present to build a lab.  They also want to be able to offer the lab in more places than San Jose and RTP.  Now, with everything being software, they could offer the lab at any secured PearsonVUE testing center.  They’ve tried in the past, but the access requirements caused some disaster.  Now, it’s all delivered in a browser window.  This will make remote labs possible.  I can see a huge expansion of the testing sites around the time of the launch.

This also means that hardware-specific questions are out.  Like layer 2 QoS on switches.  The last reason to have a physical switch (WRR and SRR queueing) is gone.  Now, all you are going to get quizzed on is software functionality.  Which probably means the loss of a few easy points.  With the removal of Frame Relay and L2 QoS, I bet that services section of the lab is going to be really fun now.

IPv6 Is Real

Now, for my favorite part.  The JNCIE has had a robust IPv6 section for years.  All routing protocols need to be configured for IPv4 and IPv6.  The CCIE has always had a separate IPv6 section.  Not any more.  Going forward in version 5, all routing tasks will be configured for v4 and v6.  Given that RIPng has been retired to the written exam only (finally), it’s a safe bet that you’re going to love working with OSPFv3 and EIGRP for IPv6.

I think it’s great that Cisco has finally caught up to the reality of the world.  If CCIEs are well versed in IPv6, we should start seeing adoption numbers rise significantly.  Ensuring that engineers know to configure v4 and v6 simultaneously means dual stack is going to be the preferred transition method.  The only IPv6-related thing that worries me is the inclusion of an item on the written exam: IPv6 Network Address Translation.  You all know I’m a huge fan of NAT.  Especially NAT66, which is what I’ve been told will be the tested knowledge.

Um, why?!? 

You’ve removed RIPng to the trivia section.  You collapsed multicast into the main routing portions.  You’re moving forward with IPv6 and making it a critical topic on the test.  And now you’re dredging up NAT?!? We don’t NAT IPv6.  Especially to another IPv6 address.  Unique Local Addresses (ULA) is about the only thing I could see using NAT66.  Ed Horley (@EHorley) thinks it’s a bad idea.  Ivan Pepelnjak (@IOSHints) doesn’t think fondly of it either, but admits it may have a use in SMBs.  And you want CCIEs and enterprise network engineers to understand it?  Why not use LISP instead?  Or maybe a better network design for enterprises that doesn’t need NAT66?  Next time you need an IPv6 SME to tell you how bad this idea is, call me.  I’ve got a list of people.


Tom’s Take

I’m glad to see the CCIE update.  Getting rid of Frame Relay and adding more IPv6 is a great thing.  I’m curious to see how the Diagnostic section will play out.  The flexible time for the TS section is way overdue.  The CCIE v5 looks to be pretty solid on paper.  People are going to start complaining about DMVPN.  Or the lack of SDN-related content.  Or the fact that EIGRP is still tested.  But overall, this update should carry the CCIE far enough into the future that we’ll see CCIE 60,000 before it’s refreshed again.

More CCIE v5 Coverage:

Bob McCouch (@BobMcCouch) – Some Thoughts on CCIE R&S v5

Anthony Burke (@Pandom_) – Cisco CCIE v5

Daniel Dib (@DanielDibSWE) – RS v5 – My Thoughts

INE – CCIE R&S Version 5 Updates Now Official

IPExpert – The CCIE Routing and Switching (R&S) 5.0 Lab Is FINALLY Here!

Will Dell Buy Aerohive?

DELL-Aerohive-Logo

One rumor I keep hearing about in the industry involves a certain buzzing wireless vendor and the world’s largest startup.  Acquisitions happen all the time.  Rumors of them are even more frequent.  But the more I thought about it, the more I realized this may be good for everyone.

Dell wants to own the stack from top to bottom.  In the past, they have had to partner with printer companies (Lexmark) and networking companies (Brocade and Juniper) to deliver parts of the infrastructure they couldn’t provide themselves.  In the case of printers, Dell found a way to build them on their own.  That reduced their reliance on Lexmark.  In the networking world, Dell shocked everyone by going outside their OEM relationship and buying Force10.  I’ve talked before about why the Force10 pickup was a better deal in the long run than Brocade.

Dell’s Desires

Dell needs specific pieces of the puzzle.  They don’t want to be encumbered with ancillary products that will need to be jettisoned later.  Buying Brocade would have required unwinding a huge fibre channel business.  In much the same way, I don’t think Dell will end up buying their current wireless OEM, Aruba Networks.  Aruba has decided to branch out past the doing simple wireless and moved into wired network switches and security and identity management programs like ClearPass.  Dell doesn’t want any of that.  They already have an issue integrating the Force10 networking expertise into the PowerConnect line.  I’ve been told in the past the FTOS will eventually come to PowerConnect, but that has yet to happen.  Integrating purchased companies isn’t easier.  That becomes exponentially harder the more product lines you have to integrate.

Aruba is too expensive for Dell to buy outright.  Michael Dell spent a huge chunk of his cash to get his company back from the shareholders.  He’s going to put it on a diet pretty soon.  I would expect to see a few product lines slimmed down or outright dropped.  That makes it tough to justify buying so much from another company.  Dell needs a scalpel, not a sledgehammer.

Aerohive’s Aspirations

Aerohive is the best target for Dell.  They are clearly fighting for third place in the wireless market behind Cisco and Aruba.  Aerohive has never been shy about punching above their weight.  They have the mentality of a scrappy terrier that won’t go down without a fight.  But, they are getting pressure to expand quickly across their product lines.  They took their time releasing an 802.11ac access point.  Their switching offering hasn’t caught on in the same way that of Aruba or Meraki (now a division of Cisco).

Aerohive is on the verge of going public.  I’m sure the infusion of cash would allow them to pay off some early investors as well as fund more development for 802.11ac Phase 2 gear and maybe a firewall offering.  The risk comes when you look at what happened to Ruckus Wireless shortly after their IPO.  While they did recover, it didn’t look very good for a company that supposedly did have a unique claim, their antenna design.  Aerohive is a cloud management platform like many others in the market.  You have to wonder how investors would view them.  Scrappy doesn’t sell stock.

Aerohive is now fighting in the new Gartner “Wired and Wireless Access” magic quadrant, which is an absolute disaster for everyone.  An analyst firm thinks that wireless is just like wired, so naturally it makes sense for AP vendors to start making switches, right?  Except the people who are really brilliant when it comes to wireless, like Matthew Gast and Victor Shtrom couldn’t care less about bits on copper.  They’ve spent the better part of their careers solving the RF problems in the world.  And now someone tells them that interference problems aren’t that much different than spanning tree?  I would have long since planted my head permanently onto my desk if I’d been told that in their position.

Aerohive gains a huge backer in the fight if Dell acquires them.  They get the name to go up against Cisco/Meraki.  The gain R&D from Dell with expertise around cloud management.  They can start developing integration with HiveManager and Dell’s SMB extensive product line.  Switch supply becomes a thing of the past.  Their entire software offering fits well with what Dell is trying to accomplish from a device independence perspective with regards to customers.

Tom’s Take

I don’t put much stock in random rumors.  But I’ve heard this one come up enough to make me ask some tough questions.  There are people in both camps that think it will happen sometime in 2014.  Dell has to get the books sorted out and figure out who’s in charge of buying things.  Aerohive has to see if there’s enough juice left in the market to IPO and not look foolish.  Maybe Dell needs to run the numbers and find out what it would take to cash out Aerohive’s investors and add the company to the growing Empire of Round Rock.  A little buzz for the World’s Largest Startup couldn’t hurt.

Avaya and the Magic of SPB

Avaya_logo-wpcf_200x57

I was very interested to hear from Avaya at Interop New York.  They were the company I knew the least about.  I knew the most about them from the VoIP side of the house, but they’ve been coming on strong with networking as well.  They are one of the biggest champions of 802.1aq, more commonly known as Shortest Path Bridging (SPB).  You may remember that I wrote a bit about SPB in the past and referred to it as the Betamax of networking fabric technologies.  After this presentation, I may be forced to eat my words to a degree.

Paul Unbehagen really did a great job with this presentation.  There were no slides, but he kept the attention of the crowd.  The whiteboard supported his message.  While informal, there was a lot of learning.  Paul knows SPB.  It’s always great to learn from someone that knows the protocol.

Multicast Magic

One of the things I keyed on during the presentation was the way that SPB deals with multicast.  Multicast is a huge factor in Ethernet today.  So much so that even the cheapest SOHO Ethernet switch has a ton of multicast optimization.  But multicast as implemented in enterprises is painful.  If you want to make an engineer’s blood run cold, walk up and whisper “PIM“.  If you want to watch a nervous breakdown happen in real time, follow that up with “RPF“.

RPF checks in multicast PIM routing are nightmarish.  It would be wonderful to get rid of RPF checks to eliminate any loops in the multicast routing table.  SPB accomplishes that by using a Dijkstra algorithm.  The same algorithm that OSPF and IS-IS use to compute paths.  Considering the heavily roots of IS-IS in SPB, that’s not surprising.  The use of Dijkstra means that additional receivers on a multicast tree don’t negatively effect the performance of path calculation.

I’ve Got My IS-IS On You

In fact, one of the optimized networks that Paul talked about involved surveillance equipment.  Video surveillance units that send via multicast have numerous endpoints and only a couple of receivers on the network.  In other words, the exact opposite problem multicast was designed to solve.  Yet, with SPB you can create multicast distribution networks that allow additional end nodes to attach to a common point rather than talking back to a rendezvous point (RP) and getting the correct tree structure from there.  That means fast convergence and simple node addition.

SPB has other benefits as well.  It supports 16.7 million ISIDs, which are much like VLANs or MPLS tags.  This means that networks can grow past the 4,096 VLAN limitation.  It looks a lot like VxLAN to me.  Except for the reliance on multicast and lack of a working implementation.  SPB allows you to use a locally significant VLAN for a service and then defined an ISID that will transport across the network to be decapsulated on the other side in a totally different VLAN that is attached to the ISID.  That kind of flexibility is key for deployments in existing, non-green field environments.

If you’d like to learn more about Avaya and their SPB technology, you can check them out at http://www.avaya.com.  You can also follow them on Twitter as @Avaya.


Tom’s Take

Paul said that 95% of all SPB implementations are in the enterprise.  That shocked me a bit, as I always thought of SPB as a service provider protocol.  I think the key comes down to something Paul said in the video.  When we are faced with applications or additional complexity today, we tend to just throw more headers at the problem.  We figured that wrapping the whole mess in a new tag or a new tunnel will take care of everything.  At least until it all collapses into a puddle.  Avaya’s approach with SPB was to go back down to the lower layers and change the architecture of things to optimize everything and make it work the right way on all kinds of existing hardware.  To quote Paul, “In the IEEE, we don’t build things for the fun it.”  That means SPB has their feet grounded in the right place.  Considering how difficult things can be in data center networking, that’s magical indeed.

Tech Field Day Disclaimer

Avaya was a presenter at the Tech Field Day Interop Roundtable.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

HP Networking and the Software Defined Store

HP

HP has had a pretty good track record with SDN.  Even if it’s not very well-known.  HP has embraced OpenFlow on a good number of its Procurve switches.  Given the age of these devices, there’s a good chance you can find them laying around in labs or in retired network closets to test with.  But where is that going to lead in the long run?

HP Networking was kind enough to come to Interop New York and participate in a Tech Field Day roundtable.  It had been a while since I talked to their team.  I wanted to see how they were handling the battle being waged between OpenFlow proponents like NEC and Brocade, Cisco and their hardware focus, and VMware with NSX.  Jacob Rapp and Chris Young (@NetManChris) stepped up to the plate to talk about SDN and the vision on HP.

They cover a lot of ground in here.  Probably the most important piece to me is the SDN app store.

The press picked up on this quickly.  HP has an interesting idea here.  I should know.  I mentioned it in passing in an article I wrote a month ago.  The more I think about the app store model, the more I realize that many vendors are going to go down the road.  Just not in the way HP is thinking.

HP wants to curate content for enterprises.  They want to ensure that software works with their controller to be sure that there aren’t any hiccups in implementation.  Given their apparent distaste for open source efforts, it’s safe to say that their efforts will only benefit HP customers.  That’s not to say that those same programs won’t work on other controllers.  So long as they operate according to the guidelines laid down by the Open Networking Foundation, all should be good.

Show Me The Money

Where’s the value then?  That’s in positioning the apps in the store.  Yes, you’re going to have some developers come to HP and want to simple apps to put in the store.  Odds are better that you’re going to see more recognizable vendors coming to the HP SDN store.  People are more likely to buy software from a name they recognize, like TippingPoint or F5.  That means that those companies are going to want to have a prime spot in the store.  HP is going to make something from hosting those folks.

The real revenue doesn’t come from an SMB buying a load balancer once.  It comes from a company offering it as a service with a recurring fee.  The vendor gets a revenue stream. HP would be wise to work out a recurring fee as well.  It won’t be the juicy 30% cut that Apple enjoys from their walled garden, but anything would be great for the bottom line.  Vendors win from additional sales.  Customers win from having curated apps that work every time that are easy to purchase, install, and configure.  HP wins because everyone comes to them.

Fragmentation As A Service

Now that HP has jumped on the idea of an enterprise-focused SDN app store, I wonder which company will be the next to offer one?  I also worry that having multiple app stores won’t end up being cumbersome in the long run.  Small developers won’t like submitting their app to four or five different vendor-affiliated stores.  More likely they’ll resort to releasing code on their own rather than jump through hoops.  That will eventually lead to support fragmentation.  Fragmentation helps no one.


Tom’s Take

HP Networking did a great job showcasing what they’ve been doing in SDN.  It was also nice to hear about their announcements the day before they broke wide to the press.  I think HP is going to do well with OpenFlow on their devices.  Integrating OpenFlow visibility into their management tools is also going to do wonders for people worried about keeping up with all the confusing things that SDN can do to a traditional network.  The app store is a very intriguing concept that bears watching.  We can only hope that it ends up being a well-respect entry in a long line of easing customers into the greater SDN world.

Tech Field Day Disclaimer

HP was a presenter at the Tech Field Day Interop Roundtable.  In addition, they also provided the delegates a 1TB USB3 hard disk drive.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

SDN and NFV – The Ups and Downs

TopSDNBottomNFV

I was pondering the dichotomy between Software Defined Networking (SDN) and Network Function Virtualization (NFV) the other day.  I’ve heard a lot of vendors and bloggers talking about how one inevitably leads to the other.  I’ve also seen a lot of folks saying that the two couldn’t be further apart on the scale of software networking.  The more I thought about these topics, the more I realized they are two sides of the coin.  The problem, at least in my mind, is the perspective.

SDN – Planning The Paradigm

Software Defined Networking telegraphs everything about what it is trying to accomplish right there in the name.  Specifically, the “Definition” part of the phrase.  I’ve made jokes in the past about the lack of definition in SDN as vendors try to adapt their solutions to fit the buzzword mold.  What I finally came to realize is that the SDN folks are all about definition. SDN is the Top Down approach to planning.  SDN seeks to decompose the network into subsystems that can be replaced or reprogrammed to suit the needs of those things which utilize the network.

As an example, SDN breaks the idea of switch down into things like “forwarding plane” and “control plane” and seeks to replace the control plane with alternative software, whether it be a controller-based architecture like OpenFlow or an overlay network similar to that of VMware/Nicira.  We can replace the OS of a switch with a concept like OpenFlow easily.  It’s just a mechanism for determining which entries are populated in the Content Addressable Memory (CAM) tables of the forwarding plane.  In top down design, it’s easy to create a stub entry or “black box” to hold information that flows into it.  We don’t particularly care how the black box works from the top of the design, just that it does its job when called upon.

Top Down designs tend to run into issues when those black boxes lack detail or are missing some critical functionality.  What happens when OpenFlow isn’t capable of processing flows fast enough to keep the CAM table of a campus switch populated with entries?  Is the switch going to fall back to process switching the packets?  That could be an big issue.  Top Down designs are usually very academic and elegant.  They also have a tendency to lack concrete examples and real world experience.  When you think about it, that does speak a lot about the early days of SDN – lots of definition of terminology and technology, but a severe lack of actual packet forwarding.

NFV – Working From The Ground Up

Network Function Virtualization takes a very different approach to the idea of turning hardware networks into software networks.  The driving principle behind NFV is replication of existing technology in a software state.  This is classic Bottom Up design.  Rather than spending a large amount of time planning and assembling the perfect system, Bottom Up designers tend to build as they go.  They concentrate on making something work first, then making their things work together second.

NFV is great for hands-on folks because it gives concrete, real results almost immediately. Once you’ve converted an load balancer or a router to a purely software-based construct you can see right away how it works and what the limitations might be.  Does it consume too many resources on the hypervisor?  Does it excel at forwarding small packets?  Does switching a large packet locally cause a fault?  These are problems that can be corrected in the individual system rapidly rather than waiting to modify the overall plan to account for difficulties in the virtualization process.

Bottom Up design does suffer from some issues as well.  The focus in Bottom Up is on getting things done on a case-by-case basis.  What do you do when you’ve converted all your hardware to software?  Do your NFV systems need to talk to one another?  That’s usually where Bottom Up design starts breaking down.  Without a grand plan at a higher level to ensure that systems can talk to each other this design methodology falls back to a series of “hacks” to get them connected.  Units developed in isolation aren’t required to play nice with everyone else until they are forced to do so.  That leads to increasing complex and fragile interconnection systems that could fail spectacularly should the wrong thread be yanked with sufficient force.


Tom’s Take

Which method is better?  Should we spend all our time planning the system and hope that our Powerpoint Designs work the right way when someone codes them in a few months?  Or should we say “damn the torpedoes” and start building things left and right and hope that someone will figure out a way to tie all these individual pieces together at some point?

Surprisingly, the most successful design requires elements of both.  People need to have a basic plan at the least when starting out on a plan to change the networking world.  Once the ideas are sketched out, you need a team of folks willing to burn the midnight oil and get the ideas implemented in real life to ensure that the plan works the right way.  The guidance from the top is essential to making sure everything works together in the end.

Whether you are leading from the top or the bottom, remember that everything has to meet in the middle sooner or later.