Nutanix and Plexxi – An Affinity to Converge

nutanix-logo

Nutanix has been lighting the hyperconverged world on fire as of late. Strong sales led to a big IPO for their stock. They are in a lot of conversations about using their solution in place of large traditional virtualization offerings that include things like blade servers or big boxes. And even coming off the recent Nutanix .NEXT conference there were some big announcements in the networking arena to help them complete their total solution. However, I think Nutanix is missing a big opportunity that’s right in front of them.

I think it’s time for Nutanix to buy Plexxi.

Software Says

If you look at the Nutanix announcements around networking from .NEXT, they look very familiar to anyone in the server space. The highlights include service chaining, microsegmentation, and monitoring all accessible through an API. If this sounds an awful lot like VMware NSX, Cisco ACI, or any one of a number of new networking companies then you are in the right mode of thinking as far as Nutanix is concerned.

SDN in the server space is all about overlay networking. Segmentation of flows and service chaining are the reason why security is so hard to do in the networking space today. Trying to get traffic to behave in a certain way drives networking professionals nuts. Monitoring all of that to ensure that you’re actually doing what you say you’re doing just adds complexity. And the API is the way to do all of that without having to walk down to the data center to console into a switch and learn a new non-Linux CLI command set.

SDN vendors like VMware and Cisco ACI would naturally have jumped onto these complaints and difficulties in the networking world and both have offered solutions for them with their products. For Nutanix to have bundled solutions like this into their networking offering is no accident. They are looking to battle VMware head-to-head and need to offer the kind of feature parity that it’s going to take a make medium to large shops shift their focus away from the VMware ecosystem and take a long look at what Nutanix is offering.

In a way, Nutanix and VMware are starting to reinforce the idea that the network isn’t a magical realm of protocols and tricks that make applications work. Instead, it’s a simple transport layer between locations. For instance, Amazon doesn’t rely on the magic of the interstate system to get your packages from the distribution center to your home. Instead, the interstate system is just a transport layer for their shipping overlays – UPS, FedEX, and so on. The overlay is where the real magic is happening.

Nutanix doesn’t care what your network looks like. They can do almost everything on top of it with their overlay protocols. That would seem to suggest that the focus going forward should be to marginalize or outright ignore the lower layers of the network in favor of something that Nutanix has visibility into and can offer control and monitoring of. That’s where the Plexxi play comes into focus.

Plexxi Logo

Affinity for Awesome

Plexxi has long been a company in search of a way to sell what they do best. When I first saw them years ago, they were touting their Affinities idea as a way to build fast pathways between endpoints to provide better performance for applications that naturally talked to each other. This was a great idea back then. But it quickly got overshadowed by the other SDN solutions out there. It even caused Plexxi to go down a slightly different path for a while looking at other options to compete in a market that they didn’t really have a perfect fit product.

But the Affinities idea is perfect for hyperconverged solutions. Companies like Nutanix are marking their solutions as the way to create application-focused compute nodes on-site without the need to mess with the cloud. It’s a scalable solution that will eventually lead to having multiple nodes in the future as your needs expand. Hyperconverged was designed to be consumable per compute unit as opposed to massively scaling out in leaps and bounds.

Plexxi Affinities is just the tip of the iceberg. Plexxi’s networking connectivity also gives Nutanix the ability to build out a high-speed interconnect network with one advantage – noninterference. I’m speaking about what happens when a customer needs to add more networking ports to support this architecture. They need to make a call to their Networking Vendor of Choice. In the case of Cisco, HPE, or others, that call will often involve a conversation about what they’re doing with the new network followed by a sales pitch for their hyperconverged solution or a partner solution that benefits both companies. Nutanix has a reputation for being the disruptor in traditional IT. The more they can keep their traditional competitors out of the conversation, the more likely they are to keep the business into the future.


Tom’s Take

Plexxi is very much a company with an interesting solution in need of a friend. They aren’t big enough to really partner with hyperconverged solutions, and most of the hyperconverged market at this point is either cozy with someone else or not looking to make big purchases. Nutanix has the rebel mentality. They move fast and strike quickly to get their deals done. They don’t take prisoners. They look to make a splash and get people talking. The best way to keep that up is to bundle a real non-software networking component alongside a solution that will make the application owners happy and keep the conversation focused on a single source. That’s how Cisco did it back and the day and how VMware has climbed to the top of the virtualization market.

If Nutanix were to spend some of that nice IPO money on a Plexxi Christmas present, I think 2017 would be the year that Nutanix stops being discussed in hushed whispers and becomes a real force to be reckoned with up and down the stack.

Meraki Will Never Be A Large Enterprise Solution

Cisco-Cloud-Networking-Meraki

Thanks to a couple of recent conversations, I thought it was time to stir the wireless pot a little. First was my retweet of an excellent DNS workaround post from Justin Cohen (@CanTechIt). One of the responses I got from wireless luminary Andrew von Nagy (@RevolutionWifi):

This echoed some of the comments that I heard from Sam Clements (@Samuel_Clements) and Blake Krone (@BlakeKrone) during this video from Cisco Live Milan in January:

During that video, you can hear Sam and Blake asking for a few features that aren’t really supported on Meraki just yet. And it all comes down to a simple issue.

Should It Just Work?

Meraki has had a very simple guiding philosophy since the very beginning. Things should be easy to configure and work without hassle for their customers. It’s something we see over and over again in technology. From Apple to Microsoft, the focus has shifted away from complexity and toward simplicity. Gone are the field of radio buttons and obscure text fields. In their place we find simple binary choics. “Do You Want To Do This Thing? YES/NO”.

Meraki believes that the more complicated configuration items confuse users and lead to support issues down the road. And in many ways they are absolutely right. If you’ve ever seen someone freeze up in front of a Coke Freestyle machine, you know how easy it is to be overwhelmed by the power of choice.

In a small business or small enterprise environment, you just need things to work. A business without a dedicated IT department doesn’t need to spend hours figuring out how to disable 802.11b data rates to increase performance. That SMB/SME market has historically been the one that Meraki sells into better than anyone else. The times are changing though.

Exceptions Are Rules?

Meraki’s acquistion by Cisco has raised their profile and provided a huge new sales force to bring their hardware and software to the masses. The software in particular is a tipping point for a lot of medium and large enterprises. Meraki makes it easy to configure and manage large access point deployments. And nine times out of ten their user interface provides everything a person could need for configuration.

Notice that was “nine times out of ten”. In an SME, that one time out of ten that something more was needed could happen once or twice in the lifetime of a deployment. In a large enterprise, that one time out of ten could happen once a month or even once a week. With a huge number of clients accessing the system for long periods of time, the statistical probability that an advanced feature will need to be configured does approach certainty quickly.

Meraki doesn’t have a way to handle these exceptions currently. They have an excellent feature request system in their “Make A Wish” feedback system, but the tipping point required for a feature to be implemented in a new release doesn’t have a way to be weighted for impact. If two hundred people ask for a feature and the average number of access points in their networks is less than five, it reflects differently than if ten people ask for a feature with an average of one thousand access points per network. It is important to realize that enterprises can scale up rapidly and they should carry a heavier weight when feature requests come in.

That’s not to say that Meraki should go the same route as Cisco Unified Communications Manager (CUCM). Several years ago, I wrote about CSCsb42763 which is a bug ID that enables a feature by typing that code into an obscure text field. It does enable the feature, but you have no idea what or how or why. In fact, if it weren’t for Google or a random call to TAC, you’d never even know about the feature. This is most definitely not the way to enable advanced features.

Making It Work For Me

Okay, the criticism part is over. Now for the constructive part. Because complaining without offering a solution is just whining.

Meraki can fix their issues with large enterprises by offering a “super config mode” to users that have been trained. It’s actually not that far away from how they validate licenses today. If you are listed as an admin on the system and you have a Meraki Master ID under your profile then you get access to the extra config mode. This would benefit both enterprise admins as well as partners that have admin accounts on customer systems.

This would also be a boon for the Meraki training program. Sure, having another piece of paper is nice. But what if all that hard work actually paid off with better configuration access to the system? Less need to call support instead of just getting slightly better access to engineers? If you can give people what they need to fix my problem without calling for support they will line up outside your door to get it.

If Meraki isn’t willing to take that giant leap just yet, another solution would be to weight the “Make A Wish” suggestions based on the number of APs covered by the user. They might even do this now. But it would be nice to know as a large enterprise end user that my feature requests are being taken under more critical advisement than a few people with less than a dozen APs. Scale matters.


Tom’s Take

Yes, the headline is a bit of clickbait. I don’t think it would have had quite the same impact if I’d titled it “How Meraki Can Fix Their Enterprise Problems”. You, the gentle reader, would have looked at the article either way. But the people that need to see this wouldn’t have cared unless it looked like the sky was falling. So I beg your forgiveness for an indulgence to get things fixed for everyone.

I use Meraki gear at home. It works. I haven’t even configured even 10% of what it’s capable of doing. But there are times when I go looking for a feature that I’ve seen on other enterprise wireless systems that’s just not there. And I know that it’s not there on purpose. Meraki does a very good job reaching the customer base that they have targeted for years. But as Cisco starts pushing their solutions further up the stack and selling Meraki into bigger and more complex environments, Meraki needs to understand how important it is to give those large enterprise users more control over their systems. Or “It Just Works” will quickly become “It Doesn’t Work For Me”.

Cisco Just Killed The CLI

DeadCLI

Gallons of virtual ink have been committed to virtual paper in the last few days with regards to Cisco’s lawsuit against Arista Networks.  Some of it is speculating on the posturing by both companies.  Other writers talk about the old market vs. the new market.  Still others look at SDN as a driver.

I didn’t just want to talk about the lawsuit.  Given that Arista has marketed EOS as a “better IOS than IOS” for a while now, I figured Cisco finally decided to bite back.  They are fiercely protective of IOS and they have to be because of the way the trademark laws in the US work.  If you don’t go after people that infringe you lose your standing to do so and invite others to do it as well.  Is Cisco’s timing suspect? One does have to wonder.  Is this about knocking out a competitor? It’s tough to say.  But one thing is sure to me.  Cisco has effectively killed the command line interface (CLI).

“Industry Standards”

EOS is certainly IOS-like.  While it does introduce some unique features (see the NFD3 video here), the command syntax is very much IOS.  That is purposeful.  There are two broad categories of CLIs in the market:

  • IOS-like – EOS, HP Procurve, Brocade, FTOS, etc
  • Not IOS-like – Junos, FortiOS, D-Link OS, etc

What’s funny is that the IOS-like interfaces have always been marketed as such.  Sure, there’s the famous “industry standard” CLI comment, followed by a wink and a nudge.  Everyone knows what OS is being discussed.  It is a plus point for both sides.

The non-Cisco vendors can sell to networking teams by saying that their CLI won’t change.  Everything will be just as easy to configure with just a few minor syntax changes.  Almost like speaking a different dialect of a language.  Cisco gains because more and more engineers become familiar with the IOS syntax.  Down the line, those engineers may choose to buy Cisco based on familiarity with the product.

If you don’t believe that being IOS-like is a strong selling point, take a look PIX and Airespace.  The old PIX OS was transformed into something that looked a lot more like traditional IOS.  In ASA 8.2 they even changed the NAT code to look like IOS.  With Airespace it took a little longer to transform the alien CLI into something IOS-like.  They even lost functionality in doing so, simply to give networking teams an interface that is more friendly to them.  Cisco wants all their devices to run a CLI that is IOS-like.  Junos fans are probably snickering right now.

In calling out Arista for infringing on the “generic command line interface” in patent #7,047,526, Cisco has effectively said that they will start going after companies that copy the IOS interface too well.  This leaves companies in a bit of conundrum.  How can you continue to produce an OS with an “industry standard” CLI and hope that you don’t become popular enough to get noticed by Cisco?  Granted, it seems that all network switching vendors are #2 in the market somehow.  But at what point does being a big enough #2 get the legal hammer brought to bear?  Do you have to be snarky in marketing messages? Attack the 800-pound gorilla enough that you anger them?  Or do you just have to have a wildly successful quarter?

Laid To REST

Instead, what will happen is a tough choice.  Either continue to produce the same CLI year and year and hope that you don’t get noticed or overhaul the whole system.  Those that choose not to play Russian Roulette with the legal system have a further choice to make.  Should we create a new, non-infringing CLI from the ground up? Or scrap the whole idea of a CLI moving forward?  Both of those second choices are going to involve a lot of pain and effort.  One of them has a future.

Rewriting the CLI is a dead-end road.  By the time you’ve finished your Herculean task you’ll find the market has moved on to bigger and better things.  The SDN revolution is about making complex networks easier to program and manage.  Is that going to be accomplished via yet another syntax?  Or will it happen because of REST APIs and programing interfaces?  Given an equal amount of time and effort on both sides, the smart networking company will focus their efforts on scrapping the CLI and building programmability into their devices.  Sure, the 1.0 release is going to sting a little.  It’s going to require a controller and some rough interface conventions.  But building the seeds of a programmable system now means it will be growing while other CLIs are withering on the vine.

It won’t be easy.  It won’t be fun.  And it’s a risk to alienate your existing customer base.  But if your options are to get sued or spend all your effort on a project that will eventually go the way of the dodo your options don’t look all that appealing anyway.  If you’re going to have to go through the upheaval of rewriting something from the ground up, why not choose to do it with an eye to the future?


Tom’s Take

Cisco and Arista won’t be finished for a while.  There will probably be a settlement or a licensing agreement or some kind of capitulation on both sides in a few years time.  But by that point, the fallout from the legal action will have finally finished off the CLI for good.  There’s no sense in gambling that you won’t be the next target of a process server.  The solution will involve innovative thinking, blood, sweat, and tears on the part of your entire development team.  But in the end you’ll have a modern system that works with the new wave of the network.  If nothing else, you can stop relying on the “industry standard” ploy when selling your interface and start telling your customers that you are setting the new standard.

 

CCIE Version 5: Out With The Old

Cisco announced this week that they are upgrading the venerable CCIE certification to version five.  It’s been about three years since Cisco last refreshed the exam and several thousand people have gotten their digits.  However, technology marches on.  Cisco talked to several subject matter experts (SMEs) and decided that some changes were in order.  Here are a few of the ones that I found the most interesting.

CCIEv5 Lab Schedule

Time Is On My Side

The v5 lab exam has two pacing changes that reflect reality a bit better.  The first is the ability to take some extra time on the troubleshooting section.  One of my biggest peeves about the TS section was the hard 2-hour time limit.  One of my failing attempts had me right on the verge of solving an issue when the time limit slammed shut on me.  If I only had five more minutes, I could have solved that problem.  Now, I can take those five minutes.

The TS section has an available 30 minute overflow window that can be used to extend your time.  Be aware that time has to come from somewhere, since the overall exam is still eight hours.  You’re borrowing time from the configuration section.  Be sure you aren’t doing yourself a disservice at the beginning.  In many cases, the candidates know the lab config cold.  It’s the troubleshooting the need a little more time with.  This is a welcome change in my eyes.

Diagnostics

The biggest addition is the new 30-minute Diagnostic section.  Rather than focusing on problem solving, this section is more about problem determination.  There’s no CLI.  Only a set of artifacts from a system with a problem: emails, log files, etc.  The idea is that the CCIE candidate should be an expert at figuring out what is wrong, not just how to fix it.  This is more in line with the troubleshooting sections in the Voice and Security labs.  Parsing log files for errors is a much larger part of my time than implementing routing.  Teaching candidates what to look for will prevent problems in the future with newly minted CCIEs that can diagnose issues in front of customers.

Some are wondering if the Diagnostic section is going to be the new “weed out” addition, like the Open Ended Questions (OEQs) from v3 and early v4.  I see the Diagnostic section as an attempt to temper the CCIE with more real world needs.  While the exam has never been a test of ideal design, knowing how to fix a non-ideal design when problems occur is important.  Knowing how to find out what’s screwed up is the first step.  It’s high time people learned how to do that.

Be Careful What You Wish For

The CCIE v5 is seeing a lot of technology changes.  The written exam is getting a new section, Network Principles.  This serves to refocus candidates away from Cisco specific solutions and more toward making sure they are experts in networking.  There’s a lot of opportunity to reinforce networking here and not idle trivia about config minimums and maximums.  Let’s hope this pays off.

The content of the written is also being updated.  Cisco is going to make sure candidates know the difference between IOS and IOS XE.  Cisco Express Forwarding is going to get a focus, as is ISIS (again).  Given that ISIS is important in TRILL this could be an indication of where FabricPath development is headed.  The written is also getting more IPv6 topics.  I’ll cover IPv6 in just a bit.

The biggest change in content is the complete removal of frame relay.  It’s been banished to the same pile as ATM and ISDN.  No written, no lab.  In it’s place, we get Dynamic Multipoint VPN (DMVPN).  I’ve talked about why Frame Relay is on the lab before.  People still complained about it.  Now, you get your wish.  DMVPN with OSPF serves the same purpose as Frame Relay with OSPF.  It’s all about Stupid Router Tricks.  Using OSPF with DMVPN requires use of mGRE, which is a Non-Broadcast Multi-Access (NBMA) network.  Just like Frame Relay.  The fact that almost every guide today recommends you use EIGRP with DMVPN should tell you how hard it is to do.  And now you’re forced to use OSPF to simulate NBMA instead of Frame Relay.  Hope all you candidates are happy now.

vCCIE

The lab is also 100% virtual now.  No physical equipment in either the TS or lab config sections.  This is a big change.  Cisco wants to reduce the amount of equipment that needs to be physically present to build a lab.  They also want to be able to offer the lab in more places than San Jose and RTP.  Now, with everything being software, they could offer the lab at any secured PearsonVUE testing center.  They’ve tried in the past, but the access requirements caused some disaster.  Now, it’s all delivered in a browser window.  This will make remote labs possible.  I can see a huge expansion of the testing sites around the time of the launch.

This also means that hardware-specific questions are out.  Like layer 2 QoS on switches.  The last reason to have a physical switch (WRR and SRR queueing) is gone.  Now, all you are going to get quizzed on is software functionality.  Which probably means the loss of a few easy points.  With the removal of Frame Relay and L2 QoS, I bet that services section of the lab is going to be really fun now.

IPv6 Is Real

Now, for my favorite part.  The JNCIE has had a robust IPv6 section for years.  All routing protocols need to be configured for IPv4 and IPv6.  The CCIE has always had a separate IPv6 section.  Not any more.  Going forward in version 5, all routing tasks will be configured for v4 and v6.  Given that RIPng has been retired to the written exam only (finally), it’s a safe bet that you’re going to love working with OSPFv3 and EIGRP for IPv6.

I think it’s great that Cisco has finally caught up to the reality of the world.  If CCIEs are well versed in IPv6, we should start seeing adoption numbers rise significantly.  Ensuring that engineers know to configure v4 and v6 simultaneously means dual stack is going to be the preferred transition method.  The only IPv6-related thing that worries me is the inclusion of an item on the written exam: IPv6 Network Address Translation.  You all know I’m a huge fan of NAT.  Especially NAT66, which is what I’ve been told will be the tested knowledge.

Um, why?!? 

You’ve removed RIPng to the trivia section.  You collapsed multicast into the main routing portions.  You’re moving forward with IPv6 and making it a critical topic on the test.  And now you’re dredging up NAT?!? We don’t NAT IPv6.  Especially to another IPv6 address.  Unique Local Addresses (ULA) is about the only thing I could see using NAT66.  Ed Horley (@EHorley) thinks it’s a bad idea.  Ivan Pepelnjak (@IOSHints) doesn’t think fondly of it either, but admits it may have a use in SMBs.  And you want CCIEs and enterprise network engineers to understand it?  Why not use LISP instead?  Or maybe a better network design for enterprises that doesn’t need NAT66?  Next time you need an IPv6 SME to tell you how bad this idea is, call me.  I’ve got a list of people.


Tom’s Take

I’m glad to see the CCIE update.  Getting rid of Frame Relay and adding more IPv6 is a great thing.  I’m curious to see how the Diagnostic section will play out.  The flexible time for the TS section is way overdue.  The CCIE v5 looks to be pretty solid on paper.  People are going to start complaining about DMVPN.  Or the lack of SDN-related content.  Or the fact that EIGRP is still tested.  But overall, this update should carry the CCIE far enough into the future that we’ll see CCIE 60,000 before it’s refreshed again.

More CCIE v5 Coverage:

Bob McCouch (@BobMcCouch) – Some Thoughts on CCIE R&S v5

Anthony Burke (@Pandom_) – Cisco CCIE v5

Daniel Dib (@DanielDibSWE) – RS v5 – My Thoughts

INE – CCIE R&S Version 5 Updates Now Official

IPExpert – The CCIE Routing and Switching (R&S) 5.0 Lab Is FINALLY Here!

Cisco CMX – Marketing Magic? Or Big Brother?

Cisco Logo

The first roundtable presenter at Interop New York was Cisco. Their Enterprise group always brings interesting technology to the table. This time, the one that caught my eye was the Connected Mobile Experience (CMX). CMX is a wireless mobility technology that allows a company to do some advanced marketing wizardry.

CMX uses your Cisco wireless network to monitor devices coming into the air space. They don’t necessarily have to connect to your wireless network for CMX to work. They just have to be beaconing for a network, which all devices do. CMX can then push a message to the device. This message can be a simple “thank you” for coming or something more advanced like a coupon or notification to download a store specific app. CMX can then store the information about that device, such as whether or not they joined the network, where they went, and how long they were there. This gives the company to pull some interesting statistics about their customer base. Even if they never hop on the wireless network.

I have to be honest here. This kind of technology gives me the bit of the creeps. I understand that user tracking is the hot new thing in retail. Stores want to know where you went, how long you stayed there, and whether or not you saw an advertisement or a featured item. They want to know your habits so as to better sell to you. The accumulation of that data over time allows for some patterns to emerge that can drive a retail operation’s decision making process.

A Thought Exercise

Think about an average person. We’ll call him Mike. Mike walks four blocks from his office to the subway station every day after work. He stops at the corner about halfway between to cross a street. On that street just happens to be a coffee shop using something like CMX. Mike has a brand new phone that uses wifi and bluetooth and Mike keeps them on all the time. CMX can detect when the device comes into range. It knows that Mike stays there for about 2 minutes but never joins the network. It then moves out of the WLAN area. The data cruncher for the store wants to drive new customers to the store. They analyze the data and find that lots of people stay in the area for a couple of minutes. They equate this to people stopping to decide if they want to have a cup of coffee from the shop. They decide to create a CMX coupon push notification that pops up after one minute on devices that have been seen in the database for the last month. Mike will see a coupon for $1 off a cup of coffee the next time he waits for the light in front of the coffee shop.

That kind of reach is crazy. I keep thinking back to the scenes in Minority Report where the eye scanners would detect you looking at an advertisement and then target a specific ad based on your retina scan. You may say that’s science fiction. But with products like CMX, I can build a pretty complete profile of your behavior even if I don’t have a retina scan. Correlating information provides a clear picture of who you are without any real identity information. Knowing that someone likes to spend their time in the supermarket in the snack aisles and frozen food aisles and less time in the infants section says a lot. Knowing the route a given device takes through the store can help designers place high volume items in the back and force shoppers to take longer routes past featured items.


Tom’s Take

I’m not saying that CMX is a bad product. It’s providing functionality that can be of great use to retail companies. But, just like VHS recorders and Bittorrent, good ideas can often be used to facilitate things that aren’t as noble. I suggested to the CMX developers that they could implement some kind of “opt out” message that popped up if I hadn’t joined the wireless network in a certain period of time. I look at that as a way of saying to shoppers “We know you aren’t going to join. Press the button and we’ll wipe our your device info.” It puts people at ease to know they aren’t being tracked. Even just showing them what you’re collecting is a good start. With the future of advertising and marketing focusing on instant delivery and data gathering for better targeting, I think the products like CMX will be powerful additions. But, great power requires even greater responsibility.

Tech Field Day Disclaimer

Cisco was a presenter at the Tech Field Day Interop Roundtable.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

Disruption in the New World of Networking

This is the one of the most exciting times to be working in networking. New technologies and fresh takes on existing problems are keeping everyone on their toes when it comes to learning new protocols and integration systems. VMworld 2013 served both as an annoucement of VMware’s formal entry into the larger networking world as well as putting existing network vendors on notice. What follows is my take on some of these announcements. I’m sure that some aren’t going to like what I say. I’m even more sure a few will debate my points vehemently. All I ask is that you consider my position as we go forward.

Captain Over, Captain Under

VMware, through their Nicira acquisition and development, is now *the* vendor to go to when you want to build an overlay network. Their technology augments existing deployments to provide software features such as load balancing and policy deployment. In order to do this and ensure that these features are utilized, VMware uses VxLAN tunnels between the devices. VMware calls these constructs “virtual wires”. I’m going to call them vWires, since they’ll likely be called that soon anyway. vWires are deployed between hosts to provide a pathway for communications. Think of it like a GRE tunnel or a VPN tunnel between the hosts. This means the traffic rides on the existing physical network but that network has no real visibility into the payload of the transit packets.

Nicira’s brainchild, NSX, has the ability to function as a layer 2 switch and a layer 3 router as well as a load balancer and a firewall. VMware is integrating many existing technologies with NSX to provide consistency when provisioning and deploying a new sofware-based network. For those devices that can’t be virtualized, VMware is working with HP, Brocade, and Arista to provide NSX agents that can decapsulate the traffic and send it to an physical endpoint that can’t participate in NSX (yet). As of the launch during the keynote, most major networking vendors are participating with NSX. There’s one major exception, but I’ll get to that in a minute.

NSX is a good product. VMware wouldn’t have released it otherwise. It is the vSwitch we’ve needed for a very long time. It also extends the ability of the virtualization/server admin to provision resources quickly. That’s where I’m having my issue with the messaging around NSX. During the second day keynote, the CTOs on stage said that the biggest impediment to application deployment is waiting on the network to be configured. Note that is my paraphrasing of what I took their intent to be. In order to work around the lag in network provisioning, VMware has decided to build a VxLAN/GRE/STT tunnel between the endpoints and eliminate the network admin as a source of delay. NSX turns your network in a fabric for the endpoints connected to it.

Under the Bridge

I also have some issues with NSX and the way it’s supposed to work on existing networks. Network engineers have spent countless hours optimizing paths and reducing delay and jitter to provide applications and servers with the best possible network. Now, that all doesn’t matter. vAdmins just have to click a couple of times and build their vWire to the other server and all that work on the network is for naught. The underlay network exists to provide VxLAN transport. NSX assumes that everything working beneath is running optimally. No loops, no blocked links. NSX doesn’t even participate in spanning tree. Why should it? After all, that vWire ensures that all the traffic ends up in the right location, right? People would never bridge the networking cards on a host server. Like building a VPN server, for instance. All of the things that network admins and engineers think about in regards to keeping the network from blowing up due to excess traffic are handwaved away in the presentations I’ve seen.

The reference architecture for NSX looks pretty. Prettier than any real network I’ve ever seen. I’m afraid that suboptimal networks are going to impact application and server performance now more than ever. And instead of the network using mechanisms like QoS to battle issues, those packets are now invisible bulk traffic. When network folks have no visibility into the content of the network, they can’t help when performance suffers. Who do you think is going to get blamed when that goes on? Right now, it’s the network’s fault when things don’t run right. Do you think that moving the onus for server network provisioning to NSX and vCenter is going to forgive the network people when things go south? Or are the underlay engineers going to be take the brunt of the yelling because they are the only ones that still understand the black magic outside the GUI drag-and-drop to create vWires?

NSX is for service enablement. It allows people to build network components without knowing the CLI. It also means that network admins are going to have to work twice as hard to build resilient networks that work at high speed. I’m hoping that means that TRILL-based fabrics are going to take off. Why use spanning tree now? Your application and service network sure isn’t. No sense adding any more bells and whistles to your switches. It’s better to just tie them into spine-and-leaf CLOS fabrics and be done with it. It now becomes much more important to concentrate on the user experience. Or maybe the wirless network. As long as at least one link exists between your ESX box and the edge switch let the new software networking guys worry about it.

The Recumbent Incumbent?

Cisco is the only major networking manufacturer not publicly on board with NSX right now. Their CTO Padma Warrior has released a response to NSX that talks about lock-in and vertical integration. Still others have released responses to that response. There’s a lot of talk right now about the war brewing between Cisco and VMware and what that means for VCE. One thing is for sure – the landscape has changed. I’m not sure how this is going to fall out on both sides. Cisco isn’t likely to stop selling switches any time soon. NSX still works just fine with Cisco as an underlay. VCE is still going to make a whole bunch of money selling vBlocks in the next few months. Where this becomes a friction point is in the future.

Cisco has been building APIs into their software for the last year. They want to be able to use those APIs to directly program the network through devices like the forthcoming OpenDaylight controller. Will they allow NSX to program them as well? I’m sure they would – if VMware wrote those instructions into NSX. Will VMware demand that Cisco use the NSX-approved APIs and agents to expose network functionality to their software network? They could. Will Cisco scrap OnePK to implement NSX? I doubt that very much. We’re left with a standoff. Cisco wants VMware to use their tools to program Cisco networks. VMware wants Cisco to use the same tools as everyone else and make the network a commodity compared to the way it is now.

Let’s think about that last part for a moment. Aside from some speed differences, networks are largely going to be identical to NSX. It won’t care if you’re running HP, Brocade, or Cisco. Transport is transport. Someone down the road may build some proprietary features into their hardware to make NSX run better but that day is far off. What if a manufacturer builds a switch that is twice as fast as the nearest competition? Three times? Ten times? At what point does the underlay become so important that the overlay starts preferring it exclusively?


Tom’s Take

I said a lot during the Tuesday keynote at VMworld. Some of it was rather snarky. I asked about full BGP tables and vMotioning the machines onto the new NSX network. I asked because I tend to obsess over details. Forgotten details have broken more of my networks than grand design disasters. We tend to fuss over the big things. We make more out of someone that can drive a golf ball hundreds of yards than we do about the one that can consistently sink a ten foot putt. I know that a lot of folks were pre-briefed on NSX. I wasn’t, so I’m playing catch up right now. I need to see it work in production to understand what value it brings to me. One thing is for sure – VMware needs to change the messaging around NSX to be less antagonistic towards network folks. Bring us into your solution. Let us use our years of experience to help rather than making us seem like pariahs responsible for all your application woes. Let us help you help everyone.

Poaching CCIEs

CCIEIce

During the CCIE Netvet Reception at Cisco Live 2013, a curious question came up during our Q&A session with CEO John Chambers. Paul Borghese asked if it was time for the partner restriction on CCIE tenure to be lifted in order to increase the value of a CCIE in the larger market. For those not familiar, when a CCIE is hired by a Cisco partner, they need to attach their number to the company in order for the company to receive the benefits of having hired a CCIE. Right now, that means counting toward the CCIE threshold for Silver and Gold status. When a CCIE leaves the the first company and moves to another partner their number stays associated with the original company for one year and cannot be counted with the new company until the expiration of that year.

There are a multitude of reasons why that might be the case. It encourages companies to pay for CCIE training and certification if the company knows that the newly-minted CCIE will be sticking around for at least a year past their departure. It also provides a lifeline to a Cisco partner in the event a CCIE decides to move on. By keeping the number attached to the company for a specific time period, the original company has the time necessary to hire or train new resources to take over for the departed CCIE’s job role. If the original partner is up for any contracts or RFPs that require a CCIE on staff, that grace period could be the difference between picking up or losing that contract.

As indicated above, Paul asked if maybe that policy needed to change. In his mind, the restriction of the CCIE number was causing CCIEs to stay at their current companies because their inability to move their number to the new company in a timely manner made them less valuable. I know now that the question came on behalf of Eman Conde, the CCIE Agent, who is very active in making sure the rights and privileges of CCIEs everywhere are well represented. I remember meeting Eman for the first time back at Cisco Live 2008 at an IPExpert party, long before I was a CCIE. In that time, Eman has worked very hard to make sure that CCIEs are well represented in the job market.  It is also in Eman’s best interests to ensure that CCIEs can move freely between companies without restriction.

My biggest fear is that removing the one-year association restriction for Cisco Partners will cause partners to stop funding CCIE development.  I was very fortunate to have my employer pay the entire cost of my CCIE from beginning to end.  In return, I agreed in principle to stay with them for a period of time and not seek employment from anyone else.  There was no agreement in place.  There was no contract.  Just a handshake.  Even after I left to go work with Gestalt IT, my number is locked to them for the next year.  This doesn’t really bother me.  It does make them feel better about moving to a competitor.  What would happen if I could move my number freely to the next business without penalty?

Could you imagine a world where CCIEs were being paid top dollar to work at a company not for their knowledge but because it was cheaper to buy CCIEs that it was to build them?  Think of a sports team that doesn’t have a good minor league system but instead buys their talent for absurd amounts of money.  If you had pictures of the New York Yankees in your head, you probably aren’t far removed from my line of thinking.  When the only value of a CCIE is associating the number to your company then you’ve missed the whole point of the program.

CCIEs are more valuable than their number.  With the exception of the Gold/Silver partner status their number is virtually useless.  What is more important is the partner specializations they can bring it.  My CCIE was pointless to my old employer since I was the only one.  What was a greater boon was all the partner certifications that I brought for unified communications, UCS implementation, and even project management.  Those certifications aren’t bound to a company.  In fact, I would probably be more marketable by going to a small partner with one CCIE or going to a silver partner with 3 CCIEs and telling them that I can bring in new lines of partner business while they are waiting for my number to clear escrow.  The smart partners will realize the advantage and hire me on and wait.  Only an impatient partner that wants to build a gold-level practice today would want to avoid number lock-in.

I don’t think we need to worry about removing the CCIE association restriction right now.  It serves to entice partners to fund CCIEs without worrying about them moving on as soon as they get certified.  Termination results in the number being freed up upon mutual agreement.  Most CCIEs that I’ve heard of that left their jobs soon after certification did it because their company told them they can’t afford to pay a CCIE.  Forcing small employers to let CCIEs walk away to bigger competitors with no penalty will prevent them from funding any more CCIE training.  They’ll say, “If the big partners want CCIEs so badly that they’ll pay bounties then let the big partners do all the training too.”  I don’t even think an employer non-compete would fix the issue as those aren’t enforceable in many states.  I think the program exists the way it does for a reason.  With all due deference to Eman and Paul, I don’t think we’ve reached the point where CCIE free agency is ready for prime time.