Cisco Data Center – Network Field Day 2

Our first presenters up on the block for day 1 of Network Field Day 2 were from the Cisco Data Center team. We arrived at Building Seven at the mothership and walked into the Cisco Cloud Innovation Center (CCIC). We were greeted by Omar Sultan and found our seats in the briefing center. Omar gave us a brief introduction, followed by Ron Fuller, Cisco’s TME for Nexus switching. In fact, he may have written a book about it.  Ron gave us an overview of the Nexus line, as well as some recent additions in the form of line cards and OS updates. His slides seemed somewhat familiar by now, but having Ron explain them was great for us as we could ask questions about reference designs and capabilities. There were even hints about things like PONG, a layer 2 traceroute across the fabric. I am very interested to hear a little more about this particular little enhancement.

Next up were representatives from two of Cisco’s recent acquisitions related to cloud-based services, Linesider and NewScale (Cisco Cloud Portal). They started out their presentations similarly, reminding us of “The Problem”:

The Probelm

The Problem - courtesy of Tony Bourke

You might want to bookmark this picture, I’m going to be referring to it a lot. The cloud guys were the first example of a familiar theme from day 1 – Defining the Problem. It seems that all cloud service providers feel the need to spend the beginning of the presentation telling us what’s wrong. As I tweeted the next day, I’m pretty sick of seeing that same old drawing over and over again. By now, I think we have managed to identify the problem down to the DNA level. There is very little we don’t know about the desire to automate provisioning of services in the cloud so customers in a multi-tenancy environment can seamlessly click their way into building application infrastructure. That being said, with the brain trust represented by the Network Field Day 2 delegates, it should be assumed that we already know the problem. What needs to happen is that the presenters need to show us how they plan to address it. In fact, over the course of Network Field Day, we found that the vendors have identified the problem to death, but their methods of approaching the solution are all quite different. I don’t mean to pick on Cisco here, but they were first up and did get “The Problem” ball rolling, so I wanted to set the stage for later discussion. Once we got to the Cisco IT Elastic Infrastructure Services (CITEIS) demo, however, the ability to quickly create a backend application infrastructure was rather impressive. I’m sure for providers in a multitenancy environment that will be a huge help going forward to reduce the need to have staff sitting around just to spin up new resource pools. If this whole process can truly be automated as outlined by Cisco, the factor by which we can scale infrastructure will greatly increase.

After the cloud discussion, Cisco brought Prashant Ghandi from the Nexus 1000V virtual switching line to talk to us. Ivan Pepelnjak and Greg Ferro perked up and started asking some really good questions during the discussion of the capabilities of the 1000V and what kinds of things were going to be considered in the future. We ended up running a bit long on the presentation which set the whole day back a bit, but the ability to ask questions of key people involved in virtual switching infrastructure is a rare treat that shoud be taken advantage of whenever possible.

<<EDIT>>

Now with video goodness! Part 1: Nexus

Part 2: Cloud Orchestration and Automation

Part 3: Nexus 1000V


Tom’s Take

I must say that Cisco didn’t really bring us much more than what we’ve already seen. Maybe that’s a big credit to Cisco for putting so much information out there for everyone to digest when it comes to data center networking. Much like their presentation at Wireless Field Day, Cisco spends the beginning and end of the presentation reviewing things with us that we’re familiar with allowing for Q&A time with the key engineers. The middle is reserved for new technology discussion that may not be immediately relevant but represents product direction for what Cisco sees as important in the coming months. It’s a good formula, but when it comes to Tech Field Day, I would rather Cisco take a chance to let us poke around on new gear or let us ask the really good questions about what kind of things Cisco sees coming down the road.


Tech Field Day Disclaimer

Cisco was a sponsor of Network Field Day 2, as as such was responsible for paying a portion of my travel and lodging fees. At no time did Cisco ask for, nor where they promised any kind of consideration in the drafting of this review. The analysis and opinions herein are mine and mine alone.

Why I Hate The Term “Cloud”

Now that it has been determined that I Am The Cloud, I find it somewhat ironic that I don’t care much for that word. Sadly, while there really isn’t a better term out there to describe the mix of services being offered to abstract data storage and automation, it sill irritates me.

I draw a lot of network diagrams. I drag and drop switches and routers hither and yon. However, when I come to something that is outside my zone of control, whether it be a representation of the Internet, the Public Switched Telephone Network (PSTN) or private ISP connection circuit, how do I represent that in my drawing? That’s right, with a puffy little cloud. I use clouds when I have to show something that I don’t control. When there are a large number of devices beyond my control I wrap them in the wispy borders of the cloud.

So when I’m talking about creating a cloud inside my network, I feel uneasy calling it that.  Why? Because these things aren’t unknown to me. I created the server farms and the switch clusters to connect my users to their data. I build the pathways between storage arrays and campus LAN. There is nothing unknown to me. I’m proud of what I’ve built. Why would I hide it in the clouds? I’d rather show you what the infrastructure looks like.

To the users though, it really is a cloud. It’s a mystical thing sitting out there that takes my data and gives me back output. Users don’t care about TRILL connections or Fibre Channel over Ethernet (FCoE). If I told them their email got ferried around on the backs of unicorns, they’d probably belive me. They don’t really want to know what’s inside the cloud, so long as they can still get to what they want. In fact, when users want to create something cloud-like today without automation and provisioning servers in place, they’re likely to come ask me to do it. Hence the reason why I’m The Cloud.

I Am The Cloud

I am the cloud.

When users see into the dark place they dare not look, they will find me staring back at them. I am the infrastructure. I am the platform. I am the service. I provision. I hypervise. The unknown is known to me. Commodity IT is my currency. I am public. I am private. When users want a new resource, I am the one click they make. When the magic happens, it is because of me. I scale infinitely. My throughput is legendary.

I am the cloud.

Info about Open Flow

I will be attending the Packet Pushers OpenFlow Symposium at Network Field Day 2 next week in San Jose, CA.  OpenFlow is a disruptive technology that looks to change the way the many of us think about network traffic and how flows are distributed.  It’s still very early in the development phase, but thanks to Ethan Banks and Greg Ferro I’m going to get the change to listen to companies like Google and Yahoo talk about how they are using OpenFlow as well as hearing from network Vendors current supporting OpenFlow initiatives, like NEC, Juniper, and Big Switch Networks.

If you would like to brush up on some OpenFlow topics ahead of the symposium on Wednesday, October 26th, here are some great links to information about the ins and outs of OpenFlow:

Packet Pushers Show 68: Practical Introduction and Application of Ope Flow Networking – Watch this one first.  Greg really breaks down what OpenFlow is what it’s capable of.

Big Switch Network, OpenFlow, and Virtual NetworkingDerick Winkworth has done a great job at the Packet Pushers blog site going into depth about OpenFlow.  He’s an evangelist and has a lot of hope for what OpenFlow can do.  All of his articles are great, but this one in particular shows how one vendor is using OpenFlow.

IOS Hints Open Flow Posts – I’m just going to go ahead and link to the entire list of Ivan Pepelnjak’s OpenFlow posts.  He plays more of the realist and does a great job of digging deep into the current state of OpenFlow.  He’s also quick to keep us grounding in the fact that OpenFlow is still very young and has lots of potential if it ever takes off.  Worth a read after you’ve caught up on what OpenFlow is from the above sources.

If you have any questions about OpenFlow that you would like asked at the symposium, feel free to leave them in the comments and I’ll try to bring them up to the panel.  I look forward to attending this great event and learning more about the future of networking.

The Wild Wild Campus

The venerable Catalyst 6500 is a switching platform that has been around for several presidential administrations.  It’s a workhorse that has helped provide connectivity for what we now refer to as the campus as well as providing a solution for the data center.  However, like star athlete in it’s waning years, it’s getting old.  Ethan has decided that the 6500 is past its prime in the campus core.  Others beg to differ.  Who’s right?

I think the issue here isn’t one of the switch being past it’s prime compared to the new products on the horizon.  Sure, the next up-and-coming sheriff is going to eclipse the veteran sitting on his horse.  The question comes down to being properly suited for the role. I think what we need to consider is that the campus userland is a totally different area than the pristine beauty of a data center.

In a data center, I have total control over what happens in my space.  I know every server and and connection.  I have everything documented and categorized.  Nothing happens without my notice or approval.  It’s a very regimented structure that keeps the critical infrastructure running while minimizing surprises.  When I have complete control over my environment, I can contemplate ideas like turning off Spanning-Tree Protocol (STP) to increase performance or disabling Port Security to prevent MAC issues with multihomed servers.  Because I can say with reliability that I know where everything is connected, I can start looking at ways to make it all run as fast as possible.  This would be like a new lawman coming into town and instituting his brand of justice learned from the Army.  Very tight, very regimented.  But based totally on rules that are very unlike the real world.

In the campus LAN, however, things begin to look more like the wild west.  At the access layer closest to the users, it’s not uncommon to see a whole host of protection mechanisms designed to prevent catastrophic network disaster from propagating to the core of your campus and then on to the data center.  Turn off STP in the access layer?  That’s a resume-generating event.  Disable port security?  Okay, but you better be ready for the onslaught of garbage.  Campus LANs aren’t the structured beauty of your data center.  At best, they are the Shootout at the OK Corral.  We employ host protection and QoS mechanisms to be sure that those gunslinger users can’t affect more than their own little domain.  No bulky FTP transfers killing the phone system.  No renegade switches being placed under desks and affecting STP paths.

To me, the distinction comes from the fact that the Nexus line of switches that we put in the data center is focused on that structured environment.  Since NX-OS is a fork of Cisco’s SAN OS, it is focused on providing connectivity among servers and storage arrays in that carefully cultivated environment.  Put one of these out in the campus core and you might find that connectivity is blazingly fast…right up to the point where someone creates a broadcast storm.  I’m sure there are mechanisms in place to prevent these kinds of things.  I just don’t know if they are as tested as the ones in the granddaddy 6500.  The 6500 also comes with a variety of service module options to help alleviate issues, such as the Firewall Service Module (FWSM) and Network Analysis Module (NAM), not to mention the wireless connectivity options afforded by a WiSM.

Ethan (and others) point out that the 6500 is reaching the end of its ability to keep up with faster and faster connectivity options.  The new Sup-2T now has the ability to introduce more 10 Gigabit ports on a linecard to aggregate links.  The Nexus line has a laundry list of 10 Gigabit connectivity options, not to mention options for 40 Gigabit and 100 Gigabit Ethernet.  Rumor has it a 40 Gigabit option will be available for the 6500 at some point, but it will likely be limited for a while due to backplane considerations (as well as requiring a new Supervisor engine).  So where does that leave the 6500?

I think what will end up happening soon is that the 6500 will become less of a campus core switch and move down into the distribution layer, perhaps even all the way to the access layer.  The 6500 still has the tools that an old sheriff needs to keep the peace in Userland.  With 10Gig and 40Gig connectivity, it can provide a fast backhaul to the distribution layer if used as an access layer device.  If it lies in the distribution layer, the ability to aggregate 10Gig links coming from the access layer is very crucial to users as the majority of traffic begins to move into the data center for things like Virtual Desktop Integration (VDI) and other heavy traffic loads.  Add in the ability of the 6500 to make intelligent decisions via service modules and you have a great device to offload complicated decision making from a core switch and allow the core to switch/route packets at high speed wherever they need to go.  This could allow you to use the new Nexus 7009 or Nexus 5500 series in the campus core and extend FabricPath/TRILL connections into the campus LAN.  That will allow the 6500 to live on providing things the Nexus can’t right now, like PoE+ and Low Latency Queuing (LLQ) which are critical to voice guys like me.

Before you say that putting a 6500 at the access layer is a mighty expensive proposition, just think about what ripping out your existing access layer to provide 10Gig uplink connectivity and Gigabit-to-the-desktop will run.  Now, add in redundancy for power and uplinks.  Odds are good the numbers are starting to get up there.  Now, think about the investment of reusing a good platform like the 6500.  You’ve already invested in supervisors and power redundancy.  If you can pick up some 10/100/1000 PoE linecards to fill it up, you have a great way to aggregate wiring closets and begin to deploy important network services closer to the edge to prevent those outlaws from rustling your precious data center network.


Tom’s Take

Any time the idea of the 6500 is brought up, you’re going to see polarization.  Some will stand by their old sheriff, confident in the fact that he’s seen it all and done it all.  Sure, he isn’t the fastest gun in the West anymore.  He doesn’t need to be.  He’s still got the smarts to outfox any outlaw.  The other camp decries that fact that the 6500 has been around before the turn of the millennium.  What they want is to put the fancy city slicker Nexus everywhere and show everyone his fancy brand of the law.  I think in this case the Catalyst still has a lot of life left to provide connectivity and services to the end users while the back end of the campus transitions to the newer high-speed platforms.  I doubt 10Gig-to-the-desktop is coming any time soon, based on Cat 6E/7 cable costs and inability to run fiber on a laptop.  That will eliminate the major points in favor of a total campus Nexus deployment.  Despite what others may say, I think the 6500 is a perfect option here, especially with the Sup-2T and newer line cards.  Just because it’s a little long in the tooth doesn’t mean it doesn’t know how the Wild Wild Campus was won.

My Way or the Highway

If Twitter is good for nothing else, it can generate a lot of great entertainment.  Last Friday, when everyone was unwrapping a new Fruit Company Mobile Telephony Device, a back-and-forth erupted on Twitter between Christofer Hoff (@Beaker) and Brad Hedlund (@BradHedlund).  The nature of the discussion started as to whether VXLAN and NVGRE were “standards” or just experimental drafts.  Tony Bourke (@TBourke) also chimed in toward the end in regards to proprietary implementations in new data center switching fabrics.  As I’ve written a little about the fabric conundrum before, I spent some time thinking about proprietary-ness in general where it applies in the data center.  Besides, I had to wait to setup my mobile telephony device, so I had the free time.

To directly answer Tony’s question here, I’m going to have to side with Beaker on this one.  Proprietary is very cut and dried.  You are either entirely open and run the same implementation with everyone, like OSPF or IS-IS, or you are closed and only play well with your own kind, like EIGRP or IRF.  I’ve always been of the mind that proprietary implementations of things aren’t necessarily a bad thing.  When you are starting out creating something from scratch or attacking a difficult problem you don’t often have time to think about whether or not the solution will be open and documented for future generations.  This is typified by the idea, “Just make it work.  I’ll figure out the details later.”  All well and good for someone that is creating parts for a hand-made bookcase or designing a specialized tool.  Not so great for those that come after you and have to make your new proprietary widget behave with the rest of society.

Standardization is what you get when someone comes in behind you and tries to figure out what you’ve done.  Reverse engineering or disassembly help people figure out what you were thinking.  Then we take what we’ve learned and try to make that repeatable.  We write rules and guidelines that govern the behavior of the standard.  This many inches, this voltage input, and so on.  The rules help make sure everyone plays nice with each other.

Standards are important for the sake of predictability.  Electrical sockets, shoe sizes, and vehicle tire sizes are a great example of why standardization is important.  Proprietary tire sizes at best mean that you don’t have much choice in where you buy your product.  At worst, you end up with the problems faced in the American Civil War, where the Confederacy was littered with railroad tracks of varying gauge size, which led to an inability to effectively move troops and supplies around the war zone.  Not that a routing protocol has anything to do with a bloody civil war, but it does illustrate the point.

An Example

In today’s data center, vendors are starting to produce solutions that are more and more proprietary over a larger area.  Tony was right in that scale matters.  Only it wasn’t in regards to how proprietary a solution is.  Instead, think of it in terms of the effect it has on your network.  Today, proprietary interfaces between line cards and backplanes mean you can’t put a Brocade card in an HP switch.  Previously, our largest scale example was running a routing protocol like EIGRP in the network.  If you wanted to put a non-Cisco router somewhere, you were forced to interoperate by adding static routes pointing to your device at the edge of EIGRP or by running a routing protocol like OSPF and doing redistribution.  I think the routing protocol example is a great way to illustrate the growing data center fabric trend toward vendor happiness.  I’m going to pick on EIGRP here since it is the most obvious example of a proprietary protocol.

If you’ve only ever bought Cisco equipment in your network and need to run a routing protocol, EIGRP is an easy choice.  It’s simple to configure and runs with little intervention.  Kind of “set it and forget it” operation.  That doesn’t change so long as everything is Cisco.  The first Brocade switch you buy, however, is going to cause some issues.  If you want to simply send traffic to that network, you can plug in static routes.  Not an elegant or scalable solution, but it does work.  If you want to address scalability, you can look at reconfiguring your network to use the standard OSPF routing protocol.  That could take days or weeks to accomplish.   You would also have to learn a new protocol.  The investment of time to be standardized would probably be insurmountable.  The last option is to interoperate via redistribution.  By running OSPF on your Brocade switch and redistributing it into EIGRP you can achieve a stable network topology.  You lose EIGRP-specific benefits when you move into the OSPF area (and vice versa) but everything works.  Depending on your level of unease with this solution, you might even be tempted to avoid buying the Brocade switch and just stick with Cisco.  That way, all you have to do is turn up EIGRP and keep on running.

In the same way in a data center, once a large fabric is implemented, you have specific benefits that a particular vendor’s solution provides.  When you want to make your fabric talk to another fabric, you have to strip out the extra “goodies” and send that packet over to the other side.  Much like creating an ASBR in OSPF, this isn’t necessarily a fun job.  You’re going to lose functionality when that transition is made.  Maybe the packets are switched a fraction of a second slower.  Maybe the routing lookups take half a second more.  The idea is that you will look at the complexity of interoperating and decide it’s not worth the hassle.  Instead, you’ll just keep buying the vendor’s fabric solutions no matter how proprietary they may be.  Because you want things to just work.  There’s nothing malicious or vindictive on the vendor’s part.  They are selling the best product they can.  They are offering you the option of interoperating.

Remember that interoperating isn’t like running a standard protocol across your network like OSPF.  It’s consciously deciding to create a point of demarcation for the purposes of making nice with someone else’s proprietary ideas.  It’s messy to setup and a pain to manage.  But it’s not vendor lock-in, right?  So we learn to live with large-scale proprietary-ness with small connections of interoperability for the sake of simplicity.


Tom’s Take

Having something be proprietary isn’t bad.  Think of a finely tailored suit.  Totally custom made and well worth it.  The key is to understand what being proprietary means to you and your network.  You must realize that you are going to commit extra resources one way or another.  You commit them in the form of capital by being beholden to a specific vendor for their equipment that works best with it’s own kind.  The other method is to save capital resources and instead expend time and effort making all the solutions work together for the good of the data center.  Either way, there is an opportunity cost associated.  Some Network Rock Stars are perfectly happy buying everything from Vendor X.  Others chafe at the idea of being indentured to any one vendor and would rather spend their time tweaking knobs and switches to make everything run as efficiently as possible despite being heterogeneous.  One isn’t necessarily better than the other.  They key is to recognize which is better for you and your network and act accordingly.

Bogon Poetry

I was thinking the other day that I’ve used the term bogon in several Packet Pushers podcasts and never really bothered to define it for my readers.  Sure, you could go out and search on the Internet.  But you’ve got me for that!

Bogon is a term used in networking to describe a “bogus address”.  According to Wikipedia, Fount of All Knowledge, the term originated from a hacker reference to a single unit of bogosity, which is the property of being bogus.  I personally like to think of it as standing for BOGus Network (forgive my spelling).  Not that this refers to undesirable packets, which is not to be confused with vogon, which is a class of undesirable bureaucrats that run the galaxy or a bogan, which is an undesirable class of socioeconomics in Australia (if you’re American, think “redneck” or “white trash”).

Bogons are addresses that should never be seen as the source of packets that are entering your network.  The most stable class of bogon isn’t actually a bogon.  It’s a martian, so called because they look like they are coming from Mars, which is a place packets clearly cannot be sourced from…yet.  Martians include any address space that is listed as reserved by RFC1918 or RFC5735.  It’s a pretty comprehensive list, especially in RFC5735 so take few moments to familiarize yourself with it.  You’ll see the majority of private networks along with APIPA addressing and a few lesser-known examples of bogus networks as well.

The other component of a bogon is an address that shouldn’t exist on the public Internet.  Beyond the aforementioned Martians, the only other bogons should be IP blocks that haven’t yet been allocated by IANA to the RIRs.  However, that list should be almost empty right now, as IANA has exhausted all its available address space and given it over to the 5 RIRs.  The folks over at Team Cymru (that prononunced kum-ree for those not fortunate to be fluent in Welsh) have put together a list of what they call “fullbogons” which lists the prefixes assigned to RIRs but not yet handed out to ISPs for consumption by customers.  Traffic being sourced from this range should be treated as dubious until the range is allocated by the RIR.  The fullbogon list is updated very frequently as the hungry, hungry Internet gobbles up more and more prefixes, so if you are going to use it please stay on top of it.

How Do I Use Bogons?

My preferred method of using a bogon list is in an edge-router access list (ACL) designed to filter traffic before it ever lands on my network.  By putting the ACL on the very edge of the network, the traffic never gets the chance to hop to my firewall for evaluation.  I’d prefer to save every spare CPU cycle I could on that puppy.  My access list looks something like this (taken from Team Cymru’s bogon list today):

!
access-list 1 deny 0.0.0.0 0.255.255.255
access-list 1 deny 10.0.0.0 0.255.255.255
access-list 1 deny 127.0.0.0 0.255.255.255
access-list 1 deny 169.254.0.0 0.0.255.255
access-list 1 deny 172.16.0.0 0.15.255.255
access-list 1 deny 192.0.0.0 0.0.0.255
access-list 1 deny 192.0.2.0 0.0.0.255
access-list 1 deny 192.168.0.0 0.0.255.255
access-list 1 deny 198.18.0.0 0.1.255.255
access-list 1 deny 198.51.100.0 0.0.0.255
access-list 1 deny 203.0.113.0 0.0.0.255
access-list 1 deny 224.0.0.0 15.255.255.255
access-list 1 deny 240.0.0.0 15.255.255.255
access-list 1 permit any
!
!
interface FastEthernet 0/0
description Internet_Facing
ip access-group 1 in
!

That should wipe out all the evil bogons and martians try to invade your network.  If you want to use the fullbogon list, obviously your ACL would be considerably longer and need to be updated more frequently.  The above list is just the basic bogon/martian detection and should serve you well.

Tom’s Take

Blocking these spoofed networks before they can make it to you is a huge help in preventing attacks and spurious traffic from overwhelming you as a Network Rock Star.  Every little bit helps today with all of the reliance on the Internet, especially as we start moving toward…The Cloud.  If you sit down and block just the regular bogon list I’ve outlined above, you can block up to 60% (Warning: Powerpoint) of the obviously bad stuff trying to get to your network.  That should be a big relief to you and let you have a few minutes of free time to take up a new hobby, like poetry.

Thanks to Team Cymru for all the information and stats.  Head over to http://www.team-cymru.org to learn more about all those nasty bogons and how to stop them.

Crunch Time

Everyone in IT has been there before.  The core switches are melting down.  The servers are formatting themselves.  Packets are being shuffled off to their doom.  Human sacrifce, dogs and cats living together, mass hysteria.  You know, the usual.  What happens next?

Strangely enough, how IT people react to stressful situations such as these has become a rather interesting study habit of mine.  I know how I react when these kinds of things start happening.  I go into my own “panic mode”.  It’s interesting to think about what changes happen when the stress levels get turned up and problems start mounting.  I start becoming short with people.  Not yelling or screaming, per se.  I start using short declarative sentences at an elevated tone of voice to get my point across.  I being looking for solutions to problems, however inelegant they may be.  Quick fixes rule over complicated designs.  I’ve trained myself to eliminate the source of stress or the cause of the problem.  I tend to tune out any other distractions until the issues at hand are sorted out.  Should I find myself in a situation where I can effect a solution to the problem, or if I’m waiting on someone or something to happen outside my directly control, that is the time when the stress starts mounting.  To those that share my “can do” attitude, this makes me look efficient and helpful in times of crisis.  To others, I look like a complete jerk.

I’ve also found that there are others in IT (and elsewhere) that have an entirely different method of dealing with stress: they shut down.  My observations have shown that these people become overwhelmed with the pressure of the situation almost immediately and begin finding ways to cope through indirect action.  Some begin blaming the problem on someone or something else.  Rather than search out the source of the trouble, they try to pin it on someone other than them, maybe in the hopes they won’t have to deal with it.  These people begin to withdraw into their own world.  They sit down and stare off into space.  They become quiet.  Some of them even break down and start to cry (yes, I’ve seen that happen before).  Until the initial shock of the situation has passed, they find themselves incapable of rendering any kind of assistance.

How do we as IT professionals deal with these two disparate types of panic modes?  You need to work out how to do that now so that you don’t have to come up with things on the fly when the core switches are dropping packets and the CxOs are screaming for heads, which is funny that the second category of blamers and inaction people always seem to be in management.

For people like me, the “doers”, we need to be doing something that can impact the problem.  No busy work, no research.  We need to be attacking things head-on.  Any second we aren’t in attack mode compounds the stress we’re under.  Even if we try a hundred things and ninety nine of them fail, we have to try to keep from going crazy.  Think of these “doers” like a wind-up toy: get us working on something and let us go.  You might not want to be around us while we’re working, lest you want some curt answers followed by looks of distaste when we have to stop and explain what we’re doing.  We’ll share…when we’re done.

For the other type of people, those that have a stress-induced Blue Screen of Death (BSoD), I’ve found that you have to do something to get them out of their initial funk.  Sometimes, this involves busy work.  Have them research the problem.  Have them go get coffee.  In most cases, have them do something other than be around you while you’re troubleshooting.  Once you can get them past the blame/sulk/cry state, they can become a useful resource for whatever needs to happen to get the problem solved.  Usually, they come back to me later and thank me for letting them help.  Of course, they also usually tell me I was a bit of an ass and should really be nicer when I’m in panic mode.  Oh well…

Tom’s Take

I don’t count on anyone in a stressful situation that isn’t me.  Most often, I don’t have the luxury of time to figure out how a person is going to react.  If you can help me I’ll get you doing something useful.  If not, I’m going to ignore or marginalize you until the problem is fixed.  Over the last couple of years, though, I’ve found that I really need to start working with every different group to ensure that communications are kept alive during stressful situations and no one’s feelings get hurt (even though I don’t normally care).  By consciously realizing that people generally fall into the “doer” or “BSoD” category, I can better plan for ways to utilize them when the time comes and make sure that the only thing going CRUNCH at crunch time is the problem.  And not someone’s head.

Forest for the Trees

If you work in data center networking today, you are probably being bombarded from all sides by vendors pitching their new fabric solutions.  Every major vendor from Cisco to Juniper to Brocade has some sort of new solution that allows you to flatten your data center network and push a collapsed control plane all the way to the edges of your network.  However, the more I look at it, the more it appears to me that we’re looking at a new spin on an old issue.

Chassis switches are a common sight for high-density network deployments.  They contain multiple interfaces bundled into line cards that are all interconnected via a hardware backplane (or fabric).  There is usually one or more intelligent pieces running a control plane and making higher level decisions (usually called a director or a supervisor).  This is the basic idea behind the switch architecture that has been driving networking for a long time now.  A while back, Denton Gentry wrote a very interesting post about the reasoning behind vendors supporting chassis-based networking the way they do.  By having a point of presence in your networking racks that provides an interface that can only be populated by hardware purchased from a vendor that you bought the enclosure from, they can continue to count on you as a customer until you grow tired enough to rip the whole thing out and start all over again with Vendor B.  Innovation does come and it allows you to upgrade your existing infrastructure over and over again with new line cards and director hardware.  However, you can’t just hop over to Vendor C’s website and buy and new module and just plug it into your Vendor A chassis.  That’s what we call “lock-in”.  Not surprisingly, this idea soon found its way into the halls of IBM, HP, and Sun to live on as the blade server enclosure.  Same principle, only revolving around the hardware that plugs into your network rather than being the network itself.  Chassis-based networking and server hardware makes a fortune for vendors every year due to repeat business.  Hold that thought, we’ll be back to it in just a minute.

Now, every vendor is telling you that data center networking is growing bigger and faster every day.  Your old fashioned equipment is dragging you down and if you want to support new protocols like TrILL and 40Gig/100Gig Ethernet, you’re going to have to upgrade.  Rest assured though, because we will interoperate with the other vendors out there to keep you from spending tons of money to rip out your old network and replace it with ours.  We aren’t proprietary.  Once you get our solution up and running, everything will be wine and roses.  Promise.  I may be over selling the rosy side here, but the general message is that interoperability is king in the new fabric solutions.  No matter what you’ve got in your network right now, we’ll work with it.

Now, if you’re a customer looking at this, I’ve got a couple questions for you to ask.  First, which port do I plug my Catalyst 4507 into on the QFabric Interconnect?  What is the command to bring up an IRF instance on my QFX3500?  Where should I put my HP12500 in my FabricPath deployment?  Odds are good, you’re going to be met with looks of shock and incredulence.  Turns out, interoperability in a fabric deployment doesn’t work quite like that.

I’m going to single out Juniper here and their QFabric solution not because I dislike them.  I’m going to do it because their solution most resembles something we already are familiar with – the chassis switch.  The QFX3500 QFabric end node switch is most like a line card where your devices plug in.  These are connected to QFX3008 QFabric Interconnect Switches that provide a backplane (or fabric) to ensure packets are forwarded at high speeds to their destinations.  There is also a supervisor on the deployment providing control plane and higher-level functions, in this case referred to as the QF/Director.  Sound familiar?  It should.  QFabric (and FabricPath and others) look just like exploded chassis switches.  Rather than being constrained to a single enclosure, the wizards at these vendors have pulled all the pieces out and spread them over the data center into multiple racks.

Juniper must get asked about QFabric and whether or not it’s proprietary a lot, because Abner Germanow wrote an article entitled “Is QFabric Proprietary?” where he says this:

Fact: A QFabric switch is no more or less proprietary than any Ethernet chassis switch on the market today.

He’s right, of course.  QFabric looks just like a really big chassis switch and behaves like one.  And, just like Denton’s blog post above, it’s going to be sold like one.

Now, instead of having a chassis welded to one rack in your data center, I can conceivably have one welded to every rack in your data center.  By putting a QFX3500/Nexus 5000 switch in the top of every rack and connecting it to QFabric/FabricPath, I provide high speed connectivity over a stretched out backplane that can run to every rack you have.  Think of it like an interstate highway system in the US, high speed roads that allow you to traverse between major destinations quickly.  So long as you are going somewhere that is connected via interstate, it’s a quick and easy trip.

What about interoperability?  It’s still there.  You just have to make a concession or two.  QFabric end nodes connect to the QF/Interconnects via 40Gbps connections.  They aren’t Ethernet, but they push packets all the same.  Since they aren’t standard Ethernet, you can only plug in devices that speak QFabric (right now, the QFX3500).  If you want to interconnect to a Nexus FabricPath deployment or a Brocade VCS cluster, you’re going to have to step down and use slower standardized connectivity, such as 10Gbps Ethernet.  Even if you bundle them into port channels, you’re going to take a performance hit for switching traffic off of your fabric.  That’s like exiting the interstate system and taking a two-lane highway.  You’re still going to get to your destination, it’s just going to take a little longer.  And if there’s a lot of traffic on that two-lane road, be prepared to wait.

Interoperability only exists insofar as to provide a bridge to your existing equipment.  In effect, you are creating islands of vendor solutions in the Ocean of Interoperability.  Once you install VCS/FabricPath/QFabric and see how effectively you can move traffic between two points, you’re going to start wanting to put more of it in.  When you go to turn up that new rack or deployment, you’ll buy the fabric solution before looking at other alternatives since you already have all the pieces in place.  Pretty soon, you’re going to start removing the old vendor’s equipment and putting in the new fabric hotness.  Working well with others only comes up when you mention that you’ve already got something in place.  If this was a greenfield data center deployment, vendors would be falling all over you to put their solution in place tomorrow.


Tom’s Take

Again, I’m not specifically picking on Juniper in this post.  Every vendor is guilty of the “interoperability” game (yes, the quotes are important).  Abner’s post just got my wheels spinning about the whole thing.  He’s right though.  QFabric is no more proprietary than a Catalyst 6500 or other chassis switches.  It all greatly depends on your point of view.  Being proprietary isn’t a bad thing.  Using your own technology allows you to make things work the way you want without worrying about other extraneous pieces or parts.  The key is making sure everyone knows which pieces only work with your stuff and which pieces work with other people’s stuff.

Until a technology like OpenFlow comes fully into its own and provides a standardized method for creating these large fabrics that can interconnect everything from a single rack to a whole building, we’re going to be using this generation of QFabric/FabricPath/VCS.  The key is making sure to do the research and keep and eye out for the trees so you know when you’ve wandered into the forest.

I’d like to thank Denton Gentry and Abner Germanow for giving me the ideas for this post, as well as Ivan Pepelnjak for his great QFabric dissection that helped me sort out some technical details.

Mobile TFTP – Review

If you work with networking devices, you know a little something about Trivial File Transfer Protocol (TFTP).  TFTP allows network rock stars to transfer files back and forth from switches and routers to central locations, such as a laptop or configuration archive.  TFTP servers are a necessary thing to have for any serious network professional.  I’ve talked about a couple that I use before in this post but I’ve started finding myself using my iDevices more and more for simple configuration tasks.  Needless to say, having my favorite server on my iPad didn’t look like a realistic possibility.

Enter Mobile TFTP.  This is the only app I could find in the App Store for TFTP file transfers.  It’s a fairly simple affair:

You toggle the server on and join your iDevice to a local wireless network.  I didn’t test whether the app would launch on 3G connection, but suffice it to say that wouldn’t be a workable solution for most people.  The IP address of your device is shown so you can start copying files over to it.  The most popular suggested use for this app is to archive configurations to your iDevice.  This is a good idea for those that spend time walking from rack to rack with a console cable trying to capture device configs.  It’s also a great way to have control over your configuration archives, since Mobile TFTP allows you to turn the service on and off as needed rather than keeping a TFTP server running on your network at all times.  As a consultant, this app is wonderful when I need to capture a config without booting my laptop.  Combined with tools like GetConsole or another SSH client, you can access a device and send the config to your mobile TFTP server without the need to boot up your laptop.

I did attempt to copy some larger files up to the device, but those results weren’t as spectacular.  Mobile TFTP Server will support files up to 32MB, so larger IOS files and WLAN controller files are out. The transfer rates from an iPhone or iPad aren’t as spectacular as a hardwired connection, but I think this is more of the platform and less of the software.  The only real complaint that I have is that the files you copy to the device are stuck inside the app.  Sure, you can hook your iDevice up to your laptop at the end of the day and copy the files out of the app inside iTunes (which is also a great way to preload skeleton configs up front), but in today’s world integration is the name of the game.  Giving me the option of linking to a storage service like Dropbox would be amazing.  I tend to keep a lot of things in Dropbox, and being able to throw a troublesome router config in there so it would automagically appear on my laptop would be too sweet.  Still, you can’t argue with the efficiency of this little app.  It does exactly what it says and does it well enough that I don’t find myself cursing at it.

Mobile TFTP Server is $3.99 in the App Store, but as it’s the only dedicated TFTP app I could find, I think it’s worth that to someone who spends a lot of time copying files back and forth and loves the portability of their iDevice.

Disclosure

The creator of Mobile TFTP Server provided me with a promo code for the purposes of reviewing this app.  He did not ask for any consideration in the writing of this review, and none was promised.  The opinions and conclusions reached here are mine and mine alone.