Cisco Data Center – Network Field Day 2

Our first presenters up on the block for day 1 of Network Field Day 2 were from the Cisco Data Center team. We arrived at Building Seven at the mothership and walked into the Cisco Cloud Innovation Center (CCIC). We were greeted by Omar Sultan and found our seats in the briefing center. Omar gave us a brief introduction, followed by Ron Fuller, Cisco’s TME for Nexus switching. In fact, he may have written a book about it.  Ron gave us an overview of the Nexus line, as well as some recent additions in the form of line cards and OS updates. His slides seemed somewhat familiar by now, but having Ron explain them was great for us as we could ask questions about reference designs and capabilities. There were even hints about things like PONG, a layer 2 traceroute across the fabric. I am very interested to hear a little more about this particular little enhancement.

Next up were representatives from two of Cisco’s recent acquisitions related to cloud-based services, Linesider and NewScale (Cisco Cloud Portal). They started out their presentations similarly, reminding us of “The Problem”:

The Probelm

The Problem - courtesy of Tony Bourke

You might want to bookmark this picture, I’m going to be referring to it a lot. The cloud guys were the first example of a familiar theme from day 1 – Defining the Problem. It seems that all cloud service providers feel the need to spend the beginning of the presentation telling us what’s wrong. As I tweeted the next day, I’m pretty sick of seeing that same old drawing over and over again. By now, I think we have managed to identify the problem down to the DNA level. There is very little we don’t know about the desire to automate provisioning of services in the cloud so customers in a multi-tenancy environment can seamlessly click their way into building application infrastructure. That being said, with the brain trust represented by the Network Field Day 2 delegates, it should be assumed that we already know the problem. What needs to happen is that the presenters need to show us how they plan to address it. In fact, over the course of Network Field Day, we found that the vendors have identified the problem to death, but their methods of approaching the solution are all quite different. I don’t mean to pick on Cisco here, but they were first up and did get “The Problem” ball rolling, so I wanted to set the stage for later discussion. Once we got to the Cisco IT Elastic Infrastructure Services (CITEIS) demo, however, the ability to quickly create a backend application infrastructure was rather impressive. I’m sure for providers in a multitenancy environment that will be a huge help going forward to reduce the need to have staff sitting around just to spin up new resource pools. If this whole process can truly be automated as outlined by Cisco, the factor by which we can scale infrastructure will greatly increase.

After the cloud discussion, Cisco brought Prashant Ghandi from the Nexus 1000V virtual switching line to talk to us. Ivan Pepelnjak and Greg Ferro perked up and started asking some really good questions during the discussion of the capabilities of the 1000V and what kinds of things were going to be considered in the future. We ended up running a bit long on the presentation which set the whole day back a bit, but the ability to ask questions of key people involved in virtual switching infrastructure is a rare treat that shoud be taken advantage of whenever possible.

<<EDIT>>

Now with video goodness! Part 1: Nexus

Part 2: Cloud Orchestration and Automation

Part 3: Nexus 1000V


Tom’s Take

I must say that Cisco didn’t really bring us much more than what we’ve already seen. Maybe that’s a big credit to Cisco for putting so much information out there for everyone to digest when it comes to data center networking. Much like their presentation at Wireless Field Day, Cisco spends the beginning and end of the presentation reviewing things with us that we’re familiar with allowing for Q&A time with the key engineers. The middle is reserved for new technology discussion that may not be immediately relevant but represents product direction for what Cisco sees as important in the coming months. It’s a good formula, but when it comes to Tech Field Day, I would rather Cisco take a chance to let us poke around on new gear or let us ask the really good questions about what kind of things Cisco sees coming down the road.


Tech Field Day Disclaimer

Cisco was a sponsor of Network Field Day 2, as as such was responsible for paying a portion of my travel and lodging fees. At no time did Cisco ask for, nor where they promised any kind of consideration in the drafting of this review. The analysis and opinions herein are mine and mine alone.

Why I Hate The Term “Cloud”

Now that it has been determined that I Am The Cloud, I find it somewhat ironic that I don’t care much for that word. Sadly, while there really isn’t a better term out there to describe the mix of services being offered to abstract data storage and automation, it sill irritates me.

I draw a lot of network diagrams. I drag and drop switches and routers hither and yon. However, when I come to something that is outside my zone of control, whether it be a representation of the Internet, the Public Switched Telephone Network (PSTN) or private ISP connection circuit, how do I represent that in my drawing? That’s right, with a puffy little cloud. I use clouds when I have to show something that I don’t control. When there are a large number of devices beyond my control I wrap them in the wispy borders of the cloud.

So when I’m talking about creating a cloud inside my network, I feel uneasy calling it that.  Why? Because these things aren’t unknown to me. I created the server farms and the switch clusters to connect my users to their data. I build the pathways between storage arrays and campus LAN. There is nothing unknown to me. I’m proud of what I’ve built. Why would I hide it in the clouds? I’d rather show you what the infrastructure looks like.

To the users though, it really is a cloud. It’s a mystical thing sitting out there that takes my data and gives me back output. Users don’t care about TRILL connections or Fibre Channel over Ethernet (FCoE). If I told them their email got ferried around on the backs of unicorns, they’d probably belive me. They don’t really want to know what’s inside the cloud, so long as they can still get to what they want. In fact, when users want to create something cloud-like today without automation and provisioning servers in place, they’re likely to come ask me to do it. Hence the reason why I’m The Cloud.

I Am The Cloud

I am the cloud.

When users see into the dark place they dare not look, they will find me staring back at them. I am the infrastructure. I am the platform. I am the service. I provision. I hypervise. The unknown is known to me. Commodity IT is my currency. I am public. I am private. When users want a new resource, I am the one click they make. When the magic happens, it is because of me. I scale infinitely. My throughput is legendary.

I am the cloud.

Info about Open Flow

I will be attending the Packet Pushers OpenFlow Symposium at Network Field Day 2 next week in San Jose, CA.  OpenFlow is a disruptive technology that looks to change the way the many of us think about network traffic and how flows are distributed.  It’s still very early in the development phase, but thanks to Ethan Banks and Greg Ferro I’m going to get the change to listen to companies like Google and Yahoo talk about how they are using OpenFlow as well as hearing from network Vendors current supporting OpenFlow initiatives, like NEC, Juniper, and Big Switch Networks.

If you would like to brush up on some OpenFlow topics ahead of the symposium on Wednesday, October 26th, here are some great links to information about the ins and outs of OpenFlow:

Packet Pushers Show 68: Practical Introduction and Application of Ope Flow Networking – Watch this one first.  Greg really breaks down what OpenFlow is what it’s capable of.

Big Switch Network, OpenFlow, and Virtual NetworkingDerick Winkworth has done a great job at the Packet Pushers blog site going into depth about OpenFlow.  He’s an evangelist and has a lot of hope for what OpenFlow can do.  All of his articles are great, but this one in particular shows how one vendor is using OpenFlow.

IOS Hints Open Flow Posts – I’m just going to go ahead and link to the entire list of Ivan Pepelnjak’s OpenFlow posts.  He plays more of the realist and does a great job of digging deep into the current state of OpenFlow.  He’s also quick to keep us grounding in the fact that OpenFlow is still very young and has lots of potential if it ever takes off.  Worth a read after you’ve caught up on what OpenFlow is from the above sources.

If you have any questions about OpenFlow that you would like asked at the symposium, feel free to leave them in the comments and I’ll try to bring them up to the panel.  I look forward to attending this great event and learning more about the future of networking.

The Wild Wild Campus

The venerable Catalyst 6500 is a switching platform that has been around for several presidential administrations.  It’s a workhorse that has helped provide connectivity for what we now refer to as the campus as well as providing a solution for the data center.  However, like star athlete in it’s waning years, it’s getting old.  Ethan has decided that the 6500 is past its prime in the campus core.  Others beg to differ.  Who’s right?

I think the issue here isn’t one of the switch being past it’s prime compared to the new products on the horizon.  Sure, the next up-and-coming sheriff is going to eclipse the veteran sitting on his horse.  The question comes down to being properly suited for the role. I think what we need to consider is that the campus userland is a totally different area than the pristine beauty of a data center.

In a data center, I have total control over what happens in my space.  I know every server and and connection.  I have everything documented and categorized.  Nothing happens without my notice or approval.  It’s a very regimented structure that keeps the critical infrastructure running while minimizing surprises.  When I have complete control over my environment, I can contemplate ideas like turning off Spanning-Tree Protocol (STP) to increase performance or disabling Port Security to prevent MAC issues with multihomed servers.  Because I can say with reliability that I know where everything is connected, I can start looking at ways to make it all run as fast as possible.  This would be like a new lawman coming into town and instituting his brand of justice learned from the Army.  Very tight, very regimented.  But based totally on rules that are very unlike the real world.

In the campus LAN, however, things begin to look more like the wild west.  At the access layer closest to the users, it’s not uncommon to see a whole host of protection mechanisms designed to prevent catastrophic network disaster from propagating to the core of your campus and then on to the data center.  Turn off STP in the access layer?  That’s a resume-generating event.  Disable port security?  Okay, but you better be ready for the onslaught of garbage.  Campus LANs aren’t the structured beauty of your data center.  At best, they are the Shootout at the OK Corral.  We employ host protection and QoS mechanisms to be sure that those gunslinger users can’t affect more than their own little domain.  No bulky FTP transfers killing the phone system.  No renegade switches being placed under desks and affecting STP paths.

To me, the distinction comes from the fact that the Nexus line of switches that we put in the data center is focused on that structured environment.  Since NX-OS is a fork of Cisco’s SAN OS, it is focused on providing connectivity among servers and storage arrays in that carefully cultivated environment.  Put one of these out in the campus core and you might find that connectivity is blazingly fast…right up to the point where someone creates a broadcast storm.  I’m sure there are mechanisms in place to prevent these kinds of things.  I just don’t know if they are as tested as the ones in the granddaddy 6500.  The 6500 also comes with a variety of service module options to help alleviate issues, such as the Firewall Service Module (FWSM) and Network Analysis Module (NAM), not to mention the wireless connectivity options afforded by a WiSM.

Ethan (and others) point out that the 6500 is reaching the end of its ability to keep up with faster and faster connectivity options.  The new Sup-2T now has the ability to introduce more 10 Gigabit ports on a linecard to aggregate links.  The Nexus line has a laundry list of 10 Gigabit connectivity options, not to mention options for 40 Gigabit and 100 Gigabit Ethernet.  Rumor has it a 40 Gigabit option will be available for the 6500 at some point, but it will likely be limited for a while due to backplane considerations (as well as requiring a new Supervisor engine).  So where does that leave the 6500?

I think what will end up happening soon is that the 6500 will become less of a campus core switch and move down into the distribution layer, perhaps even all the way to the access layer.  The 6500 still has the tools that an old sheriff needs to keep the peace in Userland.  With 10Gig and 40Gig connectivity, it can provide a fast backhaul to the distribution layer if used as an access layer device.  If it lies in the distribution layer, the ability to aggregate 10Gig links coming from the access layer is very crucial to users as the majority of traffic begins to move into the data center for things like Virtual Desktop Integration (VDI) and other heavy traffic loads.  Add in the ability of the 6500 to make intelligent decisions via service modules and you have a great device to offload complicated decision making from a core switch and allow the core to switch/route packets at high speed wherever they need to go.  This could allow you to use the new Nexus 7009 or Nexus 5500 series in the campus core and extend FabricPath/TRILL connections into the campus LAN.  That will allow the 6500 to live on providing things the Nexus can’t right now, like PoE+ and Low Latency Queuing (LLQ) which are critical to voice guys like me.

Before you say that putting a 6500 at the access layer is a mighty expensive proposition, just think about what ripping out your existing access layer to provide 10Gig uplink connectivity and Gigabit-to-the-desktop will run.  Now, add in redundancy for power and uplinks.  Odds are good the numbers are starting to get up there.  Now, think about the investment of reusing a good platform like the 6500.  You’ve already invested in supervisors and power redundancy.  If you can pick up some 10/100/1000 PoE linecards to fill it up, you have a great way to aggregate wiring closets and begin to deploy important network services closer to the edge to prevent those outlaws from rustling your precious data center network.


Tom’s Take

Any time the idea of the 6500 is brought up, you’re going to see polarization.  Some will stand by their old sheriff, confident in the fact that he’s seen it all and done it all.  Sure, he isn’t the fastest gun in the West anymore.  He doesn’t need to be.  He’s still got the smarts to outfox any outlaw.  The other camp decries that fact that the 6500 has been around before the turn of the millennium.  What they want is to put the fancy city slicker Nexus everywhere and show everyone his fancy brand of the law.  I think in this case the Catalyst still has a lot of life left to provide connectivity and services to the end users while the back end of the campus transitions to the newer high-speed platforms.  I doubt 10Gig-to-the-desktop is coming any time soon, based on Cat 6E/7 cable costs and inability to run fiber on a laptop.  That will eliminate the major points in favor of a total campus Nexus deployment.  Despite what others may say, I think the 6500 is a perfect option here, especially with the Sup-2T and newer line cards.  Just because it’s a little long in the tooth doesn’t mean it doesn’t know how the Wild Wild Campus was won.

My Way or the Highway

If Twitter is good for nothing else, it can generate a lot of great entertainment.  Last Friday, when everyone was unwrapping a new Fruit Company Mobile Telephony Device, a back-and-forth erupted on Twitter between Christofer Hoff (@Beaker) and Brad Hedlund (@BradHedlund).  The nature of the discussion started as to whether VXLAN and NVGRE were “standards” or just experimental drafts.  Tony Bourke (@TBourke) also chimed in toward the end in regards to proprietary implementations in new data center switching fabrics.  As I’ve written a little about the fabric conundrum before, I spent some time thinking about proprietary-ness in general where it applies in the data center.  Besides, I had to wait to setup my mobile telephony device, so I had the free time.

To directly answer Tony’s question here, I’m going to have to side with Beaker on this one.  Proprietary is very cut and dried.  You are either entirely open and run the same implementation with everyone, like OSPF or IS-IS, or you are closed and only play well with your own kind, like EIGRP or IRF.  I’ve always been of the mind that proprietary implementations of things aren’t necessarily a bad thing.  When you are starting out creating something from scratch or attacking a difficult problem you don’t often have time to think about whether or not the solution will be open and documented for future generations.  This is typified by the idea, “Just make it work.  I’ll figure out the details later.”  All well and good for someone that is creating parts for a hand-made bookcase or designing a specialized tool.  Not so great for those that come after you and have to make your new proprietary widget behave with the rest of society.

Standardization is what you get when someone comes in behind you and tries to figure out what you’ve done.  Reverse engineering or disassembly help people figure out what you were thinking.  Then we take what we’ve learned and try to make that repeatable.  We write rules and guidelines that govern the behavior of the standard.  This many inches, this voltage input, and so on.  The rules help make sure everyone plays nice with each other.

Standards are important for the sake of predictability.  Electrical sockets, shoe sizes, and vehicle tire sizes are a great example of why standardization is important.  Proprietary tire sizes at best mean that you don’t have much choice in where you buy your product.  At worst, you end up with the problems faced in the American Civil War, where the Confederacy was littered with railroad tracks of varying gauge size, which led to an inability to effectively move troops and supplies around the war zone.  Not that a routing protocol has anything to do with a bloody civil war, but it does illustrate the point.

An Example

In today’s data center, vendors are starting to produce solutions that are more and more proprietary over a larger area.  Tony was right in that scale matters.  Only it wasn’t in regards to how proprietary a solution is.  Instead, think of it in terms of the effect it has on your network.  Today, proprietary interfaces between line cards and backplanes mean you can’t put a Brocade card in an HP switch.  Previously, our largest scale example was running a routing protocol like EIGRP in the network.  If you wanted to put a non-Cisco router somewhere, you were forced to interoperate by adding static routes pointing to your device at the edge of EIGRP or by running a routing protocol like OSPF and doing redistribution.  I think the routing protocol example is a great way to illustrate the growing data center fabric trend toward vendor happiness.  I’m going to pick on EIGRP here since it is the most obvious example of a proprietary protocol.

If you’ve only ever bought Cisco equipment in your network and need to run a routing protocol, EIGRP is an easy choice.  It’s simple to configure and runs with little intervention.  Kind of “set it and forget it” operation.  That doesn’t change so long as everything is Cisco.  The first Brocade switch you buy, however, is going to cause some issues.  If you want to simply send traffic to that network, you can plug in static routes.  Not an elegant or scalable solution, but it does work.  If you want to address scalability, you can look at reconfiguring your network to use the standard OSPF routing protocol.  That could take days or weeks to accomplish.   You would also have to learn a new protocol.  The investment of time to be standardized would probably be insurmountable.  The last option is to interoperate via redistribution.  By running OSPF on your Brocade switch and redistributing it into EIGRP you can achieve a stable network topology.  You lose EIGRP-specific benefits when you move into the OSPF area (and vice versa) but everything works.  Depending on your level of unease with this solution, you might even be tempted to avoid buying the Brocade switch and just stick with Cisco.  That way, all you have to do is turn up EIGRP and keep on running.

In the same way in a data center, once a large fabric is implemented, you have specific benefits that a particular vendor’s solution provides.  When you want to make your fabric talk to another fabric, you have to strip out the extra “goodies” and send that packet over to the other side.  Much like creating an ASBR in OSPF, this isn’t necessarily a fun job.  You’re going to lose functionality when that transition is made.  Maybe the packets are switched a fraction of a second slower.  Maybe the routing lookups take half a second more.  The idea is that you will look at the complexity of interoperating and decide it’s not worth the hassle.  Instead, you’ll just keep buying the vendor’s fabric solutions no matter how proprietary they may be.  Because you want things to just work.  There’s nothing malicious or vindictive on the vendor’s part.  They are selling the best product they can.  They are offering you the option of interoperating.

Remember that interoperating isn’t like running a standard protocol across your network like OSPF.  It’s consciously deciding to create a point of demarcation for the purposes of making nice with someone else’s proprietary ideas.  It’s messy to setup and a pain to manage.  But it’s not vendor lock-in, right?  So we learn to live with large-scale proprietary-ness with small connections of interoperability for the sake of simplicity.


Tom’s Take

Having something be proprietary isn’t bad.  Think of a finely tailored suit.  Totally custom made and well worth it.  The key is to understand what being proprietary means to you and your network.  You must realize that you are going to commit extra resources one way or another.  You commit them in the form of capital by being beholden to a specific vendor for their equipment that works best with it’s own kind.  The other method is to save capital resources and instead expend time and effort making all the solutions work together for the good of the data center.  Either way, there is an opportunity cost associated.  Some Network Rock Stars are perfectly happy buying everything from Vendor X.  Others chafe at the idea of being indentured to any one vendor and would rather spend their time tweaking knobs and switches to make everything run as efficiently as possible despite being heterogeneous.  One isn’t necessarily better than the other.  They key is to recognize which is better for you and your network and act accordingly.

Bogon Poetry

I was thinking the other day that I’ve used the term bogon in several Packet Pushers podcasts and never really bothered to define it for my readers.  Sure, you could go out and search on the Internet.  But you’ve got me for that!

Bogon is a term used in networking to describe a “bogus address”.  According to Wikipedia, Fount of All Knowledge, the term originated from a hacker reference to a single unit of bogosity, which is the property of being bogus.  I personally like to think of it as standing for BOGus Network (forgive my spelling).  Not that this refers to undesirable packets, which is not to be confused with vogon, which is a class of undesirable bureaucrats that run the galaxy or a bogan, which is an undesirable class of socioeconomics in Australia (if you’re American, think “redneck” or “white trash”).

Bogons are addresses that should never be seen as the source of packets that are entering your network.  The most stable class of bogon isn’t actually a bogon.  It’s a martian, so called because they look like they are coming from Mars, which is a place packets clearly cannot be sourced from…yet.  Martians include any address space that is listed as reserved by RFC1918 or RFC5735.  It’s a pretty comprehensive list, especially in RFC5735 so take few moments to familiarize yourself with it.  You’ll see the majority of private networks along with APIPA addressing and a few lesser-known examples of bogus networks as well.

The other component of a bogon is an address that shouldn’t exist on the public Internet.  Beyond the aforementioned Martians, the only other bogons should be IP blocks that haven’t yet been allocated by IANA to the RIRs.  However, that list should be almost empty right now, as IANA has exhausted all its available address space and given it over to the 5 RIRs.  The folks over at Team Cymru (that prononunced kum-ree for those not fortunate to be fluent in Welsh) have put together a list of what they call “fullbogons” which lists the prefixes assigned to RIRs but not yet handed out to ISPs for consumption by customers.  Traffic being sourced from this range should be treated as dubious until the range is allocated by the RIR.  The fullbogon list is updated very frequently as the hungry, hungry Internet gobbles up more and more prefixes, so if you are going to use it please stay on top of it.

How Do I Use Bogons?

My preferred method of using a bogon list is in an edge-router access list (ACL) designed to filter traffic before it ever lands on my network.  By putting the ACL on the very edge of the network, the traffic never gets the chance to hop to my firewall for evaluation.  I’d prefer to save every spare CPU cycle I could on that puppy.  My access list looks something like this (taken from Team Cymru’s bogon list today):

!
access-list 1 deny 0.0.0.0 0.255.255.255
access-list 1 deny 10.0.0.0 0.255.255.255
access-list 1 deny 127.0.0.0 0.255.255.255
access-list 1 deny 169.254.0.0 0.0.255.255
access-list 1 deny 172.16.0.0 0.15.255.255
access-list 1 deny 192.0.0.0 0.0.0.255
access-list 1 deny 192.0.2.0 0.0.0.255
access-list 1 deny 192.168.0.0 0.0.255.255
access-list 1 deny 198.18.0.0 0.1.255.255
access-list 1 deny 198.51.100.0 0.0.0.255
access-list 1 deny 203.0.113.0 0.0.0.255
access-list 1 deny 224.0.0.0 15.255.255.255
access-list 1 deny 240.0.0.0 15.255.255.255
access-list 1 permit any
!
!
interface FastEthernet 0/0
description Internet_Facing
ip access-group 1 in
!

That should wipe out all the evil bogons and martians try to invade your network.  If you want to use the fullbogon list, obviously your ACL would be considerably longer and need to be updated more frequently.  The above list is just the basic bogon/martian detection and should serve you well.

Tom’s Take

Blocking these spoofed networks before they can make it to you is a huge help in preventing attacks and spurious traffic from overwhelming you as a Network Rock Star.  Every little bit helps today with all of the reliance on the Internet, especially as we start moving toward…The Cloud.  If you sit down and block just the regular bogon list I’ve outlined above, you can block up to 60% (Warning: Powerpoint) of the obviously bad stuff trying to get to your network.  That should be a big relief to you and let you have a few minutes of free time to take up a new hobby, like poetry.

Thanks to Team Cymru for all the information and stats.  Head over to http://www.team-cymru.org to learn more about all those nasty bogons and how to stop them.