Building A Lego Data Center Juniper Style

JDC-BirdsEye

I think I’ve been intrigued by building with Lego sets as far back as I could remember.  I had a plastic case full of them that I would use to build spaceships and castles day in and day out.  I think much of that building experience paid off when I walked into the real world and I started building data centers.  Racks and rails are network engineering versions of the venerable Lego brick.  Little did I know what would happen later.

Ashton Bothman (@ABothman) is a social media rock star for Juniper Networks.  She emailed me and asked me if I would like to participate in a contest to build a data center from Lego bricks.  You could imagine my response:

YES!!!!!!!!!!!!!

I like the fact that Ashton sent me a bunch of good old fashioned Lego bricks.  One of the things that has bugged me a bit since the new licensed sets came out has been the reliance on specialized pieces.  Real Lego means using the same bricks for everything, not custom-molded pieces.  Ashton did it right by me.

Here’s a few of my favorite shots of my Juniper Lego data center:

My rack setup.  I even labeled some of the devices!

My rack setup. I even labeled some of the devices!

Ladder racks for my Lego cables.  I like things clean.

Ladder racks for my Lego cables. I like things clean.

Can't have a data center with a generator.  Complete with flashing lights.

Can’t have a data center with a generator. Complete with flashing lights.

The Big Red Button.  EPO is a siren call for troublemakers.

The Big Red Button. EPO is a siren call for troublemakers.

The Token Unix Guy.  Complete with beard and old workstation.

The Token Unix Guy. Complete with beard and old workstation.

Storage lockers and a fire extinguisher.  I didn't have enough bricks for a halon system.

Storage lockers and a fire extinguisher. I didn’t have enough bricks for a halon system.

The Obligatory Logo Shot.  Just for Ashton.

The Obligatory Logo Shot. Just for Ashton.


Tom’s Take

This was fun.  It’s also for a great cause in the end.  My son has already been eyeing this set and he helped a bit in the placement of the pirate DC admin and the lights on the server racks.  He wanted to put some ninjas in the data center when I asked him what else was needed.  Maybe he’s got a future in IT after all.

JDC-Overview

Here are some more Lego data centers from other contest participants:

Ivan Pepelnjak’s Lego Data Center

Stephen Foskett’s Datacenter History: Through The Ages in Lego

Amy Arnold’s You Built a Data Center?  Out Of A DeLorean?

SpectraLogic: Who Wants To Save Forever?

spectra-logic-logo

Data retention is a huge deal for many companies.  When you say “tape backup”, the first thing that leaps to people’s minds is backup operations.  Servers with Digital Audio Tape (DAT) drives or newer Linear-Tape Open (LTO) units.  Judiciously saving those bits for the future when you might just need to dig up one or two in order to recover emails or databases.  After visiting with SpectraLogic at their 2013 Spectra Summit, I’m starting to see that tape isn’t just for saving the day.  It’s for saving everything.

Let’s Go To The Tape

Tape is cheap.  As outlined in this Computer World article, for small applications of less than 6 tape drives, tape is 1/6th the cost of disk backup.  It also lasts virtually forever.  I’ve still got VHS tapes from the 80s that I can watch if I so desire.  And that’s consumer grade magnetic media.  Imagine how well enterprise grade stuff would work?  It’s also portable.  You can eject a tape and take it home on the weekends as a form of disaster recovery.  If you have at least one tape offsite in the grandfather-father-son rotation, you can be assured of getting at least some of your data back in the event of a disaster.

Tape has drawbacks.  It’s slow.  Really slow.  The sequential access of tape drives makes them inefficient as a storage medium.  You can batch writes to a cluster of drives, but good luck if you ever want to get that data back in a reasonable time frame.  I once heard someone refer to tape as “Write Once, Read Never”.  It also has trouble scaling very large.  In the end, you need to cluster several tape units together in order to achieve the kind of scale that you need to capture data from the the virtual firehose today.

Go Deeper

T-Finity.  Photo by Stephen Foskett

T-Finity. Photo by Stephen Foskett

SpectraLogic launched a product called DeepStorage.  That is in no way affiliated with Howard Marks (@DeepStorageNet).  DeepStorage is the idea that you can save files forever.  It uses a product called BlackPearl to eliminate one of the biggest issues with tape: speed.  BlackPearl comes with SSD drives to use as a write cache for data being sent to the tape archive.  BlackPearl uses a SpectraLogic protocol called DS3, which stands for DeepS3, to hold the data until it can be written to the tape archive in the most efficient manner.  DS3 looks a lot like Amazon S3.  That’s on purpose.  With the industry as a whole moving toward RESTful APIs and more web interfaces, making a RESTful API for tape storage seems like a great fit for SpectraLogic.

It’s goes a little deeper than that, though (pardon the pun).  One other thing that made me pause was LTFS – the Linear Tape File System.  LTFS allows for a more open environment to write data.  In the past, any data that you backed up to tape left you at the mercy of the software you used to write that data.  CommVault couldn’t read Veritas volumes.  ARCServe didn’t play nicely with Symantec.  With LTFS, you can not only read data from multiple different backup vendors, but you can also stop treating tape drives like Write Once, Read Never devices.  LTFS allows a cluster of tape units to look and act just like a storage array.  A slow array to be sure, but still an array.

SpectraLogic took the ideas behind LTFS and coupled them with DeepStorage to create an idea – “buckets”.  Buckets function just like the buckets you find in Amazon S3.  These are user-defined constructs that hold data.  The BlackPearl caches these buckets and optimizes the writes to your tape array.  Where the bucket metaphor works well is the portability of the bucket.  Let’s say you wanted to transfer long-term data like phone records or legal documents between law firms that are both using DeepStorage.  All you need to do is identify the bucket in question, eject the tape (or tapes) needed to recreate that bucket, and then send the tapes to the destination.  Once there, the storage admin just needs to import the bucket from the tapes in question and all the data in that bucket can be read.  No software version mismatches.  No late night panicked calls because nothing will mount.  Data exchange without hassles.

The Tape Library of Congress

The ideas here boggle the mind.  While at the Spectra Summit, we heard from companies like NASCAR and Yahoo.  They are using BlackPearl and DS3 as a way to store large media files virtually forever.  There’s no reason you can’t do something similar.  I had to babysit a legal server migration one night because it had 480,000 WordPerfect documents that represented their entire case log for the last twenty years.  Why couldn’t that be moved to long-term storage?  For law offices that still have paper records of everything and don’t want to scan it all in for fear of an OCR mistake, why not just make an image of every file and store it on an LTFS volume fronted by DS3?

The flexibility of a RESTful API means that you can created a customized interface virtually on the fly.  Afraid the auditors aren’t going to be able to find data from five years ago?  Make a simple searching interface that is customized to their needs.  Want to do batch processing across multiple units with parallel writes for fault tolerance?  You can program that as well.  With REST calls, anything is possible.

DS3 is going to enable you to keep data forever.  No more worrying about throwing things out.  No need to rent storage lockers for cardboard boxes full of files.  No need to worry about the weather or insects.  Just keeping the data center online is enough to keep your data in a readable format from now until forever.

For more information on SpectraLogic and their solutions, you can find them at http://www.spectralogic.com.  You can also follow them on Twitter as @SpectraLogic.


Disclaimer

I was a guest of SpectraLogic for their 2013 Spectra Summit.  They paid for my flight and lodging during the event.  They also provided a t-shirt, a jacket, and a 2 GB USB drive containing marketing collateral.  They did not ask for any consideration in the writing of this review, nor were they promised any.  The conclusions reach herein are mine and mine alone.  In addition, any errors or omissions are mine as well.

Avaya and the Magic of SPB

Avaya_logo-wpcf_200x57

I was very interested to hear from Avaya at Interop New York.  They were the company I knew the least about.  I knew the most about them from the VoIP side of the house, but they’ve been coming on strong with networking as well.  They are one of the biggest champions of 802.1aq, more commonly known as Shortest Path Bridging (SPB).  You may remember that I wrote a bit about SPB in the past and referred to it as the Betamax of networking fabric technologies.  After this presentation, I may be forced to eat my words to a degree.

Paul Unbehagen really did a great job with this presentation.  There were no slides, but he kept the attention of the crowd.  The whiteboard supported his message.  While informal, there was a lot of learning.  Paul knows SPB.  It’s always great to learn from someone that knows the protocol.

Multicast Magic

One of the things I keyed on during the presentation was the way that SPB deals with multicast.  Multicast is a huge factor in Ethernet today.  So much so that even the cheapest SOHO Ethernet switch has a ton of multicast optimization.  But multicast as implemented in enterprises is painful.  If you want to make an engineer’s blood run cold, walk up and whisper “PIM“.  If you want to watch a nervous breakdown happen in real time, follow that up with “RPF“.

RPF checks in multicast PIM routing are nightmarish.  It would be wonderful to get rid of RPF checks to eliminate any loops in the multicast routing table.  SPB accomplishes that by using a Dijkstra algorithm.  The same algorithm that OSPF and IS-IS use to compute paths.  Considering the heavily roots of IS-IS in SPB, that’s not surprising.  The use of Dijkstra means that additional receivers on a multicast tree don’t negatively effect the performance of path calculation.

I’ve Got My IS-IS On You

In fact, one of the optimized networks that Paul talked about involved surveillance equipment.  Video surveillance units that send via multicast have numerous endpoints and only a couple of receivers on the network.  In other words, the exact opposite problem multicast was designed to solve.  Yet, with SPB you can create multicast distribution networks that allow additional end nodes to attach to a common point rather than talking back to a rendezvous point (RP) and getting the correct tree structure from there.  That means fast convergence and simple node addition.

SPB has other benefits as well.  It supports 16.7 million ISIDs, which are much like VLANs or MPLS tags.  This means that networks can grow past the 4,096 VLAN limitation.  It looks a lot like VxLAN to me.  Except for the reliance on multicast and lack of a working implementation.  SPB allows you to use a locally significant VLAN for a service and then defined an ISID that will transport across the network to be decapsulated on the other side in a totally different VLAN that is attached to the ISID.  That kind of flexibility is key for deployments in existing, non-green field environments.

If you’d like to learn more about Avaya and their SPB technology, you can check them out at http://www.avaya.com.  You can also follow them on Twitter as @Avaya.


Tom’s Take

Paul said that 95% of all SPB implementations are in the enterprise.  That shocked me a bit, as I always thought of SPB as a service provider protocol.  I think the key comes down to something Paul said in the video.  When we are faced with applications or additional complexity today, we tend to just throw more headers at the problem.  We figured that wrapping the whole mess in a new tag or a new tunnel will take care of everything.  At least until it all collapses into a puddle.  Avaya’s approach with SPB was to go back down to the lower layers and change the architecture of things to optimize everything and make it work the right way on all kinds of existing hardware.  To quote Paul, “In the IEEE, we don’t build things for the fun it.”  That means SPB has their feet grounded in the right place.  Considering how difficult things can be in data center networking, that’s magical indeed.

Tech Field Day Disclaimer

Avaya was a presenter at the Tech Field Day Interop Roundtable.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

HP Networking and the Software Defined Store

HP

HP has had a pretty good track record with SDN.  Even if it’s not very well-known.  HP has embraced OpenFlow on a good number of its Procurve switches.  Given the age of these devices, there’s a good chance you can find them laying around in labs or in retired network closets to test with.  But where is that going to lead in the long run?

HP Networking was kind enough to come to Interop New York and participate in a Tech Field Day roundtable.  It had been a while since I talked to their team.  I wanted to see how they were handling the battle being waged between OpenFlow proponents like NEC and Brocade, Cisco and their hardware focus, and VMware with NSX.  Jacob Rapp and Chris Young (@NetManChris) stepped up to the plate to talk about SDN and the vision on HP.

They cover a lot of ground in here.  Probably the most important piece to me is the SDN app store.

The press picked up on this quickly.  HP has an interesting idea here.  I should know.  I mentioned it in passing in an article I wrote a month ago.  The more I think about the app store model, the more I realize that many vendors are going to go down the road.  Just not in the way HP is thinking.

HP wants to curate content for enterprises.  They want to ensure that software works with their controller to be sure that there aren’t any hiccups in implementation.  Given their apparent distaste for open source efforts, it’s safe to say that their efforts will only benefit HP customers.  That’s not to say that those same programs won’t work on other controllers.  So long as they operate according to the guidelines laid down by the Open Networking Foundation, all should be good.

Show Me The Money

Where’s the value then?  That’s in positioning the apps in the store.  Yes, you’re going to have some developers come to HP and want to simple apps to put in the store.  Odds are better that you’re going to see more recognizable vendors coming to the HP SDN store.  People are more likely to buy software from a name they recognize, like TippingPoint or F5.  That means that those companies are going to want to have a prime spot in the store.  HP is going to make something from hosting those folks.

The real revenue doesn’t come from an SMB buying a load balancer once.  It comes from a company offering it as a service with a recurring fee.  The vendor gets a revenue stream. HP would be wise to work out a recurring fee as well.  It won’t be the juicy 30% cut that Apple enjoys from their walled garden, but anything would be great for the bottom line.  Vendors win from additional sales.  Customers win from having curated apps that work every time that are easy to purchase, install, and configure.  HP wins because everyone comes to them.

Fragmentation As A Service

Now that HP has jumped on the idea of an enterprise-focused SDN app store, I wonder which company will be the next to offer one?  I also worry that having multiple app stores won’t end up being cumbersome in the long run.  Small developers won’t like submitting their app to four or five different vendor-affiliated stores.  More likely they’ll resort to releasing code on their own rather than jump through hoops.  That will eventually lead to support fragmentation.  Fragmentation helps no one.


Tom’s Take

HP Networking did a great job showcasing what they’ve been doing in SDN.  It was also nice to hear about their announcements the day before they broke wide to the press.  I think HP is going to do well with OpenFlow on their devices.  Integrating OpenFlow visibility into their management tools is also going to do wonders for people worried about keeping up with all the confusing things that SDN can do to a traditional network.  The app store is a very intriguing concept that bears watching.  We can only hope that it ends up being a well-respect entry in a long line of easing customers into the greater SDN world.

Tech Field Day Disclaimer

HP was a presenter at the Tech Field Day Interop Roundtable.  In addition, they also provided the delegates a 1TB USB3 hard disk drive.  They did not ask for any consideration in the writing of this review nor were they promised any.  The conclusions and analysis contained in this post are mine and mine alone.

Your Data Center Isn’t Facebook And That’s Just Fine

FBLike

While at the Software Defined Data Center Symposium, I had the good fortune to moderate a panel focused on application focused networking in the data center. There were some really smart engineers on that panel. One of the most impressive people was Najam Ahmad from Facebook. He is their Director of Technical Operations. He told me some things about Facebook that made me look at what they are doing a in a new light.

Najam said when I asked him about stakeholder perceptions that he felt a little out of sorts on stage because Ivan Pepelnjak (@IOSHints) and David Cheperdak (@DavidCheperdak) had spent the last fifteen minutes talking about virtual networking. Najam said that he didn’t really know what a hypervisor or a vSwitch were because they don’t run them at Facebook. All of their operating systems and servers run directly on bare metal. That shocked me a bit. Najam said that inserting anything in between the server and what its function was added unnecessary overhead. That’s a pretty unique take on things when you look at how many data centers are driving toward full virtualization.

Old Tools, New Uses

Facebook also runs BGP to the top-of-rack (ToR) switches in their environment. That means that they are doing layer 3 all the way to their access layer. What’s funny is that while BGP in the ToR switches provides for scalability and resiliency, they don’t use BGP as their primary protocol when exchanging routes with providers.  For Facebook, BGP at the edge of doesn’t provide enough control over network egress. They take the information that BGP is providing and they crunch it a bit further before adding that all into a controller-based solution that applies business logic and policies to determine the best solution for a given network scenario.

Najam also said that they had used NetFlow for a while to collect data from their servers in order to build a picture of what was going on inside the network. What they found is that the collectors were becoming overwhelmed by the amount of data that they were being hammered with. So instead of installing bigger, faster collectors the Facebook engineers broke the problem apart by putting a small shim program on every server to collect the data and then forward to a system designed to collect data inputs, not just NetFlow inputs. Najam lovingly called their system “FBFlow”.

I thought about this for a while before having a conversation with Colin McNamara (@ColinMcNamara). He told me that this design was a lot more common than I previously thought and that he had implemented it a few times already. At service providers. That’s when things really hit home for me.

Providing Services

Facebook is doing the same things that you do in your data center today. They’re just doing it at a scale that’s one or two orders of magnitude bigger. The basics are all still there: Facebook pushes packets around a network to feed servers and provide applications for consumption by users. What is so different is that the scale at which Facebook does this begins to look less and less like a traditional data center and more and more like a service provider. After all, they *are* providing a service to their users.

I’ve talked before about how Facebook’s Open Compute Project (OCP) switch wasn’t going to be the death knell for traditional networking. Now you see some of that validated in my opinion. Facebook is building hardware to meet their needs because they are a strange hybrid of data center and service provider. Things that we would do successfully in a 500 VM system don’t scale at all for them. Crazy ideas like running exterior gateway routing protocols on ToR switches work just fine for them because of the scale at which they are operating.

Which brings me to the title of the post. People are always holding Facebook and Google in such high regard for what they are doing in their massive data centers. Those same people want to try to emulate that in their own data centers and often find that it just plain doesn’t work.  It’s the same set of protocols.  Why won’t this work for me?

Facebook is solving problems just like a service provider would.  They are building not for continuous uptime, but instead for detectable failures that are quickly recoverable.  If I told you that your data center was going to be down for ten minutes next month you’d probably be worried.  If I told you that those outages were all going to be one minute long and occur ten times, you’d probably be much less worried.  Service providers try to move around failure instead of pouring money into preventing it in the first place.  That’s the whole reasoning behind Facebook’s “Fail Harder” mentality.

Failing Harder means making big mistakes and catching them before they become real problems.  Little issues tend to get glossed over and forgotten about.  Thing about something like Weighted Random Early Detection (WRED).  WRED works because you can drop a few packets from a TCP session and it will keep chugging and request the missing bits.  If you kill the entire connection or blow up a default gateway then you’ve got a real issue.  WRED fixes a problem, global TCP synchronization, by failing quietly once in a while.  And it works.


Tom’s Take

Instead of comparing your data center to Facebook or Google you should be taking a hard look at what you are actually trying to do.  If you are doing Hadoop your data center is going to look radically different than a web services company.  There are lessons you can learn from what the big boys are doing.  Failing harder and using old tools in novel new ways are a good start your own data center analysis and planning.  Just remember that those big data centers aren’t alien environments.  They just have different needs to meet.

Here’s the entire SDDC Symposium Panel with Najam if you’d like to watch it.  He’s got a lot of interesting insights into things besides what I wrote about above.

The Vision Of A ThousandEyes

ThousandEyes_Logo

Scott Adams wrote a blog post once about career advice and whether is was better to be excellent at one thing or good at several things. Basically, being the best at something is fairly hard. There’s always going to be someone smarter or faster than you doing it just a bit better. Many times it’s just as good to be very good at what you do. The magic comes when you take two or three things that are very good and combine them in a way that no one has seen before to make something amazing. The kind of thing that makes people gaze in wonder then immediately start figuring out how to use your thing to be great.

During Networking Field Day 6, ThousandEyes showed the delegates something very similar to what Scott Adams was talking about. ThousandEyes uses tools like Traceroute, Ping, and BGP data aggregation to collect data. These tools aren’t overly special in and of themselves. Ping and Traceroute are built into almost every networking stack. BGP looking glass servers and data analysis have been available publicly for a while and can be leveraged in a tool like BGPMon. All very good tools. What ThousandEyes did was combine them in a way to make them better.

ThousandEyes can show data all along the path of a packet. I can see response times and hop-by-hop trajectory. I can see my data leave one autonomous system (AS) and land in another. Want to know what upstream providers your ISP is using? ThousandEyes can tell you that. All that data can be collected in a cloud dashboard. You can keep tabs on it to know if you service level agreements (SLAs) are being met. Or, you could think outside the box and do something that I found very impressive.

Let’s say you are a popular website that angered someone. Maybe you published an unflattering article. Maybe you cut off a user doing something they should have. Maybe someone out there just has a grudge. With the nuclear options available to most “hackers” today, the distributed denial of service (DDoS) attack seems to be a popular choice. So popular that DDoS mitigation services have sprung up to shoulder the load. The basic idea is that when you determine that you’re being slammed with gigabits of traffic, you just swing the DNS for your website to a service that starts scrubbing away attack traffic and steering legitimate traffic to your site. In theory it should prevent the attackers from taking you offline. But how can you prove it’s working?

ThousandEyes can do just that. In the above video, they show what happened when Bank of America (BoA) was recently knocked offline by a huge DDoS attack. The information showed two of the three DDoS mitigation services were engaged. The third changeover didn’t happen. All that traffic was still being dumped on BoA’s servers. Those BoA boxes couldn’t keep up with what they were seeing, so even the legitimate traffic that was being forwarded on by the mitigation scrubbers got lost in the noise. Now, if ThousandEyes can tell you which mitigation provider failed to engage then that’s a powerful tool to have on your side when you go back to them and tell them to get their act together. And that’s just one example.

I hate calling ISPs to fix circuits because it never seems to be their fault. No matter what I do or who I talk to it never seems to be anything inside the provider network. Instead, it’s up to me to fiddle with knobs and buttons to find the right combination of settings to make my problem go away, especially if it’s packet loss. Now, imagine if you had something like ThousandEyes on your side. Not only could you see the path that your packets are taking through your ISP, you can check latency and see routing loops and suboptimal paths. And, you can take a screenshot of it to forward to the escalation tech during those uncomfortable phone arguments about where the problem lies. No fuss, no muss. Just the information you need to make your case and get the problem fixed.

If you’d like to learn more about ThousandEyes and their monitoring solutions, check out their website at http://www.thousandeyes.com. You can also follow them on Twitter as @ThousandEyes.


Tom’s Take

Vision is a funny thing. Some have it. Some don’t. Having vision can mean many things. It can be someone who assembles tools in a novel way to solve a problem. It can be the ability to collect data and “see” what’s going on in a network path. It can also mean being able to take that approach and use it in a non-obvious way to provide a critical service to application providers that they’ve never had before. Or, as we later found out at Networking Field Day 6 during a presentation with Solarwinds, it can mean having the sense to realize when someone is doing something right, as Joel Dolisy said when asked about ThousandEyes, “Oh, we’ve got our eye on them.” That’s a lot of vision. A ThousandEyes worth.

Special thanks to Ivan Pepelnjak (@IOSHints) for giving me some ideas on this review.

Networking Field Day Disclaimer

While I was not an official delegate at Networking Field Day 6, I did participate in the presentations and discussions. ThousandEyes was a sponsor of Networking Field Day 6. In addition to hosting a presentation in their offices, they provided snacks and drink for the delegates. They also provided a gift bag with a vacuum water bottle, luggage tag, T-shirt, and stickers (which I somehow managed to misplace). At no time did they ask for any consideration in the writing of this review, nor were they offered any. Independence means no restrictions.  The analysis and conclusions contained in this post are mine and mine alone.

Disruption in the New World of Networking

This is the one of the most exciting times to be working in networking. New technologies and fresh takes on existing problems are keeping everyone on their toes when it comes to learning new protocols and integration systems. VMworld 2013 served both as an annoucement of VMware’s formal entry into the larger networking world as well as putting existing network vendors on notice. What follows is my take on some of these announcements. I’m sure that some aren’t going to like what I say. I’m even more sure a few will debate my points vehemently. All I ask is that you consider my position as we go forward.

Captain Over, Captain Under

VMware, through their Nicira acquisition and development, is now *the* vendor to go to when you want to build an overlay network. Their technology augments existing deployments to provide software features such as load balancing and policy deployment. In order to do this and ensure that these features are utilized, VMware uses VxLAN tunnels between the devices. VMware calls these constructs “virtual wires”. I’m going to call them vWires, since they’ll likely be called that soon anyway. vWires are deployed between hosts to provide a pathway for communications. Think of it like a GRE tunnel or a VPN tunnel between the hosts. This means the traffic rides on the existing physical network but that network has no real visibility into the payload of the transit packets.

Nicira’s brainchild, NSX, has the ability to function as a layer 2 switch and a layer 3 router as well as a load balancer and a firewall. VMware is integrating many existing technologies with NSX to provide consistency when provisioning and deploying a new sofware-based network. For those devices that can’t be virtualized, VMware is working with HP, Brocade, and Arista to provide NSX agents that can decapsulate the traffic and send it to an physical endpoint that can’t participate in NSX (yet). As of the launch during the keynote, most major networking vendors are participating with NSX. There’s one major exception, but I’ll get to that in a minute.

NSX is a good product. VMware wouldn’t have released it otherwise. It is the vSwitch we’ve needed for a very long time. It also extends the ability of the virtualization/server admin to provision resources quickly. That’s where I’m having my issue with the messaging around NSX. During the second day keynote, the CTOs on stage said that the biggest impediment to application deployment is waiting on the network to be configured. Note that is my paraphrasing of what I took their intent to be. In order to work around the lag in network provisioning, VMware has decided to build a VxLAN/GRE/STT tunnel between the endpoints and eliminate the network admin as a source of delay. NSX turns your network in a fabric for the endpoints connected to it.

Under the Bridge

I also have some issues with NSX and the way it’s supposed to work on existing networks. Network engineers have spent countless hours optimizing paths and reducing delay and jitter to provide applications and servers with the best possible network. Now, that all doesn’t matter. vAdmins just have to click a couple of times and build their vWire to the other server and all that work on the network is for naught. The underlay network exists to provide VxLAN transport. NSX assumes that everything working beneath is running optimally. No loops, no blocked links. NSX doesn’t even participate in spanning tree. Why should it? After all, that vWire ensures that all the traffic ends up in the right location, right? People would never bridge the networking cards on a host server. Like building a VPN server, for instance. All of the things that network admins and engineers think about in regards to keeping the network from blowing up due to excess traffic are handwaved away in the presentations I’ve seen.

The reference architecture for NSX looks pretty. Prettier than any real network I’ve ever seen. I’m afraid that suboptimal networks are going to impact application and server performance now more than ever. And instead of the network using mechanisms like QoS to battle issues, those packets are now invisible bulk traffic. When network folks have no visibility into the content of the network, they can’t help when performance suffers. Who do you think is going to get blamed when that goes on? Right now, it’s the network’s fault when things don’t run right. Do you think that moving the onus for server network provisioning to NSX and vCenter is going to forgive the network people when things go south? Or are the underlay engineers going to be take the brunt of the yelling because they are the only ones that still understand the black magic outside the GUI drag-and-drop to create vWires?

NSX is for service enablement. It allows people to build network components without knowing the CLI. It also means that network admins are going to have to work twice as hard to build resilient networks that work at high speed. I’m hoping that means that TRILL-based fabrics are going to take off. Why use spanning tree now? Your application and service network sure isn’t. No sense adding any more bells and whistles to your switches. It’s better to just tie them into spine-and-leaf CLOS fabrics and be done with it. It now becomes much more important to concentrate on the user experience. Or maybe the wirless network. As long as at least one link exists between your ESX box and the edge switch let the new software networking guys worry about it.

The Recumbent Incumbent?

Cisco is the only major networking manufacturer not publicly on board with NSX right now. Their CTO Padma Warrior has released a response to NSX that talks about lock-in and vertical integration. Still others have released responses to that response. There’s a lot of talk right now about the war brewing between Cisco and VMware and what that means for VCE. One thing is for sure – the landscape has changed. I’m not sure how this is going to fall out on both sides. Cisco isn’t likely to stop selling switches any time soon. NSX still works just fine with Cisco as an underlay. VCE is still going to make a whole bunch of money selling vBlocks in the next few months. Where this becomes a friction point is in the future.

Cisco has been building APIs into their software for the last year. They want to be able to use those APIs to directly program the network through devices like the forthcoming OpenDaylight controller. Will they allow NSX to program them as well? I’m sure they would – if VMware wrote those instructions into NSX. Will VMware demand that Cisco use the NSX-approved APIs and agents to expose network functionality to their software network? They could. Will Cisco scrap OnePK to implement NSX? I doubt that very much. We’re left with a standoff. Cisco wants VMware to use their tools to program Cisco networks. VMware wants Cisco to use the same tools as everyone else and make the network a commodity compared to the way it is now.

Let’s think about that last part for a moment. Aside from some speed differences, networks are largely going to be identical to NSX. It won’t care if you’re running HP, Brocade, or Cisco. Transport is transport. Someone down the road may build some proprietary features into their hardware to make NSX run better but that day is far off. What if a manufacturer builds a switch that is twice as fast as the nearest competition? Three times? Ten times? At what point does the underlay become so important that the overlay starts preferring it exclusively?


Tom’s Take

I said a lot during the Tuesday keynote at VMworld. Some of it was rather snarky. I asked about full BGP tables and vMotioning the machines onto the new NSX network. I asked because I tend to obsess over details. Forgotten details have broken more of my networks than grand design disasters. We tend to fuss over the big things. We make more out of someone that can drive a golf ball hundreds of yards than we do about the one that can consistently sink a ten foot putt. I know that a lot of folks were pre-briefed on NSX. I wasn’t, so I’m playing catch up right now. I need to see it work in production to understand what value it brings to me. One thing is for sure – VMware needs to change the messaging around NSX to be less antagonistic towards network folks. Bring us into your solution. Let us use our years of experience to help rather than making us seem like pariahs responsible for all your application woes. Let us help you help everyone.

Big Data? Or Big Analysis?

data-illustration

Unless you’ve been living under a rock for the past few years, you’ve no doubt heard all about the problem that we have with big data.  When you start crunching the numbers on data sets in the terabyte range the amount of compute power and storage space that you have to dedicate to the endeavor is staggering.  Even at Dell Enterprise Forum some of the talk in the keynote addresses focused on the need to split the processing of big data down into more manageable parallel sets via use of new products such as the VRTX.  That’s all well and good.  That is, it’s good if you actually believe the problem is with the data in the first place.

Data Vs. Information

Data is just description.  It’s a raw material.  It’s no more useful to the average person than cotton plants or iron ore.  Data is just a singular point on a graph with no axes.  Nothing can be inferred from that data point unless you process it somehow.  That’s where we start talking about information.

Information is the processed form of data.  It’s digestible and coherent.  It’s a collection of data points that tell a story or support a hypothesis.  Information is actionable data.  When I have information on something, I can make a decision or present my findings to someone to make a decision.  They key is that it is a second-order product.  Information can’t exist without data upon which to perform some kind of analysis.  And therein lies the problem in our growing “big data” conundrum.

Big Analysis

Data is very sedentary.  It doesn’t really do much after it’s collected.  It may sit around in a database for a few days until someone needs to generate information from it.  That’s where analysis comes into play.  A table is just a table.  It has a height and a width.  It has a composition.  That’s data.  But when we analyze that table, we start generating all kinds of additional information about it.  Is it comfortable to sit at the table?  What color lamp goes best with it?  Is it hard to move across the floor?  Would it break if I stood on it?  All of that analysis is generated from the data at hand.  The data didn’t go anywhere or do anything.  I created all that additional information solely from the data.

Look at the above Wikipedia entry for big data.  The image on the screen is one of the better examples of information spiraling out of control from analysis of a data set.  The picture is a visual example of Wikipedia edits.  Note that it doesn’t have anything to do with the data contained in a particular entry.  They’re just tracking what people did to describe that data or how they analyzed it.  We’ve generated terabytes of information just doing change tracking on a data set.  All that data needs to be stored somewhere.  That’s what has people in IT sales salivating.

Guilt By Association

If you want to send a DBA screaming into the night, just mention the words associative entity (or junction table).  In another lifetime, I was in college to become a DBA.  I went through Intro to Databases and learned about all the constructs that we use to contain data sets.  I might have even learned a little SQL by accident.  What I remember most was about entities.  Standard entities are regular data.  They have a primary key that describes a row of data, such as a person or a vehicle.  That data is pretty static and doesn’t change often.  Case in point – how accurate is the height and weight entry on your driver’s license?

Associative entities, on the other hand, represent borderline chaos.  These are analysis nodes.  They contain more than one primary key as a reference to at least two tables in a database.  They are created when you are trying to perform some kind of analysis on those tables.  They can be ephemeral and usually are generated on demand by things like SQL queries.  This is the heart of my big data / big analysis issue.  We don’t really care about the standard data entities.  We only want the analysis and information that we get from the associative entities.  The more information and analysis we desire, the more of these associative entities we create.  Containing these descriptive sets is causing the explosion in storage and compute costs.  The data hasn’t really grown.  It’s our take on the data that has.

Crunch Time

What can we do?  Sadly, not much.  Our brains are hard-wired to try and make patterns out of seeming unconnected things.  It is a natural reaction that we try to bring order to chaos.  Given all of the data in the world the first thing we are going to want to do with it is try and make sense of it.  Sure, we’ve found some very interesting underlying patterns through analysis such as the well-worn story from last year of Target determining a girl was pregnant before her family knew.  The purpose of all that analysis was pretty simple – Target wanted to know how to better pitch products to a specific focus groups of people.  They spent years of processing time and terabytes of storage all for the lofty goal of trying to figure out what 18-24 year old males are more likely to buy during the hours of 6 p.m. to 10 p.m. on weekday evening.  It’s a key to the business models of the future.  Rather than guessing what people want, we have magical reports that tell us exactly what they want.  Why do you think Facebook is so attached to the idea of “liking” things?  That’s an advertisers dream.  Getting your hands on a second-order analysis of Facebook’s Like database would be the equivalent of the advertising Holy Grail.


Tom’s Take

We are never going to stop creating analysis of data.  Sure, we may run out of things to invent or see or do, but we will never run out of ways to ask questions about them.  As long as pivot tables exist in Excel or inner joins happen in an Oracle database people are going to be generating analysis of data for the sake of information.  We may reach a day where all that information finally buries us under a mountain of ones and zeroes.  We brought it on ourselves because we couldn’t stop asking questions about buying patterns or traffic behaviors.  Maybe that’s the secret to Zen philosophy after all.  Instead of concentrating on the analysis of everything, just let the data be.  Sometimes just existing is enough.

Dell Enterprise Forum and the VRTX of Change

I was invited by Dell to be a part of their first ever Enterprise Forum.  You may remember this event from the past when it was known as Dell Storage Forum, but now that Dell has a bevy of enterprise-focused products in their portfolio a name change was in order.  The Enterprise Forum still had a fair amount of storage announcements.  There was also discussion about networking and even virtualization.  One thing seemed to be on the tip of everyone’s tongue from the moment it was unveiled on Tuesday morning.

VRTX

Say hello to Dell’s newest server platform – VRTX (pronounced “vertex”).  The VRTX is a shift away from the centralized server clusters that you may be used to seeing from companies like Cisco, HP, or IBM.  Dell has taken their popular m1000 blade units and pulled them into an enclosure that bears more than a passing resemblance to the servers I deployed five or six years ago.  The VRTX is capable of holding up to 4 blade servers in the chassis alongside either 12 3.5″ hard drives or 25 2.5″ drives, for a grand total of up to 48 TB of storage space.  What sets VRTX apart from other similar designs, like the IBM S-class BladeCenter of yore, is the ability for expansion.

Rather than just sliding a quad-port NIC into the mezzanine slot and calling it a day, Dell developed VRTX to expand to meet future needs of customers.  That’s why you’ll find 8 PCIe slots in VRTX (3 full height, 5 half height).  That’s the real magic in this system.  For example, the VRTX ships today with 8 1GbE ports for network connectivity.  While 10GbE is slated for a future release you could slide in a 10GbE PCIe card and attach it to a blade if needed to gain connectivity.  You could also put in a Serial Attached SCSI (SAS) Host Bus Adapter (HBA) and gain more expansion for your on-board storage.  In the future, you could even push that to 40GbE or maybe one of those super fast PCIe SSD cards from a company like Fusion-IO.  The key is that the PCIe slots give you a ton of expandability in such a small form factor instead of limiting you to whatever mezzanine card or expansion adapter has been blessed by the skunkworks labs for your supplying server vendor.

VRTX doesn’t come without a bit of controversy.  Dell has positioned this system as a remote office/branch office (ROBO) solution that combines everything you would need to turn up a new site into one shippable unit.  That follows along with comments made at a keynote talk on the third day about Dell believing that compute power has reached a point where it will no longer grow at the same rate.  Dell’s solution to the issue is to push more compute power to the edge instead of centralizing it in the data center.  What you lose in manageability you gain in power.

The funny thing for me was looking at VRTX and seeing the solution to a small scale data center problem I had for many years.  The schools I used to serve didn’t need an 8 or 10-slot blade chassis.  They didn’t need two Compellent SANs with data tiering and failover.  They needed a solution to virtualize their aging workloads onto a small box built for their existing power and cooling infrastructure.  VRTX fits the bill just fine.  It uses 110v power.  The maximum of four blades fits just perfectly with VMware‘s Essentials bundle for cheap virtualization with the capability to expand if needed later on.  Everything is the same as the enterprise-grade hardware that’s being used in other solutions, just in a more SMB-friendly box.  Plus, the entry level price target of $10,000 in a half-loaded configuration fits the budget conscious needs of a school or small office.

If there is one weakness in the first iteration of VRTX it comes from the software side of things.  VRTX doesn’t have any software beyond what you load on it.  It will run VMware, Citrix, Hyper-V, or any manner of server software you want to install.  There’s no software to manage the platform, though.  Without that, VRTX is a standalone system.  If you truly wanted to use it as a “pay as you grow” data center solution, you need to find a way to expand the capabilities of the system linearly as you expand the node count.  As a counterpoint to this, take a look at Nutanix.  Many storage people at Enterprise Forum were calling the VRTX the “Dell Nutanix” solution.  You can watch an overview of what Nutanix is doing from a session at Storage Field Day 2 last November:

The key difference is that Nutanix has a software management program that allows their nodes to scale out when a new node is added.  That is what Dell needs to work on developing to harness the power that VRTX represents.  Dell developed this as a ROBO solution yet no one I talked to saw it that way.  They saw this as a building block for a company starting their data center build out.  What’s needed is the glue to stitch two or more VRTX systems together.  Harnessing the power of multiple discrete compute units is a very important part of breaking through all the barriers discussed at the end of Enterprise Forum.


Tom’s Take

Bigger is better.  Except when it’s not.  Sometimes good things really do come in small packages.  Considering that Dell’s VRTX was a science project for the last four years being built as a proof-of-concept I’d say that Dell has finally achieved one thing they’ve been wanting to do for a while.  It’s hard to compete against HP and IBM due to their longevity and entrenchment in the blade server market.  Now, Dell has a smaller blade server that customers are clamoring to buy to fill needs that aren’t satisfied by bigger boxes.  The missing ingredient right now is a way to tie them all together.  If Dell can mulitplex their resources together they stand an excellent chance of unseating the long-standing titans of blade compute.  And that’s a change worth fighting for.

Disclaimer

I was invited to attend Dell Enterprise Forum at the behest of Dell.  They paid for my travel and lodging expenses while on site in San Jose.  They also provided a Social Media Influencer pass to the event.  At no time did they place any requirements on my attendance or participation in this event.  They did not request that any posts be made about the event.  They did not ask for nor where they granted any kind of consideration in the writing of this or any other Dell Enterprise Forum post.

Tech Field Day 9

TFD-Logo-300

It’s hard to believe that the last Tech Field Day event was held almost two years ago.  Since the, the Field Day series has branched out to cover topics like Networking, Storage, and Wireless.  The industry never stands still for long, however.  The stars aligned and the sponsors asked to bring back the granddaddy of them all.  That’s why I’m happy to announce that I’ll be attending Tech Field Day 9 from June 19-21 in Austin, TX.

There’s an all-star lineup of previous Field Day attendees with a couple of new folks sprinkled in to keep things lively:

http://techfieldday.com/wp-content/uploads/2013/05/Al-Head-2012-Small-wpcf_54x60.jpg Alastair Cooke @DemitasseNZ
Trainer, Writer, Consultant, Geek. From New Zealand.
http://techfieldday.com/wp-content/uploads/2012/08/Plankers-wpcf_60x60.jpg Bob Plankers @Plankers
A hardcore IT generalist, virtualization expert, blogger, and vocal end user of technology.
http://techfieldday.com/wp-content/uploads/2012/08/2012_Pic-wpcf_41x60.jpg Carlo Costanzo @CCostan
Carlo is a NYC based Virtualization Consultant. He writes about whatever interests him at the time @ vCloudInfo.com
http://techfieldday.com/wp-content/uploads/2012/08/wahl-headshot-200x200-wpcf_60x60.jpg Chris Wahl @ChrisWahl
The guy who is in your data center virtualizing things
http://techfieldday.com/wp-content/uploads/2012/08/Marks-wpcf_55x60.jpg Howard Marks @DeepStorageNet
Storage Analyst Extraordinary and Plenipotentiary
http://techfieldday.com/wp-content/uploads/2012/08/JohnObeto-wpcf_53x60.jpg John Obeto @JohnObeto
I like SMBs and Windows
http://techfieldday.com/wp-content/uploads/2013/03/jpw_headshot-wpcf_60x58.png Justin Warren @JPWarren
The Anablogger: Old-school, long-form analysis with an irreverent twist.
http://techfieldday.com/wp-content/uploads/2012/08/Norwood-wpcf_60x60.png Matthew Norwood @MatthewNorwood
http://techfieldday.com/wp-content/uploads/2012/08/Novak-wpcf_60x39.jpg Robert Novak @Gallifreyan
Writer, Photographer, System Administrator, Team Builder, Cat Herder, Comedian, Part-Time Shopkeeper
http://techfieldday.com/wp-content/uploads/2012/08/Adzima.jpeg Ryan Adzima @RAdzima
Ryan is an enterprise technology generalist with a tendency to always end up back in networking.
http://techfieldday.com/wp-content/uploads/2012/08/Lowe-wpcf_48x60.jpg Scott D. Lowe @OtherScottLowe
http://techfieldday.com/wp-content/uploads/2012/08/tmattke-wpcf_60x60.jpg Tony Mattke @Tonhe
network engineer / geek

The delegates are some of the best and brightest across the networking, server, and storage industries.  Which is quite fitting when you consider the sponsors that are coming your way and how the represent the new trend in converged data centers:

http://techfieldday.com/wp-content/uploads/2013/04/commvault-logo-wpcf_100x37.jpg http://techfieldday.com/wp-content/uploads/2012/08/dell_blue_rgb-wpcf_60x60.jpg http://techfieldday.com/wp-content/uploads/2013/06/logo-wpcf_100x21.png http://techfieldday.com/wp-content/uploads/2013/03/neverfail_final_logo-wpcf_100x22.png
http://techfieldday.com/wp-content/uploads/2012/08/Nutanix-wpcf_100x12.png http://techfieldday.com/wp-content/uploads/2012/08/solarwinds_RGB-300x84-wpcf_100x28.jpg http://techfieldday.com/wp-content/uploads/2012/08/veeam-Modern-Data-Protection-logo-wpcf_100x38.png

In particular, Infinio is an exciting edition to the Tech Field Day series.  They will be launching during their presentation slot, so I’m sure they’re going to have a very interesting take on their topic.

Tech Field Day 9 is also a transition point for me personally.  For the first time, I’ll be attending the event as both a delegate AND a staff member.  Now that I’m a full-time employee of Foskett Services and Gestalt IT I’m going to split my time between listening to the presenters and making sure that everything is running smoothly in the background.  It’s going to be a challenge to try and keep up with everything, but I feel that I’m more than capable of making every aspect of this event outstanding.

What’s Field Day Like?

Tech Field Day is not a vacation.  This event will involve starting a day early first thing Wednesday morning and running full steam for two and a half days.  We get up early and retire late.  Wall-to-wall meetings and transportation to and from vendors fill the days.  When you consider that most of the time we’re discussing vendors and presentations on the car ride to the next building, there’s very little downtime.  We’ve been known to have late night discussions about converged storage networking and automation until well after midnight.  If that’s your idea of a “vacation” then Tech Field Day is a paradise.  I usually crawl onto a plane late Friday night mentally and physically exhausted with a head full of blog posts and ideas.  It’s not unlike the same kind of feeling you get after running a marathon.  You don’t know if you could do it again tomorrow, but you can’t wait until the next one.

Tech Field Day – Join In Now!

Everyone at home is as much a participant in Tech Field Day as the delegates on site.  At the last event we premiered the ability to watch the streaming video from the presentations on mobile devices.  This means that you can tune in from just about anywhere now.  There’s no need to stay glued to your computer screen.  If you want to tune in to our last presentations of the day from the comfort of your couch with your favorite tablet device then feel free by all means.  We’ll also have the videos from the session posted quickly afterwards on Youtube and Vimeo.  If you have to run to the store for ice cream or catch that playoff game you can always catch up with what’s going on when you get back.  Don’t forget that you can also use Twitter to ask questions and make comments about what you’re seeing and hearing.  Some of the best questions I’ve seen came from the home audience.  Use the hashtag #TFD9 during the event.  Note that I’ll be tagging the majority of my tweets that week with #TFD9, so if the chatter is getting overwhelming you can always mute or filter that tag.

Standard Tech Field Day Sponsor Disclaimer

Tech Field Day is a massive undertaking that involves the coordination of many moving parts.  It’s not unlike trying to herd cats with an aircraft carrier.  One of the most important pieces is the sponsors.  Each of the presenting companies is responsible for paying a portion of the travel and lodging costs for the delegates.  This means they have some skin in the game.  What this does NOT mean is that they get to have a say in what we do.  No Tech Field Day delegate is every forced to write about the event due to sponsor demands. If a delegate chooses to write about anything they see at Tech Field Day, there are no restrictions about what can be said.  Sometimes this does lead to negative discussion.  That is entirely up to the delegate.  Independence means no restrictions.  At times, some Tech Field Day sponsors have provided no-cost evaluation equipment to the delegates.  This is provided solely at the discretion of the sponsor and is never a requirement.  This evaluation equipment is also not a contingency of writing a review, be it positive or negative.  The delegates are in this for the truth, the whole truth, and nothing but the truth.

If you’d like to learn more about what makes Tech Field Day so special, please check out the website at http://techfieldday.com.  If you want to be a part of Tech Field Day, don’t hesitate to fill out the nomination form to become a delegate.  We’re always on the lookout for great people to become a part of the event and we’d love to have you along for the ride.