About networkingnerd

Tom Hollingsworth, CCIE #29213, is a former network engineer and current organizer for Tech Field Day. Tom has been in the IT industry since 2002, and has been a nerd since he first drew breath.

Meeting Your Deadlines Is Never Easy

2018 has been a busy year. There’s been a lot going on in the networking world and the pace of things keeps accelerating. I’ve been inundated with things this last month, including endless requests for my 2019 predictions and where I think the market is going. Since I’m not a prediction kind of person, I wanted to take just a couple of moments to talk more about something that I did find interesting from 2018 – deadlines.

Getting It Out The Door

Long-time readers of this blog may remember that I’ve always had a goal set for myself of trying to get one post published every week. It’s a deadline I set for myself to make sure that I didn’t let my blog start decaying into something that is barely updated. I try to hold fast to my word and get something new out every week. Sometimes it’s simple, like reflections on one of the various Tech Field Day events that I’m working on that week. But there’s always something.

That is, until Cisco Live this year. I somehow got so wrapped up in things that I missed a post for the first time in eight years! Granted, this was the collection of several things going on at once:

  1. I was running Tech Field Day Extra during Cisco Live. So I was working my tail off the entire time.
  2. I was at Cisco Live, which is always a hugely busy time for me. Even when I’m not doing something specific to the event it’s social hour every hour.
  3. I normally write posts on Thursday afternoon to publish Friday this year. Guess what happened on Thursday at Cisco Live after we all said goodbye? I went on vacation with my family to Disney World. So I kind of forgot that I didn’t publish anything until Sunday afternoon.

The perfect confluence of factors led to me missing a deadline. Now, I’ve missed it again once more this year and totally forgotten to write something until the Monday following my deadline. And it’s even more frustrating when it’s something I totally could have controlled but didn’t.

Why the fuss? I mean, it’s not like all my readers are going to magically run away if I don’t put something out today or tomorrow. While that is very true, it’s more for me that I don’t want to forget to put content out. More than any other thing, scheduling your content is the key to keeping your readers around.

Think about network television. For years, they advertised their timeslots as much as they advertised their shows. Must-See Thursday. TGIF. Each of these may conjure images of friendly shows or of full houses. But you remember the day as much as you remember the shows, right? That’s because the schedule became important. If you don’t think that matters, imagine the shows that are up against big events or keep getting bumped because of sporting events. There’s a reason why Sunday evening isn’t a good time for a television show. Or why no one tries to put something up against the Super Bowl.

Likewise, schedules are important for blogging. I used to just hit publish on my posts whenever I finished them. That meant sending them out at 9pm on a Tuesday some times. Not the best time for people to want to dive into a technical post. Instead, I started publishing them in the mornings after I wrote them. That means more eyeballs and more time to have people reflect on them. I’ve always played around with the daily schedule of when to publish, but in 2018 it got pushed to Friday out of necessity. I kept running out of time. Instead of focusing on the writing, I would often wake up Friday morning with writer’s block and just churn something out to hit my deadline.

Writing because you have to is not fun. Wracking your brain to come up with some topic of conversation is stressful. Lee Badman has been posting questions every weekday morning to the wireless community for a long while and he’s decided that it’s run its course. I applaud Lee for stepping away from something like that before it became a chore. It’s not easy to leave something behind that has meant a lot to you.

Write Like The Wind

For me, blogging is still fun. I still very much enjoy sitting down in front of a computer keyboard and getting some great thoughts out there. I find my time at Tech Field Day events has energized my writing to a large degree because there is so much good content out there that needs to be discussed and indexed. I still enjoy pouring my thoughts out onto a piece of digital paper for everyone to read.

Could I cut back to simple reaction posts? Sure. But that’s not my style. I started blogging because I like the long-form of text. I’ve written some quick sub-500 word pieces because I needed to get something out. But those are the exceptions to the rule. I’d rather keep things thoughtful and encourage people to spend more time focusing on words.

I think the biggest thing that I need to change in the posting dates. I need to move back from Friday to give myself some headroom to post. I also need to use Friday as my last-ditch day to get things published. That may mean putting more thought to my posts earlier in the week for sure. It may also mean having two posts on weeks that big news breaks. But that’s the life of a writer, isn’t it?

Home Away From Home

The third biggest challenge for deadlines is all the other writing that I’m doing. I spend a lot of time taking briefings and such for Gestalt IT, which I affectionately refer to as my “Bruce Wayne” job. I get to hear a lot of fun stories and see a lot of great companies just starting out in the world. I write a lot over there because it’s how I keep up with the industry. Remember that year that I went crazy and wrote two posts every week for an entire year? Yeah, good times. Guess what? It’s going to be like that again!

Gestalt IT is going to be my writing source for most of my briefings and coverage of companies. It’s going to have a much different tone that this blog does. Here is when I’m going to spend more time pontificating and looking at big trends in technology. Or perhaps it will be stirring the pot. But I still plan on getting out one post a week about some topic. And I won’t be posting it on Friday unless I absolutely have to.


Tom’s Take

It’s no stretch to say that writing is something I do better than anything else. It’s also something I love to do. I want to do my best to keep bringing good content to everyone out there that likes to read my blog. I’m going to spend some time exploring new workflows and trying to keep the hits coming along as 2019 rolls around. I’ll have more to say on that in my usual January 1 post to kick off the new year!

Advertisements

Facebook’s Mattress Problem with Privacy

If you haven’t had a chance to watch the latest episode of the Gestalt IT Rundown that I do with my co-workers every Wednesday, make sure you check this one out. Because it’s the end of the year it’s customary to do all kinds of fun wrap up stories. This episode focused on what we all thought was the biggest story of the year. For me, it was the way that Facebook completely trashed our privacy. And worse yet, I don’t see a way for this to get resolved any time soon. Because of the difference between assets and liabilities.

Contact The Asset

It’s no secret that Facebook knows a ton about us. We tell it all kinds of things every day we’re logged into the platform. We fill out our user profiles with all kinds of interesting details. We click Like buttons everywhere, including the one for the Gestalt IT Rundown. Facebook then keeps all the data somewhere.

But Facebook is collecting more data than that. They track where our mouse cursors are in the desktop when we’re logged in. They track the amount of time we spend with the mobile app open. They track information in the background. And they collect all of this secret data and they store it somewhere as well.

This data allows them to build an amazingly accurate picture of who we are. And that kind of picture is extremely valuable to the right people. At first, I thought it might be the advertisers that crave this kind of data. Advertisers are the people that want to know exactly who is watching their programs. The more data they have about demographics the better they can tailor the message. We’ve already seen that with specific kinds of targeted posts on Facebook.

But the people that really salivate over this kind of data live in the shadows. They look at the data as a way to offer new kinds of services. Don’t just sell people things. Make them think differently. Change their opinions about products or ideas without them even realizing it. The really dark and twisted stuff. Like propaganda on a whole new scale. Enabled by the fact that we have all the data we could ever want on someone without even needing to steal it from them.

The problem with Facebook collecting all this data about us is that it’s an asset. It’s not too dissimilar from an older person keeping all their money under a mattress. We scoff at that person because a mattress is a terrible place to keep money. It’s not safe. And a bank will pay you keep your money there, right?

On the flip side, depending on the age of that person, they may not believe that banks are safe. Before FDIC, there was no guarantee your money would be repaid in a pinch. And if the bank goes out of business you can’t get your investment back. For a person that lived through the Great Depression that had to endure bank holidays and the like, keeping your asset under a mattress is way safer than giving it to someone else.

As an aside here, remember that banks don’t like leaving your money laying around either. If you deposit money in a bank, they take that money and invest it in other places. They put the money to work for them making money. The interest that you get paid for savings accounts and the like is just a small bonus to encourage you to keep your money in the bank and not to pull it out. That’s why they even have big disclaimers saying that your money may not be available to withdraw at a moment’s notice. Because if you do decide to get all of your money out of the bank at once, they need to go find the money to give you.

Now, let’s examine our data. Or, at least the data that Facebook has been storing on us. How do you think Facebook looks at that data? Do you believe they want to keep it under the mattress where it’s safe from the outside world? Do you think that Facebook wants to keep all these information locked in a vault somewhere where no one can get to it?

Or perhaps Facebook looks at your data as an asset like a bank does. Instead of keeping it around and letting it sit fallow they’d rather put it to work. That’s the nature of a valuable asset. To the average person, their privacy is one of the most important parts of their lives. To Facebook, your privacy is simply an asset. It can either sit by itself and make them nothing. Or it can be put to use by Facebook or third-party companies to make more money from the things that they can do with good data sources. To believe that a company like Facebook has your best interests at heart when it comes to privacy is not a good bet to make.

Would I Lie-ability To You?

In fact, the only thing that can make Facebook really sit up and pay attention is if that asset they have farmed out and working for them were to suddenly become a liability for some reason. Liabilities are a problem for companies because they are the exact opposite of making money. They cost money. Just as the grandmother in the above example sees an insolvent bank as a liability, so too would someone see a bad asset as a possible exposure.

Liabilities are a problem. Anything that can be an exposure is an issue for company, especially one with investors that like to get dividends. Any reduction in profit equals a loss. Liabilities on a balance sheet are giant red flags for anyone taking a close look at the operations of a business.

Turning Facebook’s data assets into a liability is the only way to make them sit up and realize that what they’re doing is wrong. Selling access to our data to anyone that wants it is a horrible idea. But it won’t stop until there is some way to make them pay through he nose for screwing up. Up until this year, that was a long shot at best. Most fines were in the thousands of dollars range, whereas most companies would pay millions for access to data. A carefully crafted statement admitting no fault after the exposure was uncovered means that Facebook and the offending company get away without a black mark and get to pocket all their gains.

The European GDPR law is a great step in the right direction. It clearly spells out what has to happen to keep a person’s data safe. That eliminates wiggle room in the laws. It also puts a stiff fine in place to ensure that any violations can be compounded quickly to drain a company and turn data into a liability instead of an asset. There are moves in the US to introduce legislation similar to GDPR, either at the federal level or in individual states like California, the location of Facebook’s headquarters.

That’s not to say that these laws are going to get it right every time. There are people out there that live to find ways to turn liabilities into assets. They want to find ways around the laws and make it so that they can continue to take their assets and make money from them even if the possibility of exposure is high. It’s one thing when that exposure is the money of people that invested in them. It’s another thing entirely when it’s personally identifiable information (PII) or protected information about people. We’re not imaginary money. We live and breath and exist long past losses. And trying to get our life back on track after an exposure is not easy for sure.


Tom’s Take

If I sound grumpy, it’s because I am tired of this mess. When I was researching my discussion for the Gestalt IT Rundown I simply Googled “Facebook data breach 2018” looking for examples that weren’t Cambridge Analytica. The number was more than it should have been. We cry about Target and Equifax and many other exposures that have happened in the last five years, but we also punish those companies by not doing business with them or moving our information elsewhere. Facebook has everyone hooked. We share photos on Facebook. We RSVP to events on Facebook. And we talk to people on Facebook as much or more than we do on the phone. That kind of reach requires a company to be more careful with who has access to our data. And if the solution is building the world’s biggest mattress to keep it all safe put me down for a set of box springs.

 

Some Random Thoughts From Security Field Day

I’m spending the week in some great company at Security Field Day with awesome people. They’re really making me think about security in some different ways. Between our conversations going to the presentations and the discussions we’re having after hours, I’m starting to see some things that I didn’t notice before.

  • Security is a hard thing to get into because it’s so different everywhere. Where everyone just sees one big security community, it is in fact a large collection of small communities. Thinking that there is just one security community would be much more like thinking enterprise networking, wireless networking, and service provider networking are the same space. They may all deal with packets flying across the wires but they are very different under the hood. Security is a lot of various communities with the name in common.
  • Security isn’t about tools. It’s not about software or hardware or a product you can buy. It’s about thinking differently. It’s about looking at the world through a different lens. How to protect something. How to attack something. How to figure all of that out. That’s not something you learn from a book or a course. It’s a way of adjusting your thinking to look at problems in a different way. It’s not unlike being in an escape room. Don’t look at the objects like you normally would. Instead, think about them with unique combinations that get you somewhere different than where you thought you needed to be.
  • Security is one of the only IT disciplines where failure is an acceptable outcome. If we can’t install a router or a wireless access point, it’s a bad job. However, in security if you fail to access something that should have been secured it was a success. That can lead to some very interesting situations that you can find yourself in. It’s important to realize that you also have to properly document your “failure” so people know what you tried to do to get there. Otherwise your success may just be a lack of proper failure.

Tom’s Take

I’m going to have some more thoughts from Security Field Day coming up another time. There’s just too much to digest at one time. Stay tuned for some more great discussions and highlights of my first real foray in the security community!

It’s The Change Freeze Season

Everyone’s favorite time of the year is almost here! Is it because it’s the holiday season? Perhaps it’s the magic that happens at the end of the year? Or maybe, it’s because there’s an even better reason to get excited!

Change Freeze Season!

That’s right. Some of you reading this started jumping up and down like Buddy the Elf at the thought of having a change freeze. There’s something truly magical about laying down the law about not touching anything in the system until after the end-of-year reports are run and certified. For some, this means a total freeze of non-critical changes from the first of December all the way through the New Year until maybe even February. That’s a long time to have a frozen network? But why?

The Cold Shoulder

Change freezes are an easy thing to explain to the new admins. You simply don’t touch anything in the network during the freeze unless it’s broken. No tweaking. No experimenting. No improvements. Just critical break/fix changes only. There had better be a ticket. There should be someone yelling that something’s not right. Otherwise you’re in for it.

There are a ton of reasons for this. The first is something I remember from my VAR days as Boredom Repellent. When you find yourself at the end of year with nothing to do, you tend to get bored. After you’ve watched Die Hard for the fifteenth time this year you decide it’s time to clear out your project backlog. Or maybe you’ve been doing some learning modules instead. You find a great blog post from one of your favorite writers about a Great Awesome Amazing Feature That Will Save You Days Of Work If You Just Enable This One Simple Command!

In either case, the Boredom Repellent becomes like pheromones for problems. Those backlogged projects take more time than you expected. That simple feature you just need to enable isn’t so simple. It might even involve an entire code upgrade train to enable it. Pretty soon you find yourself buried in a CLI mess with people screaming about very real downtime. Now, instead of being bored you’re working until the wee hours of the night because of something you did.

The second reason for change freezes at the end of the year is management. You know, the people that call and scream at you as soon as their email appears to be running slow. The people that run reports once a month at 6:00pm and then call you because they get a funny warning message on their screen. Those folks. Guess what? End-of-year is their time to shine in all their glory.

This is usually the time they are under the most stress. Those reports have to be reprinted. All the financials from the year need to be consolidated and verified. The taxes will need to be paid. And all that paperwork and pressure adds up to stress. The kind of stress that makes any imperfections in the network seem ten times more important than before. Report screen not show success within 10 ms? Problem. Printer run out of yellow toner? Network problem. Laptop go to sleep while someone went to lunch and now the entire report is gone? Must be your problem. And guess who gets to work around the clock to solve it with someone bearing down on them from on high?

Don’t Let It Go

The fact is that we can’t have people doing things in the network without tracking those changes back to reasons. That applies for adventurous architects wanting to squeeze out the last ounce of performance from that amazing new switch. And it goes double for the CFO demanding you put his traffic into AF41 so it gets to the server faster so his reports don’t take six hours to print.

It all comes back to the simple fact that we have no way to track changes in our network and we have no way of knowing what will happen when we make one live. It feels an awful lot like this GIF:

Crazy, right? Yet every time we hit the Enter key, we are amazed at the results. Even for “modern” OSes with sanity checking, like Junos or IOS-XR, you have no way of knowing if a change you make on one device somewhere in the branch is going to crash OSPF or BGP for the entire organization. And even if there was a big loud warning popup that said, “ALERT: YOU ARE GOING TO BREAK EVERYTHING!!!”, odds are good we would just click past it.

Network automation and orchestration systems can prevent this. They can take the control of change management out of the hands of bored engineers and wrap it in process and policy. And if the policy says Change Freeze then that’s what you get. No changes. Likewise, if there is a critical need, like patching out a backdoor or something, that policy can be overridden and noted so that if there is a bug eight months from now in that code train that causes issues you can have documentation of the reason for the change when someone comes to chew you out.

Likewise, there are other solutions out there that try to prototype the entire network to figure out what will happen when you make a change. Companies like Forward Networks and Veriflow can prototype your network in a model that can assess the impact of a change before you commit to it. It’s the dream of a bored engineer because you can run simulations to your heart’s content to find out if two hours of code upgrades will really get you that 2% performance increase promised in that blog post. And for the CFO/CEO/CIO screaming at you to prioritize their traffic, these solutions can remind them that most of their traffic is Youtube and Spotify and having that at AF41 will cause massive issues for them.

What’s important is that you and the rest of the team realize that change freezes aren’t a solution to the problem of an unstable network. Instead, they are treating the symptoms that crop up from the underlying disease of the network not being a deterministic system. Unlike some other machines, networks run just fine at sub-optimal performance levels. You can make massive mistakes that will live in a network for years and never show their ugly face. That is, until you make a small change that upsets equilibrium and causes the whole system to fail, cascade style, and leave you holding the keyboard as it were.


Tom’s Take

I both love and hate Change Freeze season. I know it’s for the best because any changes that get made during this time will ultimately result in long hours at work undoing those changes. I also know that the temptation to experiment with things is very, very strong this time of year. But I feel like Change Freeze season will soon go the way of the aluminum Christmas tree when we get change management and deterministic network modeling systems in place to verify changes on a system-wide basis and not just sanity checking configs at a device level. Tracking, prototyping, and verification will solve our change freeze problems eventually. And that will make it the most wonderful time of the year all year long.

The Magic of the CCIE

I stumbled across a great Reddit thread this week: Is the CCIE as impossible as it seems? There are a lot of great replies on that thread about people passing and the “good old days” of Banyan Vines, Appletalk, and more. It’s also a fascinating look into how the rest of the networking industry sees exams like the CCIE and JNCIE. Because those of us that have the numbers seem to be magicians to some.

Sleight of CLI Hand

Have you ever seen the cups and balls magic trick? Here’s an excellent example of it from the recently departed Ricky Jay:

Impressive, right? It’s amazing to behold a master craftsman at work. Every time I watch that video I’m amazed. I know he’s doing sleight of hand. But I can’t catch it. Now, watch this same video but with annotations turned on. SPOILER ALERT – The annotations will tell you EXACTLY where the tricks are done:

Is it more impressive now that you know how the tricks are done? Check out this demonstration from Penn and Teller that shows you exactly how they do the tricks as well:

Okay, so it’s a little less mystifying now that you’ve seen how all the sleight of hand happens. But it’s still impressive because, as a professional, you can appreciate how the execute their tradecraft. Knowing that it’s not magic doesn’t mean it’s not an impressive feat. It must means you appreciate something different about the performance.

Let’s apply that to the CCIE. When you’re just starting out in networking, every piece of knowledge is new. Everything you learn is something you didn’t know before. Subnet masks, routing tables, and even just addressing an interface are new skills that you acquire and try to understand. It’s like learning how to take a coin from someone’s ear. It’s simple but it provides the building blocks for future tricks.

When you reach the level of studying for the CCIE lab, it does look like a daunting task. If you’ve followed Cisco’s guidelines you probably have your CCNP or equivalent knowledge. However, there is still a lot you don’t know. If you don’t believe that, go pick up Jeff Doyle’s Routing TCP/IP Volume 1 book. That book taught me I still had a lot to learn about networking.

But, as I slogged through the CCIE, I realized that I was acquiring skills. Just like the magicians that practice the cups and balls every day to get it right, I was picking up the ability to address interfaces quickly and see potential routing loops before I made them like I did in my first lab attempt. Each thing I learned and practiced not only made me a better engineer but also made the CCIE seem less like a mountain and more like a hill that could be climbed.

And I truly realized this when I was thumbing through a copy of the CCIE Official Exam guide. Someone had given me a copy to take a look at and I was happy with the depth of knowledge that I found. I wanted to pass it along to another junior engineer because, as I said to myself, “If only I had this book when I started! I could have skipped over all those other books!”

Practice, Practice, Practice!

That’s where I went wrong. Because I jumped right to the end goal instead of realizing the process. Magicians don’t start out making the Statue of Liberty disappear. They start out pulling coins from your ear and finding your card in a deck. They build their basic skills and then move on to harder things. But they most grand tricks in the magician’s top hat all still use the basic skills: sleight of hand, misdirection, and preparation. To neglect those is to court folly on stage.

CCIEs are no different. Every person that asks me about the test asks “How hard is it to pass?” I usually respond with something like “Not hard if you study.” Some of the people I talk to pick up on the “not hard” part and get crushed by the lab their first time out. They even end up with a $1,500 soda for their efforts. The other people, the ones that focus on “study” in my answer, they are the people who pass on the first attempt or the ones that get it right pretty quickly thereafter.

The CCIE isn’t a test. It’s a course in studying. It’s the culmination of teaching yourself the minutia of protocols and how they interact. The exam itself is almost perfunctory. It tests specific combinations of things you might see in the real world. And if you ask any CCIE, the real world is often ten time stranger than the lab. But the lab makes you think about the things you’ve already learned in new ways and apply that knowledge to find ways to solve problems. The lab isn’t hard because it’s easy. The lab becomes easier when you practice enough to not think the knowledge is hard any longer. I think Bruce Lee said it best:

I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times.

Most people would agree that Bruce Lee was one of the best martial artists of all times. And even he practiced until his fingers bled and he body was exhausted. Because he knew that being the best wasn’t about passing an exam for a belt or about showing off for people. It was about knowing what you needed to know and practicing it until it was second nature.


Tom’s Take

The CCIE has a certain magical aura for sure. But it’s not magical in and of itself. It’s a test designed to ensure that the people that pass know their skills at a deep level. It’s a test designed to make you look deeper at a problem and exhaust all your options before throwing in the towel. The CCIE isn’t impossible any more than sawing someone in half is impossible. It’s all about how your practice and prepare for the show that makes the trick seem impressive.

Murphy the Chaos Manager

I had the opportunity to sit in on a great briefing from Gremlin the other day about chaos engineering. Ken Nalbone (@KenNalbone) has a great review of their software and approach to things here. The more time I spent thinking about chaos engineering and IT, the more I realized that it has more in common with Murphy’s Law that we realize.

Anything That Can Go Wrong

If there’s more than one way to do a job and one of those ways will end in disaster, then somebody will do it that way. – Edward Murphy

 

Anything that can go wrong will go wrong. – Major John Paul Stapp

We live by the adage of Murphy’s Law in IT. Anything that can go wrong will go wrong. And usually it goes wrong at the worst possible time. Database query functions will go wrong when you need them the most. And usually at the height of something like Amazon Prime Day. Data center outages only seem to happen at 4 am on a Sunday during a holiday.

But why do things go wrong like this? Is it because the universe just has it out for IT people? Are we paying off karma from the fall of the Western Roman Empire? Or is it because we can’t anticipate some crazy things? Are we kidding ourselves that we can just manage Murphy and hope for the best?

As it turns out, this is why chaos engineering is so important. Because it doesn’t just make us realize that things are broken. It helps us understand how they will break in unique and different ways each time. A big reason why this is so important is because many large-scale failures aren’t the result of a single problem, but instead a collection of smaller things that build on each other.

One of my favorite stories about this collection of failures comes from a big Amazon Web Services (AWS) outage from last March. People were seeing problems in US-EAST-1 but they couldn’t nail down the issue. Worse yet, every time they logged into the Amazon dashboard they saw green lights for every service. As the minutes dragged on it was eventually discovered that the lights were lying to everyone because Amazon hosted that page on AWS US-EAST-1. They couldn’t log in to reset the lights to show an outage! Coincidentally, many other monitoring services were down as well because they were also hosted in the same region.

What does this teach us about chaos? Well, Murphy was in full effect for sure. Something went wrong and happened at a bad time. But it was also the worst possible time for Amazon to figure out that the status lights and dashboard systems were all hosted out of one region with no backup anywhere else. Perhaps they could have caught that with a system like Gremlin. Perhaps it would have gone under the radar until the worst possible moment like it did in real life. There’s no way to know for sure. Hopefully Amazon has fixed this little problem for now.

People Will Do It Wrong

This also teaches us something about user behavior. One thing we hear frequently about patches or other glaring issues with software is “How was this not caught in testing?!?”

The flip side of that is that most of these corner case issues were never tested in the first place. Testing focuses on testing main functionality of a system. QA testers focus on the big picture stuff first. Does the UI fall apart? Are all the buttons linked to a specific task? What happens when I click HELP on the login screen.

What does QA not test for? Well, lots of things that users actually do. Holding down random keystrokes while clicking buttons. Navigating to random pages and then bookmarking them without realizing that’s a bad idea. Typing the wrong information into a list box that passes validation and screws up the backend. The list of variations is endless.

How does this apply to chaos? Well, as it turns out, engineers and testers are pretty orderly people. We all look at problems and try to figure them out. We try combinations of things until we solve the issue. But everything is based on the idea that we’re trying things in specific combinations until we replicate the issue. We don’t realize that some of the random behavior we see comes from behaviors we can’t control from users.

Another story: I was editing a document the other day in a CMS and I saved the document revisions I’d made as a draft post. When I went to check the post, it had inadvertently published itself. I didn’t want it to publish at that time, so I was perplexed. I knew I had clicked the save function button but I also knew I didn’t click the publish button. I looked through documentation and couldn’t find any issues.

I put it out of my mind until it happened again a couple of weeks later. This time, I went back through every step I had just done. The only thing that was out of the ordinary compared to the last time was the I had saved the document with ⌘+S (CTRL+S for Windows) just like I’d taught myself to do for years. But, in this CMS, that shortcut saves and publishes the current document. Surprise!

Behavior that shouldn’t have triggered a problem did. Because no one ever tested for what might happen if someone used a familiar keystroke in a place where it wasn’t intended. This is what makes chaos engineering so difficult and rewarding. Because you can set up the system to test for those random things without needing to think about them. And when you figure out a new one, like whether or not ⌘+S can crash your system, you can add it to the list to be checked against everything!


Tom’s Take

I love reading and learning about chaos engineering. The idea that we purposely break things to make people thing about building them correctly appeals to me. I find myself trying to figure out how to make better things and always find out that I’m being stymied because I don’t think “outside the box”, which is a clever way of saying that I don’t think like a user. I need something that helps me understand how things will break in new and unique ways every time. Because while we can test for the big stuff, Murphy has a way of showing us what happens when we don’t sweat the small stuff.

Why Is The CCIE Lab Moving?

Cisco confirmed big CCIE rumor this week that the RTP lab was going to be moved to Richardson, TX.

The language Cisco used is pretty neutral. San Jose and RTP are being shut down as full time lab locations and everyone is moving to Richardson. We knew about this thanks to the detective work of Jeff Fry, who managed to figure this out over a week ago. Now that we know what is happening, why is it coming to pass?

They Don’t Build Them Like They Used To

Real estate is expensive. Anyone that’s ever bought a house will tell you that. Now, imagine that on a commercial scale. Many companies will get the minimum amount of building that they need to get by. Sometimes they’re bursting at the seams before they upgrade to a new facility.

Other companies are big about having lots of area. These are the companies that have giant campuses. Companies like Cisco, Dell EMC, Intel, and NetApp have multiple buildings spread across a wide area. It makes sense to do this when you’re a large company that needs the room to spread out. In Cisco’s case, each business unit had their own real estate. Wireless was in one building. Firewalls in another. Each part of the company had their own area to play in.

Cisco was a real estate maven for a while. They built out in anticipation of business. There was a story years ago of a buried concrete slab foundation in Richardson that was just waiting for the next big Cisco product to be developed so they could clear away the dirt and start construction. But, why not just build the building and be done with it?

Remember how I said that real estate is expensive? That expense doesn’t come completely from purchases. It comes from operations. You need to have utilities for the building. You need to have services for the building. You need to pay taxes on the building. And those things happen all the time. Even if you never have anyone in the building the electricity is still running. That’s one of the reasons why Cisco shuts down their offices between Christmas and New Year’s every year. And the taxes are still due. Hence the reason why the foundation in Richardson was buried.

Real estate is also not an infinite resource. Anyone that’s been to Silicon Valley knows that. They’re running out of room in the South Bay. And building the new 49ers stadium on the corner of Tasman Drive and Great America Parkway didn’t help either. Sports teams are as hungry for real estate as tech companies. The support structures that cropped up for the stadium ended up buying the Letter Buildings from Cisco, which is why the lab was moved from Building C to Building L years ago.

Home Is Where The Work Is

The other shifting demographic is that more workers are remote in today’s environment. A combination of factors have led people to be just as productive from their home office as their open-plan cubicle. Increased collaboration software coupled with changing job requirements means that people don’t have to go to their desk every day to be productive.

This is especially true now that companies like Cisco are putting more of a focus on software instead of hardware. In the good old days of hardware dominance you needed to go into the office to work on your chipset diagrams. You needed your desktop CAD program to draw the silicon traces on a switch. And you needed to visit the assembly lines and warehouses to see that everything was in order.

Today? It’s all code. Everything is written in an IDE and stored on a powerful laptop. You can work from anywhere. A green space outside your office window. A coffee shop. Your living room. The possibilities are endless. But that also means that you don’t need a permanent office desk. And if you don’t need a desk that means your company doesn’t need to pay for you to have one.

Now, instead of bustling buildings full of people working in their shared offices there are acres of empty open-plan cubicle farms lying fallow. People would rather work from Starbucks than go to the office. People would rather work in their pajamas than toil away in a cube. And so companies like Cisco are paying taxes and utilities for open spaces that don’t have anyone while the offices around the perimeter are filled with managers that are leading people that they don’t see.

CCIE Real Estate

But what does this all mean for the lab? Well, Cisco needs to downsize their big buildings in high-value real estate markets. They’re selling off buildings in San Jose as fast as the NFL will buy them. They are downsizing the workforce in RTP as well. The first hint of the CCIE move was David Blair trying to find a new job. As real estate becomes more and more costly to obtain, Cisco is going to need to expand in less expensive markets. The Dallas/Fort Worth (DFW) area is still one of the cheapest in the country.

DFW is also right in the middle of the country. It’s pretty much the same distance from everything. So people that don’t want to schedule a mobile lab can fly to Richardson and take the test there. RTP and San Jose are being transitioned to mobile lab facilities, which means people that live close to those areas can still take the test, just not on the schedule they may like. This allows Cisco to free up the space in those buildings for other purposes and consolidate their workforce down to areas that require less maintenance. They can also sell off unneeded buildings to other companies and take the profits for reinvestment in other places. Cutting costs and making money is what real estate is all about, even if you aren’t a real estate developer.


Tom’s Take

I’m sad to see the labs moving out of RTP and San Jose. Cisco has said they are going to frame the famous Wall of Pain in RTP as a tribute to the lab takers there. I have some fond memories of San Jose as well, but even those memories are from a building that Cisco doesn’t own any longer. The new reality of a software defined Cisco is that there isn’t as much of a need for real estate any more. People want to work remotely and not live in a cube farm. And when people don’t want an office, you don’t need to keep paying for them to have one. Cisco won’t be shutting everything down any time soon, but the CCIE labs are just the first part of a bigger strategy.

Editor’s Note: An earlier version of this post accidentally referred to David Mallory instead of David Blair. This error has been corrected.