About networkingnerd

Tom Hollingsworth, CCIE #29213, is a former network engineer and current organizer for Tech Field Day. Tom has been in the IT industry since 2002, and has been a nerd since he first drew breath.

Building Snowflakes On Purpose

We all know that building snowflake networks is bad, right? If it’s not a repeatable process it’s going to end up being a problem down the road. If we can’t refer back to documentation to shows why we did something we’re going to end up causing issues and reducing reliability. But what happens when a snowflake process is required to fix a bigger problem? It’s a fun story that highlights where process can break down sometimes.

Reloaded

I’ve mentioned before that I spent about six months doing telephone tech support for Gateway computers. This was back in 2003 so Windows XP was the hottest operating system out there. The nature of support means that you’re going to be spending more time working on older things. In my case this was Windows 95 and 98. Windows 98 was a pain but it was easy to work on.

One of the most common processes we had for Windows 98 was a system reload. It was the last line of defense to fix massive issues or remove viruses. It was something that was second nature to any of the technicians on the help desk:

  1. Boot from the Gateway tools CD and use GWSCAN to write zeros to the hard drive.
  2. Reboot from the CD and use FDISK to partition the hard disk.
  3. Format the drive.
  4. Insert the Windows 98 OS CD and copy the CAB installation files to a folder on the hard drive.
  5. Run setup from the hard drive and let it complete.
  6. When Windows comes back up, insert the driver CD and let install all the drivers.

The whole process took two or three phone calls to complete. Any time you got to something that would take more than fifteen minutes to complete you logged the steps in the customer trouble ticket and had them call back when it was completed. The process was so standard that it had its own acronym in the documentation – FFR, which stood for “FDISK, Format, Reload”. If you told someone where you were in the process they could finish it no problem.

Me, Me, ME

The whole process was manual with lots of steps and could intimidate customers. At some point in the development process the Gateway folks came up with a solution for Windows ME that they thought worked better. Instead of the manual steps of copying files and drivers and such, Gateway loaded the OS CD with a copy of the image they wanted for their specific model type. The image was installed using ImageCast, an imaging program that just dropped the image down on the drive without the need to do all the other steps. In theory, it was simple and reduced call times for the help desk.

In practice, Windows ME was a disaster to work on. The ImageCast program worked about half the time. If you didn’t pick the right options in the reload process it would partition the hard drive and add a second clean copy of WinME without removing the first one. It would change the MBR to have two installations to choose from with the same identifiers so users would get confused as to which was which. And the image itself seemed to be missing programs and drivers. The fact that there was a Driver CD that shipped with the system made us all wonder what the real idea behind this “improved” process was.

Because Windows ME was such a nightmare to reload, our call center got creative. We had a process that worked for Windows 98. We had all the files we needed on the disks. Why not do it the “right” way? So we did. Informally, the reload process for Windows ME was the same as Windows 98. We would FFR Windows ME boxes and make sure they looked right when they came back. No crazy ImageCasting programs or broken software loads.

The only issue? It was an informal snowflake process. It worked much better but if someone called the main help desk number and got another call center they would be in the middle of an unsupported process. The other call center tech would simply start the regular process and screw up the work we’d done already. To counter that, we would tell the customer to call our special callback voicemail box and not do anything until we called them back. That meant the reload process took many more hours for an already unhappy customer. The end result was better but could lead to frustrations.

Let It Snow

Was our informal snowflake process good or bad? It’s tough to say. It led to happier customers. It meant the likelihood of future support calls was lower because the system was properly reloaded instead of relying on a broken image. However, it was stressful for the tech that worked on the ticket because they had to own the process the whole way through. It also meant that you had to ensure the customer wouldn’t call back and disrupt the process with someone else on the phone.

The process was broken and it needed to be fixed. However, the way to fix the broken process wasn’t easy to figure out. The national line had their process and they were going to stick to it. We came up with an alternative but it wasn’t going to be adopted. Yet, we still kept using our process as often as possible because we felt we were right.

In your enterprise, you need to understand process. Small companies need processes that are repeatable for the ease of their employees. Large companies need processes to keep things consistent. Many companies that face regulatory oversight need processes followed exactly to ensure compliance. The ability to go rogue and just do it the way you want isn’t always desired.

If you have users that are going around the process you need to find out why. We see it all the time. Security rules that get ignored. Documentation requirements that are given the bare minimum effort. Remember the shutdown dialog box in Windows Server 2003? Most people did a token entry before rebooting. And having eighteen entries of “;lkj;kljl;k” doesn’t help figure out what’s going on.

Get your users to tell you why they need a snowflake process. Especially if there’s more than one person relying on it. The odds are very good that they’ve found hiccups that you need to address. Maybe it’s data entry for old systems. Perhaps it’s problem with the requirements or the order of the steps. Whatever it is you have to understand it and fix it. The snowflake may be a better way to do things. You have to investigate and figure it out otherwise your processes will be forever broken.


Tom’s Take

Inbound tech support is full of stories like these. We find a broken process and we go around it. It’s not always the best solution but it’s the one that works for us. However, building a toolbox full of snowflake processes and solutions that no one else knows about is a path to headaches. You need to model your process around what people need to accomplish and how they do it instead of just assuming they’re going to shoehorn their workflow into your checklist. If you process doesn’t work for your people, you’d better get a handle on the situation before you’re buried in a blizzard of snowflakes.

Tech Field Day Changed My Life

It’s amazing to me that it’s been ten years since I attended by first Tech Field Day event. I remember being excited to be invited to Tech Field Day 5 and then having to rush out of town a day early to beat a blizzard to be able to attend. Given that we just went through another blizzard here I thought the timing was appropriate.

How did attending an industry event change my life? How could something with only a dozen people over a couple of days change the way I looked at my career? I know I’ve mentioned parts of this to people in the past but I feel like it’s important to talk about how each piece of the puzzle built on the rest to get me to where I am today.

Voices Carry

The first thing Tech Field Day did to change my life was to show me that I mattered. I grew up in a very small town and spent most of my formative school years being bored. The Internet didn’t exist in a usable form for me. I devoured information wherever I could find it. And I languished as I realized that I needed more to keep learning at the pace I wanted. When I finally got through college and started working in my career the same thing kept happening. I would learn about a subject and keep devouring that knowledge until I exhausted it. Yet I still wanted more.

Tech Field Day reinforced that my decision to start a blog to share what I was learning was the right one. It wasn’t as much about the learning as it was the explanation. Early on I thought a blog was just about finding some esoteric configuration stanza and writing about it. It wasn’t until later on that I figured out that my analysis and understanding and explanation was more important overall. Even my latest posts about more “soft skill” kinds of ideas are less about the ideas and how I apply them.

Blogging and podcasting are just tools to share the ideas that we have. We all have our own perspectives and people enjoy listening to those. They may not always agree. They may have their own opinions that they want to share. However, the part that is super critical is that everyone is able to share in a place where they can be discussed and analyzed and understood. As long as we all learn and grow from what we share then the process works. It’s when we stop learning and sharing and try to protest that our way is right and the only way that we stop growing.

Tech Field Day gave me the platform to see that my voice mattered and that people listened. Not just read. Not just shared. That they listened and that they wanted to hear more. People started asking me to comment on things outside of my comfort zone. Maybe it was wireless networking. It could have been storage or virtualization or even AI. It encouraged me to learn more and more because who I was and what I said was interesting. The young kid that could never find someone to listen when I wanted to talk about Star Wars or BattleTech or Advanced Dungeons and Dragons was suddenly the adult that everyone wanted to ask questions to. It changed the way I looked at how I shared with people for the better.

Not Just a Member, But the President

The second way Tech Field Day changed my life was when I’d finally had enough of what I was doing. Because of all the things that I had seen in my events from 2011 to 2013, I realized that working as an engineer and operations person for a reseller had a ceiling I was quickly going to hit. The challenges were less fun and more frustrating. I could see technology on the horizon and I didn’t have a path to get to a place to implement it. It felt like watching something cool happening outside in the yard while I was stuck inside washing the dishes.

Thankfully, Stephen Foskett knew what I needed to hear. When I expressed frustration he encouraged me to look around for what I wanted. When I tried to find a different line of work that didn’t understand why I blogged, it crystallized in me that I needed something very different from what I was doing. Changing who I was working for wasn’t enough. I needed something different.

Stephen recognized that and told me he wanted me to come on board without him. No joking that my job offer was “Do you want to be the Dread Pirate Roberts? I think you’d make an excellent Dread Pirate.”. He told me that it was hard work and unlike anything I’d ever done. No more CLI. No more router installations. In place of that would be event planning and video editing and taking briefings from companies all over the place about what they were building. I laughed and told him I was in.

And for the past eight years I’ve been a part of the thing that showed me that my voice mattered. As I learned the ropes to support the events and eventually started running them myself, I also grew as a person in a different way. I stopped by shy and reserved and came out of my shell. When you’re the face of the event you don’t have time to be hiding in the corner. I learned how to talk to people. I also learned how to listen and not just wait for my turn to talk. I figured out how to get people to talk about themselves when they didn’t want to.

Now the person I am is different from the nerdy kid that started a blog over ten years ago. It’s not just that I know more. Or that I’m willing to share it with people. It has now changed into getting info and sharing it. It’s about finding great people and building them up like I was built up. Every time I see someone come to the event for the first time I’m reminded of me all those years ago trying to figure out what I’d gotten myself into. Watching people learn the same things I’ve learned all over again warms my heart and shows me that we can change people for the better by showing them what they’re capable of and that they matter.


Tom’s Take

Tech Field Day isn’t an event of thousands. It’s personal and important to those that attend and participate. It’s not going to stop global warming or save the whales. Instead, it’s about the people that come. It’s about showing them they matter and that they have a voice and that people listen. It’s about helping people grow and become something they may not even realize they’re capable of. I know I sound biased because the pay the bills but even if I didn’t work there right now I would still be thankful for my time as a delegate and for the way that I was able to grow from those early days into a better member of the community. My life was changed when I got on that airplane ten years ago and I couldn’t be happier.

Solutions In Search of a Problem

During a few recent chats with my friends in the industry, I’ve heard a common refrain coming up about technologies or products being offered for sale. Typically these are advanced ideas given form that are then positioned as products for sale in the market. Overwhelmingly the feedback comes down to one phrase:

This is a solution in search of a problem.

We’ve probably said this a number of times about a protocol or a piece of hardware. Something that seems to be built to solve a problem we don’t have and couldn’t conceive of. But why does this seem to happen? And what can we do to fix this kind of mentality?

Forward Looking Failures

If I told you today that I was creating software that would revolutionize the way your autonomous car delivers music to the occupants on their VR headsets you’d probably think I was crazy, right? Every one of the technologies I mentioned in the statement is a future thing that we expect may be big down the road. We love the idea of autonomous vehicles and VR headsets and such.

Now, let’s change the statement. I’m working on a new algorithm for HD-DVD players to produce better color accuracy on plasma TVs that use PowerPC CPUs. Hopefully that statement had you giggling a little no matter what your tech level. What’s the difference? Well, that statement was loaded with technology that no one uses any more. HD-DVD lost a format war against Blu-Ray. Plasma TVs are now supplanted by LCD, LED, and even more advanced things. PowerPC has been replaced with RISC architecture and more modern takes on efficient CPUs in mobile devices.

If you’d have bet on the second combination of things back in the heyday of those technologies you might have made yourself a bit of money. You’d ultimately find yourself without a product to sell now, though. Because technology always changes. Even the dominant form of tech eventually goes away. Blu-Ray may have beat HD-DVD but it couldn’t stop streaming services. LCD replaced plasma but now we’re moving beyond that tech into OLED and even more advanced stuff. You can’t count on tech staying the same.

Which leads to the problem of trying to create solutions for problems that haven’t happened yet or are so far out on the horizon that you may not be able to create a proper solution for it. Maybe VR headsets will have great software that doesn’t need a new music match algorithm. Maybe the passengers in your autonomous vehicle won’t wear VR headsets. Perhaps music as we know it will change and not even be as relevant in the future. There’s no telling which butterfly effects will impact what you’re trying to accomplish.

Solve the Easy Things

Aside from the future problems you hope to be solving with your fancy new product you also have to take into account human behavior. Are people more likely to buy something to solve an issue they don’t currently have? Or are they more apt to buy something to solve a problem they have now? Startups that are looking five years into the future are going to stumble over the problems people have today on their way to the perfect answer to a question no one has asked yet.

I wanted a tablet because it was cool when they first came out. After using one for a few weeks I realized that it was a solution that didn’t address my pressing issues. I didn’t need what it offered at the time. Today a tablet solves many other issues that have come up since then, such as note taking or having quick access to information away from my desk. However, those problems needed to develop over time instead of hoping that my solution would work for something I couldn’t anticipate. I didn’t need a word processor for my tablet because I wouldn’t by typing much with an on-screen keyboard. Today I write a lot on my tablet because of the convenience factor. I also take notes because I have a pencil to write with instead of my fingers.

Solving problems people have right now is a sure fire way to make your customers happy and give you the breathing room to look to the future. How many times have you seen a startup with a great idea that ends up building something mundane because they can’t build the first thing right or they realize the market isn’t quite there yet?

I can remember specifically talking to Guardicore when they were first out of stealth and discussing how their SDN-based offensive security systems worked. It was amazing stuff with very little market. When they looked around and realized they needed to switch it up they went full-on into zero trust security and microsegementation. They took something that could be a great solution later on and pivoted to solving problems that people have right now. The result is a healthy company that makes things people want to buy instead of trying to sell them a solution for a problem they may never have.

If you are looking at the market and thinking to yourself, “I need to build X because it will revolutionize the way we do things” stop and ask yourself how we get there. What steps need to be taken? Who will buy it and when? Are there problems along the way? If the answer to the last question is anything other than “no” you need to focus on those problems first. You may find that you don’t need to build your fancy new vision of perfect future success because you solved all the other problems people needed fixed first. Your development efforts will be rewarded with customers and income instead of the perfect solution no one wants to buy.


Tom’s Take

Solutions without problems to solve are a lot like one-off kitchen gadgets. I may have a use for an avocado slicer twice a year. I also have a knife that does the exact same thing a little slower that I can use for many other problems around my house. I don’t need the perfect avocado slicing solution for the future when I’m making guacamole and avocado toast every day. I need a solution that gets my problems of slicing, chopping, dicing, and cutting done today. Technology is no different. Build what solves problems now and you’ll be a success. Build for the future if and only if you have the disposable time and income to get there.

Friction Finders

Do you have a door that sticks in your house? If it’s made out of wood the odds are good that you do. The kind that doesn’t shut properly or sticks out just a touch too far and doesn’t glide open like it used to. I’ve dealt with these kinds of things for years and Youtube is full of useful tricks to fix them. But all those videos start with the same tip: you have to find the place where the door is rubbing before you can fix it.

Enterprise IT is no different. We have to find the source of friction before we can hope to repair it. Whether it’s friction between people and hardware, users and software, or teams going at each other we have to know what’s causing the commotion before we can repair it. Just like with the sticking door, adding more force without understand the friction points isn’t a long-term solution.

Sticky Wickets

Friction comes from a variety of sources. People don’t understand how to use a device or a program. Perhaps it’s a struggle to understand who is supposed to be in charge of a change control or a provisioning process. It could even be as simple as someone applying outside issues to their regular day and causing problems because their interactions with previously stable systems is stilted.

Whatever the reasons for the friction, we need to understand what’s going on before we can start fixing. If we don’t know what we’re trying to solve we’re just going to throw solutions at the wall until something sticks. Then we hope that was the fix and we move on. Shotgun troubleshooting is never a permanent solution to anything. Instead, we need to take repeatable, documented steps to resolve the points of friction.

Ironically enough, the easiest issue to solve is the interpersonal kind. When teams argue about roles or permissions or even who should have the desks at the front of the data center it’s almost always a problem of a person against another person. You can’t patch people. You can’t upgrade people. You can’t even remove people and replace them with a new model (unless it’s a much bigger issue and HR needs to solve it). Instead, it’s a chance to get people talking. Be productive and make sure that everyone knows the outcome is the resolution of the problem. It’s not name calling or posturing. Lay out what the friction point is and make people talk about that. If you can keep them focused on the problem at hand and not at each others’ throats you should be able to get everyone to recognize what needs to change.

Mankind Versus Machines

People fighting people is a people problem. But people against the enterprise system isn’t always a cut-and-dried situation. That’s because machines are predictable in so many different kinds of ways. You can be sure that what you’re going to get is the right answer every time but you may not be able to ask the right questions to find that answer the way you want to.

Think back to how many times you’ve diagnosed a problem only to hit a wall. Maybe it’s that the CPU utilization on a device is higher than it should be. What next? Is it some software application? A bug in the system? Maybe a piece of malware running rampant? If it’s a networking device is it because of a packet flow causing issues? Or a failed table lookup causing a process switch to happen? There are a multitude of things that need to be investigated before you can decide on a course of action.

This is why shotgun troubleshooting is so detrimental to reducing friction. How do we know that the solution we tried isn’t making the problem worse? I wrote over ten years ago about removing fixes that don’t address the problem and it’s still very true today. If you don’t back out the things that didn’t address the issue you’re not only leaving bad fixes in place but you are potentially causing problems down the line when those changes impact other things.

Finding the sources of friction in your systems takes in-depth troubleshooting. Your users just want the problem to go away. They don’t want to answer forty questions about when it started or what’s been installed. They don’t want the perfect solution that ensures the problem never comes back. They just want to be able to work right now and then keep moving forward. That means you need a two-step process of triage and investigation. Get the problem under control and then investigate the friction points. Figure out how it happened and fix it after you get the users back up and running.

Lastly, document the issue and what resolved it. Write it all down somewhere, even if it’s just in your own notes. But if you do that, make sure you have a way of indexing everything so you can refer back to it at some point in the future. Part of the reason why I started this blog was to write down solutions to problems I discovered or changed along the way to ensure that I could always look them up. If you can’t publish them on the Internet or don’t feel comfortable writing it all up, at least use a personal database system like Notion to keep it all in one searchable place. That way you don’t forget how clever you are and go back to reinventing the wheel over and over again every time the problem comes up.


Tom’s Take

Friction exists everywhere. Sometimes it’s necessary to keep us from sliding all over the road or keeping a rug in place in the living room. In enterprise IT friction can create issues with system and teams. Reducing it as much as possible keeps the teams working together and being as productive as possible. You can’t eliminate it completely and you shouldn’t remove it just for the sake of getting rid of something. Instead, analyze what you need to improve or fix and document how you did it. Like any door repair, the results should be immediate and satisfying.

The Double-Edged Grindstone

Are you doing okay out there? I hope that you’re well and not running yourself thin with all the craziness still going on. Sometimes it seems like we can’t catch a break and that work and everything keep us going all the time. In fact, that specific feeling and the resulting drive around it is what I wanted to talk about today.

People have drive. We want to be better. We want to learn and grow and change. Whether it’s getting a faster time running a 5K or learning new skills to help our career along. Humans can do amazing things given the right motivation and resource availability. I know because I taught myself a semester of macroeconomics in a Waffle House the night before the final exam. Sure, I was groggy and crashed for a 10-hour nap after the final but I did pass!

It’s that kind of ability to push ourselves past our limits that both defines us and threatens to destroy us. I’m a huge fan of reading and fiction. Growing up I latched on to the Battletech novels, especially those written by Michael A. Stackpole. In his book Lost Destiny there is a great discussion about honing your skills and what it may end up costing you:

Were we to proceed, it would be the battle of the knife against the grindstone. Yes, we would get sharper, we would win great victories, but in the end, we would be ground away to nothing.

 

Running in Place

That quote hit home for me during my CCIE lab attempt training. I spent my free time after work labbing everything I could get my hands on. I would log on about 8pm every night after my eldest went to sleep and lab until midnight. Every night was the same chore. No time for television or reading anything other than a Cisco Press book. Instead, I drilled until I could provision EtherChannels in my sleep and could redistribute routes without a second thought in four different ways. I felt like I was getting so much accomplished!

I also felt tired all the time. I had no outlet to relax. I spent every waking minute of my free time focused on sharpening my skills. As above, you could practically hear the edge of the knife on the grindstone. I was razor-focused on completing this task. As I learned later in life, the joys of hyper focus in ADHD had a lot to do with that.

It wasn’t until I got through my lab that the true measure of this honing process hit home. I spent the weekend after my attempt hanging out with some friends away from technology and the whole time I felt unsettled. I watched horrible movies on the Sci-Fi channel and couldn’t get comfortable. I was antsy and wound up, even though I’d just completed a huge milestone! It wasn’t until about a week later that I was finally able to put a name to this anxiety. I listened to the little voice in my head repeating over and over again in my downtime: “You really should be studying something right now.”

I was in search of my next grindstone. My knife was sharp, but I knew it could be sharper still with the next certification or piece of knowledge. It took me a while before I could quiet that voice and focus on restoring some semblance of my life as I had known it before my lab attempts. Even now there are times when I feel like I should be studying or writing or creating something instead of unwinding with a book or a camping trip to the woods.

The rush of dopamine that we get from learning new things or performing skills to perfection cannot be understated. It makes us feel good. It makes us want to keep doing it to continue that stream of good feelings. We could focus on it to the detriment of our other hobbies or our social lives. And that was in the time before when we could go out whenever we wanted! In the current state of the pandemic it’s easy to get wrapped up in something without the ability to force yourself to walk away from it, as anyone with a half-filled room full of a new hobby can tell you.

Setting Goals with Limits

What’s the solution then? Do we just keep grinding away until there’s nothing left? Do we become the best left-handed underwater basket weaver that has ever existed? Do we keep forcing ourselves to run the same video game level over and over again until we’re perfect, even if that means not getting out of our house for weeks at a time? Can we do something repeatedly until we are destroyed by it?

The key is to set goals but to make them in such a way as to set limits on them. We do this all the time in the opposite direction. When we have a task we don’t like to do we force ourselves to do it for a set amount of time. Maybe it’s an hour of reconciling the checkbook or thirty minutes of exercise. But if it’s something you enjoy you have to set limits on it as well to avoid that burnout.

Find something you really like to do, such as reading a book. However, instead of devouring that book until it’s finished in one sitting and staying up until 4am to finish it, set an alarm to cut yourself off after an hour or two. Be honest with yourself. When the alarm goes off, stop reading and do something else. You could even pair it with a task you like less to use the little dopamine boost to help you through your other activity.

It’s not fun to stop doing something we like doing even when it’s something that is going to make us a better person or better employee. However, you’ll soon see that having that extra time to reinforce what you’ve learned or collect your thoughts does more for you that just binging something until you’re completely exhausted by it.

Remember when TV shows came out weekly? We had to wait until next Thursday for the new episode? Remember how much you looked forward to that day? Or maybe even the return of the new season in the fall? That kind of anticipation helps motivate you. Being able to consume all the content in one sitting for 8-10 hours leaves you feeling great at first. Later, when your dopamine goes back to normal you’re going to feel down. You’ll also realize you can’t get the same hit again because you have consumed your current resource of it. By limiting what you can do at one time you’re going to find that you can keep that great feeling going without burning yourself out.

Yes, this does absolutely apply to studying for things. Your brain needs time to lock in the knowledge. If you’ve ever tried to memorize something you know that you need to spend time thinking of something else and come back to that item before you really learn it. The only thing that transfers knowledge from short-term memory to long-term memory is time. There are no shortcuts. And the more you press the edge of the knife agains the stone the more you lose in the long run because the available resources are just gone.


Tom’s Take

I’m not qualified to delve into the psychoanalysis part of all this stuff. I just know how it works for me because that’s who I am. I can shut out the world and plow through something for hours at a time if I want. I’ve done it many times in my career with both work and personal tasks. But it’s taken a long time for me to finally realize that sprinting like that for extended periods of time will eventually wear you away to nothing. Even the best runners in the world need to rest. Even the smartest people in the industry need to not think about things for a while. You do too. Take some time today or tomorrow or even next week to set goals and limits for yourself. You’ll find that you enjoy the things you do and learn more with those limits in place and you’ll wind up a happier, healthier person. You’ll be sharp and ready instead of a pile of dust under the grindstone.

Planning For The Worst Case You Can’t Think Of

Remember that Slack outage earlier this month? The one that happened when we all got back from vacation and tried to jump on to share cat memes and emojis? We all chalked it up to gremlins and went on going through our pile of email until it came back up. The post-mortem came out yesterday and there were two things that were interesting to me. Both of them have implications on reliability planning and how we handle the worst-case scenarios we come up with.

It’s Out of Our Hands

The first thing that came up in the report was that the specific cause for the outage came from an AWS Transit Gateway not being able to scale fast enough to handle the demand spike that came when we all went back to work on the morning of January 4th. What, the cloud can’t scale?

The cloud is practically limitless when it comes to resources. We can create instances with massive CPU resources or storage allocations or even networking pipelines. However, we can’t create them instantly. No matter how much we need it takes time to do the basic provisioning to get it up and running. It’s the old story of eating an elephant. We do it one bite at a time. Normally we tell the story to talk about breaking a task down into smaller parts. In this case, it’s a reminder that even the biggest thing out there has to be dealt with in small pieces as well.

Slack learned this lesson the hard way. Why? Because they couldn’t foresee a time when their service was so popular that the amount of traffic rushing to their servers crushed the transit gateways. Other companies have had to learn this lesson the hard way too. Disney crawled on launch day because of demand. The release of a new game that requires downloading and patching on Day One also puts stress on servers. Even the lines outside of department stores on Black Friday (in less pandemic-driven years) are examples of what happens when capacity planning doesn’t meet demand.

When you plan for your worst case scenario, you have to think the unthinkable. Instead of asking yourself what might happen if everyone logs on at the same time you also need to ask when happens if they try and something goes wrong. I spent a lot of time in my former job thinking about simple little exercises like VDI boot storms, where office workers can push a storage system to the breaking point by simply turning all their machines on at the same time. It’s the equivalent of being on a shared network resource like a cable modem during the Super Bowl. There aren’t any resources available for you to use.

When we plan for capacity, we have to realize that even our most optimistic projections of usage are going to be conservative if we take off or go viral. Rather than guessing what that might be, take an hour every six months and readjust your projections. See how fast you’re growing. Plan for that crazy scenario where everyone decides to log on at the same time on a day where no one has had their coffee yet. And be ready for what happens when someone throws a wrench into the middle of the process.

What Happens When It Goes Wrong?

The second thing that came up in the Slack post-mortem that is just as worrisome was the behavior of the application when it realized there was a connection timeout. The app started waiting for the pathway to be open again. And guess what happened when AWS was able to scale the transit gateway? Slack clients started hammering the servers with connection requests. The result was something akin to a Distributed Denial-of-Service (DDoS) attack.

Why would the Slack client do this? I think it has something to do with the way the developers coded it and didn’t anticipate every client trying to reconnect all at once. It’s not entirely unsound thinking to be honest. How could we live in a world where every Slack user would be disconnected all at once? Yet, we do and it did and look what happened?

Ethernet figured this part out a long time ago. The CSMA/CD method for detecting collisions on a layer 2 connection has an ingenious solution for what happens when a collision is detected. Once it realizes that there was a problem on the wire it stops what is going on and calculates a random backoff timer based on the number of detected collisions. Once that timer has expired it attempts to transmit again. Because there has to be another station involved in a collision incident both stations do this. The random element of the timer calculation ensures that the likelihood of both stations choosing to transmit again at the same time is very, very low.

If Ethernet behaved like the Slack client did we would never resolve collisions. If every station on a layer 2 network immediately tried to retransmit without a backoff timer the bus would be jammed constantly. The architects of the protocol figured out that every station needs a cool off period to clear the wire before trying again. And it needs to be different for every station so there is no overlap.

Slack really needs to take this idea into account. Rather than pouncing on a connection as soon as it’s available there needs to be a backoff timer that prevents the servers from being swamped. Even a few hundred milliseconds per client could have prevented this outage from getting as big as it did. Slack didn’t plan beyond the worst case scenario because they never conceived of their worst case scenario coming to pass. How could it get worse than something we couldn’t imagine happening?


Tom’s Take

If you design systems or call yourself a reliability engineer, you need to develop a hobby of coming up with disastrous scenarios. Think of the worst possible way for something to fail. Now, imagine it getting worse. Assume that nothing will work properly when there is a recovery attempt. Plan for things to be so bad that you’re in a room on fire trying to put everything out while you’re also on fire. It sounds very dramatic but that’s how bad it can get. If you’re ready for that then nothing will surprise you. You also need to make sure you’re going back and thinking of new things all the time. You never know which piece is going to fail and how it will impact what you’re working on. But thinking through it sometimes gives you an idea of where to go when it all goes wrong.

Managing Leaders, Or Why Pat Gelsinger Is Awesome

In case you missed it, Intel CEO Bob Swan is stepping down from his role effective February 15 and will be replaced by current VMware CEO Pat Gelsinger. Gelsinger was the former CTO at Intel for a number of years before leaving to run EMC and VMware. His return is a bright spot in an otherwise dismal past few months for the chip giant.

Why is Gelsinger’s return such a cause for celebration? The analysts that have been interviewed say that Intel has been in need of a technical leader for a while now. Swan came from the office of the CFO to run Intel on an interim basis after the resignation of Brian Krzanich. The past year has been a rough one for Intel, with delays in their new smaller chip manufacturing process and competition heating up from long-time rival AMD but also from new threats like ARM being potentially sold to NVIDIA. It’s a challenging course for any company captain to sail. However, I think one key thing makes is nigh impossible for Swan.

Management Mentality

Swan is a manager. That’s not meant as a slight inasmuch as an accurate label. Managers are people that have things and look after them. Swan came from the financial side of the house where you have piles of resources and you do your best to account for them and justify their use. It’s Management 101. Managers make good CEOs for a variety of companies. They make sure that the moves are small and logical and will pay off in the future for the investors and eventually the workers as well. They are stewards first and foremost. When their background comes from something with inherent risk they are especially stewardly.

You know who else was a manager? John Sculley, the man who replaced Steve Jobs at Apple back in 1983. Sculley was seen as a moderating force to Jobs’ driving vision and sometimes reckless decision making skills. Sculley piloted the ship into calm waters at first but was ultimately sent packing because his decisions were starting to make less and less sense, such as exploring options to split Apple into separate companies and taking on IBM head-to-head on their turf.

Sculley was ousted and Jobs returned to Apple in 1993. It wasn’t easy at first but eventually the style of Jobs started producing results. Things like the iPod, iMac, and eventually the iPhone came from his vision. He’s a leader in that regard. Leaders are the ones that jump out and take risks to make big results. Leaders are people like John Kennedy that give a vision of going to the moon in a decade without the faintest idea how that might happen. Leadership is what drives companies.

Leaders, however, are a liability without managers. Leaders say “let’s go to the moon!” Managers sit down and figure out how to make that happen without breaking the budgets or losing too many people along the way. Managers are the grounded voices that guide leaders. Without someone telling a leader of the challenges to overcome they won’t see the roadblocks until the drive right into them.

Leaders without brakes on their vision have no reality to shape it. Every iMac has an Apple Lisa. Every iPod has the iPod Hi-Fi. Even the iPhone wasn’t the iPhone until the App Store came around against the original vision of Apple’s driving force. To put it another way, George Lucas is a visionary leader in filmmaking. However, when he was turned loose without management of his process we ended up with the messy prequel trilogy. Why was Empire Strikes Back such a good film? Because it had people like Lawrence Kasdan involved managing the process of Lucas creating art. They helped focus the drive of a leader and make the result something great.

Tech Leadership

Let’s bring this discussion back to Intel and Pat Gelsinger. I know he is the best person to lead Intel right now. I know that because Gelsinger is very much a tech leader. He has visions for how things need to be and he can see how to get there. He knows that reducing costs and reaving product lines at Intel isn’t going to make them a better company down the road no matter what the activist investors have to say on the matter. They may have wanted regime change when they petitioned the board back in December, but they may find the new king a bit harder to deal with.

Gelsinger is also a manager. Going from CTO to being COO at EMC and eventually CEO at VMware has tempered his technical chops. You can’t hope to run a company on crazy ideas and risky bets. Steve Jobs had people like Tim Cook in the background keeping him as grounded in reality as possible. Gelsinger picked up these skills in helming VMware and I think that’s going to pay off for him at Intel. Rather than running out to buy another company to augment capabilities that will never see the light of day, someone like him can see the direction that Intel needs to go and make it happen in a collected manner. No more FPGA acquisitions that never bear fruit. No more embarrassing sales of the mobile chip division because no one could capitalize on it.

Pat Gelsinger is the best kind of technical manager. I saw it in the one conversation I was involved in with him during an event. He stepped in to a talk between myself and a couple of analysts. He listened to them and to me and when he was asked for his opinion, he stopped for a moment to think. He asked a question to clarify and then gave his answer. That’s a tempered leader approach to things. He listened. He thought. He clarified. And then he made a decision. That means there is steel behind the fire. That means the driving factors of the decision-making process aren’t just “cool stuff” or “save as much money as we can”. What will happen is the fusion of the two that the company needs to stay relevant in a world that seems bent on passing it by.


Tom’s Take

I’ve worked for managers and I’ve worked for leaders. I don’t have a preference for one or the other. I’ve seen leaders sell half their assets to save their company. I’ve also seen them buy ridiculous stuff in an effort to build something that no one would buy. I’ve seen managers keep things calm in the middle of a chaotic mess. I’ve also seen them so wracked with indecision that the opportunities they needed to capitalize on sailed off into the sunset. If you want to be the best person to run a company as the CEO, whether it’s a hundred people or a hundred thousand, you should look to someone like Pat Gelsinger. He’s the best combination of a manager and leader that I’ve seen in a long time. In five years we will be talking about how he was the one to bring Intel back to the top of the mountain, both through his leadership and his management skills.

Building Backdoors and Fixing Malfeasance

You might have seen the recent news this week that there is an exploitable backdoor in Zyxel hardware that has been discovered and is being exploited. The backdoor admin account with the clever name ‘zyfwp’ is not something that has been present in the devices forever. The account was put in during firmware version 4.60, which was released in Q4 2020.

Zyxel is rushing to patch the devices and remove the backdoor account. Users are being advised to disable remote administration until the accounts can be deactivated and proven to be removed. However, the bigger question in my mind relates to the addition of the user account in the first place. Why would you knowingly install a backdoor?

Hello, Joshua

Backdoors are nothing new in the computer world. I’d argue the most famous backdoor account in the history of computer hacking belongs to Joshua, the dormant login for the War Operations Programmed Response (WOPR) computer system in the 1983 movie Wargames. Joshua was an old login for the creator to access the system outside of the military chain of command. When the developer was removed from the project the account was forgotten about until a kid discovered it and kicked off the plot of the movie.

Joshua tells us a lot about developers and their desire to have access to the system. I’ll admit I’ve been in the same boat before. I’ve created my own logins to systems with elevated access to get tasks accomplished. I’ve also notified the users and administrators of those systems about my account and let them deal with it as needed. Most were okay with it being there. Some were hesitant and required it to be disabled after my work was done. Either way, I was up front about what was going on.

Joshua and zyfwp are examples of what happens when those systems are installed outside of the knowledge of the operators. What would have happened if the team in the Netherlands hand’t found the account? What if Zyxel devices were getting hacked and networks breached without anyone knowing the vector? I’m sure the account showed up in all the admin dashboards, right?

Easter Egg Hunts

Do you remember the Windows 3.1 Bear? It was a hidden reference in the credits to the development team’s mascot. You had to jump through a hoop to find it by holding down a keystroke combination and clicking a specific square in the Windows logo. People loved finding those little nuggets in the software all the way up to Windows 98.

What changed? Turns out, as part of Microsoft’s Trustworth Computing Initiative in 2002 they removed all undocumented features and code that could cause these kinds of things. It also might have had something to do with the antitrust investigations into Microsoft in the 1990s and how undocumented features in Windows and Office might have given the company a competitive advantage. Whatever the reason, Microsoft has committed to removing undocumented code.

Easter eggs are fun to find but represent the bright side of the dark issue above. What happens when the easter egg in question isn’t a credit roll but an undocumented account? What if the keystroke doesn’t bring up a teddy bear but instead gives the current user account full admin access? You scoff at the possibility but there’s nothing stopping a developer from making that happen.

These issues are part of the reason why all code and features need to be documented. We need to know what’s going on in the program and how it could impact us. This means no backdoors. If there is a way to access the system aside from the controls built in already it needs to be known and be able to be disabled if necessary. If it can’t be disabled then the users need to be aware of that fact and make the choice to not use the software because of security issues.

If you’re following along closely, you should have picked up on the fact that this same logic applies to backdoors that have been mandated by the government too. The current slate of US Senators seem to believe that we need to create keys that allow end-to-end encryption to be weakened and readable by law enforcement. However, as stated by companies like Apple for years, if you create a key for a lock that should only ever be opened under special circumstances you have still created a weakness that can be unlocked. We’ve seen the tools used by intelligence agencies stolen and used to create malware unlike anything we’ve ever seen before. What do you think might happen if they get the backdoor keys to go through encrypted messaging systems?


Tom’s Take

I don’t run Zyxel equipment in my home or anywhere I used to work. But if I did there would be a pile of it in the dumpster after this mess. Having a backdoor is one thing. Purposely making one is another. And having that backdoor discovered and exploited by the Internet is an entirely differently conversation. The only way to be sure that you’ve fixed your backdoor problem is to not have one in the first place. Joshua and zyfwp are what we need to get away from, not what we need to work toward. Malfeasance only stops when you don’t do it in the first place.

Winning in 2021

I’d jump in here and say something about 2020 being a crazy year but we all know it’s nothing we haven’t heard before. I’d also say that we’re going to look back at my big plans for the year however we also know that those got scrapped right after the end of February. I like looking back at a couple of things and then looking forward to what the next year will accomplish. Why? Because retrospectives are boring and putting your planning out there for the world to see is a much more interesting use of your time. The journey you’re taking changes greatly when you change your thinking about the destination.

2020 Good or Bad

2020 wasn’t all bad. I finally justified getting a new office chair! All kidding aside, 2020 was a year that challenged everyone greatly when it came to mental health, professional output, and even personal capability. My biggest focus for 2020 was to start putting blog posts out earlier in the week and focus on continuous improvement. I’d say the first was another miss due to the hectic workload, as a lot of my posts still came out on Fridays.

The second point was a bit more successful. I’ve been more diligent about getting stuff down and in a state when it can be improved. I’ve also added a lot of things to my repertoire over the year that I’m proud of. Here are some specifics:

  • Tomversations: I started a video series this year! I wanted to start coming up with monthly videos around topics that worked better as explorations instead of just simply spouting randomness. We put twelve episodes up last year starting around April. I was very happy with the way they turned out, especially toward the end when my process improved. Video is a great medium for some of the conversations I want to have.
  • The Rundown: Okay, this is a bit of stretch since I’ve been co-hosting the Rundown since it started. But this year my friend Rich Stroffolino headed off to future endeavors and I took over the production part on the back end. It’s been interesting skimming the news and putting it together each week to try and keep the sparkly magic going. It also means I’m much closer to the details behind the tech now.
  • Cooking: This was my big pandemic skill level up. My cooking skill has always been just shy of adequate. This year I pushed myself to get better about learning technique and saving recipes so I have something to pull from when I make food. The tastes have gotten way better and I feel more confident. I’d say the family is happier too since we have something other than Kraft Mac and Cheese all the time.
  • Running: This was my other pandemic level up. I fell off the exercise wagon at the end of 2019 and it showed. I was heavier than I had ever been. I wasn’t thrilled at the idea of getting back in shape either. Once the pandemic set in and I knew I wasn’t going to be on the road for the foreseeable future I jumped back on the road to running. Since June 1, I have run or walked over 900 miles and lost almost 50 pounds. I feel better and I look forward to lacing up my shoes and running every morning.

2021 More Time

That’s where my energy went in 2020. Video and research and cooking meals to eat after I ran. What am I thinking about for 2021?

  • Bullet Journaling: This is an idea I got from my partner in crime Ben Gage. I need a better system for capturing info and logging tasks. I say this every year. And every year I find a way to fail at it somehow. This year I’m going with the less-structured approach. I’m keeping the journal digital in GoodNotes and using these templates from Robert Terekedis (@robterakedis) that I found in a search. I like the hint of organization with the freedom to do more when I want it done. Let’s hope this sticks!
  • More Video Content: Like I don’t spend enough time on camera? I’m going to explore the idea of doing more video content. I’m not going to do a daily log or anything but I’m going to try and figure out if creating more around some of my ideas but putting it on video will help me solidify it a little. I’ve found through Tomversations that my ability to riff on subjects and think through stuff when I’m staring at a camera lens feels much different than facing a computer keyboard. It’s not better or worse. It’s different and I’m curious about where that will lead.
  • Create Content that Resonates: My blog is ten years old now. There are posts from 2011 that don’t apply to anything any longer. Some of the posts that I’ve been putting out recently aren’t as technical and look more at work skills, soft skills, or even just life skills. Many of you have commented that my ideas around time management or organization are things you wanted or needed to hear. I’m going to explore those ideas a bit this year too. Don’t worry – The Networking with a Side of Snark isn’t going away any time soon. And I’m not going to turn into a productivity blogger overnight. Mostly because I don’t have enough productivity to make that happen! But I want people to enjoy reading my content for what it can help them with in the next twelve months of working with the challenges we will face.

Tom’s Take

2020 was a sucky year in general. Too much stress, too much uncertainty, and for those that tend to overanalyze everything it was a year of way too much introspection and questioning. I’m looking forward to the next 52 weeks to sort out what needs to be done and get it finished. I set good habits in 2020 that I want to carry forward. I’m going to keep improving just like last year and use the tools I can to make those changes a part of what I need to do to ensure that 2021 is filled with more winning than anything else. I may not be on a plane at all this year. However, I can win all I can from my house and help you all along with the way too. Let’s enjoy the coming 525,600 minutes and do something that makes us feel like winners.

Making Time For Yourself

I was a recent pop-in guest on the Network Collective Holiday Show with my friends Jordan Martin and Tony Efantis. One of the questions they had been asking their guests was about the big lessons we’ve learned this year. As I thought back on the roller coaster ride that was 2020, I realized that one of the biggest lessons that I’ve learned is that I need to make time for the important things for myself.

Mark It Down

I know it sounds like a given, but we all need to make time for ourselves. I realized that when my usual schedule of running myself in overdrive and jumping from one event or travel opportunity to the next evaporated back in March. I found myself sitting at home and working toward some uncertain future. I never thought that there were going to be huge problems but I also didn’t know how things would end up turning out.

As the days grew into weeks and eventually into months, I quickly figured out that the normal I once knew was going to stay gone for quite a while. In place of that was a situation that I needed to adjust to. And that was going to to take some time. I needed to catch my breath but I also needed to build a skill set that would allow me to continue forward.

Over April and May I got better at cooking. I retaught myself the basics of making all kinds of meals. I gained the confidence of trying new things. It helped me find a bit of stability. It happened because I started doing research and setting aside time every day to practice those skills. Maybe it was something small like making tacos. Or even putting something into a slow cooker. But it was time that I needed to take to do something that I needed to learn.

The second big lesson in taking time for myself came in June. With the move of Cisco Live to a digital event an the likelihood that everything else for the rest of the year was going to go the same way, I took the opportunity to get back into better, healthier shape. I had a hard time exercising on the road with the hotel gym being something that I didn’t appreciate. I started getting up earlier and going for walks and then for short runs. Then I upped my running and my walking distances. I made sure to lace up running shoes every day no matter what. No excuses whether it was raining or blisteringly hot outside.

Taking the time to get into better shape has had a huge impact on my self worth and my health. I’ve dropped 50 pounds since March and my running times keep coming down. My pants size went down significantly and the pictures of myself that I’m taking now barely resemble pre-COVID me. All because I took the time for myself.

Make It Happen

There’s no magic in what I did. There was no special system or secrecy code to get me to where I am right now. The only trick was making the time for myself. It’s like the financial books you can buy that give you tips to put into practice to get rich. One of the first is “pay yourself”. It’s contrite but proves the point that you need to give yourself resources to work with or you’ll never get ahead.

Time is as valuable as resource as anything we have. We can’t save time and use it later. We can’t manufacture time. We can only use the time we have to the best of our abilities. Sometimes that means putting something we want to do on hold because of something we have to do. As someone that prides myself on writing lots of blog posts it meant that I had to put that particular part of my productivity behind more immediate things like getting my morning run in. It meant getting the Gestalt IT Rundown story script done before I could play a game or watch a TV show.

Time is what we make of it. I’ve started to realize that by blocking more and more of my time to do things. Maybe I put down on my calendar that Tuesday evening is a day to draw or practice a new cooking skill. Thursday morning could be my long run of the week and my day to research topics for my Tomversations videos. Whatever it is, I make it stick. I don’t need to schedule my exercise in the mornings because it’s become a habit for me. But I do need to schedule the other things to make sure they’re done. You don’t need to have a mark on every minute of your day to be productive, but you do need to make sure you make time for the pieces that are important.

That means making time for non-work things. It’s easy to fill up our calendar with things for work. It’s harder still to fill up the calendar with non-work tasks and skills. Schedule a hike on a Saturday morning. Make Monday night your night to work on a craft that you want to do like learning leather working or knife making. Maybe you just want to say that Wednesday at lunchtime is the place where you’re going to schedule time to read a few more pages of your new favorite book. There are all things that as valid as the next staff meeting or briefing that you have to do. Because they enrich you and help you become a better person.

More importantly, by scheduling these things in your calendar on a personal level you remind yourself not to let work get out of hand. When you live in the office, which is the same as working from home now, you will find yourself working at midnight some nights because you just couldn’t put the work down. By reminding yourself of what’s important you draw a bright line between work and personal and ensure that you have time to put in effort on both without getting overwhelmed.


Tom’s Take

There’s a bit of irony in me saying that you need to make time for the things that are important to you while I write this post after midnight on Christmas. The fact is that I made time earlier today for my family to open some gifts and help bake cookies. I went for a walk and watched some educational videos. Sure, I eventually found the time to write for myself but it came after I had taken care of other things. My journey through 2020 has taught me that time is the kind of resource you need to pay yourself with as often as you can. But you can’t just mark off your calendar and hope that something magical happens. You need to make the effort to use your time wisely and work on yourself. Every new skill you learn or pound you lose is making you a better, more well-rounded person. And that’s the kind of payoff that you can only get from investing time in yourself.