The Blogging Mirror

Writing isn’t always the easiest thing in the world to do. Coming up with topics is hard, but so too is making those topics into a blog post. I find myself getting briefings on a variety of subjects all the time, especially when it comes to networking. But translating those briefings into blog posts isn’t always straight forward. When I find myself stuck and ready to throw in the towel I find it easy to think about things backwards.

A World Of Pure Imagination

When people plan blog posts, they often think about things in a top-down manner. They come up with a catchy title, then an amusing anecdote to open the post. Then they hit the main idea, find a couple of supporting arguments, and then finally they write a conclusion that ties it all together. Sound like a winning formula?

Except when it isn’t. How about when the title doesn’t reflect the content of the post? Or the anecdote or lead in doesn’t quite fit with the overall tone? How about when the blog starts meandering away from the main idea halfway through with a totally separate argument? Or when the conclusion is actually the place where the lede is buried like the Ark of the Covenant?

All of these things are artifacts of the creative process. We often brainstorm great ideas halfway through the process and it derails our train of thought. That leads us down tangents we never intended to go down and create posts that aren’t thematic or even readable in some cases.

It happens all the time. In fact, even in writing this post I thought of a catchy title for a subject heading and had to move it when I was done because the heading didn’t fit the content of the section that followed. It’s okay to have the freedom to change that as soon as you see it. Provided you have a plan for the rest of the post. And that’s where the key here comes into play.

Strike That, Reverse It

I find the easiest way to plan a blog post is to actually write it in reverse. Instead of thinking about things from a top-down method, I start off by thinking about thinks bottom up. Literally.

  • Start From The End – It’s easiest to write the conclusion of your post first. After all, you’re just restating what you’ve been arguing or demonstrating in the post, right? So start with that. Use it as the main idea of your writing. Always refer back to it. If what you’ve typed doesn’t fit the tone of the conclusion, you either need to support it or cut it.
  • Support Your Conclusion – Now that you know what you’re going to be talking about, figure out how to support it. that means figuring out how to break your argument in to paragraphs and logical sections. Note that even though you’re trying to optimize for reading on screens today, you still need to follow basic structure. Paragraphs have multiple sentences that support the main idea. One you have two or three of those arguments, you’ve got support for your conclusion.
  • State The Topic – After you build your support for your conclusion then you can write the topic. After all, you just spent a lot of time spelling it all out. This paragraph at the top is where you state the purpose or theme of the post. Don’t worry about getting into too much detail here. That’s what the support is for. Your readers will get the idea by the time they get to the conclusion, which serves to wrap it all together.
  • Build Your Anecdote – If you are the type of writer that likes to open with an anecdote, much like a cold open in a drama, this is where you write it. Now that you’ve basically outlined the whole post you can tie your anecdote into the rest of the narrative. You don’t have to worry about building your discussion to support the really cool story. Because you’re adding the story at the end of the creative process you can guarantee that it’s going to fit.
  • Title Card – Now that you’ve written the post you can title it. This keeps you from making a title that doesn’t fit the narrative. It also allows the title to make a bit more sense in context. Either because you called the post something cute and catchy or because you made the most SEO optimized title in history to reap those sweet, sweet Google searches.

Tom’s Take

As you can see, posts are easier to write in reverse. When you think about things the opposite way from the restrictive methods of writing you’re much more free to express your creativity while also keeping yourself on track to make sure everything makes sense. Some people thrive in the realm of structure and can easily crank out a post from the top down. But when you find yourself stuck because you can’t tie everything together the right way try looking in a blogging mirror. The results will end up the same, but backwards might just be the way forward.

Advertisements

Silo 2: On-Premise with DevOps

I had a great time stirring up the hornet’s nest with the last post about DevOps, so I figured that I’d write another one with some updated ideas and clarifications. And maybe kick the nest a little harder this time.

Grounding the Rules

First, we need to start out with a couple of clarifications. I stated that the mantra of DevOps was “Move Fast, Break Things.” As has been rightly pointed out, this was a quote from Mark Zuckerberg about Facebook. However, as has been pointed out by quite a few people, “The use of basic principles to enable business requirements to get to production deployments with appropriate coordination among all business players, including line of business, developers, classic operations, security, networking, storage and other functional groups involved in service delivery” is a bit more of definition than motto.

What exactly is DevOps then? Well, as I have been educated, it’s a principle. It’s an idea. A premise, if you will. An ideal to strive for. So, to say that someone is on a DevOps team is wrong. There is no such thing as a classic DevOps team. DevOps is instead something that many other teams do in addition to their other jobs.

That being said, go ask someone what their job is in an organization. I’m willing to be that a lot of people will tell you their on the “DevOps Team”. I know this because some did a report, which I wrote about here and it includes responses from the “DevOps” team. Which, according to the classic definition, is wrong. Right?

Well, almost. See, this is where this tweet of mine comes into play:

“Pure” DevOps is hard to manage. It involves organizational shifts. It pisses people off because it’s hard to track metrics. You can’t track a person that does some traditional stuff and some of that new Dev-Op stuff. Where does that part of their job end up on a report? Putting someone in a team or a silo is almost as much for the purposes of managing that person as it is for them to do their job. If I put you in a silo, I know what you do. Or, at the very least, I can assign you tasks and responsibilities that you should be doing and grade you on those. If your “silo” is a principle and not a team, it’s crazy to grade the effectiveness of how you integrated with the developers to deliver services effectively. It can be tracked, but not as easily as a checkbox.

Likewise, people fear change. So, instead of putting their people into roles that cross functional barriers and reorganize the workflows, they instead just take the young people that are talking about the “new way” of doing things and put them in a team together. They slap a DevOps on the door and it’s done. We do DevOps now. Or, worse yet, they take the old infrastructure teams, move a few people off of them into a new team, and tell them to figure out what to do while they’re repainting the team name on the door. This has rightly been called “DevOps Washing” but a lot of people.

But what happens when that team starts Devving the Ops? Do they look at the enshrined principles of The Holy Book of DevOps and start trying to change organizational culture a little bit at a time to get the happy ending from The Phoenix Project? Do they eliminate the Brents of the world and give the security teams peace of mind?

Or, do they carve out their own little fiefdoms and start behaving like an integrated team with responsibilities and politics? Do they do things like deploy new projects to the cloud with little support from other teams. With the idea that they now “own” that workflow and can control how it’s used and how their team is viewed? If you read the article above with the report from Veriflow, you’ll find that a lot of organizations are seeing this second behavior.

Just as much as people fear proper change, they also get greedy in their new roles and want to be important to the business. And taking ownership of all the new initiatives, like cloud development, is a great way to be important. And, as much as The Phoenix Project preaches that security should be integrated into the DevOps workflow, you still half the 330 respondents to the above survey saying there is an increase in security threats to their new initiatives in public cloud.

Redefining DevOps

In a way, this “definition” of DevOps is like the title of this post. I’m sure more than a few of you bristled at the use of on-premise. Because, in today’s IT landscape we’re fighting a losing battle against a premise. When you refer to something as happening in a location, you say “on-premises”. If you say “on-premise”, you should be referring to an idea or concept. And yet, so many people in Silicon Valley say “on-premise” when referring to “on site” or “on location”. It’s grammatically wrong. But it sounds hip. It’s not the classical definition of the word and yet that word is slowly be redefined to mean what people are using it to mean. It literally happened with “literally”.

For those railing against the DevOps Washing that’s going on, ask yourself this question: Why? If the pure principles of DevOps are so much better and easier, why is everyone just slapping DevOps on existing teams or reforming other people into teams and running with the DevOps idea instead of following the rules as laid down by the sacred DevOps texts?

It could be that all organizations that are doing it this way are wrong. But are their more organizations doing it the proper way? Or is the lazy way more prevalent? I don’t know the answer, but given the number of products I see aimed at “the DevOps team” or the number of people that have given me feedback about how their organization’s DevOps teams display the same behaviors I talked about in my other blog post, I’d say there are more bad apples than purists out there.

So, what does this all mean for DevOps? Are we going to go on pointing and laughing at the DevOps-In-Name-Only crowd? Are we going to silently moan about how Real DevOps doesn’t happen and that we need to stay pure to the ideals? Or are we going to step back and realize that, just like every other technology or organizational shift that has ever occurred, nothing really gets implemented in its purest form? Instead of complaining that those not doing it the “proper” way are wrong, let’s examine why things get done the way they do and figure out how to fix it.

If businesses are implementing DevOps teams to execute the things they need done, find out why it has to be a dedicated team. Maybe they’re doing it wrong, or maybe they’ve stumbled across something that wasn’t included in the strictest definitions of DevOps. If people are giving work to those teams to accomplish and excluding other functional teams at the same time, don’t just wag your finger at them and tell them that’s not the “right way”. Find out what enabled that team to violate the ideas in the first place. Maybe the DevOps Team is responsible for all cloud deployments. Maybe they want some control over things instead of just a nebulous connection to an ideal.


Tom’s Take

DevOps in theory is a great thing. DevOps as presented in The Phoenix Project is a marvelous idea. But we all know that when theory meets reality, what we get is something different than we expected. It’s not unlike von Moltke’s famous quote, “No plan survives first contact with the enemy.” In theory, DevOps is pure and works like it should. But we’re seeing practice differing greatly from reality. The results are usually the same but the paths are radically different. And for the purists out there, if you don’t want DevOps to suffer the same fate as on-premise, you need to start asking yourself the same hard questions we are supposed to ask organizations as they start to deploy these ideas.

DevOps is a Silo

Silos are bad. We keep hearing how IT is too tribal and broken up into teams that only care about their swim lanes. The storage team doesn’t care about the network. The server teams don’t care about the storage team. The network team is a bunch of jerks that don’t like anyone. It’s a viscous cycle of mistrust and playground cliques.

Except for DevOps. The savior has finally arrived! DevOps is the silo-busting mentality that will allow us all to get with the program and get everything done right this time. The DevOps mentality doesn’t reinforce teams or silos. It focuses on the only pure thing left in the world – committing code. The way of the CI/CD warrior. But what if I told you that DevOps was just another silo?

Team Players

Before the pitchforks and torches come out, let’s examine why IT has been so tribal for so long. The silo mentality came about when we started getting more specialized with regards to infrastructure. Think about the original compute resources – mainframes. There weren’t any silos with mainframes because everyone pretty much had to know what they were doing with every part of the system. Everything was connected to the mainframe. The mainframe itself was the silo.

When we busted the mainframe apart and started down the road of client/server computing the hardware started becoming more specialized. Instead of one giant machine we had lots of little special machines everywhere. The more we deconstructed the mainframe, the more we needed to focus. The direct-attached storage became NAS and eventually SAN. The computer got bigger and bigger and eventually morphed into a virtualized hypervisor. The network exists to connect everything to the rest of the world, and as technology wore on the network became the transport for the infrastructure to talk to everything else.

Silos exist because you had to have specialized knowledge to operate your specialized infrastructure. Sure, there could be some cross training at lower levels or administration. Buy one you got into really complex topics like disk geometry optimization or route redistribution the ability for a layperson to understand what was going on was shot. Each silo exists to reinforce their own infrastructure. Each silo has their norms and their schedules. The storage team will never lose data. The network always has to be available.

Even as these silos got crammed together and subsumed into new job roles, the ideas behind them stayed consistent. Some of the storage admin’s job roles combined with the virtualization team to be some kind of a hybrid. The networking team has been pushed to adopt more agile development methodologies like automation and orchestration. Through it all, the silos were coming down as people pushed the teams to embrace more software focused on the infrastructure. That is, until DevOps burst onto the scene.

OpSilo

The DevOps tribe has a mantra: “Move Fast. Break Things. Ship. Ship. SHIP!” Maybe not those exact words but something very similar. DevOps didn’t come from mainframes. It didn’t even come from the early days of client/server. DevOps grew out of a time when everything was blown apart and on the verge of being moved into the cloud. These new DevOperators didn’t think about infrastructure as a team or a tribe. Instead, it was an impediment to shipping code.

When you work in software, moving fast and breaking things works. Because you’re pushing the limits of what you can do. You’re focused on features. You want new shiny things. Stability can wait as long as the next code commit is right around the corner. Who cares about what you’ve been doing.

In order to have the best experience with Software X, please turn on Automatic Updates so we can push the code as fast as our commits will allow.

Sound familiar? Who cares about disk geometry or route reflectors. Make my stuff work! Your infrastructure supports all my awesome code. I write the stuff that pays your salary. This place would be out of business if it wasn’t for me!

Granted that’s a little extreme, but the mentality is the same. Infrastructure exists to be consumed. IT is there to support the mission of Moving Fast, Breaking Things, and Shipping. It’s almost like a tribal behavior. Everyone has the same objective – ALL THE COMMITS!

Move fast and break things is the exact opposite of the storage and networking teams. You really don’t want to be screaming along at 800Mph when deploying a new SAN or trying to get iBGP stood up. You want careful. Calm. Collected. You’re working with a whole system that’s built on a house of cards. Unlike DevOps, breaking a thing in a SAN or on the edge of a network could impact the entire system, not just one chat module.

That’s why Networking and storage admins are so methodical. I harken back to some of my days in network engineering. When the network was running slow or the storage array was taxed, it took time to get data back. People were irritated but they got used to the idea of slowness. But if those systems ever went down, it was all-hands-on-deck panic! Contrast that with the mentality of the DevOps tribe. Who cares if it’s kind of broken right now? We need to ship the next feature or patch.

DevOps isn’t a silo buster. It’s just a different kind of tribal silo. The DevOps folks all have similar mentalities and view infrastructure in the same way. Cloud appeals to them because it minimizes infrastructure and gives them the tools they need to focus on developing. Cloud sprawl can easily happen when planning doesn’t occur. When specialized groups get together and talk about what they need, there is a reduction in consumed resources. Storage admins know how to get the most out of what they have. They don’t just spin up another bucket and keep deploying.


Tom’s Take

If you treat DevOps like a siloed tribe you’ll find their behavior is much easier to predict and work with. Don’t look at them as a cross-functional solution to all your problems. Even if you deploy all your assets to the cloud you’re going to need specialized teams to manage them once the infrastructure grows too big to manage by movement. Specialization isn’t the result of bad planning or tribalism. Instead, those specialized teams developed because of the need for deeper understanding. Just like DevOps developed out of a need to understand rapid deployment and fast-moving consumption of infrastructure. In time, the next “solution” to the DevOps problem will come along and we’ll find as well that it’s just another siloed team.

Managing Automation – Fighting Fear of Job Justification

Dear Employees

 

We have decided to implement automation in our environment because robots and programs are way better than people. We will need you to justify your job in the next week or we will fire you and make you work in a really crappy job that doesn’t involve computers while we light cigars with dollar bills.

 

Sincerely, Management

The above letter is the interpretation of the professional staff of your organization when you send out the following email:

We are going to implement some automation concepts next week. What are some things you wish you could automate in your job?

Interpretations differ as to the intent of automation. Management likes the idea of their engineering staff being fully tasked and working on valuable projects. They want to know their people are doing something productive. And the people that aren’t doing productive stuff should either be finding something to do or finding a new job.

Professional staff likes being fully tasked and productive too. They want to be involved in jobs and tasks that do something cool or justify their existence to management. If their job doesn’t do that they get worried they won’t have it any longer.

So, where is the disconnect?

You Do Exist (Sort of)

The problem with these interpretations comes down to the job itself. Humans can get very good at repetitive, easy jobs. Assembly line works, quality testers, and even systems engineers are perfect examples of this. People love to do something over and over again that they are good at. They can be amazing when it comes to programming VLANs or typing out tweets for social media. And those are some pretty easy jobs!

Consistency is king when it comes to easy job tasks. When I can do it so well that I don’t have to think about things any more I’ve won. When it comes out the same way every time no matter how inattentive I am it’s even better. And if it’s a task that I can do at the same time or place every day or every week then I’m in heaven. Easy jobs that do themselves on a regular schedule are the key to being employed forever.

Automatic For The Programs

Where does that sound more familiar in today’s world? Why, automation of course! Automation is designed to make easy, repeatable jobs execute on a schedule or with a specific trigger. When that task can be done by a program that is always at work and never calls in sick or goes on vacation you can see the allure of it to management. You can also see the fear in the eyes of the professional that just found the perfect role full of easy jobs that can be scheduled on their calendar.

Hence the above interpretation of the automation email sample. People fear change. They fear automation taking away their jobs. Yet, the jobs that are perfect for automation are the kinds of things that shouldn’t be jobs in the first place. Professionals in a given discipline are much, much smarter than just doing something repetitively over and over again like VLAN modifications or IP addressing of interfaces. More importantly, automation provides consistency. Automation can be programmed to pull from the correct database or provide the correct configuration every time without worry of a transcription mistake or data entry in the wrong field.

People want these kinds of jobs because they afford two important things: justification of their existence and free time at work. The former ensures they get to have a paycheck. The latter gives them the chance to find some kind of joy in their job. Sure, you have some kind of repetitive task that you need to do every day like run a report or punch holes in a sheet of metal. But when you’re not doing that task you have the freedom to do fun stuff like learn about other parts of your job or just relax.

When you take away the job with automation, you take away the cover for the relaxation part. Now, you’ve upset the balance and forced people to find new things to do. And that means learning. Figuring out how to make tasks easy and repetitive again. And that’s not always possible. Hence the fear of automation and change.

Building A Better Path To Automation

How do we fix this mess? How can we motivate people to embrace automation? Well, it’s pretty simple:

  1. Help Your Team See The Need – If your teams think think they’re going to lose their jobs because of automation, they’re not going to embrace it. You need to show them that not only are they not going to lose their jobs but how automation will make their jobs easier and better. Remember to frame your arguments along the lines of removing mistakes and not needing to worry about justifying your existence in a role. That should encourage everyone to look for new challenges to overcome.
  2. Show the Value – This goes with the first part somewhat, but more than showing the need for automation with mistake reduction or schedule easing, you also need to show value. If a person has never made a mistake or has built their schedule around repetitive tasks they are going to hate automation. Show them what they can do now that their roles don’t have to focus on the old stuff they did. Help them look at where they can provide additional value. Even if it starts off by monitoring the automation platform to make sure it’s executing correctly. Maybe the value they can provide is finding new things to automate!
  3. Embrace the Future – Automation allows people to learn how to do new things. They can focus on new skills or roles that help support the business in a better way. More automation means more complexity to understand but also a chance for people to shine in new roles. The right people will see a challenge as something to be overcome. Help them set new goals. Help them get where they want to be. You’ll be surprised how quickly they will get there with the right leadership.

Tom’s Take

Automation isn’t going to steal jobs. It will force people to examine their tasks and decide how important they really are. The people that were covering their basic roles and trying to skate by are going to leave no matter what. Even if your automation push fails these marginal people are going to leave for greener pastures thanks to the examination of what they’re actually doing. Don’t let the pushback discourage you in the short term. Automation isn’t the goal. Automation is the tool to get you to the true goal of a smoother, more responsive team that accomplishes more and can reacher higher goals.

Risking It All

When’s the last time you thought about risk? It’s something we have to deal with every day but hardly ever try to quantify unless we work in finance or a high-stakes job. When it comes to IT work, we take risks all the time. Some are little, like deleting files or emails thinking we won’t need them again. Or maybe they’re bigger risks, like deploying software to production or making a change that could take a site down. But risk is a part of our lives. Even when we can’t see it.

Mitigation Revelations

Mitigating risk is the most common thing we have to do when we analyze situations where risk is involved. Think about all the times you’ve had to create a backout plan for a change that you’re checking in. Even having a maintenance window is a form of risk mitigation. I was once involved in a cutover for a metro fiber deployment that had to happen between midnight and 2 am. When I asked why, the tech said, “Well, we don’t usually have any problems, but sometimes there’s a hiccup that takes the whole network down until we fix it. This way, there isn’t as much traffic on it.”

Risk is easy to manage when you compartmentalize it. That’s why we’re always trying to push risk aside or contain the potential damage from risk. In some cases, like a low-impact office that doesn’t rely on IT, risk is minimal at best. Who cares if we deploy a new wireless AP or delete some files? The impact is laughable if one computer goes offline. For other organizations, like financial trading or healthcare, the risks of downtime are far greater. Things that could induce downtime, such as patches or changes, must be submitted, analyzed, approved, and implemented in such a way as to ensure there is no downtime and no chance for failure.

Risk behaves this way no matter what we do. Sometimes our risks are hidden because we don’t know everything. Think about bugs in release code, for example. If we upgrade to a new code train to fix an existing bug or implement a new feature we are assuming the code passed some QA checks at some point. However, we’re still accepting a risk that the new code will contain a bug that is worse or force us to deal with new issues down the road. New code upgrades have even more stringent risk mitigation, such as bake-in periods or redundancy requirements before being brought online. Those protections are there to protect us.

Invisible Investments In Problems

But what about risks that we don’t know about? What if those risks were minimized before we ever had a chance to look for them in a shady way?

For example, when I had LASIK surgery many years ago, I was handed a pamphlet that was legally required to be handed to me. I read through the procedure, which included a risk of possible side effects or complications. Some of them were downright scary, even with a low percentage chance of occurring. I was told I had to know the risks before proceeding with the surgery. That way, I knew what I was getting into in case one of those complications happened.

Now, legal reasons aside, why would the doctor want to inform me of the risks? It makes it more likely that I’m not going to go through with the procedure if there are significant risks. So why say anything at all unless you’re forced to? Many of you might say that the doctor should say something out of the goodness or morality of their own heart, but the fact a law exists that requires disclosure should tell you about the goodness in people’s hearts.

Medical providers are required to reveal risk. So are financial planners and companies that provide forward looking statements. But when was the last time that a software company disclosed potential risk in their platform? After all, their equipment or programs could have significant issues if they go offline or are made to break somehow. What if there is a bug that allows someone to hack your system or crash your network? Who assumes the risk?

If your provider doesn’t tell you about the risks or tries to hide them in the sales or installation process, they’re essentially accepting the unknown risk on your behalf for you. If they know there is a bug in the code that could cause a hospital core network to melt down or maybe reboot a server every 180 days like we had to do with an unpatched CallManager 6.0 server then they’ve accepted that risk silently and passed it along to you. And if you think you’re going to be able to sue them or get compensation back from them you really need to read those EULAs that you agree to when you install things.

Risky Responsibility

The question now becomes about the ultimate responsibility. These are the “unknown unknowns”. We can’t ask about things we don’t know about. How could we? So it’s up to people with the knowledge of the risk to disclose it. In turn, that opens them up to some kinds of risk too. If my method of mitigating the risk in your code is to choose not to purchase your product, then you have to know that it was less risk for me to choose that route. Sure, it’s more risky for you to disclose it, but the alternative could lead to a big exposure.

Risk is a balance. In order to have the best balance we need to have as much information as possible in order to put plans in place to mitigate it to our satisfaction. Some risks can’t be entirely mitigated. But failure to disclose risks to prevent someone from making a sale or implementing a technology is a huge issue. And if you find out that it happened to you then you absolutely need to push back on it. Because letting someone else accept the risk on your behalf in secret will only lead to problems down the road.


Tom’s Take

Every change I checked into a network during production hours was a risk. Some of them were minor. Others were major. And still others were the kind that burned me. I accepted those risks for myself but I always made sure to let the customer I was working for know about them. There was no use in hiding information about a change that could take down the network or delete data. Sure, it may mean holding off on the change or making it so that we needed to find an alternative method. But the alternative was hurt feelings at best and legal troubles at worst. Risk should never be a surprise.

2019 Is The King of Content

2018 was a year full of excitement and fun. And for me, it was a year full of writing quite a bit. Not only did keep up my writing here for my audience but I also wrote quite a few posts for GestaltIT.com. You can find a list of all the stuff I wrote right here. I took a lot of briefings from up-and-coming companies as well as talking to some other great companies and writing a couple of series about SD-WAN.

It was also a big year for the Gestalt IT Rundown. My co-host with most Rich Stroffolino (@MrAnthropology) and I had a lot of fun looking at news from enterprise IT and some other fun chipset and cryptocurrency news. And I’ve probably burned my last few bridges with Larry Ellison and Mark Zuckerberg to boot. I look forward to recording these episodes every Wednesday and I hope that some of you will join us on the Gestalt IT Facebook page at 12:30 EST as well.

Content Coming Your Way

So, what does that leave in store for 2019? Well, since I hate predictions on an industry scale, that means taking a look at what I plan on doing for the next year. For the coming 365 days, that means creating a lot of content for sure. You already know that I’m going to be busy with a variety of fun things like Networking Field Day, Mobility Field Day, and Security Field Day. That’s in addition to all the things that I’m going to be doing with Tech Field Day Extra at Cisco Live Europe and Cisco Live US in San Diego.

I’m also going to keep writing both here and at Gestalt IT. You probably saw my post last week about how hard it is to hit your deadlines. Well, it’s going to be a lot of writing coming out in both places thanks to coverage of briefings that I’m taking about industry companies as well as a few think pieces about bigger trends going on in the industry.

I’m also going to experiment more with video. One of the inspirations that I’m looking at is none other than my good friend Ethan Banks (@ECBanks). He’s had some amazing videos series that he’s been cranking out on his daily walks. He’s been collecting some of them in the Brain Spasms playlist. It’s a really good listen and he’s tackling some fun topics so far. I think I’m going to try my hand at some solo video content in the future at Gestalt IT. This blog is going to stay written for the time being.

Creating Content Quickly

One of the other things that I’m playing around with is the idea of being able to create content much more quickly and on the spot versus sitting down for long form discussions. You may recall from a post in 2015 that I’ve embraced using Markdown. I’ve been writing pretty consistently in Markdown for the past three years and it’s become second nature to me. That’s a good thing for sure. But for 2019, I’m going to branch out a bit.

The biggest change is that I’m going to try to do the majority of my writing on an iPad instead of my laptop. This means that I can just grab a tablet and type out some words quickly. It also means that I can take notes on my iPad and then immediately translate them into thoughts and words. I’m going to do this using iA Writer as my content creation tool. It’s going to help me with my Markdown as well as helping me keep all the content I’m going to write organized. I’m going to force myself to use this new combination unless there’s no way I can pull it off, such as with my Cisco Live Twitter list. That whole process still relies quite a bit on code and not on Markdown.

As I mentioned in my deadline post, I’m also going to try to move my posting dates back from Friday to Wednesday or Thursday at the latest. That gives me some time to play around with ideas and still have a cushion before I’m late with a post. On the big days I may still have an extra post here or there to talk about some big news that’s breaking. I’m hoping this allows me to get some great content out there and keep the creative juices flowing.


Tom’s Take

2019 is going to be a full year. But it allows me to concentrate on the things that I love and am really good at doing: Writing and leading Tech Field Day. Maybe branching out into video is going to give me a new avenue as well, but for now that’s going to stay pretty secondary to the writing aspect of things. I really hope that having a more mobile writing studio also helps me get my thoughts down quickly and create some more compelling posts in the coming year. Here’s hoping it all works out and I’ve got some great things to look back on in 365 days!

 

Murphy the Chaos Manager

I had the opportunity to sit in on a great briefing from Gremlin the other day about chaos engineering. Ken Nalbone (@KenNalbone) has a great review of their software and approach to things here. The more time I spent thinking about chaos engineering and IT, the more I realized that it has more in common with Murphy’s Law that we realize.

Anything That Can Go Wrong

If there’s more than one way to do a job and one of those ways will end in disaster, then somebody will do it that way. – Edward Murphy

 

Anything that can go wrong will go wrong. – Major John Paul Stapp

We live by the adage of Murphy’s Law in IT. Anything that can go wrong will go wrong. And usually it goes wrong at the worst possible time. Database query functions will go wrong when you need them the most. And usually at the height of something like Amazon Prime Day. Data center outages only seem to happen at 4 am on a Sunday during a holiday.

But why do things go wrong like this? Is it because the universe just has it out for IT people? Are we paying off karma from the fall of the Western Roman Empire? Or is it because we can’t anticipate some crazy things? Are we kidding ourselves that we can just manage Murphy and hope for the best?

As it turns out, this is why chaos engineering is so important. Because it doesn’t just make us realize that things are broken. It helps us understand how they will break in unique and different ways each time. A big reason why this is so important is because many large-scale failures aren’t the result of a single problem, but instead a collection of smaller things that build on each other.

One of my favorite stories about this collection of failures comes from a big Amazon Web Services (AWS) outage from last March. People were seeing problems in US-EAST-1 but they couldn’t nail down the issue. Worse yet, every time they logged into the Amazon dashboard they saw green lights for every service. As the minutes dragged on it was eventually discovered that the lights were lying to everyone because Amazon hosted that page on AWS US-EAST-1. They couldn’t log in to reset the lights to show an outage! Coincidentally, many other monitoring services were down as well because they were also hosted in the same region.

What does this teach us about chaos? Well, Murphy was in full effect for sure. Something went wrong and happened at a bad time. But it was also the worst possible time for Amazon to figure out that the status lights and dashboard systems were all hosted out of one region with no backup anywhere else. Perhaps they could have caught that with a system like Gremlin. Perhaps it would have gone under the radar until the worst possible moment like it did in real life. There’s no way to know for sure. Hopefully Amazon has fixed this little problem for now.

People Will Do It Wrong

This also teaches us something about user behavior. One thing we hear frequently about patches or other glaring issues with software is “How was this not caught in testing?!?”

The flip side of that is that most of these corner case issues were never tested in the first place. Testing focuses on testing main functionality of a system. QA testers focus on the big picture stuff first. Does the UI fall apart? Are all the buttons linked to a specific task? What happens when I click HELP on the login screen.

What does QA not test for? Well, lots of things that users actually do. Holding down random keystrokes while clicking buttons. Navigating to random pages and then bookmarking them without realizing that’s a bad idea. Typing the wrong information into a list box that passes validation and screws up the backend. The list of variations is endless.

How does this apply to chaos? Well, as it turns out, engineers and testers are pretty orderly people. We all look at problems and try to figure them out. We try combinations of things until we solve the issue. But everything is based on the idea that we’re trying things in specific combinations until we replicate the issue. We don’t realize that some of the random behavior we see comes from behaviors we can’t control from users.

Another story: I was editing a document the other day in a CMS and I saved the document revisions I’d made as a draft post. When I went to check the post, it had inadvertently published itself. I didn’t want it to publish at that time, so I was perplexed. I knew I had clicked the save function button but I also knew I didn’t click the publish button. I looked through documentation and couldn’t find any issues.

I put it out of my mind until it happened again a couple of weeks later. This time, I went back through every step I had just done. The only thing that was out of the ordinary compared to the last time was the I had saved the document with ⌘+S (CTRL+S for Windows) just like I’d taught myself to do for years. But, in this CMS, that shortcut saves and publishes the current document. Surprise!

Behavior that shouldn’t have triggered a problem did. Because no one ever tested for what might happen if someone used a familiar keystroke in a place where it wasn’t intended. This is what makes chaos engineering so difficult and rewarding. Because you can set up the system to test for those random things without needing to think about them. And when you figure out a new one, like whether or not ⌘+S can crash your system, you can add it to the list to be checked against everything!


Tom’s Take

I love reading and learning about chaos engineering. The idea that we purposely break things to make people thing about building them correctly appeals to me. I find myself trying to figure out how to make better things and always find out that I’m being stymied because I don’t think “outside the box”, which is a clever way of saying that I don’t think like a user. I need something that helps me understand how things will break in new and unique ways every time. Because while we can test for the big stuff, Murphy has a way of showing us what happens when we don’t sweat the small stuff.