Assume Disaster

One of the things that people have mentioned to me in the past regarding my event management skills is my reaction time. They say, “You are always on top of things when they go wrong. How do you do it?”

My response never fails to make them laugh. I offer, “I always assume something is going to go wrong. I may not know what it is but when it does happen I’m ready to fix it.”

That may sound like a cynical take on planning and operations but it’s served me well for many years. Why is it that things we spend so much time working on always seem to go off the rails?

Complexity Fails

Whether it’s an event or a network or even a carpentry project you have to assume that something is going to go wrong. Why? Because the more complex the project the more likely you are to hit a snag. Systems that build on themselves and require input to proceed are notorious for hitting blocks that cause the whole thing to snarl into a mess of missed timelines.

When I was in college studying project management I learned there’s even a term for time saving: crashing a project. Not literally crashing the project into something but instead looking for ways to trim the timeline and work through issues. Why is this a common term? I’d hazard a guess that very few projects actually stick to their timeline. It could be a parts delay. It could be a team taking longer to work through an issue. Mercury could be in retrograde during sunspots. Whatever the case may be, projects are designed to have floating timelines.

This imprecision built into project planning made me realize that the only way to be really sure that something would get done properly was to anticipate the errors and work through them. Part of the way to prevent these issues is to reduce complexity. You may not be able to work through every potential scenario where something is going to go sideways but you can almost always tell where the problems will arise. Any module of work that has lots of moving parts or lots of people with specific deadlines is going to be a trouble spot. The more components that depend on each other means a greater chance that any one of them slipping will cause a delay that requires attention.

If you have a project or are planning something that has complicated steps for a specific goal, try to break those down into more simple things that don’t depend on each other. Have a team that needs to write a report based on the research from another team? Don’t bundle those together. Have the writing team working on things that aren’t dependent upon the research team just in case the data isn’t delivered. If you’re building a house and you are planning on having things done that require a roof being installed you should have a plan for what happens if the roofers are behind or the shingles don’t arrive on schedule. Finding these extra bits of complexity and eliminating them will go a long way toward solving recurring sources of frustration.

Be Prepared for Problems

The motto of the Boy Scouts is “be prepared”. It’s something I constantly remind the youth in the program weekly. Be prepared for what exactly? It doesn’t matter what if you’re properly prepared. You don’t have to be prepared for every possible scenario but you need to have the flexibility to address a wide variety of potential problems.

Take information security, as a prime example. How will your enterprise be breached? There’s almost too many ways to consider. New zero day? Backdoor password installed years ago? Phishing your key employees? Good old fashioned malfeasance? The list of things are endless! But the results are always the same. Attackers look for things of value and either steal them or disable them. Thieves steal and chaotic souls cause chaos. The entry is unknown but the results of entry can be quantified and considered.

You may not know how they’ll get in but you know how to stop them once they do. That’s why you should always assume you’re under attack or already breached. If you construct the system in such a way as to prevent lateral movement or even create policies to keep data safe at rest you’ll go a long way to preventing unauthorized users from accessing it, malicious or otherwise.

Is assuming that you’re always under attack kind of paranoid? Yes, it is. However, if you assume you’ve been breached and you are wrong all you’ve done is ensure that your data is safe and secure. If you assume you’re not and you end up being wrong you get to spend a lot of time cleaning up and sending emails to your boss and your resume to the next place where you get to make all new assumptions.


Tom’s Take

The optimist in me wants to believe that you can plan something so well that there isn’t a chance a problem can happen. The realist in me knows the optimist is crazy. That doesn’t mean I should just stop planning and hope for the best when I need to tap dance my way out of a problem. Instead, it means that I need to consider all the possibilities and try to have an answer for them, event if they’re remote. That way I’m never caught off guard by the wackiest of issues.

It’s About Time and Project Management

I stumbled across a Reddit thread today from /u/Magician_Hiker that posed a question I’ve always found fascinating. When we work on projects, it always seems like there is a disconnect between the project management team and the engineering team doing the work. The statement posted at the top of this thread is as follows:

Project Managers only plan for when things go right.

Engineers always plan for when things go wrong.

How did we get here? And can anything be done about it?

Projecting Management

I’ve had a turn or two at project management. I got my Project+ many years back, and even more years before that I had to learn all about project management in college. The science behind project management is storied and deep. The idea of having someone assigned to keep things running on task and making sure all the little details get taken care of is a huge boon as the size of projects grow.

As an engineer, can you imagine trying to juggle three different installations across 5 different sites that all need to be coordinated together? Can you think about the effort needed to make sure that everything works together and is done on time? The thought alone probably gives you hives.

Project managers are capable of juggling lots of things in their professional capabilities. That means keeping all the dishes cooking at the same time and making sure that everything is done on time to eat dinner. It also means that people need to know about timelines and how those timelines intersect and can impact the execution of multiple phases of a project. Sure, it’s easy to figure out that we can’t start installing the equipment until it arrives on the dock. But how about coordinating the installers to be on-site on the right day knowing that the company is drop shipping the equipment to three different receiving docks? That’s a bit harder.

Project managers need to know timelines for things because they have to juggle everything together. If you’ve ever had the misfortune to need to use a Gantt chart you’ll know what I’m talking about. These little jewels have all the timeline needs of a project visualized for everyone to figure out how to make things happen. Stable time is key to a project. Estimates need to make sense. You can’t just spitball something and hope it works. If part of your project timeline is off in either direction, you’re going to get messed up further down the line.

Predictability

Project timelines need to be consistent. Most people try to err on the side of caution when trying to make them work. They fudge the numbers and pad things out a bit so that everything will work out in the end. Even if that means that there may be a few hours when someone is sitting around with nothing to do.

I worked with a project manager that jokingly told me that the way he figured out the timing for an installation project was to take the units from his engineers and double it and move to the next time unit. So hours became days, and days became weeks. We chuckled about this at the time, but it also wasn’t surprising when their projects always seemed to talk a LOT longer than most people budgeted for.

The problem with inflated numbers is that no customer is going to want to pay for wasted time. If you think it’s hard to get a customer to buy off on an installation that might take 30 hours try getting them to pay when they are telling you your engineers were sitting around for 10 of those hours. Customers only want to pay for the hours worked, not the hours spent on the phone trying to locate shipments or trying to figure out what this weird error message is.

Likewise, trying to go the other direction and get things done more quickly than the estimate is a recipe for disaster too. There’s even a specific term for it: crashing (sounds great, eh?). Crashing a project means adding resources to a project or removing items from the critical execution path to make a deadline or complete something earlier. If you want a textbook example of why messing with a project timeline is a bad idea, go read or watch The Martian. The first resupply mission is a prime example of this practice in action and why it can go horribly wrong.

These are all great reasons why cloud is so appealing to people. Justin Warren (@JPWarren) did a great presentation a couple of years ago about what happens when projects run late and why cloud fixes that:

Watch that whole video and you’ll understand things from a project manager’s point of view. Cloud is predictable and stable and it always works the same way. The variance on things is tight. You don’t have to worry about projects slipping up or taking too much time. Cloud removes uncertainty and doubt about execution. That’s something that project managers love.


Tom’s Take

I used to get asked to quote my projected installation times to the sales managers for projects. Most of the time, I’d give them an estimate that I felt comfortable with and that would be the end of it. One day, I asked them about why a 10-hour project was quoted as 14 on an order. The sales manager told me that they’d developed “Tom Time”, which was 1.4 times the amount of whatever I quoted. So, 10 hours became 14 and 20 hours became 28, and so on. When I asked why I was told that engineers often run into problems and don’t think to account for it. So project managers need to build in the time somehow. Perhaps that’s one of the reasons why software defined and cloud are more attractive. Because there isn’t any Tom Time involved.