The Cargo Cult of Google Tools

You should definitely watch this amazing video from Ben Sigelman of LightStep that was recorded at Cloud Field Day 4. The good stuff comes right up front.

In less than five minutes, he takes apart crazy notions that we have in the world today. I like the observation that you can’t build a system more than three or four orders of magnitude. Yes, you really shouldn’t be using Hadoop for simple things. And Machine Learning is not a magic wand that fixes every problem.

However, my favorite thing was the quick mention of how emulating Google for the sake of using their tools for every solution is folly. Ben should know, because he is an ex-Googler. I think I can sum up this entire discussion in less than a minute of his talk here:

Google’s solutions were built for scale that basically doesn’t exist outside of a maybe a handful of companies with a trillion dollar valuation. It’s foolish to assume that their solutions are better. They’re just more scalable. But they are actually very feature-poor. There’s a tradeoff there. We should not be imitating what Google did without thinking about why they did it. Sometimes the “whys” will apply to us, sometimes they won’t.

Gee, where have I heard something like this before? Oh yeah. How about this post. Or maybe this one on OCP. If I had a microphone I would have handed it to Ben so he could drop it.

Building a Laser Moustrap

We’ve reached the point in networking and other IT disciplines where we have built cargo cults around Facebook and Google. We practically worship every tool they release into the wild and try to emulate that style in our own networks. And it’s not just the tools we use, either. We also keep trying to emulate the service provider style of Facebook and Google where they treated their primary users and consumers of services like your ISP treats you. That architectural style is being lauded by so many analysts and forward-thinking firms that you’re probably sick of hearing about it.

Guess what? You are not Google. Or Facebook. Or LinkedIn. You are not solving massive problems at the scale that they are solving them. Your 50-person office does not need Cassandra or Hadoop or TensorFlow. Why?

  • Google Has Massive Scale – Ben mentioned it in the video above. The published scale of Google is massive, and even it’s on the low side of the number. The real numbers could even be an order of magnitude higher than what we realize. When you have to start quoting throughput numbers in “Library of Congress” numbers to make sense to normal people, you’re in a class by yourself.
  • Google Builds Solutions For Their Problems – It’s all well and good that Google has built a ton of tools to solve their issues. It’s even nice of them to have shared those tools with the community through open source. But realistically speaking, when are you really going to use Cassandra to solve all but the most complicated and complex database issues? It’s like a guy that goes out to buy a pneumatic impact wrench to fix the training wheels on his daughter’s bike. Sure, it will get the job done. But it’s going to be way overpowered and cause more problems than it solves.
  • Google’s Tools Don’t Solve Your Problems – This is the crux of Ben’s argument above. Google’s tools aren’t designed to solve a small flow issue in an SME network. They’re designed to keep the lights on in an organization that maps the world and provides video content to billions of people. Google tools are purpose built. And they aren’t flexible outside that purpose. They are built to be scalable, not flexible.

Down To Earth

Since Google’s scale numbers are hard to comprehend, let’s look at a better example from days gone by. I’m talking about the Cisco Aironet-to-LWAPP Upgrade Tool:

I used this a lot back in the day to upgrade autonomous APs to LWAPP controller-based APs. It was a very simple tool. It did exactly what it said in the title. And it didn’t do much more than that. You fed it an image and pointed it at an AP and it did the rest. There was some magic on the backend of removing and installing certificates and other necessary things to pave the way for the upgrade, but it was essentially a batch TFTP server.

It was simple. It didn’t check that you had the right image for the AP. It didn’t throw out good error codes when you blew something up. It only ran on a maximum of 5 APs at a time. And you had to close the tool every three or four uses because it had a memory leak! But, it was a still a better choice than trying to upgrade those APs by hand through the CLI.

This tool is over ten years old at this point and is still available for download on Cisco’s site. Why? Because you may still need it. It doesn’t scale to 1,000 APs. It doesn’t give you any other functionality other than upgrading 5 Aironet APs at a time to LWAPP (or CAPWAP) images. That’s it. That’s the purpose of the tool. And it’s still useful.

Tools like this aren’t built to be the ultimate solution to every problem. They don’t try to pack in every possible feature to be a “single pane of glass” problem solver. Instead, they focus on one problem and solve it better than anything else. Now, imagine that tool running at a scale your mind can’t comprehend. And you’ll know now why Google builds their tools the way they do.


Tom’s Take

I have a constant discussion on Twitter about the phrase “begs the question”. Begging the question is a logical fallacy. Almost every time the speaker really means “raises the question”. Likewise, every time you think you need to use a Google tool to solve a problem, you’re almost always wrong. You’re not operating at the scale necessary to need that solution. Instead, the majority of people looking to implement Google solutions in their networks are like people that put chrome everything on a car. They’re looking to show off instead of get things done. It’s time to retire the Google Cargo Cult and instead ask ourselves what problems we’re really trying to solve, as Ben Sigelman mentions above. I think we’ll end up much happier in the long run and find our work lives much less complicated.

Advertisements

The Privacy Pickle

I recorded a fantastic episode of The Network Collective last night with some great friends from the industry. The topic was privacy. Originally I thought we were just going to discuss how NAT both was and wasn’t a form of privacy and how EUI-64 addressing wasn’t the end of days for people worried about being tracked. But as the show wore on, I realized a few things about privacy.

Booming In Peace

My mom is a Baby Boomer. We learn about them as a generation based on some of their characteristics, most notably their rejection of the values of their parents. One of things they hold most dear is their privacy. They grew up in a world where they could be private people. They weren’t living in a 1 or 2 room house with multiple siblings. They had the right of privacy. They could have a room all to themselves if they so chose.

Baby Boomers, like my mom, are intensely private adults. They marvel at the idea that targeted advertisements can work for them. When Amazon shows them an ad for something they just searched for they feel like it’s a form of dark magic. They also aren’t trusting of “new” things. I can still remember how shocked my mother was that I would actively get into someone else’s car instead of a taxi. When I explained that Uber and Lyft do a similar job of vetting their drivers it still took some convincing to make her realize that it was safe.

Likewise, the Boomer generation’s personal privacy doesn’t mesh well with today’s technology. While there are always exceptions to every rule, the number of people in their mid-50s and older that use Twitter and Snapchat are far, far less than the number that is the target demographic for each service. I used to wonder if it was because older people didn’t understand the technology. But over time I started to realize that it was more based on the fact that older people just don’t like sharing that kind of information about themselves. They’re not open books. Instead, Baby Boomers take a lot of studying to understand.

Zee Newest

On the opposite side of the spectrum is my son’s generation, Generation Z. GenZ is the opposite of the Boomer generation when it comes to privacy. They have grown up in a world that has never known anything but the ever-present connectivity of the Internet. They don’t understand that people can live a life without being watched by cameras and having everything they do uploaded to YouTube. Their idea of celebrity isn’t just TV and movie stars but also extends to video game streamers on Twitch or Instagram models.

Likewise, this generation is more open about their privacy. They understand that the world is built on data collection. They sign away their information. But they tend to be crafty about it. Rather than acting like previous generations that would fill out every detail of a form this generation only fills out the necessary pieces. And they have been known to put in totally incorrect information for no other reason than to throw people off.

GenZ realizes that the privacy genie is out of the bottle. They have to deal with the world they were born into, just like the Baby Boomers and the other generations that came before them. But the way that they choose to deal with it is not through legislation but instead through self-regulation. They choose what information they expose so as not to create a trail or a profile that big data consuming companies can use to fingerprint them. And in most cases, they don’t even realize they’re doing it! My son is twelve and he instinctively knows that you don’t share everything about yourself everywhere. He knows how to navigate his virtual neighborhood just a sure as I knew how to ride my bike around my physical one back when I was his age.

Tom’s Take

Where does that leave me and my generation? Well, we’re a weird mashup on Generation X and Generation Y/Millenials. We aren’t as private as our parents and we aren’t as open as our children. We’re cynical. We’re rebelling against what we see as our parent’s generation and their complete privacy. Likewise, just like our parents, we are almost aghast at the idea that our children could be so open. We’re coming to live in a world where Big Data is learning everything about us. And our children are growing up in that world. Their children, the generation after GenZ, will only know a world where everyone knows everything already. Will it be like Minority Report, where advertising works with retinal patterns? Or will it be a generation where we know everything but really know nothing because no one tells the whole truth about who they are?

Data Is Not The New Oil, It’s Nuclear Power

Big Data. I believe that one phrase could get millions in venture capital funding. I don’t even have to put a product with it. Just say it. And make no mistake about it: the rest of the world thinks so too. Data is “the new oil”. At least, according to some pundits. It’s a great headline making analogy that describes how data is driving business and controlling it can lead to an empire. But, data isn’t really oil. It’s nuclear power.

Black Gold, Texas Tea

Crude oil is a popular resource. Prized for a variety of uses, it is traded and sold as a commodity and refined into plastics, gasoline, and other essential items of modern convenience. Oil creates empires and causes global commerce to hinge on every turn of the market. Living in a state that is a big oil producer, the exploration and refining of oil has a big impact.

However, when compared to Big Data, oil isn’t the right metaphor. Much like oil, data needs to be refined before use. But oil can be refined into many different distinct things. Data can only be turned into information. Oil burns up when consumed. Aside from some smoke and a small amount of residuals oil all but disappears after it expends the energy trapped within. Data doesn’t disappear after being turned into information.

In fact, the biggest issue that I have with the entire “Data as Oil” argument is that oil doesn’t stick around. We don’t see massive pools of oil on the side of the road from spills. We don’t hear about our massive issues with oil disposal or securing our spent oil to prevent theft. People that treat data like oil are only looking at the refined product as the final form. They tend to forget that the raw form of data sticks around after the transformation. Most people will tell you that’s a good thing because you can run analytics and machine learning against static datasets and continue to derive value from it. But doesn’t make me think of oil at all.

Welcome To The Nuclear Age

In fact, Big Data reminds me most of a nuclear power plant. Much like oil, the initial form of radioactive material isn’t very useful. It radiates and creates a small amount of heat but not enough to run the steam generators in a power plant. Instead, you must bombard the uranium 235 pellets with neutrons to start a fission reaction. Once you have a sustained controllable reaction the amount of generated heat rises and creates the resource you need to power the rest of your machinery.

Much like data, nuclear fission reactions don’t do much without the proper infrastructure to harness them. Even after you transform you data into information you need to parse, categorize, and analyze it. The byproduct of the transformation is the critical part of the whole process.

Much like nuclear fuel rods, data stays in place for years. It continues to produce the resource after being modified and transformed. It sits around in the hopes that it can be useful until the day that it no longer serves its purpose. Data that is past the useful shelf life goes into a data warehouse when it will eventually be forgotten. Spent nuclear fuel rods are also eventually removed and placed somewhere where they can’t affect other things. Maybe it’s buried deep underground. Or shot into space. Or placed under enough concrete that they will never be found again.

The danger in data and in nuclear power is not what happens when everything goes right. Instead, it’s what happens when everything goes wrong. With nuclear power, wrong is a chain reaction meltdown. Or wrong could be improper disposal of waste. It could be a disaster at the plant or even a theft of fissile nuclear material from the plant. The fuel rods themselves are simultaneously our source of power and the source of our potential disaster.

Likewise, data that is just sitting around and stored improperly can lead to huge disasters. We’re only four years removed from Target’s huge data breach. And how many more are waiting out there to happen? It seems the time between data leaks is shrinking as more and more bad actors are finding ways to steal, manipulate, and appropriate data for their own ends. And, much like nuclear fuel rods, the methods of protecting the data are few compared to other fuel sources.

Data isn’t something that can easily be hidden or compacted. It needs to be readable to be useful. It needs to be fast to be useful. All the things that make it easy to use also make it easy to exploit. And once we’re done with it, the way that it is stored in perpetuity only increase the likelihood of it being used improperly. Unless we’re willing to bury it under metaphorical concrete we’re in for a bad future if we forget how to handle spent data.


Tom’s Take

Data as Oil is a stupid metaphor. It’s meant to impress upon finance CEOs and Wall Street wonks how important it is for data to be taken seriously. Data as Oil is something a data scientist would say to get a job. By drawing a bad comparison, you make data seem like a commodity to be traded and used as collateral for empire building. It’s not. It’s a ticking bomb of disastrous proportions when not handled correctly. Rather than coming up with a pithy metaphor for cable news consumption and page views, let’s treat data with the respect it deserves and make sure we plan for how we’re going to deal with something that won’t burn up into smoke whenever it’s convenient.

Devaluing Data Exposures

I had a great time this week recording the first episode of a new series with my co-worker Rich Stroffolino. The Gestalt IT Rundown is hopefully the start of some fun news stories with a hint of snark and humor thrown in.

One of the things I discussed in this episode was my belief that no data is truly secure any more. Thanks to recent attacks like WannaCry and Bad Rabbit and the rise of other state-sponsored hacking and malware attacks, I’m totally behind the idea that soon everyone will know everything about me and there’s nothing that anyone can do about it.

Just Pick Up The Phone

Personal data is important. Some pieces of personal data are sacrificed for the greater good. Anyone who is in IT or works in an area where they deal with spam emails and robocalls has probably paused for a moment before putting contact information down on a form. I have an old Hotmail address I use to catch spam if I’m relative certain that something looks shady. I give out my home phone number freely because I never answer it. These pieces of personal data have been sacrificed in order to provide me a modicum of privacy.

But what about other things that we guard jealously? How about our mobile phone number. When I worked for a VAR that was the single most secretive piece of information I owned. No one, aside from my coworkers, had my mobile number. In part, it’s because I wanted to make sure that it got used properly. But also because I knew that as soon as one person at the customer site had it, soon everyone would. I would be spending my time answering phone calls instead of working on tickets.

That’s the world we live in today. So many pieces of information about us are being stored. Our Social Security Number, which has truthfully been misappropriated as an identification number. US Driver’s Licenses, which are also used as identification. Passport numbers, credit ratings, mother’s maiden name (which is very handy for opening accounts in your name). The list could be a blog post in and of itself. But why is all of this data being stored?

Data Is The New Oil

The first time I heard someone in a keynote use the phrase “big data is the new oil”, I almost puked. Not because it’s a platitude the underscores the value of data. I lost it because I know what people do with vital resources like oil, gold, and diamonds. They horde them. Stockpiling the resources until they can be refined. Until every ounce of value can be extracted. Then the shell is discarded until it becomes a hazard.

Don’t believe me? I live in a state that is legally required to run radio and television advertisements telling children not to play around old oilfield equipment that hasn’t been operational in decades. It’s cheaper for them to buy commercials than it is to clean up their mess. And that precious resource? It’s old news. Companies that extract resources just move on to the next easy source instead of cleaning up their leftovers.

Why does that matter to you? Think about all the pieces of data that are stored somewhere that could possibly leak out about you. Phone numbers, date of birth, names of children or spouses. And those are the easy ones. Imagine how many places your SSN is currently stored. Now, imagine half of those companies go out of business in the next three years. What happens to your data then? You can better believe that it’s not going to get destroyed or encrypted in such a way as to prevent exposure. It’s going to lie fallow on some forgotten server until someone finds it and plunders it. Your only real hope is that it was being stored on a cloud provider that destroys the storage buckets after the bill isn’t paid for six months.

Devaluing Data

How do we fix all this? Can this be fixed? Well, it might be able to be done, but it’s not going to be fun, cheap, or easy. It all starts by making discrete data less valuable. An SSN is worthless without a name attached to it, for instance. If all I have are 9 random numbers with no context I can’t tell what they’re supposed to be. The value only comes when those 9 numbers can be matched to a name.

We’ve got to stop using SSN as a unique identifier for a person. It was never designed for that purpose. In fact, storing SSN as all is a really bad idea. Users should be assigned a new, random ID number when creating an account or filling out a form. SSN shouldn’t be stored unless absolutely necessary. And when it is, it should be treated like a nuclear launch code. It should take special authority to query it, and the database that queries it should be directly attached to anything else.

Critical data should be stored in a vault that can only be accessed in certain ways and never exposed. A prime example is the trusted enclave in an iPhone. This enclave, when used for TouchID or FaceID, stores your fingerprints and your face map. Pretty important stuff, yes? However, even with biometric ID systems become more prevalent there isn’t any way to extract that data from the enclave. It’s stored in such a way that it can only be queried in a specific manner and a result of yes/no returned from the query. If you stole my iPhone tomorrow, there’s no way for you to reconstruct my fingerprints from it. That’s the template we need to use going forward to protect our data.


Tom’s Take

I’m getting tired of being told that my data is being spread to the four winds thanks to it lying around waiting to be used for both legitimate and nefarious purposes. We can’t build fences high enough around critical data to keep it from being broken into. We can’t keep people out, so we need to start making the data less valuable. Instead of keeping it all together where it can be reconstructed into something of immense value, we need to make it hard to get all the pieces together at any one time. That means it’s going to be tough for us to build systems that put it all together too. But wouldn’t you rather spend your time solving a fun problem like that rather than making phone calls telling people your SSN got exposed on the open market?

Don’t Build Big Data With Bad Data

I was at Pure Accelerate 2017 this week and I saw some very interesting things around big data and the impact that high speed flash storage is going to have. Storage vendors serving that market are starting to include analytics capabilities on the box in an effort to provide extra value. But what happens when these advances cause issues in the training of algorithms?

Garbage In, Garbage Out

One story that came out of a conversation was about training a system to recognize people. In the process of training the system, the users imported a large number of faces in order to help the system start the process of differentiating individuals. The data set they started with? A collection of male headshots from the Screen Actors Guild. By the time the users caught the mistake, the algorithm had already proven that it had issues telling the difference between test subjects of particular ethnicities. After scrapping the data set and using some different diverse data sources, the system started performing much better.

This started me thinking about the quality of the data that we are importing into machine learning and artificial intelligence systems. The old computer adage of “garbage in, garbage out” is never more apt today than it has been in history. Before, bad inputs caused data to be suspect when extracted. Now, inputting bad data into a system designed to make decisions can have even more far-reaching consequences.

Look at all the systems that we’re programming today to be more AI-like. We’ve got self-driving cars that need massive data inputs to help navigate roads at speed. We have network monitoring systems that take analytics data and use it to predict things like component failures. We even have these systems running the background of popular apps that provide us news and other crucial information.

What if the inputs into the system cause it to become corrupted or somehow compromised? You’ve probably heard the story about how importing UrbanDictionary into Watson caused it to start cursing constantly. These kinds of stories highlight how important the quality of data being used for the basis of AI/ML systems can be.

Think of a future when self-driving cars are being programmed with failsafes to avoid living things in the roadway. Suppose that the car has been programmed to avoid humans and other large animals like horses and cows. But, during the import of the small animal data set, the table for dogs isn’t imported for some reason. Now, what would happen if the car encountered a dog in the road? Would it make the right decision to avoid the animal? Would the outline of the dog trigger a subroutine that helped it make the right decision? Or would the car not be able to tell what a dog was and do something horrible?

Do You See What I See?

After some chatting with my friend Ryan Adzima, he taught me a bit about how facial recognition systems work. I had always assumed that these systems could differentiate on things like colors. So it could tell a blond woman from a brunette, for instance. But Ryan told me that it’s actually very difficult for a system to tell fine colors apart.

Instead, systems try to create contrast in the colors of the picture so that certain features stand out. Those features have a grid overlaid on them and then those grids are compared and contrasted. That’s the fastest way for a system to discern between individuals. It makes sense considering how CPU-bound things are today and the lack of high definition cameras to capture information for the system.

But, we also must realize that we have to improve data collection for our AI/ML systems in order to ensure that the systems are receiving good data to make decisions. We need to build validation models into our systems and checks to make sure the data looks and sounds sane at the point of input. These are the kinds of things that take time and careful consideration when planning to ensure they don’t become a hinderance to the system. If the very safeguards we put in place to keep data correct end up causing problems, we’re going to create a system that falls apart before it can do what it was designed to do.


Tom’s Take

I thought the story about the AI training was a bit humorous, but it does belie a huge issue with computer systems going forward. We need to be absolutely sure of the veracity of our data as we begin using it to train systems to think for themselves. Sure, teaching a Jeopardy-winning system to curse is one thing. But if we teach a system to be racist or murderous because of what information we give it to make decisions, we will have programmed a new life form to exhibit the worst of us instead of the best.

AI, Machine Learning, and The Hitchhiker’s Guide

Deep_Thought

I had a great conversation with Ed Horley (@EHorley) and Patrick Hubbard (@FerventGeek) last night around new technologies. We were waxing intellectual about all things related to advances in analytics and intelligence. There’s been more than a few questions here at VMworld 2016 about the roles that machine learning and artificial intelligence will play in the future of IT. But during the conversation with Ed and Patrick, I finally hit on the perfect analogy for machine learning and artificial intelligence (AI). It’s pretty easy to follow along, so don’t panic.

The Answer

Machine learning is an amazing technology. It can extrapolate patterns in large data sets and provide insight from seemingly random things. It can also teach machines to think about problems and find solutions. Rather than go back to the tired Target big data example, I much prefer this example of a computer learning to play Super Mario World:

You can see how the algorithms learn how to play the game and find newer, better paths throughout the level. One of the things that’s always struck me about the computer’s decision skills is how early it learned that spin jumps provide more benefit than regular jumps for a given input. You can see the point in the video when this is figured out by the system, whereafter all jumps become spinning for maximum effect.

Machine learning appears to be insightful and amazing. But the weakness of machine learning is be exemplified by Deep Thought from The Hitchhiker’s Guide to the Galaxy. Deep Thought was created to find the answer to the ultimate question of life, the universe, and everything. It was programmed with an enormous dataset – literally every piece of knowledge in the known universe. After seven million years, it finally produces The Answer (42, if you’re curious). Which leads to the plot of the book and other hijinks.

Machine learning is capable of great leaps of logic, but it operates on a fundamental truth: all inputs are a bounded data set. Whether you are performing a simple test on a small data set or combing all the information in the universe for answers you are still operating on finite information. Machine learning can do things very fast and find pieces of data that are not immediately obvious. But it can’t operate outside the bounds of the data set without additional input. Even the largest analytics clusters won’t produce additional output without more data being ingested. Machine learning is capable of doing amazing things. But it won’t ever create new information outside of what it is operating on.

The Question

Artificial Intelligence (AI), on the other hand, is more like the question in The Hitchhiker’s Guide. Deep Thought admonishes the users of the system that rather than looking for the answer to Life, The Universe, and Everything, they should have been looking for The Question instead. That involves creating a completely different computer to find the Question that matches the Answer that Deep Thought has already provided.

AI can provide insight above and beyond a given data set input. It can provide context where none exists. It can make leaps of logic similarly to those that humans are capable of doing. AI doesn’t simply stop when it faces an incomplete data set. Even though we are seeing AI in infancy today, the most advanced systems are capable of “filling in the blanks” to cover missing information. As the algorithms learn more and more how to extrapolate they’ll become better at making incomplete decisions.

The reason why computers are so good at making quick decisions is because they don’t operate outside the bounds of the possible. If the entire universe for a decision is a data set, they won’t try to look around that. That ability to look beyond and try to create new data where none exists is the hallmark of intelligence. Using tools to create is a uniquely biologic function. Computers can create subsets of data with tools but they can’t do a complete transformation.

AI is pushing those boundaries. Given enough time and the proper input, AI can make the leaps outside of bounds to come up with new ideas. Today it’s all about context. Tomorrow may find AI providing true creativity. AI will eventually pass a Turing Test because it can look outside the script and provide the pseudorandom type of conversation that people are capable of.


Tom’s Take

Computers are smart. They think faster than we do. They can do math better than we can. They can produce results in a fraction of the time at a scale that boggles the mind. But machine learning is still working from a known set. No matter how much work we pour into that aspect of things, we are still going to hit a limit of the ability of the system to break out of it’s bounds.

True AI will behave like we do. It will look for answers when their are none. It will imagine when confronted with the impossible. It will learn from mistakes and create solutions to them. That’s the power of intelligence versus learning. Even the most power computer in the universe couldn’t break out of its programming. It needed something more to question the answers it created. Just like us, it needed to sit back and let its mind wander.

Networking Needs Information, Not Data

GameAfoot

Networking Field Day 12 starts today. There are a lot of great presenters lined up. As I talk to more and more networking companies, it’s becoming obvious that simply moving packets is not the way to go now. Instead, the real sizzle is in telling you all about those packets instead. Not packet inspection but analytics.

Tell Me More, Tell Me More

Ask any networking professional and they’ll tell you that the systems they manage have a wealth of information. SNMP can give you monitoring data for a set of points defined in database files. Other protocols like NetFlow or sFlow can give you more granular data about a particular packet group of data flow in your network. Even more advanced projects like Intel’s Snap are building on the idea of using telemetry to collect disparate data sources and build collection methodologies to do something with them.

The concern that becomes quickly apparent is the overwhelming amount of data being received from all these sources. It reminds me a bit of this scene:

How can you drink from this firehose? Maybe you should be asking if you should instead?

Order From Chaos

Data is useless. We need to perform analysis on it to get information. That’s where a new wave of companies is coming into the networking market. They are building on the frameworks and systems that are aggregating data and presenting it in a way that makes it useful information. Instead of random data points about NetFlow, these solutions tell you that you’ve got a huge problem with outbound traffic of a specific type that is sent at a specific time with a specific payload. The difference is that instead of sorting through data to make sense of it, you’ve got a tool delivering the analysis instead of the raw data.

Sometimes it’s as simple as color-coding lines of Wireshark captures. Resets are bad, so they show up red. Properly torn down connections are good so they are green. You can instantly figure out how good things are going by looking for the colors. That’s analysis from raw data. The real trick in modern networking monitoring is to find a way to analyze and provide context for massive amounts of data that may not have an immediate correlation.

Networking professionals are smart people. They can intuit a lot of potential issues from a given data set. They can make the logical leap to a specific issue given time. What reduces that ability is the sheer amount of things that can go wrong with a particular system and the speed at which those problems must be fixed, especially at scale. A hiccup on one end of the network can be catastrophic on the others if allowed to persist.

Analytics can give us the context we need. It can provide confidence levels for common problems. It can ensure that symptoms are indeed happening above a given baseline or threshold. It can help us narrow the symptoms and potential issues before we even look at the data. Analytics can exclude the impossible while highlighting the more probably causes and outcomes. Analytics can give us peace of mind.


Tom’s Take

Analytics isn’t doing our job for us. Instead, it’s giving us the ability to concentrate. Anyone that spends their time sifting through data to try and find patterns is losing the signal in the noise. Patterns are things that software can find easily. We need to leverage the work being put into network analytics systems to help us track down the issues before they blow up into full problems. We need to apply the thing that makes network professionals the best suited to look at the best information we can gather about a situation. Our concentration on what matters is where our job will be in five years. Let’s take the knowledge we have and apply it.