Network Visibility with Barefoot Deep Insight

As you may have heard this week, Barefoot Networks is back in the news with the release of their newest product, Barefoot Deep Insight. Choosing to go down the road of naming a thing after what it actually does, Barefoot has created a solution to finding out why network packets are behaving the way they are.

Observer Problem

It’s no secret that modern network monitoring is coming out of the Dark Ages. ping, traceroute, and SNMP aren’t exactly the best tools to be giving any kind of real information about things. They were designed for a different time with much less packet flow. Even Netflow can’t keep up with modern networks running at multi-gigabit speeds. And even if it could, it’s still missing in-flight data about network paths and packet delays.

Imagine standing outside of the Holland Tunnel. You know that a car entered at a specific time. And you see the car exit. But you don’t know what happened to the car in between. If the car takes 5 minutes to traverse the tunnel you have no way of knowing if that’s normal or not. Likewise, if a car is delayed and takes 7-8 minutes to exit you can’t tell what caused the delay. Without being able to see the car at various points along the journey you are essentially guessing about the state of the transit network at any given time.

Trying to solve this problem in a network can be difficult. That’s because the OS running on the devices doesn’t generally lend itself to easy monitoring. The old days of SNMP proved that time and time again. Today’s networks are getting a bit better with regard to APIs and the like. You could even go all the way up the food chain and buy something like Cisco Tetration if you absolutely needed that much visibility.

Embedding Reporting

Barefoot solves this problem by using their P4 language in concert with the Tofino chipset to provide a way for there to be visibility into the packets as they traverse the network. P4 gives Tofino the flexibility to build on to the data plane processing of a packet. Rather than bolting the monitoring on after the fact you can now put it right along side the packet flow and collect information as it happens.

The other key is that the real work is done by the Deep Insight Analytics Software running outside of the switch. The Analytics platform takes the data collected from the Tofino switches and starts processing it. It creates baselines of traffic patterns and starts looking for anomalies in the data. This is why Deep Insight claims to be able to detect microbursts. Because the monitoring platform can analyze the data being fed to it and provide the operator with insights.

It’s important to note that this is info only. The insights gathered from Deep Insight are for informational purposes. This is where the skill of network professional comes into play. By gaining perspective into what could be causing issues like microbursts from the software you gain the ability to take your skills and fix those issues. Perhaps it’s a misconfigured ECMP pair. Maybe it’s a dead or dying cable in a link. Armed with the data from the platform, you can work your networking magic to make it right.

Barefoot says that Deep Insight builds on itself via machine learning. While machine learning is seems to be one of the buzzwords du jour it could be hoped that a platform that can analyze the states of packets can start to build an idea of what’s causing them to behave in certain ways. While not mentioned in the press release, it could also be inferred that there are ways to upload the data from your system to a larger set of servers. Then you can have more analytics applied to the datasets and more insights extracted.


Tom’s Take

The Deep Insight platform is what I was hoping to see from Barefoot after I saw them earlier this year at Networking Field Day 14. They are taking the flexibility of the Tofino chip and the extensibility of P4 and combining them to build new and exciting things that run right alongside the data plane on the switches. This means that they can provide the kinds of tools that companies are willing to pay quite a bit for and do it in a way that is 100% capable of being audited and extended by brilliant programmers. I hope that Deep Insight takes off and sees wide adoption for Barefoot customers. That will be the biggest endorsement of what they’re doing and give them a long runway to building more in the future.

Networking Needs Information, Not Data

GameAfoot

Networking Field Day 12 starts today. There are a lot of great presenters lined up. As I talk to more and more networking companies, it’s becoming obvious that simply moving packets is not the way to go now. Instead, the real sizzle is in telling you all about those packets instead. Not packet inspection but analytics.

Tell Me More, Tell Me More

Ask any networking professional and they’ll tell you that the systems they manage have a wealth of information. SNMP can give you monitoring data for a set of points defined in database files. Other protocols like NetFlow or sFlow can give you more granular data about a particular packet group of data flow in your network. Even more advanced projects like Intel’s Snap are building on the idea of using telemetry to collect disparate data sources and build collection methodologies to do something with them.

The concern that becomes quickly apparent is the overwhelming amount of data being received from all these sources. It reminds me a bit of this scene:

How can you drink from this firehose? Maybe you should be asking if you should instead?

Order From Chaos

Data is useless. We need to perform analysis on it to get information. That’s where a new wave of companies is coming into the networking market. They are building on the frameworks and systems that are aggregating data and presenting it in a way that makes it useful information. Instead of random data points about NetFlow, these solutions tell you that you’ve got a huge problem with outbound traffic of a specific type that is sent at a specific time with a specific payload. The difference is that instead of sorting through data to make sense of it, you’ve got a tool delivering the analysis instead of the raw data.

Sometimes it’s as simple as color-coding lines of Wireshark captures. Resets are bad, so they show up red. Properly torn down connections are good so they are green. You can instantly figure out how good things are going by looking for the colors. That’s analysis from raw data. The real trick in modern networking monitoring is to find a way to analyze and provide context for massive amounts of data that may not have an immediate correlation.

Networking professionals are smart people. They can intuit a lot of potential issues from a given data set. They can make the logical leap to a specific issue given time. What reduces that ability is the sheer amount of things that can go wrong with a particular system and the speed at which those problems must be fixed, especially at scale. A hiccup on one end of the network can be catastrophic on the others if allowed to persist.

Analytics can give us the context we need. It can provide confidence levels for common problems. It can ensure that symptoms are indeed happening above a given baseline or threshold. It can help us narrow the symptoms and potential issues before we even look at the data. Analytics can exclude the impossible while highlighting the more probably causes and outcomes. Analytics can give us peace of mind.


Tom’s Take

Analytics isn’t doing our job for us. Instead, it’s giving us the ability to concentrate. Anyone that spends their time sifting through data to try and find patterns is losing the signal in the noise. Patterns are things that software can find easily. We need to leverage the work being put into network analytics systems to help us track down the issues before they blow up into full problems. We need to apply the thing that makes network professionals the best suited to look at the best information we can gather about a situation. Our concentration on what matters is where our job will be in five years. Let’s take the knowledge we have and apply it.

More Bang For Your Budget With Whitebox

white-box-sdn-nfv

As whitebox switching starts coming to the forefront of the next buying cycle for enterprises, decision makers are naturally wondering about the advantages of buying cheaper hardware. Is a whitebox switch going to provide more value for me than buying something from an established vendor? Where are the real savings? Is whitebox really for me? One of the answers to this puzzle comes not from the savings in whitebox purchases, but the capability inherent in rapid deployment.

Ten Thousand Spoons

When users are looking at the acquisition cost advantages of buying whitebox switches, they typically don’t see what they would like to see. Ridiculously cheap hardware isn’t the norm. Instead, you see a switch that can be bought for a decent discount. That does take into account that most vendors will give substantial one-time discounts to customers to entice them into more lucrative options like advanced support or professional services.

The purchasing advantage of whitebox doesn’t just come from reduced costs. It comes from additional unit purchases. Purchasing budgets don’t typically spell out that you are allowed to buy ten switches and three firewalls. They more often state that you are allowed to spend a certain dollar amount on devices of a specific type. Savvy shoppers will find deals or discounts to get more for their dollar. The real world of purchasing budgets means that every dollar will be spent, lest the available dollars get reduced next year.

With whitebox, that purchasing power translates into additional units for the same budget amount. If I could buy three switches from Vendor X or five switches from Whitebox Vendor Y, ceteris paribus I would buy the whitebox switches. If the purpose of the purchase was to connect 144 ports, then that means I have two extra switches lying around. Which does seem a bit wasteful.

However, the option of having spares on the shelf becomes very appealing. Networks are supposed to be built in a way to minimize or eliminate downtime because of failure. The network must continue to run if a switch dies. But what happens to the dead switch? In most current cases, the switch must be sent in for warranty replacement. Services contracts with large networking vendors give you the option for 4-hour, overnight, or next business day replacements. These vendors will even cross-ship you the part. But you are still down the dead switch. If the other part of the redundant pair goes down, you are going to be dead in the water.

With an extra whitebox switch on the shelf you can have a ready replacement. Just slip it into place and let your orchestration and provisioning software do the rest. While the replacement is shipping, you still have redundancy. It also saves you from needing to buy a hugely expensive (and wildly profitable) advanced support contract.

All You Need Is A Knife

Suppose for a moment that we do have these switches sitting around on a shelf doing nothing but waiting for the inevitable failure in the network. From a cost perspective, it’s neutral. I spent the same budget either way, so an unutilized switch is costing me nothing. However, what if I could do something with that switch?

The real advantage of whitebox in this scenario comes from the ability to use non-switching OSes on the hardware. Think for a moment about something like a network packet monitor. In the past, we’ve needed to download specialized software and slip a probing device into the network just for the purposes of packet collection. What if that could be done by a switch? What if the same hardware that is forwarding packets through the network could also be used to monitor them as well?

Imagine creating an operating system that runs on top of something like ONIE for the purpose of being a network tap. Now, instead of specialized hardware for that purpose you only need to go and use one of the switches you have lying around on the shelf and repurpose it into a sensor. And when it’s served that purpose, you put it back on the shelf and wait until there is a failure before going back to push it into production as a replacement. With Chef or Puppet, you could even have the switch boot into a sensor identity for a few days and then provision it back to being a data forwarding switch afterwards. No need for messy complicated software images or clever hacks.

Now, extend those ideas beyond sensors. Think about generic hardware that could be repurposed for any function. A switch could boot up as an inline firewall. That firewall could be repurposed into a load balancer for the end of the quarter. It could then become a passive IDS during an attack. All without moving. The only limitation is the imagination of the people writing code for the device. It may not ever top the performance of a device running purely for the purpose of a given function, but the flexibility of having a device that can serve multiple functions without massive reconfiguration would win out in the long run for many applications. Flexibility is more key than overwhelming performance.


Tom’s Take

Whitebox is still finding a purpose in the enterprise. It’s been embraced by webscale, but the value to the enterprise is not found in massive capabilities like that. Instead, the additional purchasing power that can be derived from additional unit purchases for the same dollar amount leads to reduced support contract costs and even new functionality increases from existing hardware lying around that can be made to do so many other things. Who could have imagined that a simple switch could be made to do the job of many other purpose-built devices in the data center? Isn’t it ironic, don’t you think?

 

The Vision Of A ThousandEyes

ThousandEyes_Logo

Scott Adams wrote a blog post once about career advice and whether is was better to be excellent at one thing or good at several things. Basically, being the best at something is fairly hard. There’s always going to be someone smarter or faster than you doing it just a bit better. Many times it’s just as good to be very good at what you do. The magic comes when you take two or three things that are very good and combine them in a way that no one has seen before to make something amazing. The kind of thing that makes people gaze in wonder then immediately start figuring out how to use your thing to be great.

During Networking Field Day 6, ThousandEyes showed the delegates something very similar to what Scott Adams was talking about. ThousandEyes uses tools like Traceroute, Ping, and BGP data aggregation to collect data. These tools aren’t overly special in and of themselves. Ping and Traceroute are built into almost every networking stack. BGP looking glass servers and data analysis have been available publicly for a while and can be leveraged in a tool like BGPMon. All very good tools. What ThousandEyes did was combine them in a way to make them better.

ThousandEyes can show data all along the path of a packet. I can see response times and hop-by-hop trajectory. I can see my data leave one autonomous system (AS) and land in another. Want to know what upstream providers your ISP is using? ThousandEyes can tell you that. All that data can be collected in a cloud dashboard. You can keep tabs on it to know if you service level agreements (SLAs) are being met. Or, you could think outside the box and do something that I found very impressive.

Let’s say you are a popular website that angered someone. Maybe you published an unflattering article. Maybe you cut off a user doing something they should have. Maybe someone out there just has a grudge. With the nuclear options available to most “hackers” today, the distributed denial of service (DDoS) attack seems to be a popular choice. So popular that DDoS mitigation services have sprung up to shoulder the load. The basic idea is that when you determine that you’re being slammed with gigabits of traffic, you just swing the DNS for your website to a service that starts scrubbing away attack traffic and steering legitimate traffic to your site. In theory it should prevent the attackers from taking you offline. But how can you prove it’s working?

ThousandEyes can do just that. In the above video, they show what happened when Bank of America (BoA) was recently knocked offline by a huge DDoS attack. The information showed two of the three DDoS mitigation services were engaged. The third changeover didn’t happen. All that traffic was still being dumped on BoA’s servers. Those BoA boxes couldn’t keep up with what they were seeing, so even the legitimate traffic that was being forwarded on by the mitigation scrubbers got lost in the noise. Now, if ThousandEyes can tell you which mitigation provider failed to engage then that’s a powerful tool to have on your side when you go back to them and tell them to get their act together. And that’s just one example.

I hate calling ISPs to fix circuits because it never seems to be their fault. No matter what I do or who I talk to it never seems to be anything inside the provider network. Instead, it’s up to me to fiddle with knobs and buttons to find the right combination of settings to make my problem go away, especially if it’s packet loss. Now, imagine if you had something like ThousandEyes on your side. Not only could you see the path that your packets are taking through your ISP, you can check latency and see routing loops and suboptimal paths. And, you can take a screenshot of it to forward to the escalation tech during those uncomfortable phone arguments about where the problem lies. No fuss, no muss. Just the information you need to make your case and get the problem fixed.

If you’d like to learn more about ThousandEyes and their monitoring solutions, check out their website at http://www.thousandeyes.com. You can also follow them on Twitter as @ThousandEyes.


Tom’s Take

Vision is a funny thing. Some have it. Some don’t. Having vision can mean many things. It can be someone who assembles tools in a novel way to solve a problem. It can be the ability to collect data and “see” what’s going on in a network path. It can also mean being able to take that approach and use it in a non-obvious way to provide a critical service to application providers that they’ve never had before. Or, as we later found out at Networking Field Day 6 during a presentation with Solarwinds, it can mean having the sense to realize when someone is doing something right, as Joel Dolisy said when asked about ThousandEyes, “Oh, we’ve got our eye on them.” That’s a lot of vision. A ThousandEyes worth.

Special thanks to Ivan Pepelnjak (@IOSHints) for giving me some ideas on this review.

Networking Field Day Disclaimer

While I was not an official delegate at Networking Field Day 6, I did participate in the presentations and discussions. ThousandEyes was a sponsor of Networking Field Day 6. In addition to hosting a presentation in their offices, they provided snacks and drink for the delegates. They also provided a gift bag with a vacuum water bottle, luggage tag, T-shirt, and stickers (which I somehow managed to misplace). At no time did they ask for any consideration in the writing of this review, nor were they offered any. Independence means no restrictions.  The analysis and conclusions contained in this post are mine and mine alone.

Statseeker – Information Is Ammunition

The first presenter at Network Field Day 4 came to us from another time and place.  Stewart Reed came to us all the way from Brisbane, Australia to talk to us about his network monitoring software from Statseeker.  I’ve seen Statseeker before at Cisco Live and you likely have too if you been.  They’re the group that always gives away a Statseeker-themed Mini on the show floor.  They’ve also recently done a podcast with the Packet Pushers.

We got into the room with Stewart and he gave us a great overview of who Statseeker is and what they do:

He’s a great presenter and really hits on the points that differentiates Statseeker.  I was amazed by the fact that they said they can keep historical data for a very long period of time.  I’ve managed to crash a network monitoring system years ago by trying to monitor too many switch ports.  Keeping up with all that information was like drinking from a firehose.  Trying to keep that data for long periods of time was a fantasy.  Statseeker, on the other hand, has managed to find a way to not only keep up with all that information but keep it around for later use.  Stewart said one of my new favorite quotes during the presentation, “Whoever has the best notes wins.”  Not only do they have notes that go back for a long time, but their notes don’t suffer from averaging abstraction.  When most systems say that they keep data for long periods of time, what they really mean is that they keep the 15 or 30 minute average data for a while.  I’ve even seen some go to day or week data points in order to reduce the amount of stored data.  Statseeker takes one minute data polls and keeps those one minute data polls for the life of the data.  I can drill into the interface specs at 8:37 on June 10th, 2008 if I want.  Do you think anyone really wants to argue with someone that keeps notes like that?

Of course, what would Network Field Day be without questions:

One of the big things that comes right out in this discussion is the idea that Statseeker doesn’t allow for customer SNMP monitoring.  By restricting the number of OIDs that can be monitored to a smaller subset, this allows for the large-scale port monitoring and long term data storage that Statseeker can provide.  I mean, when you get right down to it, how many times have you had to write your own custom SNMP query for an odd OID?  The majority of the customers that Statseeker are likely going to have something like 90% overlap in what they want to look at.  Restricting the ability to get crazy with monitoring makes this product simple to install and easy to manage.  At the risk of overusing a cliche, this is more in line with Apple model of restriction with focus on ease of use.  Of course, if Statseeker wants to start referring to themselves as the Apple of Network Monitoring, by all means go right ahead.

The other piece from this second video that I liked was the mention that the minimum Statseeker license is 1000 units.  Stewart admits that below that price point, it argument for Statseeker begins to break down somewhat.  This kind of admission is refreshing in the networking world.  You can’t be everything to everyone.  By focusing on long term data storage and quick polling intervals, you obviously have to scale your system to hit a specific port count target.  If you really want to push that same product down into an environment that only monitors around 200 ports, you are going to have to make some concessions.  You also have to compete with smaller, cheaper tools like MRTG and Cacti. I love that they know where they compete best and don’t worry about trying to sell to everyone.

Of course, a live demo never hurts:

If you’d like to learn more about Statseeker, you can head over to their website at http://www.statseeker.com/.  You can also follow them on Twitter as @statseeker.  Be sure to tell them to change their avatar and tweet more.  You can see hear about Statseeker’s presentation in the Packet Pushers Priority Queue Show 14.


Tom’s Take

Statseeker has some amazing data gathering capabilities.  I personally have never needed to go back three years to win an argument about network performance, but knowing that I can is always nice.  Add in the fact that I can monitor every port on the network and you can see the appeal.  I don’t know if Statseeker really fits into the size of environment that I typically work in, but it’s nice to know that it’s there in case I need it.  I expect to see some great things from them in the future and I might even put my name in the hat for the car at Cisco Live next year.

Tech Field Day Disclaimer

Statseeker was a sponsor of Network Field Day 4.  As such, they were responsible for covering a portion of my travel and lodging expenses while attending Network Field Day 4. They did not ask for, nor where they promised any kind of consideration in the writing of this review.  The opinions and analysis provided within are my own and any errors or omissions are mine and mine alone.

Additional Network Field Day 4 Coverage:

StatseekerThe Lone Sysadmin

Statseeker – Keeping An Eye On The Little ThingsLamejournal