About Tom Hollingsworth

Tom Hollingsworth, CCIE #29213, is a network engineer that works with Cisco, HP, Microsoft, VMware, and various other technologies. Tom has been in the IT industry since 2002, and has been a nerd since he first drew breath.

Security’s Secret Shame

photo-2

Heartbleed captured quite a bit of news these past few days.  A hole in the most secure of web services tends to make people a bit anxious.  Racing to release patches and assess the damage consumed people for days.  While I was a bit concerned about the possibilities of exposed private keys on any number of web servers, the overriding thought in my mind was instead about the speed at which we went from “totally secure” to “Swiss cheese security” almost overnight.

Jumping The Gun

As it turns out, the release of the information about Heartbleed was a bit sudden.  The rumor is that the people that discovered the bug were racing to release the information as soon as the OpenSSL patch was ready because they were afraid that the information had already leaked out to the wider exploiting community.  I was a bit surprised when I learned this little bit of information.

Exploit disclosure has gone through many phases in recent years.  In days past, the procedure was to report the exploit to the vendor responsible.  The vendor in question would create a patch for the issue and prepare their support organization to answer questions about the patch.  The vendor would then release the patch with a report on the impact.  Users would read the report and patch the systems.

Today, researchers that aren’t willing to wait for vendors to patch systems instead perform an end-run around the system.  Rather than waiting to let the vendors release the patches on their cycle, they release the information about the exploit as soon as they can.  The nice ones give the vendors a chance to fix things.  The less savory folks want the press of discovering a new hole and project the news far and wide at every opportunity.

Shame Is The Game

Part of the reason to release exploit information quickly is to shame the vendor into quickly releasing patch information.  Researchers taking this route are fed up with the slow quality assurance (QA) cycle inside vendor shops.  Instead, they short circuit the system by putting the issue out in the open and driving the vendor to release a fix immediately.

While this approach does have its place among vendors that move a glacial patching pace, one must wonder how much good is really being accomplished.  Patching systems isn’t a quick fix.  If you’ve ever been forced to turn out work under a crucial deadline while people were shouting at you, you know the kind of work that gets put out.  Vendor patches must be tested against released code and there can be no issues that would cause existing functionality to break.  Imagine the backlash if a fix to the SSL module cause the web service to fail on startup.  The fix would be worse than the problem.

Rather than rushing to release news of an exploit, researchers need to understand the greater issues at play.  Releasing an exploit for the sake of saving lives is understandable.  Releasing it for the fame of being first isn’t as critical.  Instead of trying to shame vendors into releasing a patch rapidly to plug some hole, why not work with them instead to identify the issue and push the patch through?  Shaming vendors will only put pressure on them to release questionable code.  It will also alienate the vendors from researchers doing   things the right way.


Tom’s Take

Shaming is the rage now.  We shame vendors, users, and even pets.  Problems have taken a back seat to assigning blame.  We try to force people to change or fix things by making a mockery of what they’ve messed up.  It’s time to stop.  Rather than pointing and laughing at those making the mistake, you should pick up a keyboard and help them fix it. Shaming doesn’t do anything other than upset people.  Let’s put it to bed and make things better by working together instead of screaming “Ha ha!” when things go wrong.

End Of The CLI? Or Just The Start?

update1-leak3-hero

Windows 8.1 Update 1 launches today. The latest chapter in Microsoft’s newest OS includes a new feature people been asking for since release: the Start Menu. The biggest single UI change in Windows 8 was the removal of the familiar Start button in favor of a combined dashboard / Start screen. While the new screen is much better for touch devices, the desktop population has been screaming for the return of the Start Menu. Windows 8.1 brought the button back, although it only linked to the Start screen. Update 1 promises to add functionality to the button once more. As I thought about it, I realized there are parallels here that we in the networking world can learn as well.

Some very smart people out there, like Colin McNamara (@ColinMcNamara) and Matt Oswalt (@Mierdin) have been talking about the end of the command line interface (CLI). With the advent of programmable networks and API-driven configuration the CLI is archaic and unnecessary, or so the argument goes. Yet, there is a very strong contingent of the networking world that is clinging to the comfortable glow of a terminal screen and 80-column text entry.

Command The Line

API-driven interfaces provide flexibility that we can’t hope to match in a human interface. There is no doubt that a large portion of the configuration of future devices will be done via API call or some sort of centralized interface that programs the end device automatically. Yet, as I’ve said before, engineers don’t like have visibility into a system. Getting rid of the CLI for the sake of streamlining a device is a bad idea.

I’ve worked with many devices that don’t have a CLI. Cisco Catalyst Express switches leap immediately to mind. Other devices, like the Cisco UC500 SMB phone system, have a CLI but use of it is discouraged. In face, when you configure the UC500 using the CLI, you start getting warnings about not being able to use the GUI tool any longer. Yet there are functions that are only visible through the CLI.

Non-Starter

Will the programmable networking world will make the same mistake Microsoft did with Windows 8? Even a token CLI is better than cutting it out entirely. Programmable networking will allow all kinds of neat tricks. For instance, we can present a Cisco-like CLI for one group of users and a Juniper-like CLI for a different group that both accomplish the same results. We don’t need to have these CLIs sitting around resident memory. We should be able to generate them on the fly or call the appropriate interfaces from a centralized library. Extensibility, even in the archaic interface of last resort.

If all our talk revolves around the removal of the tool people have been using for decades to program devices you will make enemies quickly. The talk needs to shift from the death of CLI and more toward the advantages gained through adding API interfaces to your programming. Even if our interface into calling those APIs looks similar to a comfortable CLI, you’re going to win more converts up front if you give them something they recognize as a transition mechanism.


Tom’s Take

Microsoft bit off more than they could chew when they exiled the Start Menu to the same pile as DOSShell and Microsoft Bob. People have spent almost 20 years humming the Rolling Stones’ “Start Me Up” as they click on that menu. Microsoft drove users to this approach. To pull it out from under them all at once with no transition plan made for unhappy users. Networking advocates need to be just as cognizant of the fact that we’re headed down the same path. We need to provide transition options for the die-hard engineers out there so they can learn how to program devices via non-traditional interfaces. If we try to make them quit cold turkey you can be sure the Start Menu discussion will pale in comparison.

The Sunset of Windows XP

WindowsXPMeadow

The end is nigh for Microsoft’s most successful operating system of all time. Windows XP is finally set to reach the end of support next Tuesday. After twelve and a half years, and having its death sentence commuted at least twice, it’s time for the sunset of the “experienced” OS. Article after article has been posted as of late discussing the impact of the end of support for XP. The sky is falling for the faithful. But is it really?

Yes, as of April 8, 2014, Windows XP will no longer be supported from a patching perspective. You won’t be able to call in a get help on a cryptic error message. But will your computer spontaneously combust? Is your system going to refuse to boot entirely and force you at gunpoint to go out and buy a new Windows 8.1 system?

No. That’s silly. XP is going to continue to run just as it has for the last decade. XP will be as secure on April 9th as it was on April 7th. But it will still function. Rather than writing about how evil Microsoft is for abandoning an operating system after one of the longest support cycles in their history, let’s instead look at why XP is still so popular and how we can fix that.

XP is still a popular OS with manufacturing systems and things like automated teller machines (ATMs). That might be because of the ease with which XP could be installed onto commodity hardware. It could also be due to the difficulty in writing drivers for Linux for a large portion of XP’s life. For better or worse, IT professionals have inherited a huge amount of embedded systems running an OS that got the last major service pack almost six years ago.

For a moment, I’m going to take the ATMs out of the equation. I’ll come back to them in a minute. For the other embedded systems that don’t dole out cash, why is support so necessary? If it’s a manufacturing system that’s been running for the last three or four years what is another year of support from Microsoft going to get you? Odds are good that any support that manufacturing system needs is going to be entirely unrelated to the operating system. If we treat these systems just like an embedded OS that can’t be changed or modified, we find that we can still develop patches for the applications running on top of the OS. And since XP is one of the most well documented systems out there, finding folks to write those patches shouldn’t be difficult.

In fact, I’m surprised there hasn’t been more talk of third party vendors writing patches for XP. I saw more than a few start popping up once Windows 2000 started entering the end of its life. It’s all a matter of the money. Banks have already started negotiating with Microsoft to get an extension of support for their ATM networks. It’s funny how a few million dollars will do that. SMBs are likely to be left out in the cold for specialized systems due to the prohibitive cost of an extended maintenance contract, either from Microsoft or another third party. After all, the money to pay those developers needs to come from somewhere.


Tom’s Take

Microsoft is not the bad guy here. They supported XP as long as they could. Technology changes a lot in 12 years. The users aren’t to blame either. The myth of a fast upgrade cycle doesn’t exist for most small businesses and enterprises. Every month that your PC can keep running the accounting software is another month of profits. So who’s fault is the end of the world?

Instead of looking at it as the end, we need to start learning how to cope with unsupported software. Rather than tilting at the windmills in Remond and begging for just another month or two of token support we should be investigating ways to transition XP systems that can’t be upgraded within 6-8 months to an embedded systems plan. We’ve reached the point where we can’t count on anyone else to fix our XP problems but ourselves. Once we have the known, immutable fact of no more XP support, we can start planning for the inevitable – life after retirement.

The Alignment of Net Neutrality

Net neutrality has been getting a lot of press as of late, especially as AT&T and Netflix have been sparring back and forth in the press.  The FCC has already said they are going to take a look at net neutrality to make sure everyone is on a level playing field.  ISPs have already made their position clear.  Where is all of this posturing going to leave the users?

Chaotic Neutral

Broadband service usage has skyrocketed in the past few years.  Ideas that would never have been possible even 5 years ago are now commonplace.  Netflix and Hulu have made it possible to watch television without cable.  Internet voice over IP (VoIP) allows a house to have a phone without a phone line.  Amazon has replaced weekly trips to the local department store for all but the most crucial staple items.  All of this made possible by high speed network connectivity.

But broadband doesn’t just happen.  ISPs must build out their networks to support the growing hunger for faster Internet connectivity.  Web surfing and email aren’t the only game in town.  Now, we have streaming video, online multiplayer, and persistently connected devices all over the home.  The Internet of Things is going to consume a huge amount of bandwidth in an average home as more smart devices are brought online.  ISPs are trying to meet the needs of their subscribers.  But are they going far enough?

ISPs want to build networks their customers will use, and more importantly pay to use.  They want to ensure that complaints are kept to a minimum while providing the services that customers demand.  Those ISP networks cost a hefty sum.  Given the choice between paying to upgrade a network and trying to squeeze another month or two out of existing equipment, you can guarantee the ISPs are going to take the cheaper route.  Coincidentally, that’s one of the reasons why the largest backers of 802.1aq Shortest Path Bridging were ISP-oriented.  SPB doesn’t require new equipment to forward frames (like TRILL).  ISPs can use existing equipment to deliver SPB with no out-of-pocket expenditure on hardware.  That little bit of trivia should give you an idea why ISPs are trying to do away with net neutrality.

True Neutral

ISPs want to keep using their existing equipment as long as possible.  Every dollar they make from this cycle’s capital expenditure means a dollar of profit in their pocket before they have to replace a switch.  If there was a way to charge even more money for existing services, you can better believe they would do it.  Which is why this infographic hits home for most:

net-neutrality

Charging for service tiers would suit ISPs just fine.  After all, as the argument goes, you are using more than the average user.  Shouldn’t you shoulder the financial burden of increased network utilization?  That’s fine for corner cases like developers or large consumers of downstream bandwidth.  But with Netflix usage increasing across the board, why should the ISP charge you more on top of a Netflix subscription?  Shouldn’t their network anticipate the growing popularity of streaming video?

The other piece of the tiered offering above that should give pause is the common carrier rules for service providers.  Common carriers get to be absolved of liability for the things they transport because they have to agree to transport everything offered to them.  What do you think would happen if those carriers suddenly decide they want to discriminate about what they send?  If that discrimination revokes their common carrier status, what’s to stop them from acting like a private carrier and start refusing to transport certain applications or content?  Maybe forcing a video service to negotiate a separate peering agreement for every ISP they want to use?  Who would do that?

Neutral Good

Net Neutrality has to exist to ensure that we are free to use the services we want to consume.  Sure, this means that things like Quality of Service (QoS) can’t be applied to packets to ensure they are all being treated equally.  The inverse is to have guaranteed delivery for an additional fee.  And every service you add on to the top would incur more fees.  New multiplayer game launching next week? The ISP will charge you an extra $5 per month to insure you have a low ping time to beat the other guy.  If you don’t buy the package, your multiplayer traffic gets dumped in with Netflix and the rest of the bulk traffic.

This is part of the reason why Google Fiber is such a threat to existing ISPs.  When the only options for local loop delivery are the cable company and the phone company, it’s difficult to have options that aren’t being tiered in the absence of neutrality.  With viable third party fiber buildouts like Google starting to spring up it becomes a bargaining chip to increase speeds to users and upgrade backbones to support heavy usage.  If you don’t believe that, look at what AT&T did immediately after Google announced Google Fiber in Austin, TX.


Tom’s Take

ISPs shouldn’t be able to play favorites with their customers.  End users are paying for a connection.  End users are also paying services to use their offerings.  Why should we have to pay for a service twice if the ISP wants to charge me more in a tiering setup?  That smells of a protection racket in many ways.  I can imagine the ISP techs sitting there in a slick suit saying, “That’s a nice connection you got there.  It would be a shame if something were to happen to it.”  Instead, it’s up to the users to demand ISPs offer free and unrestricted access to all content.  In some cases, that will mean backing alternatives and “voting with your dollar” to make the message be heard loud and clear.  I won’t sign up for services that have data usage caps or metered speed limits past a certain ceiling.  I would drop any ISP that wants me to pay extra just because I decide to start using a video streaming service or a smart thermostat.  It’s time for ISPs to understand that hardware should be an investment in future customer happiness and not a tool that’s used to squeeze another dime out of their user base.

Throw More Storage At It

All of  this has happened before.   All of this will happen again.

The more time I spend listening to storage engineers talk about the pressing issues they face in designing systems in this day and age, the more I’m convinced that we fight the same problems over and over again with different technologies.  Whether it be networking, storage, or even wireless, the same architecture problems crop in new ways and require different engineers to solve them all over again.

Quality is Problem One

A chance conversation with Howard Marks (@DeepStorageNet) at Storage Field Day 4 led me to start thinking about these problems.  He was talking during a presentation about the difficulty that storage vendors have faced in implementing quality of service (QoS) in storage arrays.  As Howard described some of the issues with isolating neighboring workloads and ensuring they can’t cause performance issues for a specific application, I started thinking about the implementation of low latency queuing (LLQ) for voice in networking.

LLQ was created to solve a specific problem.  High volume, low bandwidth flows can starve traditional priority queuing systems.  In much the same way, applications that demand high amounts of input/output operations per second (IOPS) while storing very little data can cause huge headaches for storage admins.  Storage has tried to solve this problem with hardware in the past by creating things like write-back caching or even super fast flash storage caching tiers.

Make It Go Away

In fact, a lot of the problems in storage mirror those from networking world many years ago.  Performance issues used to have a very simple solution – throw more hardware at the problem until  it goes away.  In networking, you throw bandwidth at the issue.  In storage, more IOPS are the answer.  When hardware isn’t pushed to the absolute limit, the answer will always be to keep driving it higher and higher.  But what happens when performance can’t fix the problem any more?

Think of a sports car like the Bugatti Veyron.  It is the fastest production car available today, with a top speed well over 250 miles per hour.  In fact, Bugatti refuses to talk about the absolute top speed, instead countering “Would you ask how fast a jet can fly?”  One of the limiting factors in attaining a true measure of the speed isn’t found in the car’s engine.  Instead, the sub systems around the engine being to fail at such stressful speeds.  At 258 miles per hour, the tires on the car will completely disintegrate in 15 minutes.  The fuel tank will be emptied in 12 minutes.  Bugatti wisely installed a governor on the engine limiting it to a maximum of 253 miles per hour in an attempt to prevent people from pushing the car to its limits.  A software control designed to prevent performance issues by creating artificial limits.  Sound familiar?

Storage has hit the wall when it comes to performance.  PCIe flash storage devices are capable of hundreds of thousands of IOPS.  A single PCI card has the throughput of an entire data center.  Hungry applications can quickly saturate the weak link in a system.  In days gone by, that was the IOPS capacity of a storage device.  Today, it’s the link that connects the flash device to the rest of the system.  Not until the advent of PCIe was the flash storage device fast enough to keep pace with workloads starved for performance.

Storage isn’t the only technology with hungry workloads stressing weak connection points.  Networking is quickly reaching this point with the shift to cloud computing.  Now, instead of the east-west traffic between data center racks being a point of contention, the connection point between the user and the public cloud will now be stressed to the breaking point.  WAN connection speeds have the same issues that flash storage devices do with non-PCIe interfaces.  They are quickly saturated by the amount of outbound traffic generated by hungry applications.  In this case, those applications are located in some far away data center instead of being next door on a storage array.  The net result is the same – resource contention.


Tom’s Take

Performance is king.  The faster, stronger, better widgets win in the marketplace.  When architecting systems it is often much easier to specify a bigger, faster device to alleviate performance problems.  Sadly, that only works to a point.  Eventually, the performance problems will shift to components that can’t be upgraded.  IOPS give way to transfer speeds on SATA connectors.  Data center traffic will be slowed by WAN uplinks.  Even the CPU in a virtual server will eventually become an issue after throwing more RAM at a problem.  Rather than just throwing performance at problems until they disappear, we need to take a long hard look at how we build things and make good decisions up front to prevent problems before they happen.

Like the Veyron engineers, we have to make smart choices about limiting the performance of a piece of hardware to ensure that other bottlenecks do not happen.  It’s not fun to explain why a given workload doesn’t get priority treatment.  It’s not glamorous to tell someone their archival data doesn’t need to live in a flash storage tier.  But it’s the decision that must be made to ensure the system is operating at peak performance.

Replacing Nielsen With Big Data

I’m a fan of television. I like watching interesting programs. It’s been a few years since one has caught my attention long enough to keep my watching for multiple seasons. Part of that reason is due to my fear that an awesome program is going to fail to reach a “target market” and will end up canceled just when it’s getting good. It’s happened to several programs that I liked in the past.

Sampling the Goods

Part of the issue with tracking the popularity of television programs comes from the archaic method in which programs are measured. Almost everyone has heard of the Nielsen ratings system. This sampling method was created in the 1930s as a way to measure radio advertising reach. In the 50s, it was adapted for use in television.

Nielsen selects target audiences that represent the greater whole. They ask users to keep written diaries of their television watching habits. They also have the ability to install a device called a set meter which allows viewers to punch in a code to identify themselves via age groups and lock in a view for a program. The set meter can tell the instant a channel is changed or the TV is powered off.

In theory, the sampling methodology is sound. In practice, it’s a bit shaky. View diaries are unreliable because people tend to overreport their view habits. If they feel guilty that they haven’t been writing anything down, they tend to look up something that was on TV and write it down. Diaries also can’t determine if a viewer watched the entire program or changed the channel in the middle. Set meters aren’t much better. The reliance on PIN codes to identify users can lead to misreported results. Adults in a hurry will sometimes punch in an easier code assigned to their children, leading to skewed age results.

Both the diary and the set meter fail to take into account the shift in viewing habits in modern households. Most of the TV viewing in my house takes place through time-shifted DVR recordings. My kids tend to monopolize the TV during the day, but even they are moving to using services like Netflix to watch all episodes of their favorite cartoons in one sitting. Neither of these viewing habits are easily tracked by Nielsen.

How can we find a happy medium? Sample sizes have been reduced signifcantly due to cord-cutting households moving to Internet distribution models. People tend to exaggerate or manipulate self-reported viewing results. Even “modern” Nielsen technology can’t keep up. What’s the answer?

Big Data

I know what you’re saying: “We’ve already got that with Nielsen, right?” Not quite. TV viewing habits have shifted in the past few years. So has TV technology. Thanks to the shift from analog broadcast signals to digital and the explosion of set top boxes for cable decryption and movie service usage, we now have a huge portal into the living room of every TV watcher in the world.

Think about for a moment. The idea of a sample size works provided it’s a good representative sample. But tracking this data is problematic. If we have access to a way to crunch the actual data instead of extrapolating from incomplete sets shouldn’t we use that instead? I’d rather believe the real numbers instead of trying to guess from unreliable sources.

This also fixes the issue of time-shifted viewing. Those same set top boxes are often responsible for recording the programs. They could provide information such as number of shows recorded versus viewed and whether or not viewers skip through commercials. For those that view on mobile devices that data could be compiled as well through integration with the set top box. User logins are required for mobile apps as it is today. It’s just a small step to integrating the whole package.

It would require a bit of technical upgrading on the client side. We would have to enable the set top boxes to report data back to a service. We could anonymize the data to a point to be sure that people aren’t being unnecessarily exposed. It will also have to be configured as an opt-out setting to ensure that the majority is represented. Opt-in won’t work because those checkboxes never get checked.

Advertisers are going to demand the most specific information about people that they can. The ratings service exists to work for the advertisers. If this plan is going to work, a new company will have to be created to collect and analyze this data. This way the analysis company can ensure that the data is specific enough to be of use to the advertisers while at the same time ensuring the protection of the viewers.


Tom’s Take

Every year, promising new TV shows are yanked off the airwaves because advertisers don’t see any revenue. Shows that have a great premise can’t get up to steam because of ratings. We need to fix this system. In the old days, the deluge of data would have drown Nielsen. Today, we have the technology to collect, analyze, and store that data for eternity. We can finally get the real statistics on how many people watched Jericho or After MASH. Armed with real numbers, we can make intelligent decisions about what to keep on TV and what to jettison. And that’s a big data project I’d be willing to watch.

Google+ And The Quest For Omniscience

GooglePlusEverything

When you mention Google+ to people, you tend to get a very pointed reaction. Outside of a select few influencers, I have yet to hear anyone say good things about it. This opinion isn’t helped by the recent moves by Google to make Google+ the backend authentication mechanism for their services. What’s Google’s aim here?

Google+ draws immediate comparisons to Facebook. Most people will tell you that Google+ is a poor implementation of the world’s most popular social media site. I would tend to agree for a few reasons. I find it hard to filter things in Google+. The lack of a real API means that I can’t interact with it via my preferred clients. I don’t want to log into a separate web interface simply to ingest endless streams of animated GIFs with the occasional nugget of information that was likely posted somewhere else in the first place.

It’s the Apps

One thing the Google of old was very good at doing was creating applications that people needed. GMail and Google Apps are things I use daily. Youtube gets visits all the time. I still use Google Maps to look things up when I’m away from my phone. Each of these apps represent a separate development train and unique way of looking at things. They were more integrated than some of the attempts I’ve seen to tie together applications at other vendors. They were missing one thing as far as Google was concerned: you.

Google+ isn’t a social network. It’s a database. It’s an identity store that Google uses to nail down exactly who you are. Every +1 tells them something about you. However, that’s not good enough. Google can only prosper if they can refine their algorithms.  Each discrete piece of information they gather needs to be augmented by more information.  In order to do that, they need to increase their database.  That means they need to drive adoption of their social network.  But they can’t force people to use Google+, right?

That’s where the plan to integrate Google+ as the backend authentication system makes nefarious sense. They’ve already gotten you hooked on their apps. You comment on Youtube or use Maps to figure out where the nearest Starbucks already. Google wants to know that. They want to figure out how to structure AdWords to show you more ads for local coffee shops or categorize your comments on music videos to sell you Google Play music subscriptions. Above all else, they can use that information as a product to advertisers.

Build It Before They Come

It’s devilishly simple. It’s also going to be more effective than Facebook’s approach. Ask yourself this: when’s the last time you used Facebook Mail? Facebook started out with the lofty goal of gathering all the information that it could about people. Then they realized the same thing that Google did: You have to collect information on what people are using to get the whole picture. Facebook couldn’t introduce a new system, so they had to start making apps.

Except people generally look at those apps and push them to the side. Mail is a perfect example. Even when Facebook tried to force people to use it as their primary communication method their users rebelled against the idea. Now, Facebook is being railroaded into using their data store as a backend authentication mechanism for third party sites. I know you’ve seen the “log In With Facebook” buttons already. I’ve even written about it recently. You probably figured out this is going to be less successful for a singular reason: control.

Unlike Google+ and the integration will all Google apps, third parties that utilize Facebook logins can choose to restrict the information that is shared with Facebook. Given the climate of privacy in the world today, it stands to reason that people are going to start being very selective about the information that is shared with these kinds of data sinks. Thanks to the Facebook login API, a significant portion of the collected information never has to be shared back to Facebook. On the other hand, Google+ is just making a simple backend authorization. Given that they’ve turned on Google+ identities for Youtube commenting without a second though, it does make you wonder what other data their collecting without really thinking about it.


Tom’s Take

I don’t use Google+. I post things there via API hacks. I do it because Google as a search engine is too valuable to ignore. However, I don’t actively choose to use Google+ or any of the apps that are now integrated into it. I won’t comment on Youtube. I doubt I’ll use the Google Maps functions that are integrated into Google+. I don’t like having a half-baked social media network forced on me. I like it even less when it’s a ham-handed attempt to gather even more data on me to sell to someone willing to pay to market to me. Rather than trying to be the anti-Facebook, Google should stand up for the rights of their product…uh, I mean customers.

The Visibility Event Horizon

black_hole1

I’ve always been a science nerd.  Especially when it comes to astronomy.  I’ve always been fascinated by stellar objects.  The most fascinating has to be the black hole.  A region of space with intense gravity formed from the collapse of a stellar body.  I’ve read all about the peculiar properties of classical black holes.  Of particular interest to the networking field is the idea of the event horizon.

An event horizon is a boundary beyond which events no longer affect observers.  In layman’s terms, it is the point of no return for things falling into a black hole.  Anything that falls below the event horizon disappears from the perspective of the observer.  From the point of view of someone falling into the black hole, they never reach the event horizon yet are unable to contact the outside world.  The event horizon marks the point at which information disappears in a system.

How does this apply to the networking world?  Well, every system has a visibility boundary.  We tend to summarize information heading in both directions.  To a network engineer, everything above the transport layer of the OSI model doesn’t really matter.  Those are application decisions that are made by programmers that don’t affect the system other than to be a drain on resources.  To the programmers and admins, anything below the session layer of the OSI model is of little importance.  As long as a call can be made to utilize network resources, who cares what’s going on down there?

Looking Down

Software Defined Networking (SDN) vendors are enforcing these event horizons.  VMware NSX and the Microsoft Hyper-V virtual networking solution both function in a similar manner.  They both create overlay networks that virtualize resources below the level of the host.  Tunnels are created between devices or systems that ride on top of the physical network beneath.  This means that the overlay can function no matter the state of the underlay, provided a path between the two hosts exists.  However, it also means that the overlay obscures the details of the physical network.

There are many that would call this abstraction.  Why should the hosts care about the network state?  All they really want is a path to reach a destination.  It’s better to give them what they want and leave the gory details up to routing protocols and layer 2 loop avoidance mechanisms.  But, that abstraction becomes an event horizon when the overlay is unwilling (or unable) to process information from the underlay network.

Applications and hosts should be aware enough to listen to network conditions.  Overlays should not rely on OSPF or BGP to make tunnel endpoint rerouting decisions.  Putting undue strain on network processing is part of what has led to the situation we have now, where network operating systems need to be complex and intensive to calculate solutions to problems that could be better solved at a higher level.

If the network reports a traffic condition, like a failed link or a congested WAN circuit, that information should be able to flow back up to the overlay and act as a data point to trigger an alternate solution or path.  Breaking the event horizon for information flowing back up toward the overlay is crucial to allow the complex network constructs we’ve created, such as fabrics, to utilize the best possible solutions for application traffic.

Gazing Up

That’s not to say the event horizon doesn’t exist in the other direction as well.  The network has historically been ignorant of the needs of applications at a higher layer.  Network engineers have spent thousands of hours of time creating things like Quality of Service in an attempt to meet the unique needs of higher level programs.  Sometimes this works in a vacuum with no problems provided we’ve guess accurately enough to predict traffic patterns.  Other times, it fails spectacularly when the information changes too quickly.

The underlay network needs to destroy the event horizon that prevents information at higher layers from flowing down into the network.  Companies that have historically concentrated on networking alone have started to see how important this intelligence can be.  By allowing the network to respond to the needs of applications quickly developers can provide enough information to ensure that they programs are treated fairly by changing network conditions even without needing to listen to them.  In this way, the application people can no longer claim the network is a “black hole”.


Tom’s Take

Even as I was writing this, a number of news stories came out from a paper by Professor Stephen Hawking that stated that the classical event horizon doesn’t exist.  The short, short version is that the conditions close to a quantum singularity preclude a well-defined boundary that prevents the escape of all information, including light.  Pretty heady stuff.

In networking, we do have the luxury of a well-defined boundary between underlay networks and overlay networks.  We’ve seen the damage caused by the apparent event horizon for years.  Critical information wasn’t flowing back and forth as needed to help each side provide the best experience for users and engineers.  We need to ensure that this barrier is removed going forward.  The networking people can’t exist in a vacuum pretending that applications don’t have needs.  The overlay admins need to understand the the underlay is a storehouse of critical information and shouldn’t be ignored simply because tunnels are awesome.  Knowing about the event horizon is the first step to finding a way to blast through it.

The Value of the Internet of Things

NestPrice
The recent sale of IBM’s x86 server business to Lenovo has people in the industry talking.  Some of the conversation has centered around the selling price.  Lenovo picked up IBM’s servers for $2.3 billion, which is almost 66% less than the initial asking price of $6 billion two years ago.  That price drew immediate comparisons to the Google acquisition of Nest, which was $3.2 billion.  Many people asked how a gadget maker with only two shipping products could be worth more than the entirety of IBM’s server business.
Are You Being Served?
It says a lot for the decline of hardware manufacturing, especially at the low end.  IT departments have been moving away from smaller, task focused servers for many years now.  Instead of buying a new 1U, dual socket machine to host an application, developers have used server virtualization as a way to spin up new services quickly with very little additional cost.  That means that older low end servers aren’t being replaced when they reach the end of their life.  Those workloads are being virtualized and moved away while the equipment is permanently retired.
It also means that the target for server manufacturers is no longer the low end.  IT departments that have seen the benefits of virtualization now want larger servers with more memory and CPU power to insert into virtual clusters.  Why license several small servers when I can save money by buying a really big server?  With advances in SAN technology and parts that can be replaced without powering down the system, the need to have multiple systems for failover is practically negated.
And those virtual workloads are easily migrated away from onsite hardware as well.  The shift to cloud computing is the coup-de-gras for the low end server market.  It is just as easy to spin up an Amazon Web Services (AWS) instance to test software as it is to provision new hardware or a virtual cluster.  Companies looking to do hybrid cloud testing or public cloud deployments don’t want to spend money on hardware for the data center.  They would rather pour that money into AWS instances.
Those Internet Things
I think the disparity in the purchase price also speaks volumes for the value yet to be recognized in the Internet of Things (IoT).  Nest was worth so much to Google because it gave them an avenue not previously available.  Google wants to have as many devices in your home as it can afford to acquire.  Each of those devices can provide data to tune Google’s algorithms and provide quality data to advertisers that pay Google handsomely for those analytics.
IoT devices don’t need home servers.  They don’t ask for DNS entries.  They don’t have web interfaces.  The only setup needed out of the box is a connection to the wireless network in your home.  Once that happens, IoT devices usually connect back to a server in the cloud.  The customer accesses the device via an application downloaded from an app store.  No need for any additional hardware in the customer’s home.
IoT devices need infrastructure to work effectively.  However, they don’t need that infrastructure to exist on premises.  The shift to cloud computing means that these devices are happy to exist anywhere without dependence on hardware.  Users are more than willing to download apps to control them instead of debating how to configure the web UI.  Without the need for low end hardware to run these devices, the market for that hardware is effectively dead.

Tom’s Take
I think IBM got exactly what they wanted when they offloaded their server business.  They can now concentrate on services and software.  The kinds of things that are going to be important in the Internet of Things.  Rather than lamenting the fire sale price of a dying product line, we should instead by looking to the value still locked inside IoT devices and how much higher it can go.

A Plan To Fix E-Rate

The federal E-Rate program is in the news again. This time, it is due to a mention in the president’s State of the Union speech. He asked a deceptively simple question: “Why can’t schools have the same kind of wifi we have in coffee shops?” After the speech, press releases went flying from the Federal Communications Commission (FCC) talking about a restructuring plan that will eliminate older parts of the Federal Universal Service Fund (USF) like pagers and dial-up connections.

This isn’t the first time that E-Rate has been skewered by the president. Back in June of 2013, he asked some tough questions about increasing the availability of broadband connections to schools. Back then, I thought a lot about what his aim was and how easily it would be accomplished. With the recent news, I feel that I have to say something that the government doesn’t want to hear but needs to be said.

Mr. President, E-Rate is fundamentally broken and needs to be completely overhauled before your goals are met.

E-Rate hasn’t really changed since its inception in 1997. All that’s happened is more rules being piled on the top to combat fraud and attempt to keep up with changing technologies. Yet no one has really looked at the landscape of education technology today to see how best to use E-Rate funding. Computer labs have given way to laptop carts or even tablet carts. T1 WAN connections are now metro fiber handoffs at speeds of 100Mbit or better. Servers do much more than run DNS or web servers.

E-Rate has to be completely overhauled. The program no longer meets the needs of its constituents. Today, it serves as a vehicle for service providers and resellers to make money while forcing as much technology into schools as their meager budgets can afford. When schools with a 90% discount percentage are still having a hard time meeting their funding commitments, you have to take a long hard look at the prices being charged and the value being offered to the schools.

With that in mind, I’ve decided to take a stab at fixing E-Rate. It’s not a perfect solution, but I think it’s a great start. We need to build on the important pieces and discard the things that no longer make sense. To that end, I’m suggesting the Priority 1 / Priority 2 split be abolished. Cisco even agrees with me (PDF Link). In it’s place, we need to take a hard look at what our schools need to educate the youth of America.

Tier 1: WAN Connections

Schools need faster WAN connections. Online testing has replaced Scantrons. Streaming video is augmenting the classroom. The majority of traffic is outbound to the Internet, not internally. T1/T3 doesn’t cut it any more. Schools are going to need 100Mbit or better to meet student needs. Yet providers are reluctant to build out fiber networks that are unprofitable. Schools don’t want to pay for expensive circuits that are going to be clogged with junk.

Tier 1 in my proposal will be funding for fast WAN circuits and the routers that connect them. In the current system, that router is Priority 2, so even if you get the 10Gbit circuit you asked for, you may not be able to light it if P2 doesn’t come through. Under my plan, these circuits would be mandated to be fiber. That way, you can increase the amount of bandwidth to a site without the need to run a new line. That’s important, since most schools find themselves quickly consuming additional bandwidth before they realize it. Having a circuit capable of having additional head room is key to the future.

Service providers would also be capped at the amount that they could charge on a monthly basis for the circuit. It does a school no good to order a 1Gbps fiber circuit if they can’t afford to pay for it every month. By capping the amount that SPs can charge, they will be forced to compete or find other means to fund build outs.

Tier 2: Wireless Infrastructure

Wireless is key to the LAN connectivity in schools today. The days of wiring honeycombing the walls is through. Yet, Priority 2 still has a cabling component. It’s time to bring out schools into the 21st century. To that end, Tier 2 of my plan will be focused entirely on improving school wireless connectivity. No more cable runs unless they have a wireless AP on the end. Switches must be PoE/PoE+ capable to support the wireless infrastructure.

In addition, wireless site surveys must be performed before any installation plan is approved. VARs tend to skimp on the surveys now due to inability to recover costs in a straightforward manner. Now, they must do them. The costs of the site survey will be a line item for the site that is capped based on discount percentage. This will lead to an overall reduction in the amount of equipment ordered and installed, so the costs are easy to justify. The capped amount keeps VARs from price gouging with unnecessary additional work that isn’t critical to the infrastructure.

Tier 3: Server Infrastructure

Servers are still an important part of education IT. Even though the applications and services they provide are being increasing outsourced to hosted facilities there will still be a need to maintain existing equipment. However, current E-Rate rules only allow servers to serve Internet functions like DNS, DHCP, or Web Servers. This is ridiculous. DNS is an integral part of Active Directory, so almost every server that is a member of a domain is running it. DHCP is a minuscule part of a server’s function. Given the costs of engineering multiple DHCP servers in a network, having this as a valid E-Rate function is pointless. And when’s the last time a school had their own web server? Hosting services provide scale and easy-of-use that can’t be matched by a small box running in the back of the data center.

Tier 3 of my plan has servers available for schools. However, the hardware can run only one approved role: hypervisors. If you take a server under my E-Rate plan, it has to run ESX/Hyper-V/KVM on the bare metal. This means that ordering fewer big servers will be key to run virtual workloads. They cost allocation nightmare is over. These servers will be performing hypervisor duties all the time. The end user will be responsible for licensing the OS running on the guest. That gets rid of the gray areas we have today.

If you take a virtual server under Tier 3, you must provide a migration plan for your existing non-virtualized workloads. That means that once you accept Tier 3 funding for a server, you have one calendar year to migrate your workloads to that system. After that year, you can no longer claim existing servers as eligible. Moving to the future shouldn’t be painful, but buying a big server and not taking advantage of it is silly. If you show the foresight to use virtualization you’re going to use it all the way.

Of course, for those schools that don’t want to take a server because their workloads already exist in private clouds like Amazon Web Services (AWS) or Rackspace, there will be funding for AWS as well. We have to embrace the available options to ensure our students are learning at the fullest capacity.

Tom’s Take

E-Rate is a fifteen year old program in need of a remodel. The current system is underfunded, prone to gaming, and will eventually collapse in on itself. When USF is forced to rely on rollover funds from previous years to meet funding goals even at 90% something has to change. Priority 1 is bleeding E-Rate dry. The above plan focuses on the technology needed for schools to continue efficiently educating students in the coming years. It meets the immediate needs of education without starving the fund, since an increase is unlikely to come, even though other parts of USF have a sketchy reputation at best, as a quick Google search about the USF-funded cell phone program will attest. As Damon Killian said in The Running Man, “Hard times call for hard choices.” We have to be willing to blow up E-Rate as we know it in order to refocus it to make it serve the ultimate goal: Educating our students.

Disclaimer

Because I know someone from the FCC or SLD is going to read this, here’s the following disclaimer: I don’t work for an E-Rate provider. While I have in the past, this post does not reflect the opinions of anyone at that organization or any other organization involved in the consulting or execution of E-Rate. Don’t assume that because I think the program is broken that means the people that are still working with the program should be punished. They are doing good work while still conforming to the craziest red tape ever. Don’t punish them because I spoke out. Instead, fix the system so no one has to speak out again.