Tune In and Switch Off

As I sit here right now, the country of Egypt is a black hole on the Internet.  All 3,500 prefixes originated by Egypt’s four major ISPs have been withdrawn from the global BGP table.  There is no route into or out of the country, save the one ISP utilized by the Egyptian Stock Market, most likely in an effort to keep the country’s economy from collapsing.  This follows on the heels of other government interference in cybercommunications in Tunisia this past month and Iran last year.  Egypt, however, is the first country to completely darken the Internet in an effort to keep services such as Twitter and Facebook from coordinating resistance and allowing information to be disseminated to the world at large.  I learned a very long time ago that arguing about politics never leads anywhere.  What I would like to comment on, however, is the trend toward censoring information by disrupting network communication.

Egypt yanked all Internet access for its citizens in an effort to control information.  Tunisia has been accused of affecting Internet traffic for its citizens as well, blocking certain routes and causing outages on the Web.  Iran limited access to social media and even attempted to severely rate limit Internet traffic during the election protests last year.  This trend shows that governments are starting to realize the power that the Internet provides to disaffected groups of people.  No longer to “subversives” need to meet in underground basements or abandoned warehouses.  Those places have been replaced by chat rooms and e-mail.  Relying on one or two trustworthy individuals to get the word out by smuggling rolls of film to the mass media has been replaced with instant pictures being uploaded from a cell phone to Twitter or Flickr.  The speed with which protests can become revolutions has become frighteningly accelerated.  So too is the speed with which the affected government can slam the door shut on the ability for these revolutionaries to use the very media which they rely on to spread the word.  Egypt was able to successfully cut off access within a few hours of the first rumors of such a thing being contemplated.

For those of you that think that something like that could never happen here (here being the US), let me direct your attention to the Protecting Cyberspace as a National Asset Act.  This hotly debated bill would give the government more ability to combat large-scale cyber warfare and allow them to protect assets deemed vital to the national interest.  The biggest concern comes from a provision inserted that would give the president the ability to enact “emergency measures” to prevent a wide-reaching cyber attack.  This includes the power to shut down major networks for a period of up to 120 days.  After that time, Congress must either approve an extension, or the networks must be reactivated.  I won’t delve into some of the wilder conspiracy theories I’ve seen surrounding this bill, but the idea that our networks could be shut down without our consent to protect us is troubling.  According to my research, there is no provision that defines the situation that could cause a national shutdown.  The president, acting through the National Center for Cybersecurity and Communications (NCCC) Director, is supposed to inform the affected networks to enact their emergency measures and ensure the emergency actions represent the least disruptive means feasible to operations.  In other words, the NCCC director just has to tell you he shut you down and you should try to make things work as well as you can.

Using this as a possible scenario, assume some kind of external driver causes the president and the NCCC director to shut down a large portion of the Internet traffic.  It doesn’t have to be a revolution or something so sinister.  It could be a Stuxnet-type attack on critical power infrastructure.  Or maybe even a coordinated cyber attack like something out of a Tom Clancy novel.  In an attempt to deter the attack or mitigate the damage, let’s say the unprecedented step of withdrawing a large number of BGP prefixes is taken, similarly to what Egypt has done.  What kind of global chaos might this cause?  How many transit ASes exist in the US that would pass traffic around the world.  I’ve seen stories of how the World Trade Center attacks in 2001 caused a global Internet slowdown due to the amount of traffic that was passed through the networks located there.  That was two buildings.  Imagine withdrawing even half the traffic that flows through the US and networks located here.  What impact would that have?  The possibilities would be mind-boggling.  Even a carefully coordinated network shutdown would have far reaching impact that no one could foresee.  Chaos is funny like that.

The Internet, or cyberspace or whatever your term for it, is now something of a curiosity.  It exists on its own, independent of the laws of nations or man.  Those who seek to control information flow or restrict access find themselves quickly thwarted by the fact that packets and frames do not respect political boundaries.  For every attempt to shutdown The Pirate Bay, a simple move to different location allowed them to stay active.  Even when pressure was applied to the people behind the site, it was quickly seen that their creation had taken on a life of its own and would persist no matter what.  What of the Wikileaks saga, where the attempt to behead the organization by targeting its leader has only fanned its flames and most likely ensured its survival no matter what may happen to Julian Assange.  Those of us who live our lives in this electronic realm see differences in the way culture is developing.  There are lawless places in the Internet where mob rule is the law of the cyberland.  Information is never truly forgotten, merely pigeonholed away until it is needed again.  Attempts to impose political will upon the citizens of the Internet are usually met with force, protest, and in some cases, retribution.  I keep wondering when organizations are going to figure out that attempting to erase information is tantamount to daring the Internet to publicize it.  In the same way, attempting to shut down access the Internet and social media at large is a sure way to force people to circumvent these restrictions.  As we watched Egypt vanish from the cyber landscape last night, many of my friends remarked that it would only be a matter of time before someone challenged the blockade and won.  Someone could hack the edge routers and reestablish the BGP peering with the rest of the world and the floodgates would be opened again.  Whether or not that happens in the next few days remains to be seen.

As the world becomes more reliant on the Internet to provide information to everyone, we as cyber citizens must also remain vigilant to keep the information flowing freely.  The Internet by design lends itself to surviving major disruptions without totally crashing.  It is our responsibility to show the world that information wants to be learned and shared and no amount of meddling will change that.

I Am A Network Ninja

I am a network ninja.

I appear meek and uninteresting.  My stealth and guile are my weapons.  I have succeeded in my task if you never knew I fixed the network.  My khakis and polo shirt allow me to blend into the crowd of salespeople and marketing drones, yet hide my knowledge of BGP and MPLS.  I care not for the ritual troubleshooting combat of TAC engineers.  I use whatever methods I can to achieve victory over that which might harm my network.

My tools appear deceptive at first.  A laptop. A console cable. Simple business cards.  Yet, when in my hands, they become the weapons of legend.  Repeating Youtube videos to distract the masses while I work my network ninja magic.  A console cable to garrote those who question my skill with OSPF.  Business cards to use as shurikens to force back the account managers who dare to be technical and disrupt my troubleshooting chi.

My SNMP spies report every movement of the enemy to me.  I know what the battlefield will look like before the battle is even commenced.  With a flash of light from my LED iPhone camera flash, I mysteriously appear at your side, asking you why you are downloading a torrent on my network.  Before you can speak your lies, my honed reflexes have rate-limited your switchport.  I give you the choice of no choice.  Continue downloading at your peril, for your fate is in your own hands.  Before you can put up a fight explaining why you need a copy of Harry Potter before it’s released in theaters, I disappear with a puff of smoke, off to sow havoc in the server room.

I train my students without training.  I give them difficult configuration tasks and force them to gather information about switches by hand.  I make them learn from experience and recognize danger before they can perceive it.  I create spanning tree loops and redistribution quagmires so my students will never fear the hell of a network gone wrong.  When they realize that my difficult tasks and exacting manner have in fact taught them the way of network ninjitsu,  I send them into the world, knowing they will carry on my legacy and teach other network ninjas as they have been taught.

My deception is my strength.  When the C-level shoguns ask why the network is slow, I appear to give them many answers, yet I reveal none.  The questions for me become questions.  I ask him for his opinion of what might be wrong.  I feign ignorance in his presence, knowing how to use his ego and strength as a weapon of my own.  As he explains how the network needs his experience, I wait for the opening. When confusion reigns and all appears cloudy, I strike.  I turn off the shogun’s Internet radio and appear demure, claiming all should be well thanks to his enlightenment.  As I retreat to my network dojo, the shogun feels content, knowing he fixed the problems and showed his network serf a thing or two about the way things work.  My greatest work is fixing the network without fixing the network.

When others speak of the mysterious forces that weave magic and make the network bend in ways they never dreamed of, I shall laugh and tell them they must be dreaming.  There is no such thing as a network ninja.

Yet, I AM a network ninja…

These /8s Are Now Diamonds

The end times are upon us.  According to many reliable sources, the allocation of IPv4 addresses is quickly reaching it’s conclusion.  Of the final seven /8s available to be allocated by IANA, APNIC is about to exercise its option on two of them.  When that happens, the final 5 will be allocated as planned to each of the 5 Regional Internet Registrars (RIRs) and completely deplete the source pool of allocatable IPv4 addresses.  Now, before Chicken Little starts screaming about what this means, let’s take a step back and examine things.

I liken the total allocation addresses from IANA to the RIRs to a resource exhaustion.  For the sake of discussion, lets pretend the IPv4 addresses are diamonds.  The announcement of of total allocation would be like De Beers announcing that all of the diamonds in the earth’s crust have been mined and there are no more available.  That doesn’t necessarily mean that all the engagement rings and tennis bracelets for sale will disappear tomorrow.  What it means is the primary source for this resource is now depleted.  Just like we can’t manufacture any more IPv4 addresses, in this example we can’t mine any more diamonds.  So what happens next?

Usually, when a resource starts becoming scarce, the cost to acquire that resource will be driven up.  In this particular case, people like Greg Ferro have already suggested that there will be a “run” on addresses.  Again, this is a behavior that is typically seen when the announcement is made that a resource is becoming hard to come by.  I think that the RIRs will start putting policies in place to prevent ISPs and other parties from requesting more IPv4 addresses than they currently need.  That will prevent the pool that the RIRs currently possess from becoming depleted faster than necessary.  It will also hopefully stave off the resale of these addresses on the black market, which is a distant possibility but possible nonetheless.  Just like in our fictional diamond example, the price of the diamonds at the jewelry store will go up, and most stores will implement policies restricting the sale of large numbers of stones to single parties, so as to prevent hoarding and help keep prices high.  This will also stave off the onset of a diamond secondary market, where speculators will sell stockpiles of stones for exorbitant prices.

So, now that IANA has run out of addresses, what’s next?  Well, the next countdown becomes the date the first RIR runs out of it’s allocation.  Right now, that’s projected to be APNIC sometime in October 2011.  APNIC has been burning through its addressing at blinding speed, and so they find themselves at the head of the IPv4 exhaustion line.  Once APNIC runs out of addresses, the only thing they can offer their customers going forward is IPv6 address space.  For some customers, this will be rather unsavory.  These types of customers will feel that IPv6 hasn’t penetrated deep enough into the market.  They’ll pay any price for those precious v4 prefixes.  So, I imagine that APNIC and the other RIRs will hold a small portion of IPv4 address blocks in reserve for those customers that are willing to pay big bucks for them.  For the customers without the pocketbook or that don’t care about the address space they receive, they’ll get IPv6 and likely won’t think twice about it.  Just like in our diamond example, when the first jewelry supply runs out of stones purchased from De Beers the price will start going up.  Perhaps they’ll offer lab-created diamonds of similar quality.  But there will be customers that feel the lab-created stones are inferior and those same customers will pay a significant amount of money to get the “real” stones.  In these cases, the smart jewelry supplier will hold back some of the best gems in order to get a much better price for them.

Once the first RIR runs out of address, things will accelerate from there.  Depending on the level of IPv6 preparedness in the market, you may start seeing customers hosting equipment in locations where RIRs still have address space to assign.  I would sincerely hope that by the end of 2011 that most everyone has either begun their IPv6 prep in earnest or completed it with flying colors.  Otherwise, I predict there will be a migration of data centers to locations served by AfriNIC, which is the RIR with the largest block of unallocated /8s.  The final exhaustion of IPv4 addresses isn’t predicted to occur until July 2012.  That gives customers and plenty of time to decide how to implement IPv6 rather than moving data centers around to mop up what little IPv4 address space remains.  In our fictional example, as the jewelery suppliers start running out of diamonds to give to their customers, their customers will begin shopping around to find suppliers that still have stock, even if they have to start importing stock from supplies located overseas.

Once the last RIR runs out of addresses to give to its customers, then it will be a matter of time before the last of the IPv4 addresses are allocated to the final end users by the ISPs and other middle men.  Customers that deal directly with ARIN and RIPE and the other RIRs will be out of luck, instead needing to move to IPv6 to continue growing their Internet presence.  Hopefully by the time this occurs in late 2012, IPv6 will be firmly entrenched and the drive to allocate the final IPv4 address space will be greatly lessened.  With end users concentrating on their shiny new IPv6 address blocks, the last of the IPv4 addresses can be handed out to those truly in need.  After that, the Internet can go forward operating on a dual-stack of IPv4 and IPv6 until the last of the IPv4-only hosts go dark, leaving us totally on IPv6.  I doubt that day will ever truly come, but it’s a possibility that’s out there.  And in our fictional diamond example, people will still show off their fancy jewelry, but the world at large will start to turn to the next precious stone like rubies or sapphires.  While diamonds will never truly be gone, the demand for that which can no longer be obtained will be lessened greatly, only pursued by those with the resources to expend to obtain something so expensive.

The final and total allocation of IPv4 isn’t something that’s going to happen overnight.  It will be a death by degrees.  Just like boiling a frog one degree at a time, there was a time we might not have known what was going on until it was too late.  Thankfully, enough people have been getting the word out to cause everyone to start making their plans early and get ready for what is coming.  This was no more apparent to me as when I contacted a local technology group putting on a conference to request a presentation slot to speak about IPv4 exhaustion and IPv6 planning.  The person on the other end of the phone was a rather technical person, yet I had to spend some time explaining just what IPv6 was and why getting the word out was so important.  So, to those of you that have influence on communication and/or blogging and tweeting avenues, keep talking about what’s going on with IPv4 depletion as we approach the true end of the address space.  That way, we don’t find ourselves scrambling for diamonds at the last minute or wondering if we need to upgrade to Diamondv6.

For Want of a Nail

I have been weighed.  And measured.  And my configuring skills have been found…lacking.  Welcome, everyone, to the post-lab wrap up post.  Jan 20th looked like a good day for me.  I found all the faults in the troubleshooting section.  The config section looked very beatable to me.  I double checked all the configs with the hour and a half of extra time I had available.  I found some dumb mistakes that were immediately corrected.  And when I left the lab, I had a very good feeling about this one.  Alas, 58 minutes later, the score report I received indicated that I wasn’t as good as I had expected.

Without gory, rule-breaking details, I really thought I did better.  I had reachability in my lab without violating any of the constraints.  My paper was filled with two check marks next to all by one task that I was working on when time was called.  I re-read each question and physically put the point of my pencil on each word on the screen to make sure I didn’t read over anything and miss a subtle nuance that could sink me.  In the end, I think those nuances are what might have sunk me.  It’s not enough to have full reachability if you miss a little phrase that tells you to avoid a certain method or use a specific technique.  While I feel that I had accounted for all of those possibilities, the score report doesn’t think so.  And since the cold reality of the score report is more official than my high hopes, it looks like this trip to sunny San Jose was a bust as well.

After posting my results on Twitter, along with the condolences there were a lot of questions about whether or not I should as for a lab re-grade.  For those not familiar, the lab is partially graded by a script.  In cases where the script returns some strange results a proctor will look over your lab and double check certain tricky tasks.  Historically, I’ve gotten my scores after 3-5 hours.  The fact that I got this report while sitting at In-and-Out burger confused me.  Perhaps the first pass of the grading script didn’t have any issues with my configs.  One or two people even suggested that as a good sign that I might have even passed from the script alone.  So, as I stared at the percentages in front of me, I contemplated something I had never tried before.  I wanted them to take another look at my exam.  I wanted another proctor to take the configurations saved from my equipment and load them up onto new routers and regrade my exam by hand.  This process is not without it’s downsides, though.  First, it costs $250 for them to even do it.  If you get re-graded to a passing score, there is no refund.  Secondly, there is as much possibility of you losing points as there is passing.  This is because the human eyes of the proctor may catch something incorrect that the script missed.  So, you may have only been 3 points from passing, only to now find that you are 8 points away due to some other mistakes.  However, if you are very close to the mark, as in ~2-3%, there is no reason not to take the chance on the regrade.  After all, it’s still cheaper than a new $1400 lab attempt, right?  Third, the regrade attempt takes up to three weeks.  So if you’re hoping for a fast turnaround before deciding to take another lab attempt, you’re going to have a cool your jets for the better part of a month.  On the bright side, Cisco is usually very understanding and will refund the $1400 if you’ve booked another lab attempt and your re-read is honored and changed to a passing score.

Turns out, Cisco won’t even look at tests that are not “statistically close to the historical passing rate.” English?  If you aren’t within about 5% of the passing score, you don’t even have the option to have it regraded.  And, at least in Cisco’s eyes, I was wide of the mark this time.  I’ve filed a case with support to at least be granted the option for the regrade.  We’ll see what happens.  In the meantime, I’m getting back on the horse.  I’ve already had conversations with my employer about when and if there will be another attempt on their dime.  I’m not packing away any lab equipment for the time being.  I’m getting right back into the thick of things to keep it all fresh.  In fact, my wife and I went to the final closing party for a local pub that we’ve been going to since college.  As I sat there surrounded by all the merrymakers and noise of a bar full of people, all I could think of was how much I wanted to get back home and start labbing up some multicast questions.  It seems that the lab has transformed me and sharpened my focus into laser-like precision.  I think about how I’m going to configure tasks and how I’m going to move efficiently from one task to the next.  I pace myself based on the amount of work I can get done in 6 hours with a 40-minute lunch break.  I review flash cards and GNS3 configs during my free time.

No matter what, I’m going to go back and try this again.  I feel like the top of Everest is in sight and I just need one last push to reach the summit.  I’ve come too far to fail now.  More studying, more refining of my task config skills, more “stick time” for this airplane of router and switch configuration.  For want of a nail, this lab was lost.  But the next time around, the only nail will be the lab being nailed by me.

Avoiding Proctor Proctology

Other than perhaps professional sports referees, I can think of no more maligned group of individuals than the proctors in the CCIE labs.  The myths about them are legend.  They screw up your rack at lunch.  They do everything they can to make you fail.  They act obtuse when you ask questions because they don’t like passing new CCIEs.  For the most part, all the mystery and nastiness that surrounds the proctors is just a bunch of hokum.  They are people just like you and I.  They do their jobs just like you and I.  It just so happens that their job is closely identified with something that most people study for hundreds of hours to achieve, and when those people fail, they look to cast blame on others.

I’ve been to the lab once or twice.  I’ve met several proctors both there and at Cisco events.  People who only need to be known by first names.  People like Howard and Tom and Stefan.  And each time I encountered them outside the lab setting, they impressed me with their poise and charm and sense of humor.  Just like any position of authority, the CCIE lab proctor has a role to play once they step inside the walls of the lab.  They must project an  aura of authority and calm.  They must provide guidance where confusion reigns.  And above all, they must protect the integrity of the program.  When most people deal with them in this setting, they come off cold and uncaring.  Just like a judge or a police officer, it’s important to remember that the position of proctor is a role that must be played.  Keeping some things in mind before you step through the badged door will go a long way to making your proctor experience a smooth one.

1.  Be nice. Yes, it sounds silly, but it’s a very important thing to keep in mind.  People are amped up when the walk into the lab.  They are stressed and wired and strung out.  Some people retreat into a shell when stressed.  Some people lash out and are irritable.  What’s important to keep in mind is that when dealing with the proctor, you should keep a calm and even demeanor.  People tend to respond in kind with the emotions they are presented.  If you walk over to ask a question and are short and snippy, expect a short answer in return.  However, if you are nice and pleasant, it can go a long way toward getting a favorable answer quickly.  When you go to lunch, don’t be afraid to engage the proctor in some light conversation.  Talk about the weather or the food or a sports team.  Anything but the lab.  Engaging them outside the walls will give you a feel for how they react to things and can help you judge if they will be helpful when it comes time to utilize them.  Besides, it never hurts to be friendly.

2.  Ask ‘yes’ or ‘no’ questions. The most popular complaint I hear about proctors is that they never answer the question that you ask them.  Before you stand up to go ask for clarification, carefully word your question so that it can be answered with a simple ‘yes’ or ‘no’.  For instance, rather than asking “How should I configure this frame relay connection?”, instead ask “There are two ways two configure this frame relay connection, point-to-point and point-to-multipoint.  The question isn’t clear about which I should use.  I’m leaning toward configuring it as a point-to-point.  Would I be correct in this assumption?”  The proctor may still choose not to answer your question, but you’re more likely to get a binary yes/no response out of them.  By showing you understand the technology and you aren’t just trolling for an answer, you’ll appeal to their interpretive skills.  Remember that the proctor won’t give you an answer, no matter what.  They have too much riding on their job, reputation, and the CCIE program to ‘bend’ the rules for one candidate.

3.  Don’t assume a problem is something you can’t fix. Want to piss off a proctor really fast?  Walk up and say, “I can’t get OSPF to come up.  My rack must have a cabling problem.”  I promise that after the eye roll and gruff answer, you’ll fail the lab.  Candidates who are weak in troubleshooting skills tend to blame the physical layers first because it’s the one layer they can’t touch in the lab.  In the old 2-day lab, cabling could be an issue to resolve.  But in the 1-day lab, with cabling extracted from the candidate’s control, there is a 1-in-1,000 chance that the cabling is at fault.  Think about it like this: the lab equipment supports 3-5 candidates 5 days a week for 50 weeks a year.  Cabling issues will be caught and dealt with quickly.  Simple things like bad ports or bad cables will be caught when the lab racks are booted first thing in the morning.  You going to the proctor and claiming that OSPF is broken because your routers are wired backwards will only serve to irritate the proctor.  What will happen next is that the proctor will ask you to wait by his/her desk or make you wait in the RTP conference room.  They will go to your rack and start troubleshooting OSPF.  They’ll find out you misconfigured a network statement or forgot to enable authentication.  They’ll check to make absolutely sure it’s not a layer 1 problem, all while you are isolated away from your terminal.  Then, after 15-30 minutes, they’ll come get you and tell you, “It’s not a physical problem.”  That’s it.  No other information.  You won’t get to see what they did to find your problem.  You won’t get any hints about how to fix the issue.  And you won’t get those 15-30 minutes back.  Period.  On the other hand, if you go to the proctor with a long list of reasons why it has to be a layer 1 problem, like BERT or TDR output or a down/down physical interface that won’t come up at all, they’ll be more receptive due to your troubleshooting efforts.  And if they find a layer 1 fault during their troubleshooting, you’ll probably get the time back from their efforts and the time it takes to replace the faulty unit.

4.  Don’t cheat. Well, duh.  You’d think that should be a given.  But people still try to pull stupid things all the time.  And not just the braindumping.  Writing things down and then trying to sneak them out on the scratch paper, of which every scrap must be accounted for.  Trying to use a cell phone to discreetly look up answers.  Looking at other screens (for all the good it’ll do you).  Just don’t cheat, or even give the appearance you’re cheating.  That should keep the proctors from getting quite nasty with you.

In the end, just remember that these guys and gals are doing their jobs as defined by the CCIE program.  They work hard and do everything within their guidelines to help you pass.  They aren’t out to screw you, and if you keep your head about you and use some common sense and manners, they’ll be the best resource you’ll find in the lab.

Hooray for Bruno!

Twenty-four hours after Cisco ‘tweaked’ the CCIE lab troubleshooing section a little, we finally get a little bit of feedback about what they have wrought on the weary CCIE candidates.  This thread over on the Cisco Learning Network serves as the official announcement.  Further down the list is a response to some questions from concerned folks about what this could mean from Bruno van de Werve, who appears to be part of the CCIE program inside Cisco.  Bruno was also responsible for the excellent CCIE lab web interface video.  So, what did Bruno share with us?

1.  This is a virtual switch based on a router image, so the ports will be shutdown and not configured as switchports up front.  This about having a switch module in a router in Dynamips/GNS3 and I’m sure this is what the L2IOU is like.

2.  The image appears to be based more toward the 3550 rather than the 3560.  This comes from comments about the port being in dynamic desirable instead of the 3560 method.

3.  All interfaces are Ethernet, not FastEthernet or GigabitEthernet.  They are also arranged in groups of 4, e.g. e 0/0, e 0/1 and so on.  Due to the software limitations, you can only do interface range commands across the 4 ports of the module, so for instance setting up more than 4 trunks at once would require multiple interface range commands.  As an aside here, Bruno says that they are most likely not going to have more than 4 trunks per switch right now.

4.  These switches are based off of 12.2 mainline IOS with some L2 switching features added in.  They also lack any of the hardware ASICs necessary to do advanced features.  So, no Cat-QoS for instance (His words, not mine.  Cat-QoS, really???)

5.  There are only two switches at present in the TS section, and they are in the same IGP domain.  Bruno elaborates that they might add more switches later as features are added to the L2IOU image.

6.  There will still only be 10 questions in the TS section (his words).  So, at best you can probably expect 1-2 L2 questions for now.  He did say that there is a possibility that the questions could influence other tickets, so be on guard if you have a L2 question up front that it might affect an OSPF question later.

7.  There are no planned changes to the configuration section of the lab at this time.  No, they aren’t going to remove MPLS.  They aren’t screwing your config section totally.  Also, in a reply to one of the first commenters, Bruno said,

“At the moment, there won’t be any changes to the configuration section of the lab, so the two L2 troubleshooting questions will still be there.”

Ahem.  As my old econ professor in college used to say during test reviews…”Hint, hint, hint, oh hint.  You might see this on the exam.”  Guess what folks? Bruno just gave you a big hint.  You are still going to see L2 troubleshooting on the config section.  And if the historical trend is true, it’s going to be the first part of your lab.  “There are XXX faults in your initial lab configuration.  Correct these faults for 1 point each.  Be aware that these faults could impact configuration in other sections of the lab, and if they are not resolved and impact another working solution, you will not receive points for the non-working solution.”  So put your thinking caps on.

My Take

L2IOU was introduced to the troubleshooting section to crack down on the braindumpers.  Cisco knew they needed to add it to keep the cheaters off-balance, but the feature set of the IOU emulator wasn’t ready when the TS section went live.  I’m sure a team has been working on it night and day trying to get enough features built into it to make it a usable tool for the TS section.  It would be virtually impossible to have a physical setup for troubleshooting.  30+ routers would be a large rack, and now that you start adding physical switches in on top of it, it become unmanageable.  IOU made the most sense for the L3 section, but L2 is just as important for testing a candidate’s knowledge.  However, there wasn’t an emulator capable of perfectly virtualizing a switch.   And there still isn’t.  The fact that L2IOU got pushed out the door in the state that Bruno describes it in means that someone had to be done to stop the braindump hemmorage.  That’s the only reason I can think of to introduce a totally foreign software-emulated L2 device that by all accounts doesen’t resemble the feature set of the devices in the lab that you are already forced to troubleshoot a mere two hours after the beginning of the lab.  So why not put the additional troubleshooting questions in the config section where they can be done on real hardware?  Especially since there are only two L2IOU devices currently?

Because Cisco has never nailed down the exact nature of the TS section, it gives them more latitude in changing it quickly.  Adding ten new TS questions takes a lot less time than re-engineering a whole lab set.  Just like the OEQs before, the TS section is designed to weed out the weak candidates relying on someone else’s leaked materials rather than the tried-and-true studying methods that should be employed.

I still don’t like the change.  Not enough lead time.  A ham-handed attempt to fend off what appears to me to be a growing portion of lab candidates.  Seeing as how they appear to be phasing things in slowly and not increasing the impact of L2 until more features get baked in, I guess I can live with being a guinea pig for now.  But I sincerely hope that I pass my lab on this attempt so I don’t have to beta test new TS section features again.

Layer 2 – Electric Boogaloo

In alignment with CCIE level requirements, Cisco is adding L2 switching features to the CCIE R&S Troubleshooting exam through L2 IOS software on Unix (L2IOU) virtual environment. The new feature will be available starting January 17, 2011. The CCIE R&S exam consists of 2 sections b the troubleshooting (TS) section which runs two hours, and the Configuration (Config) section which is six hours. The Config lab utilizes actual physical devices in racks, whereas, the TS lab uses a virtual environment under IOU. IOU offers a very realistic simulation of router (L3+) features in the TS lab but until now had no L2 switch capability. With the addition of the new L2IOU, the TS lab will now include both L2 and L3 capabilities in the virtual environment.

Official Cisco Announcemnt (Thanks to Rob Routt for the text)

So, it appears that Cisco has finally added layer 2 (L2) troubleshooting to the first troubleshooting section of the lab.  Hmmm…

Ladies and gentlemen, a rant…

What?!?! Really?  Why did this have to happen four days before my exam!  Okay, it’s not fair to think that you’re singling anyone out with these changes, but when you added the open-ended questions (OEQ) in 2009, you did it the week before my second lab.  And that little change with the OEQs cost me one lab.  With luck like this, anyone going to Cisco Live Las Vegas 2011 should head down to the casino with me and bet the exact opposite of what I say. You’ll make millions.

I know the standard response.  Anything is fair game for troubleshooting on the TS section.  If I’m a prepared CCIE candidate, I should have no fear about this addition.  Everything will be fine, I’ll nail it.  All of these things are true, but it still doesn’t make me feel any better.  This is like going in to take a driver’s test only to find that the steering wheel is on the wrong side of the car!  Does it affect the test?  Not really, but it is a head scratcher.

If all I have to worry about is layer 3 faults, I don’t have to consider spanning tree or trunk ports or Q-in-Q tunneling or QoS mismatches or backup interfaces or Etherchannel misconfigurations or any one of a number of things.  No, adding L2 to the TS lab doesn’t make it necessarily harder, but it does broaden the range of things I have to think about when I start looking at a problem.  Irritating, this is.

Why the need to cram so much stuff into the beginning?  You know, you always did well with the first TS question of the config section – Hey, we broke three or four things in your lab that you need to fix before you get started.  You don’t get any points if they aren’t fixed, and since they probably are going to break other things, you won’t get points for other questions either.  Guess breaking things on five routers and four switches isn’t nearly as fun as doing it to 30 routers and who-knows-how-many switches.  I’m sure that the setup of the TS lab allows them to break lots and lots of interesting things in ways they can’t simulate in the config section.  But if that’s how things are going to be going forward, they need to remove any TS from the config section too, if it still exists.  After all, there are no sim questions on the CCIE written because they are covered in other sections.  Why should there be troubleshooting on the lab config section when you’ve so eloquently covered it in the TS section?  I figure that once the L2 TS stuff gets baked in, we might actually see this happen.

And another thing…one business day notice?  Really?  Come on now.  That’s just unfair.  Used to be, changes to the lab required six months of notification before going live to allow students the opportunity to study up on them and be prepared.  How do you think the folks taking the lab bright and early Monday morning are going to feel?  I guess that adding a whole new topic to the TS section doesn’t constitute a ‘major’ change anymore.  Or perhaps because you intentionally left the description of the TS section vague, it allows you the opportunity to ‘clarify’ it with little to no warning.

I’ve got a question, Cisco.  Are the braindumpers really hurting you that bad?  You can tell me all you want that the OEQs and the TS section were designed to better assess a candidate’s knowledge of every facet of networking.  With respect: bullshit.  The OEQs were a direct outgrowth of the interview process implemented at a specific testing site to catch people memorizing leaked lab questions and getting through with no difficulties.  The TS section replaced the OEQs once the process was refined to the point where they were no longer needed.  Have people started memorizing the TS section too?  Is it bad enough that we’re going to have a new exam every week in an effort to catch the unworthy candidates?  Perhaps we should just move to a totally interview-based exam with no access to CCO or documentation of any kind?  After all, the perfect CCIE candidate should be able to configure and troubleshoot a lab without access to a keyboard or a mouse or even a monitor!

Okay, I think that’s enough ranting for one morning.  I’m at T-minus 141 hours and counting.  That should be just enough time to brush up on my L2 TS skills in addition to all the other stuff I’m cramming for.  You know, I’m just about to the point where I’m not going to pass this lab because it’s a great career move or because it’s something I feel I need to do or even because it would show me as being at the pinnacle of my networking career.  I’m going to pass this lab just to spite Cisco and some of the dumb last-minute decisions their program mangers make.  Because I don’t know if I can handle CCIE Lab 3: CCIE With A Vengeance.

Frame (Relay) of Reference

A while back I wrote a CCIE lab-related post about RIP and why it’s still on the lab.  Most of the questions that I see lately revolve around another old technology and the curiosity of why it’s still contained in the vaunted lab blueprint.  I speak of frame relay.  The much-maligned WAN technology that no one seems to be able to explain the reason for existing anymore, yet the weary lab candidates find themselves pour over at the last minute.

Frame relay has been around forever.  And slightly shorter than forever, it’s been on the CCIE lab blueprint.  I’ve been actively studying for my exam since July 2008.  And as far back as I can remember, I’ve been studying frame relay.  And ever since that time people have been asking why it’s even on the exam.  I’ve heard people say it’s too old.  It’s past it’s prime.  No one offers it anymore.  Recently, with the change to the version 4 lab exam, MPLS was introduced as a configuration topic.  We all thought “Surely frame relay will be gone with its apparent successor on the lab now…”  And yet, frame relay is still there.  In fact, they added frame relay switching back onto the blueprint.  That was a huge change for me, never having dealt with that side of things before.  So, here is frame relay configuration of all kinds taunting us with is outdatedness.  Forcing us to keep up with our grandfather’s WAN technology.  And, in my opinion, that’s just the way Cisco likes it.

Just like RIP, frame relay wasn’t necessarily meant to be on the exam as a test of your ability to map IPs to DLCIs.  Frame relay introduces a whole level of configuration possibilities to the lab to test your wits without being obvious.  The whole point of keeping it on the lab isn’t to make you expend effort in getting 2 or 3 points.  It’s laying a foundation that could cost you 10 points or more.

I remember reading the Building Scalable Cisco Internetworks (BSCI) routing exam guide a few years back studying for my CCNP.  It was my first real exposure to the depths that routing protocols could go to.  In one the chapters over OSPF, there was a section that covered a ton of information on running OSPF over non-broadcast multi-access networks.  There were tables and charts and graphs galore.  All dedicated to one sub-topic.  Why was that?  Because the intricacies of running OSPF over frame relay are legion.  There are multiple network types that determine DR and BDR functionality and hello timers.  It’s like a mix-and-match sale at the department store.  So many fun combinations from so few parts.  That chart was just the tip of the iceberg, though.  Because once you’ve gotten a frame “cloud” in your lab, the real fun can start.

Suppose you have a really simple question in the lab like this:

Configure frame relay between R3 and R6.  Do not use static
mapping or inverse ARP. (2 points)

Straightforward, eh?  You could probably nail up this frame configuration in about 5 minutes.  Well, think about the rest of this example lab and find all the things that rely on Frame Relay to award points.  Let me give you a few examples:

Configure OSPF on all routers indicated in the diagram.  
Between R3 and R6, use a network type the provides for the
fastest recovery times without a DR/BDR election. (3 points)

Configure the link between R3 and R6 so that routing protocol
packets are never dropped.  If the queue has more than 20 packets,
reduce the bandwith to 32kbps. (3 points)

Configure the link between R3 and R6 to authenticate using CHAP.
(2 points)

Now, I’m not saying that all of those questions could end up in your lab.  But think about it like this:  That single 2-point FR question just became a 10-point chunk of your lab.  Now, if you don’t configure your FR section correctly, you are halfway to failing the lab from one misconfiguration!  Remember, no partial credit means that even if your whole OSPF section is right, failing to get FR working correctly means the script used to grade your exam won’t see all the routes on R3 and R6, and so you won’t get any points for OSPF.  It also means that your QoS won’t work, so there goes three more points.  And so on.  Soon, you could fail the lab simply because that outdated technology you didn’t really care about sucker punched you.

I’ve always held the belief that the CCIE lab is designed in a way to test every facet of being an network superhero.  Not just the obvious things like OSPF configuration or BGP troubleshooting, though.  As I’ve said before, the lab manual is carefully written by a team of highly competent people.  There are no extraneous words, and there are no unimportant tasks.  It’s like a game of Jenga.  The things you do at 10:00 a.m. have a direct impact on the last task you do at 4:00 p.m.  If your lab isn’t built carefully and solidly like a Jenga tower, it will topple down on the desk in front of you and knock your cup of colored pencils and markers onto the floor (metaphorically, of course).  Like it or not, frame relay is one of those blocks that forms the foundation of your tower.  Not because Cisco is protecting their investment in legacy technology.  Not because they are trying to wear lab candidates out with archaic configuration tasks.  Frame relay still exists on the blueprint because it teaches candidates to get their configurations right in that critical first hour of the lab because the Jenga tower that is your lab exam depends on each and every part.  And since frame relay can touch so many parts of the test, Cisco likes to keep it around to make sure you’re paying attention.

So the next time you find yourself staring at the blueprint and asking yourself “Why does Cisco still put frame relay on the lab?” you might try asking instead, “What does Cisco want me to learn from configuring frame relay?”  I think you’ll find the answer to the latter question a lot more enlightening.  With all of the things that depend on a simple frame relay link, one miscue could be the difference between a detailed score report and a simple 4-letter one.  Because from Cisco frame of reference, frame relay isn’t just a simple test question.  It could be one of the most important topics on your exam.

Party like it’s 1993!

I was once told by a consultant that he could figure out whether a client needed his services after about 30 seconds at a router console.  When I asked how he could be so sure after such a short amount of time, he consoled to a router in his lab and typed in “show clock”.  The lab router returned the now-familiar string of Mar 1, 1993 (he’d just booted it).  As soon as he showed me that one command, it all made sense.  Keeping accurate time in a computer environment is very important to systems.  Directory structures depend on an accurate clock to authorize logins and track audit events.  Novell and Windows both utilize systems to ensure that the clocks of all the servers in the network are synchronized.  And if those clocks drift out of sync, heaven help the server admins.

What about network equipment?  In days past, the routers and switches were typically neglected when it came to clock setting.  In fact, most older Cisco routers didn’t even include an on-board battery to keep the clock accurate.  And when the router would reboot, the software clock didn’t have an accurate hardware clock to refer to, so it used 00:00 1 Mar 1993 as the reference point.  But as systems have increased in complexity over the past several years, the need to have all the time on your equipment accurate has become paramount.  When debugging a call hand-off on a voice gateway, an accurate router clock ensures that you can match the time the call was placed with the debug message output.  If there is a security incursion into your network devices, you need to track the time the device was accessed in order to be able to accurately report the event to the proper authorities.

So how do we get our network clocks to report the right time?  Well, the process is fairly easy, provided you’ve done a little homework about a couple of things.  First, you need to know what timezone you are in and what your GMT offset is.  In today’s world, it is far easier to keep the clock of your device synced to GMT, then apply an offset to show you what the local time is.  That way, if you have devices spread all over the world, you never have to worry about a significant time difference because one clock was synced to local time and the other was synced to GMT with an offset applied.  For the purposes of the examples in this post, I’m going to assume the router is located in the central United States and is in the Central Time Zone.  The fact that I myself am in the Central Time Zone and therefore would not need to do any additional thinking to write my examples is purely coincidental.

In the case of my example, the central United States is in the Central Standard Time Zone (CST).  CST is six hours behind the GMT clock (GMT -6).

When you first connect to the router, you will either see the default time of 00:00 1 Mar 1993 (for older routers), or if you’re on an ISR or newer router, it may be synced fairly close to the actual time.  Starting with the ISR, Cisco started keeping the time synced more closely when the router was shipped from the factory, and the battery keeping the hardware clock time when powered off seemed to last a lot longer.

At this point, we need to set the router’s timezone with this command:

R1(config)# clock timezone <name> <GMT offset>
R1(config)# clock timezone CST -6

Once you’ve done that, the system should update with a message telling you that the current time zone has been updated.  Now, the router should know what time zone it’s in and adjust the clock offset appropriately.  You should also set the daylight savings time conditions as well.  For those not familiar, DST is a law that many countries have adopted that force sleepy engineers to reset the clock on the microwave at 2 a.m. on two days during the year.  Just when we had it down, they went and changed it a couple of years ago.  Hence the reason that Cisco doesn’t hard-code the DST settings on their routers.  They are more than happy to let you do it yourself.  In this example, we’re using the U.S. standard of the second Sunday in March and the first Sunday in November:

R1(config)# clock summer-time <name> recurring <week number start> 
        <day> <month> <time to start> <week number end> <day> <month> <time to end>
R1(config)# clock summer-time CDT recurring 2 Sun Mar 2:00 1 Sun Nov 2:00

And just like that, your router will perform just like every other smart device in your house and reset the clock for DST when necessary.  Now, if I could just get my microwave to do that.

Now that your router is in the right time zone, it’s probably a good idea to sync the clock to some kind of external time source.  That’s why Network Time Protocol (NTP) was created.  It allows distributed systems to sync their clocks with a time source directly connected to an atomic clock (the most precise kind).  NTP form as a hierarchy through use of strata. Stratum 0 NTP devices are directly connected to atomic clock sources, usually through RS-232 or other cabling.  The are not connected via a network.  Stratum 1 NTP devices are connected to Stratum 0 devices via a network connection.  These are the systems that are usually referred to as time servers.  These are usually the devices that you will sync one of your clocks to.  Why just one?  Well, while having all your devices synced to external time sources is a great idea, there are two issues that can arise.  First, having that much NTP traffic exiting your network isn’t exactly optimal.  There is no reason for 20 routers to each poll an external NTP server when one router could do the same thing, and the other 19 routers poll the synced router.  You can even designated your NTP-synced router as a Stratum 2 or 3 device if you’d like, then have it start serving time to your other devices.  The second issue with using all external NTP servers deals with security.  Some people are uncomfortable with the idea that all their devices are being told how to manage their clocks by an external device over which they have no control.  By configuring your devices to sync to one clock in your network that you control, there is more consistency and less reliance on systems outside your control that may be overloaded or unreliable.  For more info on what happens when someone starts abusing an NTP server, check out what NETGEAR did to the University of Wisconsin’s NTP server here.

For the sake of your sanity, it’s best to set the clock to something close to the actual time before you set the NTP server.  The reason for this is due to the propensity for NTP to look at your clock and declare it insane if you are too far drifted from the actual time.  I usually try to get my local clock somewhere in the neighborhood of one hour from the actual time, but the closer you are and the faster NTP will sync.  You do this with the following command (note the context)

R1#clock set <HH:MM:SS> <1-31> <Name of Month> <Year>
R1#clock set 01:00:00 10 Jan 2011

You must manually set the clock in enable mode, not global config mode.  Once you’ve gotten the clock close to your time, you can set the NTP server.  If you want to sync your router to an external time source, using the NIST Time Server list is always a good start.  If your router is capable of resolving DNS, a better idea is to use the NTP pool service.  This service is a cluster of NTP servers around the world that is served by round-robin DNS query, so no one server can get overloaded.  If you don’t feel comfortable syncing your clock to a server in France or Japan, you can always narrow down the focus of your query by using narrower DNS entries.  Check out their site for ways to do this.  Once you’ve figured out  how you are going to connect to NTP, and whether you are going to use external or internal servers, the command is pretty easy:

R1(config)# ntp server <name or ip>
R1(config)# ntp server 10.1.1.1
or
R1(config)# ntp server 0.pool.ntp.org

Yep, that’s it.  Simple once you’ve gotten all the planning down.  You can check your NTP sync status with the command show ntp status.  You can see which time servers your are polling with the command show ntp associations.  The commands have a lot of good info, but can be a little cryptic your first couple of trys.

Once you’ve gotten your clock in sync, there’s just two more commands you need to use to ensure that all your logs and debug outputs are using the right clock.  By default, logs and debugs use the system uptime for their output, so you could get a log message that says “1w3d” instead of the real time.  And if you have to start doing math on your debugs to figure out when a call was dropped, you’re going to be a cranky rock star.  From global config mode, go ahead and type in these two commands:

R1(config)# service timestamps log datetime msec
R1(config)# service timestamps debug datetime msec

The “MSEC” keyword on the end tracks the message down to the millisecond level, which I’ve always found very handy when trying to figure out sub-second error messages.  It also helps better match error logs which are all part of the same event, but spread out over multiple messages.

Now, your routers are all in sync and your logs and debugs are all outputting the correct time.  If a consultant tries to access your devices, you will appear to be one cool customer that doesn’t need any help on your networking devices.  You’ll also be able to troubleshoot faster and get all the fame and wealth appropriate to your station as the Network Miracle Worker.

For what it’s worth, I did a lot of searching about why the default time on a Cisco router without a battery backup is March 1, 1993.  No one seemed to have a definitive answer.  According to the Internet, the only really exciting thing that happened that day was George Steinbrenner being reinstated as Yankees owner.  I doubt anyone in San Jose is a Yankees fan, so I didn’t think that was it.  It also didn’t correspond to any neat numbers in the Unix Time Epoch.  1993 was the first year that Cisco acquired a company, but that happened in September of that year.  The only thing that happened early was the release of IOS 10.0.  I guess that Cisco decided that “10” was an important enough number that they wanted to start basing recent history sourced from this date.  The other possibility is that it was just an arbitrary date chosen by IOS engineers so that the router clock didn’t cycle all the way back to 1900.  MS-DOS had a similar functionality, wherein it would see the system BIOS clock set to 1900 and assume that the BIOS had to be wrong.  It would then set the clock to the earliest date it could (January 1, 1980).  Maybe Cisco just decided that 1993 was so early that if they noticed a router clock stuck on that date, it would just be assumed the the clock was not set.  Either than, or they were really big fans of this song…

I’s and T’s and Crosses and Dots

My name is Tom, and I’m careless.

Yep, I admit it freely.  I’m the kind of person that rushes through things and gets the majority of the work done.  Often I leave a few things undone with the hope that I’ll go back later and fix them.  For me, the result is the key.  Sometimes it works out in my favor, sometimes it doesn’t.  More often than not, I find myself cursing out loud about this unfinished job or task months down the road and threatening to find the person responsible, only to later determine that I should be kicking my own butt for it.

One place where this particular habit of mine has caused me endless grief in inside the unforgiving walls of Cisco’s Building C lab in San Jose.  Yep, I can honestly say that at least one lab attempt was foiled due to my propensity to miss the little things.  I’ve previously written about some of the details of the lab, but I wanted to take some time in this post to talk about the details themselves.  As in, the details in the questions that will kill you if you give them the chance.

Let’s get it out there right now: there is NO partial credit in the CCIE lab.  None. Zilch.  If you fail to answer every portion of the question with completeness, you get zero points for that question.  Unlike the old days in elementary school, you don’t get points for trying.  This shouldn’t really come as a shock to anyone that’s taken a multiple choice test any time in their life.  On those tests, there is exactly one set of answer(s) for a particular question, and if you don’t select the proper repsonse(s), you don’t get the points.  The same thing goes for the questions you find in the CCIE lab exam.  Just because the questions may or may not have multiple parts doesn’t excuse your need to answer them fully.  Old Mr. Hollingsworth used to tell me regularly, “Son, close only counts in horseshoes and hand grenades.”  Since I don’t play horseshoes and my hand grenade supplier mysteriously dried up, I guess close just won’t cut it any more.

You might end up getting a question in the lab that says something along the lines of “Configure OSPF on R1, R3, and R6 according to the diagram.  Do not change router IDs.  Rename R1 to ‘SnugglesR1’.”  You could build the most perfect OSPF lab in history.  You could spend an hour optimizing things.  If you forget to rename Snuggles the Router, you will receive no credit for the question.  All that hard work will get flushed down the toilet.  You’ll get your score report at the end of the day and wonder why you didn’t get any points for all that time you spend making OSPF sing like a soprano.

In order to prevent this from happening to you, start training yourself now to read carefully and consider every facet of the questions you’ll see.  Remember that the questions in the lab are carefully constructed by a team that spends a ton of time evaluating every part.  There are no unnecessary words.  Candidates have pestered proctors over the meaning of single words on a question.  The questions are written as they are to make sure you take into account a number of factors.  They are also designed to slip in changes to tasks and additional configuration with a word or two.  And if you are careless, you’ll miss those phrases that signal changes and negations.

Surely, everyone has taken a test that has a question that says “Which of the following was NOT a <something> <something>”  Your job is to evaluate the choices and pick the one that is not something.  That single word changes the whole meaning of the question.  And for those that are careless or the kind the skim questions, the NOT might be missed and cause them to answer incorrectly.  Questions in the lab are the same way.  Skimming over them without reading critically can cause nuances to be missed and lead to incorrect solutions.  After 5 hours of staring at words on a monitor, things might start blurring a little, but attention must be paid to the last few questions, as those might be enough points to buoy over the passing mark.

I’ll be the first to admit that the pressure to get everything done in the allotted time may cause the candidate to want to rush, but you must resist that pressure.  Many CCIE lab prep courses and instructors will tell you to carefully read the questions before you ever start configuring.  I agree, with some additions.  I always take my scratch paper and write the task numbers down the side.  After I’ve accounted for Task 1.1, 1.2, 2.1, and so on, I then go back to the questions and make marks next to my list for any questions that may have multiple parts or tricky solutions.  That way, if I find myself rushing through after lunch the marks I made early in the day force me to pay attention to the question and ensure that I don’t miss something that might cause me to tank three or four points.  Those points add up over the course of the day, and more than a few careless mistakes can cost you a nice expensive soda can.

If you are serious about the CCIE lab, it’s worth your time to start working on ensuring that you pay close attention to each question and don’t make any careless mistakes due to reading too fast or missing important configuration requirements.  Your day is going to be stressful enough without the added pressure of fixing mistakes later in the lab as a result of forgetting to enable OSPF authentication or a typo on a VLAN interface.  You want to remember to dot every “i” and cross every “t” for each and every question.  That way, you can walk out of the lab and use that freshly-dotted “i” when you spell you new title as a CCIE.