The Voice of SD-WAN

SD-WAN is about migrating your legacy hardware away from silos like MPLS and policy-based routing and instead integrating everything under one dashboard and one central location to make changes and see the impacts that those changes have. But there’s one thing that SD-WAN can’t really do yet. And that’s prepare us the for the end of TDM voice.

Can You Hear Me Now?

Voice is a way of life for some people. Cisco spent years upon years selling CallManager into every office they could. From small two-line shops to global organizations with multiple PRIs and TEHO configured everywhere. It was a Cisco staple for years. Which also had Avaya following along quickly to get into the act too.

Today’s voice world is a little less clear. Millenials hate talking on the phone. Video is an oddity when it comes to communications. Asynchronous chat programs like WhatsApp or Slack rule the day today. People would rather communicate via text than voice. We all have mobile devices and the phone may be one of the least used apps on it.

Where does that leave traditional voice services? Not in a good place for sure. We still need phone lines for service-focused businesses or when we need to call a hotline for support. But the office phone system isn’t getting any new features anytime soon. The phone system is like the fax machine in the corner. It’s a feature complete system that is used when it has to be used by people that are forced to use it unhappily.

Voice systems are going to stay where they are by virtue of their ubiquity. They exist because TDM technology hasn’t really advanced in the past 20 years. We still have twisted pair connections to deliver FXO lines. We still have the most basic system in place to offer services to our potential customers and users. I know this personally because when I finally traded out my home phone setup for a “VoIP” offering from my cable provider, it was really just an FXS port on the back of a residential cable modem. That’s as high-tech as it gets. TDM is a solved problem.

Call If You WANt To

So, how does SD-WAN play into this? Well, as it turns out, SD-WAN is replacing the edge router very quickly. Devices that used to be Cisco ISRs are now becoming SD-WAN edge devices. They aggregate WAN connections and balance between them. They take MPLS and broadband and LTE instead of serial and other long-forgotten connection methods.

But you know what SD-WAN appliances can’t aggregate? TDM lines. They don’t have cards that can accept FXO, FXS, or even PRI lines. They don’t have a way to provide for DSP add-in cards or even come with onboard transcoding resources. There is no way for an SD-WAN edge appliance to function as anything other than a very advanced packet router.

This is a good thing for SD-WAN companies. It means that they have a focused, purpose built device that has more software features than hardware muscle. SD-WAN should be all about data packets. It’s not a multitool box. Even the SD-WAN vendors that ship their appliances with LTE cards aren’t trying to turn them into voice routers. They’re just easing the transition for people that want LTE backup for data paths.

Voice devices were moved out of the TDM station and shelf and into data routers as Cisco and other companies tried to champion voice over IP. We’re seeing the fallout from those decisions today. As the data routing devices become more specialized and focused on the software aspects of the technology, the hardware pieces that the ISR platform specialized in are now becoming a yoke holding the platform back. Now, those devices are causing those platforms to fail to evolve.

I can remember when I was first thinking about studying for my CCIE Voice lab back in 2007-2008. At the time, the voice lab still have a Catalyst 6500 switch running in it that needed to be configured. It had a single T1 interface on a line card that you had to get up and running in CallManager. The catch? That line card would only work with a certain Supervisor engine that only ran CatOS. So, you have to be intimately familiar with CatOS in order to run that lab. I decided that it wasn’t for me right then and there.

Hardware can hold the software back. ISRs can’t operate voice interfaces in SD-WAN mode. You can’t get all the advanced features of the software until you pare the hardware down to the bare minimum needed to route data packets. If you need to have the router function as a TDM aggregator or an SBC/IPIPGW you realize that the router really should be dedicated to that purpose. Because it’s functioning more as a TDM platform than a packet router at that point.

Tom’s Take

The world of voice that I lived in five or six years ago is gone. It’s been replaced with texting and Slack/Spark/WebEx Teams. Voice is dying. Cell phones connect us more than we’ve ever been before but yet we don’t want to talk to each other. That means that the rows and rows of desk phones we used to use are falling by the wayside. And so too are the routers that used to power them. Now, we’re replacing those routers with SD-WAN devices. And when the time finally comes for use to replace those TDM devices, what will we use? That future is very murky indeed.

Invalid Information Element Contents Error Message

Problems with no apparent cause really drive me up the wall.  A customer called me with an issue that had no rhyme or reason for existing.  A group of phones at one site were not able to make outbound calls.  They were receiving calls from the PRI and were able to call other extensions with no problems.  Other phones that were using the same route patterns and gateways were able to call with no issues.  Troubleshooting the route pattern at the phone showed the digits landing on the gateway but a fast busy right after that.  It wasn’t until I drilled into things with my new favorite command debug isdn q931 that I found the real problem.  It looked something like this (numbers obscured):

Calling Party Number i = 0x0081, 'XXXX'
Plan:Unknown, Type:Unknown
Called Party Number i = 0x80, 'XXXXXXXXXXX'
Plan:Unknown, Type:Unknown
Sending Complete
ISDN Se0/0/0:23 Q931: RX <- RELEASE_COMP pd = 8 callref = 0x9F
Cause i = 0x82E404 - Invalid information element contents

Hmmm.  Guess it’s off to Google.  Then I found this post from the Cisco VOIP Mailing list.   And after implementing a quick fix, everything turned out fine.  So what happened?

This particular site was the first time I used this excellent guide on rewriting outbound caller ID with Calling Party Transform Masks as opposed to doing it on the Route Lists or the Route Patterns.  In my haste to import all the phones, I missed a critical group of phones in my transform mask set.  As such, they weren’t sending a full 10-digit number to the PRI and the provider was rejecting the call.  I’ve never had this happen before, as I see customers that only send 4 digits to the PSTN sometimes.  I can see the allure of not allowing less than 10 digits on the PRI as a final check to ensure your station ID is correct for things like emergency services and I like that idea a lot better than the provider just overwriting your station ID without warning.

In the end, all is well and I now know where to track the issue down again.  Hopefully others might find this post enlightening.

Deciphering A PRI Turn-up Worksheet

One of the many wonderful things I get to do at $employer is work on voice systems and convincing my customers to move from old clusters of analog trunks to new, shiny Primary Rate Interface (PRI) trunks to carry their calls.  PRIs are wonderful things, capable of taking up to 23 calls at a time, providing calling party and called party information, and dispensing with the need to have kludgy “rollover” analog trunks.  However, in my experience with turning these circuits on, the worksheet the telco provider sends out tends to look like speaking Greek to most network enginee…rock stars.  It took a while for me to figure out what all the obscure acronyms meant, since the telco just assumed that I knew what they all stood for.  In an effort to provide help to my readers that may not be telco people, or might be getting forced into working on a PRI worksheet, I thought it might be helpful to provide some translations.

PIC/LPIC – Probably the most confusing acronym out of the bunch.  PIC stands for Primary Interexchange Carrier.  This is your long distance carrier.  This is a code that is kept in a database and when you need to make a long distance call, the telco consults this database to know whose network to send the call along.  A great explanation of long distance calls can be found HERE.  Conversely, the LPIC is the Local Primary Interexchange Carrier.  In other words, they are the company that handles your local calls that aren’t long distance.  These two providers can be different, and in many cases they are.  In rural areas, the LPIC is the local telco, and the PIC is a larger carrier like AT&T or Verizon.  I’ve found that many companies will give you a deal if you specify them for both PIC and LPIC.  Most of the time, the PIC/LPIC choice will be whomever is installing the PRI for you, such as AT&T or Cox Communications.

DID – Another one that confuses people.  In this case, DID stands for Direct Inward Dial.  This is a huge change from the way an analog circuit works.  With an analog circuit (like my house), when you call my number it sends an electrical signal along the wire telling the device at the other end to ring.  When we hook this circuit up to a CUCM/CCME system, we usually have to configure Private Line Automatic Ringdown (PLAR) in order to be sure something gets trigger when the electrical signal arrives.  A PRI doesn’t use electric signals to trigger ringing.  Instead, they are configured with two different fields, the Calling Party and the Called Party.  In this example, the Calling Party is what is most often referred to as “Caller ID”.  The Called Party on a PRI is the DID.  This is a number that is delivered to the PRI and sent to the PBX equipment on the other end.  The name comes from the fact that these numbers are most often used to directly reach internal extensions without the need to reach a PBX operator or automated attendant.  The DID can be configured to ring a phone, a group of phones, or even a recording.  The numbers that used to belong to your analog circuits will usually be moved over to a group of DIDs and pointed at the PRI.

Outpulsed Digits – This one sounds straight forward.  Digits are being sent somewhere, right?  Remember that this worksheet is from the perspective of the service provider, so the outpulsed digits are what the provider is sending to your equipment.  You have tons of options, but most providers will usually limit your options to 4, 7, or 10 digits.  This is the number of digits that you get from the PRI to determine where your calls get sent.  Since I’m a big fan of using translation patterns on my systems to send the digits around, I tend to pick 7 or 10 digits.  In areas like Dallas, you may be forced to take 10 digits, as most metro areas are now mandatory 10-digit dialing. This also helps me avoid dial plan collisions when a phone number for a site is the same as a 4 digit extension internally.  If I get 7 digits coming from the PRI, I can be pretty sure that none of my extensions will have the same number.  If you don’t want to configure translation patterns and have a lot of DID numbers that correspond to phone extensions, you may want to consider a 4-digit outpulse setup from the telco.

NFAS – This one I don’t use very often, but it might come up.  NFAS stands for Non-Facility Associated Signaling.  This is used when you have more than one PRI configured in your environment.  With a 24-channel PRI, 23 of those channels are used to provided data/calls.  These are bearer channels or B-channels.  The 24th channel is used to send control and signalling data.  This is the Data Channel or the D-channel.  When you configure your environment with multiple PRIs, you have multiple D-channels to provide signalling.  However, you can pay a premium for each of those D-channels.  In an effort to save some money, the idea of NFAS allows one D-channel to provide the signalling for up to 20 PRI lines.  The catch is that if the D-channel goes down for any reason, so does the signalling for all the PRIs participating in the NFAS setup.  Usually, if you designate NFAS on your worksheet, the telco will make you choose whether or not to have a backup D-channel.  This is a good idea just in case, because you can never go wrong with a backup.

Station Caller ID – I include this one because of more than one issue I’ve gotten into with a telco over it.  Like, a full-on yelling match.  If you are given the option of using the station ID as the outbound caller ID, use it.  You have much more control over how the caller ID is represented inside of CUCM than you do if you the telco takes over for you.  If you don’t use the station ID as the caller ID, they will usually use the first DID number in your list, or set it to the billing number of the main telephone line.  As most PRIs I setup are usually for multi-site deployments, this creates issues.  People see the caller ID of the headquarters or the administration building instead of the individual unit number.  They call that number back expecting to get their child’s school (for instance), but instead get the board of education building.  Some telcos will go to war with you about the inherent danger in letting the user specify their station ID for use with emergency services like 911 or 999.  I usually tell the telco rep to get stuffed, since my route lists will get the Caller ID more correct that their ham-handed attempts to just slap a useless billing ID number on the PRI and call it good.  If they pick a DID number that doesn’t appear in the phone book or in the PS/ALI database for the local emergency service provider, then you can get into a liability issue.  Better to just check the “station ID” box and build your system right.

Tom’s Take

These were the most confusing parts of the PRI worksheets that I’ve filled out from multiple providers.  I hope that my explanations help if and when you need to fill out your own sheet.  If it saves time having to Google what LPIC and NFAS mean, then I’ll sleep happy knowing that you were able to conserve some of your Google-fu.