If you work in the voice or video world, you’ve undoubtedly heard about Mean Opinion Scores (MOS). MOS is a rough way of ranking the quality of the sound on a call. It’s widely used to determine the over experience for the user on the other end of the phone. MOS represents something important in the grand scheme of communications. However, MOS is quickly becoming a crutch that needs some explanation.
That’s Just Like Your Opinion
The first think to keep in mind when you look at MOS data is that the second word in the term is opinion. Originally, MOS was derived by having selected people listen to calls and rank them on a scale of 1 (I can’t hear you) to 5 (We’re sitting next to each other). The idea was to see if listeners could distinguish when certain aspects of the call were changed, such as pathing or exchange equipment. It was an all-or-nothing ranking. Good calls got a 4 or even rarely a 5. Most terrible calls got 2 or 3. You take the average of all your subjects and that gives your the overall MOS for your system.
When digital systems came along, MOS took on an entirely different meaning. Rather than being used to subjectively rank call quality, MOS became a yardstick for tweaking the codec used to digitally transform analog speech to digital packets. Since this has to happen in order for the data to be sent, all digital calls must have a codec somewhere. The first codecs were trying to approximate the quality of a long distance phone call, which was the gold standard for quality. After that target was reached, providers started messing around the codecs in question to reduce bandwidth usage.
G.711 is considered the baseline level of call quality from which all others are measure. It has a relative MOS of 4.1, which means very good voice quality. It also uses around 64 kbps of bandwidth. As developers started playing with encoding schemes and other factors, they started developing codecs which used significantly less bandwidth and had almost equivalent quality. G.729 uses only 8 kbps of bandwidth but has a MOS of 3.9. It’s almost as good as G.711 in most cases but uses an eighth of the resources.
MOS has always been subjective. That was until VoIP system providers found that certain network metrics have an impact on the quality of a call. Things like packet loss, delay, and jitter all have negative impacts on call quality. By measuring these values a system could give an approximation of MOS for an admin without needing to go through the hassle of making people actually listen to the calls. That data could then be provided through analytics dashboards as an input into the overall health of the system.
Like A Rolling Stone
The problem with MOS is that it has always been subjective. Two identical calls may have different MOS scores based on the listener. Two radically different codecs could have similar MOS scores because of simple factors like tonality or speech isolation. Using a subjective ranking matrix to display empirical data is unwieldy at best. The only reason to use MOS as a yardstick is because everyone understands what MOS is.
Enter R-values. R-values take inputs from the same monitoring systems that produce MOS and rank those inputs on a scale of 1 – 100. Those scores can then be ranked with more precision to determine call quality and VoIP network health. A call in the 90s is a great call. If things dip in the 70s or the 60s, there are major issues to identify. R-values solve the problem of trying to bolt empirical data onto a subjective system.
Now that communications is becoming more and more focused on things like video, the need for analytics around them is becoming more pronounced. People want to track the same kinds of metrics – codec quality, packet loss, delay, and jitter. But there isn’t a unified score that can be presented in green, yellow, and red to let people know when things are hitting the fan.
It has been suggested that MOS be adapted to reference video in addition to audio. While the idea behind using a traditional yardstick like MOS sounds good on the surface, the reality is that video is a much more complicated thing that can’t be encompassed by a 50-year-old ranking method like MOS.
Video calls can look horrible and sound great. They can have horrible sound and be crystal clear from a picture perspective. There are many, many subjective pieces that can go into ranking a video call. Trying to shoehorn that into a simple scale of 5 values is doing a real disservice to video codec manufacturers, not to mention the network operators that try and keep things running smoothly for video users.
R-value seems to be a much better way to classify analytics for video. It’s much more nuanced and capable of offering insight into different aspects of call and picture quality. It can still provide a ranked score for threshold measuring, but that rank is much more likely to mean something important for each number as opposed to the absolute values present in MOS.
MOS is an old fashioned idea that tries valiantly to tie the telecom of old to the digital age. People who understood subjective quality tried to pair it with objective analytics in an effort to keep the old world and the new world matched. But even communications is starting to eclipse these bounds. Phone calls have given way to email, texting, and video chats. Two of those are asynchronous and require no network reliability beyond up or down. Video, and all the other real-time digital communications, needs to have the right metrics and analytics to provide good feedback about how to improve the experience for users. And whatever we end up calling that composite metric or ranked algorithmic score, it shouldn’t be called MOS. Time to let that term grow some moss in the retirement bin.