Measuring Voice Quality

May 27, 2007

Historically, Mean Opinion Score (MOS) was measured subjectively by having people rate the quality of a set of standard sentences recorded with both male and female voices. The test subjects were asked to rate each sample according to the following scale:

  • 1. Excellent with no perceptible impairments
  • 2. Good with barely perceptible but not annoying impairments
  • 3. Fair with perceptible and slightly annoying impairments
  • 4. Poor with annoying but not objectionable impairments
  • 5. Bad with very annoying and objectionable impairments

The average or mean of these subjective opinions determined the rating or score for the sample; hence the designation as Mean Opinion Score. Because the MOS is an average, the maximum score possible is generally considered to be about 4.5. The PSTN regularly achieves an MOS rating of 4.3.

Other techniques have since been created to determine voice quality from objective measurements of end-to-end network performance data. For VoIP communications, the preferred technique is the ITU-T G.107 E-Model, which takes into account both internal and external impairments. Measuring quality is so important that many of the voice enhancement devices now being used to improve voice quality also have built-in support for continuous monitoring of actual calls based on the E-Model.

At best, a packet-based network, even with extensive IP QoS provisions, can only simulate the physical circuits employed end-to-end in the PSTN. But PSTN carriers know full well that even these physical circuits do nothing to remove the many impairments to acceptable levels of call quality. So while IP QoS may be necessary, no amount of IP packet control is sufficient for achieving satisfactory voice quality. “Good” or “Excellent” voice quality is possible only in VoIP networks where each and every impairment to voice quality is successfully removed, just as it is in the PSTN.

Understanding the Impairments to Voice Quality

This section examines six specific impairments to voice communications, including both what causes them and what can be done, in an IP network, to minimize or eliminate their impact on voice quality. Individually, these impairments degrade voice quality, sometimes substantially. Collectively they jeopardize the investment businesses are making in VoIP solutions.

Note throughout the discussion how IP QoS provisions fail to address any of these impairments, and indeed, treat noise, echoes and garbled speech as it were part of a normal VoIP call. In other words: IP QoS treats both the desirable speech elements and these undesirable impairments in VoIP traffic with the same high priority, ensuring that everything always comes through “loud and clear.” Garbage in, garbage out.

Acoustic Echo – Acoustic echo is caused by poor audio isolation between the microphone and speaker in user devices. Acoustic echo becomes more problematic with VoIP-induced packet delay, making the echo more noticeable. The problem is quite common in VoIP networks where an estimated 10-15 percent of all calls suffer from this impairment, potentially in quite annoying ways. A good acoustic echo control solution will suppress a wide range of bidirectional echo variations using sophisticated algorithms that consider both speaker volume and path loss levels to effectively eliminate acoustic echoes for virtually all handsets and speakerphones.

Hybrid Echo – Hybrid or line echo is an electrical signal reflection that occurs during a two-to-four-wire conversion in the analog tail circuit at the edge of the PSTN. Although hybrid echo is not generated in a pure VoIP network, the majority of VoIP calls continue to originate or terminate in the PSTN or a cellular network, or via the PSTN to another VoIP network. A robust hybrid echo cancellation solution will completely eliminate hybrid echo bidirectionally for VoIP calls that traverse a PSTN hybrid. The best solutions offer fast and stable convergence, and fully compensate for the very long network tail delays that can be encountered end-to-end.

Ambient or Background Noise – Background noise is present everywhere, and noise in the form of static, hum, crosstalk and “popping” sounds can also be introduced by various packet processing systems. When the noise level is sufficiently high, voice intelligibility and quality suffer substantially. In most VoIP networks, this undesirable noise is simply incorporated right into the encoded packets along with the voice signal. A good adaptive noise cancellation solution should utilize a high-precision noise reduction algorithm that removes the noise components of a call without reducing talker volume.

“Silence” – One might think that complete silence is the ultimate goal of noise cancellation. But eliminating all noise becomes a problem because it has the effect of making users perceive the line has gone “dead” and prompting the familiar question: “Are you still there?” Some codecs (the coder/decoders that convert analog voice to digital packets) consequently provide for comfort noise generation, and some voice enhancement devices incorporate an adjustable noise “floor” feature.

Audio Level Mismatch – The volume level of calls between two VoIP endpoints is often unbalanced with one side being higher than the other. This can be caused by using devices from different manufacturers or from improper transcoding among different types of codecs. Users may be able to compensate for this impairment somewhat by adjusting volume settings, but such hassles are never imposed by the PSTN. A good automatic level control solution will, therefore, dynamically detect level imbalances and automatically amplify and/or attenuate the signals to bring both sides of the call to the same, specified volume. Some solutions also prevent clipping and codec distortions, and help compensate for background noise by improving the signal-to-noise ratio.

Garbled Speech – There are several problems that can cause speech to become “garbled” or, worse yet, unintelligible. The most common cause is the use of low bit-rate codecs that reduce the sharpness or clarity of speech. The codec standard used in the PSTN converts speech into a 64 kbps packet stream, which matches the size of a standard PSTN circuit. But speech quality deteriorates with low bit-rate codecs that generate packet streams as low as 8 kbps. There is a tradeoff involved here, of course, because lower bit-rate codecs enable the available network bandwidth to handle proportionally more calls.

Implied in the above discussion is that the enterprise has the freedom of choice to select an optimal codec. But therein lies another problem: How to deal with the more than 20 different codec types, many of which are certain be used with calls external to the enterprise. To provide a means of interoperability the industry employs transcoding, a digital signal processing technique used for converting among the many different types of codecs. Ideally, the transcoding process would be integral to the voice enhancement device being used to eliminate all other impairments to voice quality.

Garbled speech is also caused or made worse by the inevitable packet loss, delay and jitter (the variations in delay) found in all IP networks, including those with IP QoS provisions. For this reason, most codec standards specify some means to compensate for packet loss, and these packet loss concealment (PLC) or packet loss recovery (PLR) algorithms do indeed help—up to a point. For example, the PLC algorithm in codecs used with the PSTN can recover reasonably well from a packet loss of up to 5 percent. But with very low bit-rate codecs, even with their built-in PLC or PLR algorithms, packet loss rates of less than 1 percent are normally perceptible, and often annoying. To compensate for jitter, most VoIP systems incorporate dejitter buffering to eliminate the variations. But these dejitter buffers work, in effect, by imposing the maximum packet delay, which exacerbates echo impairments.

There are two technologies specifically designed to address these and other issues that cause garbled speech. The first is intelligent packet restoration, which utilizes a predictive speech model to reconstruct a missing packet’s voice payload. This powerful capability can often enhance voice quality substantially, even in congested IP networks experiencing unusually high packet loss rates. The second technology is enhanced voice intelligibility that improves the quality of speech by rebalancing the spectral signature, compensating for ambient noise, and boosting the critical speech formants (the soft sounds) to provide increased clarity for all codec types, including those with low bit-rates.

End-to-End Voice Quality Beyond the Enterprise Network

Voice quality must be considered beyond the boundaries of the enterprise network for a very simple reason: many calls are placed to or from somewhere on the outside. Businesses regularly communicate with customers, suppliers and business partners, and often outsource entire functions to separate organizations. Even within the enterprise, many workers may prefer a mobile phone as their primary means of communication; and employees routinely call—or get calls from—coworkers while at home.

Given this reality, no enterprise-wide VoIP system can be considered an island. For communications with “outsiders,” the enterprise has no control over anything except, of course, its own efforts to eliminate external impairments to call quality. For communications among multiple offices within the enterprise, there are basically three options. One is to treat the intervening WAN as an “IP pipe,” in which case the enterprise takes full responsibility for voice quality. The second option is to outsource the entire IP telephony solution to a service provider, in which case the service provider is responsible for handling everything from end-to-end—hopefully in a satisfactory manner. The third choice involves a myriad of possibilities in between in-house and outsourced implementations.

As VoIP becomes more prevalent, there will be increasing opportunities for the enterprise to leverage external services without sacrificing control. VoIP technology is gradually displacing its predecessor—time division multiplexing—in the PSTN. Mobile carriers will begin to do the same with advent of third- and fourth generation broadband wireless technologies. Even cable companies, or multiple system operators that have long offered VoIP services to their residential subscribers, are now targeting businesses. There are also the Managed Virtual Network Operators that add value by bundling a variety of services from multiple providers to create a complete, managed solution.

Regardless of the arrangement—now or in a distant future when VoIP is ubiquitous—special provisions will be needed to mitigate impairments to voice quality. People will continue to call from cars or street corners (background noise), use inexpensive handsets or speakerphones (acoustic echo and audio level mismatch), or use low bit-rate codecs to support more calls with limited bandwidth (garbled speech). The enterprise therefore will remain responsible for eliminating impairments to voice quality either directly or indirectly, via the choice of service providers, for the foreseeable future.

Poor voice quality jeopardizes the potential cost savings of IP telephony by undermining productivity and customer satisfaction. IP QoS is necessary but not sufficient to provide acceptable levels of voice quality. Separate provisions are, therefore, also required to mitigate the adverse impact of impairments to voice quality, whether internal or external to the enterprise VoIP infrastructure.

The techniques needed to minimize or eliminate these impairments were pioneered in the PSTN, and have since proven fully effective at improving voice quality. The same provisions are now available for VoIP communications, and can be implemented quite easily and cost-effectively by the enterprise in a purpose-built voice enhancement device (VED). The primary advantage of a VED is its integration of multiple voice quality enhancement features, including adaptive noise cancellation, acoustic and hybrid echo cancellation, and automatic level control—all in a single, scalable system. Some VEDs also include support for features that compensate for inevitable packet loss, and a few even offer transcoding among multiple codec types and built-in E-Model monitoring of call quality.

Many organizations are able to cost-justify the investment in a voice enhancement device based solely on its ability to deliver satisfactory voice quality for codecs with lower bit-rates. These codecs increase the number of calls that can simultaneously traverse the existing enterprise network, thereby postponing a costly upgrade in bandwidth. And reducing costs is, after all, one of main goals of IP telephony.

About Karl Brown: Brown brings more than 15 years of experience in network communications to his position at Ditech Networks, spanning the wireline, wireless, voice, and data industries, and has worked for both equipment vendors and service providers. Prior to joining Ditech as Director of Product Marketing and Product Management, Brown held marketing positions at ANDA Networks, Jetstream Communications, and Nortel Networks.

As Vice President of Marketing, Brown is responsible for Ditech’s market strategy, product management, product marketing, and corporate marketing programs, and focuses on the development of new markets for the company’s core technologies.

Also by Karl Brown: VoIP: Quality is King.

Comments

2 Responses to “Measuring Voice Quality”

  1. VoIP: Quality is King | The IT-Finance Connection on May 27th, 2008 8:06 am

    [...] Also by Karl Brown: Measuring Voice Quality. [...]

  2. The View from Here: The Case Against VoIP | The IT-Finance Connection on July 9th, 2008 9:24 am

    [...] that IT departments must make sure decision-markers understand is that the situation has flipped: The basic phone services offered by VoIP providers are nearly the equal of the traditional companies… However, the price differential has largely disappeared. Reality: Companies looking at VoIP [...]

Got something to say?





CA Anti-Virus Plus 2008