Reversing the voice quality gap

A guest post from Scott Kurtz, president, DSP Soundware

Human evolution is a constant when compared to the ever increasing rate of evolution of technology. Our input methods (sight, hearing, smell, taste, and touch) are what they are. Until technology enables our brains to interface directly with our electronics, our interface to the world around us will be via our five senses.

In the television, radio, and digital media spaces, we have done plenty to improve voice and sound quality over the years. But in the area of telephony we have actually taken steps back, especially in the early days of Voice-over-IP and cellular. Beyond that, improving hands-free communication such as speakerphone and hands-free cellular has always been challenging when it comes to speech quality. “Environmental issues” such as background noise, acoustic echo, and reverberation reduce not only human intelligibility but also the accuracy of automatic speech recognition algorithms. When it comes right down to it, these degradations simply sour the human communication experience.

When we instruct our voice-controlled thermostats to change temperature, we want the temperature to be set correctly on the first try. Imagine this: You have just done a load of dark laundry and your spouse is fuming when she/he sees that her/his favorite clothes have been bleached. It turns out that your dryer’s screeching motor bearings caused your newfangled washing machine’s speech recognizer to interpret the screech as “bleach.” In a Seinfeld-esque way, you plead with your spouse “It was the screech; it was the screech” to no avail. Now you’re wishing that your washing machine had some noise reduction to help its speech recognizer.

But I digress. Back in 2005, I gave a presentation titled “Bridging the Voice Quality Gap” at the Texas Instrument’s DSP Champs conference. The premise of the presentation was that Voice-over-IP and cellular gave us worse voice quality than did the ubiquitous wireline telephony, hence the term “voice quality gap.” We had taken a step in the wrong direction. The point of the presentation was that, through digital signal processing, we could not only narrow the gap but also strive to reverse the gap in favor of the newer technologies. In the early days of VoIP and digital cellular, much of the focus was on compressing speech into limited bandwidth channels. That compression was one of the reasons the voice quality soured in the first place. Since that time, bandwidth is far less constrained and HD voice is finally being introduced as a result. But that still leaves us with the “environmental issues” of background noise, acoustic echo, and reverberation. These environmental issues can be quite challenging because the environment is different for each user. The environment is comprised of the hardware, speaker, microphone, enclosure, and acoustic environment surrounding the user.

So, nearly ten years after my 2005 presentation, I formed a new company – DSP Soundware – with the mission to develop voice and sound quality enhancement technology, algorithms, and software.

Cleaning up that signal!

Voice and sound degradations are shown pictorially in the diagram below. The diagram shows a number of people in a conference room. There is a speakerphone on the conference room table and a fan on the right wall depicting a noise source. The person standing at the whiteboard is talking. The blue path shows his speech travelling directly to the speakerphone. The red path shows the talker’s speech reflecting off of the left wall. (It reflects off all surfaces, but only one reflection is shown.) The green path shows the output of the speakerphone reflecting back into its microphone. And finally, the brown path shows the noise of the fan as it travels to the speakerphone. At the other end of the phone call (not shown) is somebody trying to understand what is being said in the presence of all the distortion.

At DSP Soundware, we have developed voice and sound quality enhancement algorithms to combat these degradations. Our algorithms include acoustic echo cancellation, noise reduction, dereverberation, acoustic beamforming, and active noise cancellation. Referring back to the same diagram (above), we show pictorially how our algorithms remove the distortion in an attempt to recreate the undistorted signal. As the signal passes down through the algorithms, you can see the color coded distortion circles get “filtered” out, leaving only the light blue circle, which represents the clean speech.

The TI DSP advantage

Once we had the algorithms, we needed to turn them into usable embedded software products, and software needs to run on processors. I have been working with TI DSPs since 1984 when TI’s first DSP, the TMS32010, was introduced. TI has consistently had the best silicon combined with the best tools not only for an algorithm developer like me but also for end product developers. Today, TI offers the most diverse set of DSPs, microprocessors, and microcontrollers out there. And TI has a strong network of designers and developers that support its products.

For me, the decision to develop our software for use on TI products was more than a technical decision but also a business decision. TI silicon is a top notch choice for engineers who develop voice and audio applications.

Today we use TI’s OMAP-L138/TMS320C6748 LCDK board to demonstrate our algorithms. The LCDK board has audio input and output, an obvious requirement for demonstrating voice quality enhancement. But it’s also so inexpensive (the LC stands for low-cost) that any customer can purchase one, load our demonstration, and listen for him or herself. Our first such demonstration shows off our noise reduction algorithm. The user can feed noisy audio into the LCDK; our software processes the signal and outputs the noise-reduced version. The user can vary the noise reduction algorithm’s control parameters through a simple interface and listen to the result.

For the foreseeable future, voice communication is here to stay. Voice quality can be enhanced through the use of digital signal processing. By combining our algorithms and TI processors, we provide the building blocks to equipment designers to improve voice quality in their end-products. To that end, I hope we can help you.