AER used in hands-free tuning issues

Edmund Elsey

We are experiencing multiple issues tuning aer_c64Px_obj_17_0_0_0_Linux for use in our product in hands‑free mode, however we have had success with tuning the algorithm for use in handset mode.

I’ve followed the tuning guide and tried setting everything as aggressive as possible but even with no activity at the far end (and the noise guard active), loud shrieks are coming back.

The main issues I am having are getting the NLP operating suitably and possibly the answers to the following questions could help me configure it properly. What I am trying to achieve is maximum aggressiveness of the NLP (forcing half duplex operation if necessary) with no echo or artifact breakthrough so I can start backing it off.

What exactly does ‘Hands Free mode processing” do? (apart from allowing increased tail length) does it interact with the NLP?
Enabling frequency domain NLP causes the speech to become garbled and distorted so I can’t use it. Any ideas why this might be happening? (I’ve tried changing the FDNLP delay setting and got some minor improvement, but it was still unusable)
I’m unable to see much impact on operation when changing the combined loss target, how does this link with NLP_linattn_Max_ERLE? (I’ve read the description in the developer guide but haven’t seen much change in performance when changing the Max_ERLE parameter). For the linear attenuator to operate at maximum aggressiveness all the time what should I set NLP_linattn_Max_ERLE to? Are there any other requirements to make the linear attenuator work other than setting these parameters? We have disabled the AER built in AGC, does that have an impact?
I haven’t managed to work out exactly what Thresh_Rx_NLD does. I understand that it is the threshold above which non-linear distortion is expected, but what is supposed to happen when the threshold is exceeded? Does this only work when hands-free mode processing is enabled?
Changing the value of Thresh_Tx_Noise seems to massively affect intelligibility of speech and the echo cancellation performance. The developer guide mentions it is used to make decisions regarding NLP break-in. How does it interact with the NLP?

Regards,

over 8 years ago

0 Sivaraj Kuppuraj over 8 years ago

TI__Mastermind 35645 points

Hi,

We will work on this and will let you know the update ASAP.

Thanks & regards,
Sivaraj K

0 ran35366 over 8 years ago in reply to Sivaraj Kuppuraj

TI__Genius 12805 points

Started getting information from local TI expert

I will update as soon as I have new information

0 bogdank over 8 years ago

TI__Intellectual 800 points

Hello Ed,

The adjustment of Acoustic Echo Canceller for handsfree operation is a complex process but also fun to do once you get into it. So, let's start. :)

It seems that you have read our tuning guide and documentation. For adjustments to make sense you were required to do some basic measurements of your handsfree enclosure/setup. Could you provide us a bit more information regarding your handsfree HW characteristics so we can see where the problem might be?

Also, is this the first time you are integrating an acoustic echo canceller into a product or you have considerable experience doing this on different products?

That would help us choose the right way of helping you out. The questions below may look trivial to you if you're an expert, or would be a starting point for us helping you guide you through the tuning if you are doing it for the first time.

Here is what we need to know (as much as you can tell us):

What is your product? A phone, some other communication device?
What is the distance between microphone and speaker?
How are the microphone and speaker oriented (in which directions)?
Do you have a solid enclosure for the microphone and speaker? Is it made of stiff material or one that can easily bend and vibrate?
Do you have any mechanical barrier inside the enclosure to isolate speaker from the microphone acoustically (e.g. back volume)?
Do you have a small power amplifier for your speaker? (how many Watts?) What is your speaker power?
What type of mic do you have (assuming omni-directional)? Is it enclosed within any directional cavity, etc.?
Do you have rubber or felt legs at the bottom of your enclosure to reduce vibrations when speaker operates at high volume?

This first set of questions serves the purpose of us understanding a bit better how easy or hard might be to get good handsfree performance from your handsfree enclosure.

The next set of questions would provide us a bit more information that can help us guide you through the tuning process:

Did you perform any acoustic measurements on your enclosure (e.g. TIA-810/920 or similar)? Please describe and provide relevant data (frequency characteristics of the transducers as measured within the enclosure in a "quiet room"). Which equipment did you use? Did you have access to anechoic chamber or a "quiet" room?
If you could not perform the industry standard measurements, did you perform the simple ones that we also described in our documentation? (using cheaper equipment) Please describe your findings.
- Did you adjust nominal gains for the mic and speaker? Please provide relevant measured levels.
Did you perform nonlinear echo path analysis using the described procedure and our "nonlin" tool? This is the most important measurement for assessing the possible depth of convergence that you might be able to achieve. It also gives insight into areas where EQ may need to be additionally applied, how much DRC may need to be used in the Rx path, etc.
Did you integrate the DRC in the Rx path (mandatory) and Tx path (desirable)?
What was your configuration during the measurements (for AGC and AER)?
Did you configure EQ in Tx/Rx? How did you get the EQ parameters? Did you use our EQBQ design tool by providing measured and desired responses from the transducers? Please provide design parameters and responses used, if any.
The FDNLP is what you want to use. In general, frequency based approach will give you better performance for handsfree when adjusted properly.
Did you get so far as to create speaker volume tables for NLP parameters?

Finally, a few more questions:

What is the environment in which the product is supposed to operate in handsfree operation (e.g. office, conference room, intercom, hospital, traffic tunnel, etc.)?
What are your priorities in terms of the following (please order them in terms of importance)::
- Loudness (very loud sound)
- Duplex (ability to clearly hear the other side while talking at the same time) (keep in mind that half-duplex phones can have very clear performance as long as it is possible to easily "break-in" and interrupt the other side. Full-duplex may exhibit some level of attenuation or distortion during double-talk, but that can be optimized properly.)
- Echo leakage (desire not to hear any echo coming back)
Are you trying to make perfect full duplex operation without any artifacts or are willing to optimize performance using above three performance parameters?

Please provide answers to these questions and one of our AER experts will follow up providing further help.

Best Regards,
Bogdan

0 Edmund Elsey over 8 years ago in reply to bogdank

Prodigy 190 points

Hi,

Thanks for the reply and apologies for the delay in getting back to you, I’ve been trying to gather as much information as possible to reply to your questions:

The product is a loud, compact handsfree phone device.

The distance between speaker and microphone is 12cm, both facing forwards.

Mechanical coupling has been carefully minimised and there is an internal barrier between loudspeaker and microphone.

The SPL at max power from the speaker peaks at 2KHz at 105dB @20cm– so it is loud. The 1-3KHz region forms the majority of the echo signal.

The microphone is an omnidirectional MEMS that is ported to the outside world

All testing so far has been performed in an anechoic chamber with the product suspended in free air.

We have performed all the necessary measurements to prove it is possible to get good FD performance whilst maintaining high SPL output from the speaker.

The mic gain is set by our own AGC to provide an RMS level of -26dBFS and not clip the ADC, although I have also tried disabling this and setting the gain lower. We are not using the inbuilt AGC function of the AER algorithm. The speaker gain is set for the PA to operate in its linear operating region.

We performed nonlinear echo path analysis using our own equipment and tools, everything is in order.

I’m not sure if we are using the DRC in the Rx path, I’ll try to find out.

The main settings I’m using are Handsfree phone mode processing on, CC set +10, combined loss target set -80dB, Noise Guard active -60dB. FD NLP disabled because unusable.

We are using our own EQ algorithm, not the one in the AER.

We cannot use FDNLP – enabling this garbles the speech (it sounds robotic, almost like the sample rate is wrong and with some data corruption)

We have integrated NLP volume tables but currently have the same values set in each.

For now I’m just trying to get it working in an anechoic chamber without any feedback or shrieking. So HD performance at minimum volume in a free field is my starting point.

Regards,

0 Edmund Elsey over 8 years ago in reply to Edmund Elsey

Prodigy 190 points

Hi,

We didn't integrate the DRC. Is this required for the NLP to operate correctly? Is it required for FDNLP?

Regards,

0 Jianzhong Xu over 8 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Hi Ed,

Sorry for my late response. I was aware of this post but missed your response to Bogdan's comments. Some quick answers:

1. No, you don't have to use DRC for NLP to operate correctly.

2. When you tune the performance in hands-free mode, the first thing is to make sure that the adaptive filter converges. Please refer to chapter 5 of the AER_Quick_Tuning_Guide.pdf in the docs folder. If possible, can you send me a capture like the one in the tuning guide (Figure 5)?

Regards,

Jianzhong

0 Edmund Elsey over 8 years ago in reply to Jianzhong Xu

Prodigy 190 points

Hi,

Thanks for the reply. Its good to know that we don't need the DRC module for it to work correctly. It may take some time for me to provide you with the above information though.

Regards,

0 Edmund Elsey over 8 years ago in reply to Edmund Elsey

Prodigy 190 points

Its going to take us some time to be able to provide you with what you requested. However we optimized filter convergence in handset mode and were getting 28dB ERLE reported by the debug.

In handsfree mode I can hear the filter converging, although this has not been fully optimized it definitely converges within about 3 seconds. However residual echo is still present which the NLP is not removing, and when the filter diverges E.G during doubletalk the NLP is not seemingly doing anything to prevent the echo getting through.

As mentioned previously we are unable to use FDNLP because enabling it breaks the audio and I'm unable to achieve HD operation through aggressive use of the NLP.

0 Jianzhong Xu over 8 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Ed,

Can you elaborate a little bit more what you mean by "hear the filter converging"? The filter shouldn't diverge during double talk because the double talk detector prevents filter adaptation during double talk. If you see filter diverging, it might be caused by something else. A common problem causing filter divergence is sample slip, which happens when the Rx out and Tx in clocks are not synchronized.

Regards,

Jianzhong

0 Edmund Elsey over 8 years ago in reply to Jianzhong Xu

Prodigy 190 points

Hi,

So when talking into the far end and listening for echo, at the start of the call some echo comes back but gets progressively less over about 2 seconds whilst the filter is converging. I can hear this in handset mode and handsfree mode (whilst at minimum volume with low mic gain in handsfree mode to avoid feedback).

Whilst tuning handset mode we were able to log via serial port the reported ERLE from the AER algorithm and I could see this increasing to 28dB over about 5 seconds (taking only 1 second to get to 15dB), but during doubletalk this reduces and occasionally reports 0dB and coincides with echo breaking through until it recovers again. I can hear the same behavior in handsfree mode but we aren't able to log reported ERLE in handsfree mode currently without making some changes to the software.

Regards,

0 Jianzhong Xu over 8 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Ed,

If NLP is enabled, you shouldn't hear any echo during single talk. If you want to listen for echo to get a sense of convergence, you can disable the NLP. Injecting a test signal (white noise or CSS) from the far end can help measure the convergence better than speech.

The ERLE measurement during double talk can drop to 0, but that doesn't mean filter diverges. It's the measurement of signal power reduction before and after the filter. When near end signal dominates, the power before and after the filter is close and thus ERLE is low.

To help debugging, I strongly suggest you getting the following measurements and captures:
1. disable AER, inject CSS from far end, and capture Rx out and Tx in. This can tell us how the raw echo looks like and can reveal coupling problems between speaker and mic.
2. enable AER, disable NLP, inject CSS from far end, and capture Rx out and Tx out. This will tell us how the convergence looks like.

Based on your information, I suspect that your system may have high ERL, meaning echo is much higher than the original signal. This can be caused by bad coupling between the speaker and the mic. Again, this is only speculation until you can capture the signals mentioned above.

Regards,
Jianzhong

0 Edmund Elsey over 8 years ago in reply to Jianzhong Xu

Prodigy 190 points

In handset mode we have 15dB ERL. The Centre Clipper does have an effect, but the linear attenuator doesn't have any audible effect which seems to be why we still hear echo at the start of the call, however I managed to improve this to an acceptable level using the noise guard. My suspicions are that the linear attenuator is not working as it should and this is what is causing us all the problems in handsfree mode.

In Handsfree mode the ERL is -7.8dB, which the algorithm should be able to cope with. We have performed measurements on loudspeaker-microphone coupling and there is nothing abnormal or excessive to suggest mechanical coupling.

0 Jianzhong Xu over 8 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

First, one correction in my previous post - "high ERL" was meant to be "high negative ERL".

Linear attenuation for handset mode is limited to 3dB. When you heard echo in handset mode, was it during single talk or double talk? Center clipper can be made more aggressive to remove echo, instead of using the noise guard.

-7.8dB ERL for handsfree mode is not too bad, but not good enough to achieve good full duplex performance. However, echo shouldn't be heard in single talk. Getting the convergence measurement and signal capture can help us narrow down the problem.

Regards,
Jianzhong

0 Edmund Elsey over 8 years ago in reply to Jianzhong Xu

Prodigy 190 points

In handset mode, the echo is only heard during doubletalk after the filter has converged fully. Setting the Centre clipper rail >0 had a too high an impact on the speech quality, but I found changing the Thresh_Tx_Noise affected the operation of the center clipper also.

So back to my original questions:

What exactly does ‘Hands Free mode processing” do? (apart from allowing increased tail length) does it interact with the NLP? – from you last reply I assume the answer is yes – enabling Hands Free mode processing allows the linear attenuator to provide >3dB attenuation.

For the linear attenuator to operate at maximum aggressiveness all the time what should I set NLP_linattn_Max_ERLE to? I’ve been setting NLP_Linattn_Max_ERLE to 0dB (the developer guide implies this means that there would be no reduction in aggressiveness of the linear attenuator as ERLE increases)- is this correct?

Am I right in assuming NLP_Rx/Tx_Linattn_Min should be set to 0dB and NLP_Rx/Tx_Linattn_Max should be set to 70dB so there will be no restriction imposed on the operation of the linear attenuator? Therefore the amount of attenuation would be set by the combined loss target only?

0 Jianzhong Xu over 8 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Ed,

Your understanding of the parameters are all correct. The hands-free mode operation affects:

1. filter update

2. attenuation and clipping

3. double talk detection

Regards,

Jianzhong

0 Edmund Elsey over 8 years ago in reply to Jianzhong Xu

Prodigy 190 points

ok thanks. I'm going to perform some more tests over the next week to see if I can confirm whether the linear attenuator is working correctly in our implementation or not.

At the stage we are at it will require considerable effort from our software team to provide me with a suitable test environment to obtain the information you requested.

In the meantime it would be great if you can think of any reason why frequency domain NLP isn't working for us.

Regards,

Ed

0 Jianzhong Xu over 8 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Ed,

The garbling and distorting of FDNLP is most likely caused by poor convergence and aggressive setting of NLP. To determine the convergence on your phone, you can also use a network analyzer such as Wireshark to capture the packets, if your phones are connected over a network.

During the capturing, disable AER and then enable it while keeping NLP disabled. You can play out a CSS and use a T-HAT to inject it to the far end phone.

If you have a stable and decent convergence (e.g. giving you 30dB ERLE), you should be able to get a reasonable duplex performance using either TDNLP or FDNLP.

Regards,

Jianzhong

0 Edmund Elsey over 7 years ago in reply to Jianzhong Xu

Prodigy 190 points

Hi,

Sorry for the delay in responding, I was diverted onto other things.

I set the TDNLP as aggressive as possible and found I could clearly hear it operating. Unfortunately, when counting "1,2,3" into the near end I could always hear "1" echoed back. I tried changing the NLP Time Constants to make it as fast as possible, but still found half of "1" echoed back. Any Ideas why this might be happening?

I enabled FDNLP and set the combined loss target to 0dB, but the transmitted speech still sounded robotic, so the issue is not because the NLP is set too aggressive.

Its starting to look like it might be an integration issue and that we need a proper test platform to work out what is going on. From the information I've provided can you suggest any possible issues with the integration that could cause these problems? or do you require the data from the tests in the developer guide?

Regards,

0 Jianzhong Xu over 7 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Ed,

If you disable NLP completely, do you hear echo of all the numbers, 1, 2, 3, etc?

To debug this issue, you need to have captures of the send out signal of the phone under tuning under 3 scenarios:

1. when AER is disabled,

2. when AER is enabled and NLP is disabled,

3. when everything is enabled.

If your current system doesn't have the capability to capture the send out signal, can you describe how it sounds under the first 2 scenarios. You already described the 3rd scenario in your previous post.

Thanks,

Jianzhong

0 Edmund Elsey over 7 years ago in reply to Jianzhong Xu

Prodigy 190 points

Hi Jianzhong,

I've just found out something that could be making the situation worse - We are generating sidetone within the hardware and not through the DSP (therefore not sampled by the AEC). The build of software I'm using to test handsfree mode is simply the handset mode code re-routed to the main loudspeaker and a different microphone, so it is possible there is sidetone enabled (which obviously we don't want in handsfree mode!). I'll need to get our software team to provide me with a build where sidetone is disabled. I'll then re-run my tests in handset and handsfree mode under the 3 scenarios you describe above and report back.

Regards,

Ed

0 Edmund Elsey over 7 years ago in reply to Edmund Elsey

Prodigy 190 points

Hi Jianzhong,

I performed some testing as you recommended and found that under scenario 2 (AER enabled, NLP disabled) that the AER would converge well within about 2 seconds but after 10-20 seconds, loud shrieking unintelligible noises would appear and continue even when the speech had stopped (note this only happens in hands-free mode).

I attempted to resolve the issue by adjusting the tail length and y2x delay but was unable to achieve stable operation even at low volume settings with low mic gain. As a result we have ordered some EVMs (see https://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/543802) so we can run the algorithm in isolation. If we are able to achieve good performance with the algorithm in isolation, then this will prove we have an issue with the integration of the algorithm in our product.

Regards,

0 Jianzhong Xu over 7 years ago in reply to Edmund Elsey

TI__Mastermind 40215 points

Ed,

Thanks for the update. Is that behavior repeatable?

Regards,

Jianzhong

0 Edmund Elsey over 7 years ago in reply to Jianzhong Xu

Prodigy 190 points

Yes its repeatable, but I didn't have time to record it unfortunately. If I get a chance I will record it and upload!

Regards,

Processors

Processors forum

AER used in hands-free tuning issues