PCM2912A: rare problem in ADC ("IN") direction, data overlap 256 samples back, at 1 kHz

Frank Rysanek

Other Parts Discussed in Thread: PCM2912A, TUSB2046B

Dear gentlemen (any ladies around?)

I work as a tech support with a specialty distributor of industrial gadgets. I'm trying to help an integrator customer, using some callcenter headsets with a USB interface, containing the PCM2912A. The headset is an off-the-shelf product by TIPRO. The contact at my customer company is a programmer with quite some background in small-signal electronics, I'm just scratching the surface of Linux guts and basic electronics.

Long story short: under some circumstances, with varying period of occurrence (about 1-2 days on some sites, maybe a week on others), the PCM2912A seems to send garbage instead of some samples in the data stream. The ADC sampling rate is set to 48 kHz. Upon closer inspection, it seems that every 1 ms (= every 48 samples in the resulting audio recording), a couple of samples are garbled. In a particular example (see URL below), it was 6 samples every 1 ms, but reportedly on another occasion it was 2 samples every 1 ms. Interestingly, the garbage is not completely random. Upon closer inspection, the data in the "garbage" sequences is actually valid sample data, taken from an offset 256 samples (512B) in the past. 6 samples every 1 ms, taken 256 samples in the past. As 256 is not divisible by 48 (off by -16/+32 samples), data from the garbled positions is lost. Upon a very close inspection, on the first garbled sample, possibly the MSB comes from the correct sample, combined with LSB from the garbled/shifted sample (or completely hallucinogenic).

I haven't found a way to attach files in this forum. Here is a link to a raw audio file and two commented audacity screenshots:

e-shop.fccps.cz/.../distorted.zip

The problem occurs "at random" (= we do not know the trigger, nor can we reproduce this in a lab yet) and doesn't vanish all by itself - the only way out is to re-start the transfer (close and reopen the data stream).

We've tried to gather some information about the USB async isochronous transfers... as far as we can tell, 1 ms is the nominal "SOF clock rate" for full-speed USB, which should effectively correspond to the rate of IN transactions in insochronous transfers. At 48kSps, each IN payload packet (URB) contains 48 samples (theoretically give or take 1 sample for a marginal clock rate mismatch). The shift of 256 samples seems fairly characteristic, but doesn't "fit in the nominal USB clockwork" - and, as it's applied only to the payload (rather than the whole USB packets), it would seem to come from the 2912A. We're wondering if perhaps the host goes "out to lunch" past a certain margin, misses a couple "bus turn-around periods" (N times 1 ms) = fails to send an IN request for a few periods, and the 2912A goofs up - has nowhere to send the data, and possibly fails to move the "read pointer" forward (in the absence of IN frames for its endpoint, it fails to act on just the SOF frames, generated autonomously in precise 1ms spacing by the HCI hardware). This potential explanation is just a theory and is possibly imprecise or just plain wrong. We haven't yet captured a "now it happend" moment, as the occurrence is relatively rare, on a production system, and difficult to reproduce.

We're considering some hacks to the Linux kernel (USB audio drivers) to artificially introduce a small "hiccup" in the stream of IN requests, to try and see what happens. If this turns out to be a way to reproduce the problem, we could focus on mitigating "out to lunch" latencies in the host PC (scheduling priorities, SMI suppression, RT patches, compile-time preemption settings and some such - people around RTAI know better).

The callcenter application involves running the USB-attached microphones for long periods of time, 24/7. The USB transfer doesn't ever get turned off, except for system maintenance. The host PC is running Linux 3.10.17 with ALSA 1.0.27.1. This host PC is effectively an "audio grabbing front end", sending the data further over a network. My customer's software talks straight to the raw ALSA device in user space - there's no PulseAudio or Jack framework at work. The point is that the problem can be observed already at this grabbing PC, by watching the URB's coming from the low-level UHCI/EHCI drivers. Unfortunately we don't have a proper USB HW analyzer, and therefore we have to watch the URB's in software (Wireshark with a patched libpcap with USB support). Wireshark traces are available from us upon request.

Any comments or ideas are welcome...

While doing my homework (gathering knowledge about the USB async isochronous clockwork), I found a beautiful report (memoire) by Hitoshi Kondoh of Burr Brown (now TI) giving some context of how this USB audio product family got started. Very nice reading, explains some concepts... unfortunately not very relevant for the async transfer sub-mode in the "IN" direction and not a hint about the buffering details, but still pretty dense material.

www.thewelltemperedcomputer.com/.../Hitoshi Kondoh story.pdf

I have also noticed the "UAA guidelines" by Microsoft (from 2006), claiming that the async mode should not work at all on w2k3 or older, which would include XP SP3, where we did use the TIPRO headset not exactly "just fine", but it did work with a marginal quirk... (which we attributed to broken XP drivers for the EHCI/UHCI).

download.microsoft.com/.../uaa_guidelines.doc

Frank Rysanek

over 9 years ago

0 Luis Fernando Rodriguez S. over 9 years ago

TI__Guru* 84915 points

Hi, Frank,

Welcome to E2E and thank you for your interest in our products.

I will take a look of this and I will answer you as soon as possible.

Best regards,
Luis Fernando Rodríguez S.

0 Frank Rysanek over 9 years ago in reply to Luis Fernando Rodriguez S.

Prodigy 50 points

Dear Mr. Rodríguez,

thanks for your polite response...

I have a bit of an update: my integrator customer has managed to find a crude way to reproduce the "flawed payload buffering behavior" in his lab.
Using a solid-state switch (photoMOS relay or some such) he shorts the USB data A+B for 1 ms, thus wasting both an IN frame and likely also a SOF frame in the affected "bus clock cycle". He keeps repeating this (with a period of 7110 ms = 7.11 s) until he notices in the digital audio stream, that the problem has arrived - and stops disrupting the USB data at that point. (If he kept on causing the disruptive shorts, the problem would eventually go away - observed in the lab setup). Once the problem occurs, and the deliberate disruptions are canceled, = just a clear USB channel remains, the "buffering problem" stays there forever.

The above "reproduction method" would be consistent with TI's general advice to "pay close attention to your grounding" (possibly extrapolated to EMI ingress in general, in the USB data link). It sounds weird though - the environment at the "call center" is not exactly EMI-messy and they've obviously checked for grounding/EMI problems before... the integrator is a traditional supplier of telco gear, they're doing their own hardware design (not in this case - the USB mike is not their product), they do have the know-how.

The reproduction method above doesn't preclude a possibility, that the actual cause of the problem in production operation is more subtle, not EMI/GND related - such as, that the host PC fails to send an IN frame now and then for some reason. System management interrupt comes to mind, as a usual suspect - though we haven't been able to observe this in the lab yet. We're just gazing at the source code, trying to invent the right place where to insert a snippet of disruptive code, which would allow us to generate this kind of anomaly. In the lab, the Linux host PC runs the isochronous transfer rock solid, no matter how much user-space and LAN interrupt load we throw at it. (multiple flood pings, the host reporting 92% in "soft IRQ", yet the transfer keeps clocking away just fine)

Any further suggestions and notes are most welcome :-)

Frank Rysanek

0 Ivan Salazar over 9 years ago in reply to Frank Rysanek

TI__Guru** 110745 points

Hi Frank,

Could you share your schematic and layout if possible? It would be helpful to verify that there is no design problem here.
Some distortion issues show because of a deficient clock signal or impedance mismatching at the D+/D- that could lead to signal reflections.
The crystal oscillator circuitry must be close to the PCM2912A and must not use any vias in its traces.
The weird thing here is that the problem appearing has some kind of random behavior.
The problem points more to the interface between the host and the PCM2912A.

Best regards,
-Ivan Salazar
Texas Instruments

0 Frank Rysanek over 9 years ago in reply to Ivan Salazar

Prodigy 50 points

Dear Mr. Salazar,
thanks for sharing those hints :-)

I have to say that I'm gradually learning more about the "culprit" field sites (one of which might be less than ideal) and also about the precise internals of the USB handset containing your chip. The host PC's used at two misbehaving field sites contain 6- and 7-series Intel chipsets - those already don't have a companion UHCI, just an EHCI with a rate-matching hub (not sure if you know about that development in the Intel chipsets, and this arrangement should not really make a difference). At a particularly troublemaking site, the overall cabling length from the handset to the PC is 10 m, with an AUX-powered hub halfway down the cabling route - i.e. two hops, 5 m each. The wall wart AC/DC adapter powering the hub is likely nothing special. Next, on part of the "USB handset" - the 2912A is not wired directly to the outside USB-B female, there's an additional hub chip inside the handset - a TUSB2046B (there are other USB endpoints in the box, served by an AT90USB647 = Atmel MCU). Apparently there are 3 pcs of 6MHz crystals in the box, on two PCB's :-) The PCB traces to crystals seem allright - very short and no vias in them. I can also see proper series termination resistors on the internal USB lanes (between the neighbor chips on the PCB).

I got some photoes of the two PCB's in the TIPRO handset from my integrator customer, but I don't want to present them to the broader public, the PCB designs are property of TIPRO. Is there a way to send some materials to you in private? I would probably ask TIPRO first, certainly for permission, and maybe for direct cooperation with you (in terms of schematics - I don't have those). The circuitry and the boards and the overall concept are their designs.

My integrator customer is well equipped to look at the clocks with an oscilloscope and measure the clock rates with a precise counter. So far though, all the clock signals reportedly looked good and the clock rates were well within the +/- 500 ppm tolerance of the 2912A (more like 70 PPM actually, within 400 Hz "off the mark" at 6 MHz).

Considering that the built-in TUSB2046B is a USB 1.1 hub, the external "USB repeater hub en route" probably handshakes the upstream link at "high speed" and the dowstream link at "full speed".

I've suggested to my integrator customer to try a USB full-speed isolator in the long cabling link, and try a local connection from the handset's USB-B shield ground to a local mains protective earth (maybe with a local power supply, for a good measure) - to check if the "shifted by 256 samples garbled" problem gets alleviated. Not sure if they'll try that and how soon :-)

As a side note: reportedly if they try to trigger the problem by "zapping" the USB transmission line, the problem reproduction is actually pretty consistent: 5 lost "isochronous cycles" reproduce the problem, another "lost cycle" makes the problem go away :-) This can be repeated over and over.

My integrator customer has repeatedly voiced his opinion that: while they may have an occasional EMI burst in the environment, which is difficult to shield away, and may cause loss of a SOF frame or an IN frame, in theory these occasional dropped frames should not make the 2912A garble payload data! The isochronous USB transfer per definition does not bother to guarantee delivery by retransmissions, it is assumed that lost packets are lost for good - implying that the components in the system should cope gracefully with an occasional lost frame, with a mere pause in the audio recording where the frame was lost (this implication is admittedly ours :-) If the observed behavior is indeed a sign of the 2912A "getting the buffer head vs. tail pointers wrapped around" as a result of lost USB packets, it sounds like a bug in the chip, of the ASIC design not being robust enough against lost packets :-/ This is possibly not a part of a USB conformance test...

Thanks for your help so far :-)

Frank Rysanek

0 Ivan Salazar over 9 years ago in reply to Frank Rysanek

TI__Guru** 110745 points

Hi Frank,

You can send the files directly to my e-mail address: ivan.salazar@ti.com
I see this is a complex system and not simply the PCM2912A connected through USB.
So the distortion problem is not present in any system/environment but in specific circumstances?
I think the confusion/problem is that the missing packets are not really missed but replaced with some other data right?

Best regards,
-Ivan Salazar
Texas Instruments

0 Frank Rysanek over 9 years ago in reply to Ivan Salazar

Prodigy 50 points

Dear Mr. Salazar,

I will politely ask my support contact from TIPRO to send you some official material - hopefully board layouts and schematics for you to check (I am told that the box contains two PCB's).

I guess there are two factors in play, that together result in the problem occurring (distorted audio):

1) the 2912A does not tolerate lost USB packets (IN requests coming from the host) in an isochronous audio stream. Or possibly a lost SOF+IN. There's a sampling rate (48 kHz) and a SOF packet rate (1 kHz) running async / not locked together, small deviations can be accomodated by inserting or skipping a single sample in the 48-sample payload frame now and then, but a completely lost IN request results in "nowhere to send a whole batch of 48 samples", which the 2912A internally cannot properly cope with. It possibly has a buffer for 256 samples, which overflows after some 5 lost IN requests. Seems that lost IN requests are not detected from "circumstantial clues" and not handled gracefully.

2) for some reason, in some field deployments (possibly not all), IN packets do get lost occasionally, and frankly it may be difficult to do something about it.

These two factors combined result in a behavior, where the 2912A garbles a few samples per every USB IN payload - replaces them with older data, as mentioned before.

Even detection that the problem has actually occured "right now" is difficult, because the audio spectrum is not impaired all that much (the amplitude envelope of the garbage closely follows the useful signal), and in the application there may be no human listener permanently listening to all the voice traffic, or the human listener is geographically distant, not instructed to report when the problem occurs etc.

At this stage it's difficult for me to tell if the packet loss (item 2) is strictly due to some local EMI (the field site can be clean all day long, and then twice a day someone flips a switch or something) or if this is possibly down to some design-level violation in the PCB's by Tipro, or some chip-to-chip antagonism among the chip models used (TI codec, TI hub, Atmel MCU acting as a HID), or what. The Xtal clocks involved look allright.

I guess among the three parties involved (me, TIPRO and our integrator customer) we all welcome your polite suggestion to look at TIPRO's designs, to check for any obvious problems.

Audio

Audio forum

PCM2912A: rare problem in ADC ("IN") direction, data overlap 256 samples back, at 1 kHz