McASP: Unjustified high service frequency?

Anonymous

Other Parts Discussed in Thread: OMAPL138, OMAP-L138

Hi,

I would like to ask a question on McASP.

A problem I noticed on McASP peripheral is its extremely small buffer size. On both OMAPL138 and DM6437, there is only one 32-bit buffer for each serializer, and some easy calculation show the amount of CPU/DMA services for a typical CD-quality audio. On OMAPL138, I noticed the addition of 256-byte write and read FIFO buffer, which significantly reduced the number of services per second. Despite this reduction, some 600 services per minute is still significant.

The biggest problem is on DMA. The relationship between DMA's service time and data amount is linear but with a constant positive offset on the ordinate as shown in Question on AM1808 EDMA3. The 26ms number given in that post was not accurate as later discovered, and the actual constant time could be as low as 2ms; however, even for OMAPL138 which has 256-byte McASP FIFO, this still means there requires over 600 DMA transfers per second, each time transferring merely 4 bytes of data. Several issues exist for this operation:

1. If each DMA transfer takes at least 2ms of time, can this 600+ transfers be completed in 1 second?

2. 600+ DMA transfer means requesting SCR bus (switched central resources) bus 600+ times per second. Could SCR respond promptly enough? If there are other uses of DMA, would interference cause delay in DMA's servicing McASP, and subsequently degrading of the sound quality (most noticeably, lowering of pitch as perceived by the ear)?

3. It is clearly again the general principle of DMA operation, which is aimed to transfer larger amount of data between memory locations. The 4-byte (256-byte if FIFO used) amount per transfer is inappropriate for DMA. Why it is designed like this?

4. For DM6437 whose McASP is without FIFO, the 44100 times of servicing per second, either from CPU or DMA, is hardly imaginable. And if by DMA, I fundamentally doubt if DMA could respond fast enough to catch up with that frequency.

	CD Audio
Sampling frequency	44.1 KHz
Bit depth	16
Channel	2
Data rate	44.1K × 16 × 2 = 44.1k × (32bit = 1 word) = 44.1k word(s)
XBUF size	32 bits = 1 word
FIFO @ OMAP L138	256 bytes = 64 words CPU/DMA transfer: 44.1K÷64 = 689.0625/sec
FIFO @ DM6437	N/A CPU/DMA transfer: 44100/sec

As a comparison:

http://computer.howstuffworks.com/sound-card3.htm said:

As with a graphics card, a sound card can use its own memory to provide faster data processing.

A sound card is provided on PC at additional cost, which might not be justified for a single-integrated embedded processor. However, even if McASP cannot have its own internal memory, it might still be designed to have the ability to "fetch" data from memory, as opposed to being "fed". A more ideal design as I simply wished would be allowing an amount of sound data, say 5 seconds, to be placed linearly at a memory location, and McASP would fetch data equaling XBUF size each time, and increment address pointer automatically; upon finishing of transmission of the last XBUF to serializer, it fetches another word of data as pointed by the incremented address pointer, and in this mode everything can be done automatically without the intervention of CPU/DMA.

Why it is not designed like this?

Could anyone advise me of the proper mode of McASP servicing? How much CPU time it typically occupies? And if there are both Video and Audio stream, possible plus other job at the same time, how should CPU time be properly allocated for different streams?

Zheng

over 13 years ago

0 kcastille over 13 years ago

TI__Guru 54422 points

Zheng,

First, I think the 26 ms and even the 2 ms numbers are grossly overestimated.

If we assume 100 EDMA cycles of overhead per event, and EDMA running at (say) 150 MHz (6.6 ns) (refer to your datasheet and specific operating conditions for actual frequency), then that's only 660ns of overhead. Your estimate is off by several orders of magnitude. If your data buffers reside in offchip memory (like SDRAM), there can be additional "jitter" on the overhead due to contention with other requestors in the system and due to refreshes/page commands. You should set the EDMA servicing McASP to highest priority to ensure latency for this transfer is minimized. Alternatively, you can place your data buffers in on chip memory (like DSP L2) which provides higher and more deterministic access patterns compared to DDR.

I can't really comment on why it wasn't designed a particular way. But as you can imagine, the FIFO was added on the OMAP-L138 for reasons related to what you point out. Depending on overall system load and McASP operating frequency, the single deep buffer on DM6437 can result in overflow/underflow. However, the FIFO provided on OMAP-L138 provides much more immunity to such issues.

Regarding your point:

> this still means there requires over 600 DMA transfers per second, each time transferring merely 4 bytes of data

Note that when using the FIFO, the EDMA should be setup to service a "threshold" worth of data per event. Given a 256-B fifo, it is reasonable to set the threshold at 128 Bytes to allow ping/pong style operation where the McASP is filling 1/2 of the FIFO while the EDMA is draining the other half of the FIFO. In this case, the constant overhead is per 128-B not per 4-B.

Regards

Kyle

0 Anonymous over 13 years ago in reply to kcastille

Kyle,

Kyle said:

First, I think the 26 ms and even the 2 ms numbers are grossly overestimated.

If we assume 100 EDMA cycles of overhead per event, and EDMA running at (say) 150 MHz (6.6 ns) (refer to your datasheet and specific operating conditions for actual frequency), then that's only 660ns of overhead. Your estimate is off by several orders of magnitude. If your data buffers reside in offchip memory (like SDRAM), there can be additional "jitter" on the overhead due to contention with other requestors in the system and due to refreshes/page commands. You should set the EDMA servicing McASP to highest priority to ensure latency for this transfer is minimized. Alternatively, you can place your data buffers in on chip memory (like DSP L2) which provides higher and more deterministic access patterns compared to DDR.

I am not really sure on this. Although simple math can derive a figure like this, it is not what I got from experiments, although it is possible that there are still room for optimization that I didn't apply.

And I will experiment the Ping-Pong FIFO approach, thanks for suggestion.

Zheng

0 kcastille over 13 years ago in reply to Anonymous

TI__Guru 54422 points

Zheng,

It's possible that there is overhead in your measurement technique. E.g., are you using interrupts as part of the measurement technique? Are you using the chip level timer, or the DSP internal CPU timestamp counter? In any case, I believe there must be some significant non-dma overhead being measured somehow...

Also, I hope/expect you are setting up the McASP event to directly trigger the EDMA data transfer? (rather than interrupting the CPU and then submitting the EDMA transfer?)

Regards

Kyle

Processors

Processors forum

McASP: Unjustified high service frequency?