[C6713] McASP multi TDM + EDMA : will this work

Eliot Blennerhassett

Intro:

I have a system using McASP in TDM mode (8 slots), and 4 serializers each in and out. So a total of 32 channels in and 32 out. I'll use input as an example in the following.

Now I want to get each channel into its own buffer. My initial attempt has only got the data from each serializer into its own buffer, which then has to be reordered by the CPU: When multiple serializers are active, the EDMA must be programmed in Frame synchronized mode. Each frame transfers the one sample from each serializer. I.e. 4. So each single sample can be placed in its own buffer.

But there is only one counter + index left (the framecount). So there is no way to have a different increment for the 'within TDM frame' i.e. different channels and 'between TDM frames' i.e. the next samples of the channels. The framecount = TDM_slots * single_channel_buffersize

When only a single serializer is active, element sync transfer can be used. So the element count/idx can be used for TDM slots, while the frame count is the single channel buffer size -1.

Untested idea

New idea for multi-serializer TDM EDMA

Set up EDMA Params A to transfer a single TDM frame (32 samples), from McASP into the first element of 32 buffers. Link params to itself.
Set up another EDMA Params B to be chain triggered by A. Transfer B updates A's DST to be the second element of the buffers (e.g. by copying from an array of precomputed addresses). and so on for N=buffer length. I think of B as providing a third dimension to the indexing. B's transfer complete interrupt indicates buffer full.
I think ping pong buffering can be acheived by having 2 param sets for B, covering half the buffer each.

Questions

Before I launch into coding this monstrosity I have a couple of questions. I realise the description is pretty dense - I'm ready to expand on it if there is anyone listening...

Is this workable to have the chained transfer update the parameter set that triggers it? Is there a race between A reloading itself, and B updating A's DST pointer? (Maybe a race can be avoided by having 2 sets of cross-linked params for A)
Is there any other clever way of deinterleaving the multi-serializer TDM using EDMA alone?
Does EDMA3 on newer processors work any better in this scenario?

thanks

Eliot

over 16 years ago

0 RandyP over 16 years ago

TI__Guru* 84110 points

What you want to do must have been done hundreds of times already, so someone out there has to have a working example they could just hand to you. If my comments do not work out for you, and if no one speaks up with an example, try posting again with a title like "EDMA2 Channel sorting on C6713 McASP". Or there might be examples in the board support libraries or CSL examples. I do not have the C6713 BSP installed on my computer, so I have not searched those folders.

EDMA3 on the newer processors is much more flexible than the EDMA2 that you have in the C6713. But it still needs an interim buffer, as far as I can see.

To your other questions about chaining and race conditions, yes there can be a race condition when one channel tries to update another channel while linking and chaining are going on. This is a problem if you try to chain to the channel being updated, since the update might not have completed when the chaining occurs. But in your case, you should be okay because the linking will start before the write from channel B.

Another idea would be to use an interim 32-word buffer to store a single TDM frame. Here is what I would propose:

Create a 32-word buffer, probably in L2 if that is convenient.
EDMA Channel A is triggered by the AREVTn event
EDMA Channel B is chained to when Channel A completes
Channel A will copy 4 samples from the McASP Data Port to 4 contiguous words in the interim buffer (write to 4n+m, n=TDM slot, m=0..3 serializer)
When Channel A has copied a full TDM frame, it will post a TCC to one of the chaining channels, Channel B, and Channel A will link to a simple duplicate reload PARAM set
Channel B will then copy a full frame to the 32 audio channel buffers, with ELECNT=32, ELEIDX=span between buffers
Channel B can have FRMCNT=depth of one audio channel buffer, and FRMIDX=((ELEIDX * (1 - ELECNT)) + 4) (I think the math is right, if not, please post the correction)
Channel B can then reload with a duplicate set or one-of-two for ping-pong, if needed

Since I have not coded this up and tested it, no guarantees. You will be able to get CPU interrupts at every TDM frame (from Channel A) and/or at the filling of the buffer (from Channel B), depending on your need.

0 Eliot Blennerhassett over 16 years ago in reply to RandyP

Intellectual 445 points

Thanks Randy.

On the race condition, you say "you should be okay because the linking will start before the write from channel B." Is this guaranteed that the reload will happen before the chained channel is triggered and does its write?

If there is a problem with A reloading 'at the same time' as B is writing a new value to A's reload parameters, my plan is to have 2 sets of cross linked reload params for A: A1 and A2. When A is reloading from A1, B will update A2 and vice versa. And as you mention, B will have one-of-two reload sets for ping pong.

I haven't looked in detail at EDMA3. If its not capable of servicing my TDM scenario, I'll be disappointed.

Your "another idea" is interesting. I may implement it if my idea doesn't work out.

I think for your channel B, call ELEITEMS the number of elements (eg ints) buffer span. ELEIDX = ELEITEMS*sizeof(element)

FRMIDX = sizeof(element) * (-((ELEITEMS-1) * ELEIDX) + 1) i.e. go back to where we started, then add one element.

0 RandyP over 16 years ago in reply to Eliot Blennerhassett

TI__Guru* 84110 points

Race condition between Channel A linking, Channel A chaining to Channel B, Channel B writing to Channel A's PARAM:

Channel A's linking operation will be performed within the EDMA2 Channel Controller when the last Transfer Request has been sent to the Transfer Controller. Once the TC has completed its part of the transfer (sent the data to the write bus), it will send a Transfer Complete Code to the chaining mechanism. Channel B will then send a TR to the TC to perform a write to Channel A's PARAM. Channel A's linking operation will have been completed even before Channel B receives the chaining trigger. So for this case, there is not a race condition.

If Channel B writes to Channel A's PARAM and then attempts to chain back to Channel A, then there is a race condition between the Config Bus write to Channel A's PARAM and the TC sending a TCC to the chaining register. In that scenario, there is a race condition. This does not apply to your design, but is just some extra information for you.

EDMA3 for future use:

We certainly do not want you to be disappointed when you move to the next processor with EDMA3. For handling McASP TDM channels as you want to do, EDMA3 will have to be programmed very similarly to EDMA2. EDMA3 has many added features, like independent source/destination indexes and count values for three dimensions. But for your application, those do not change the way you will need to design the transfer - either Channel B changing Channel A's active or link PARAM, or Channel B copying to/from an interim buffer.

ELEIDX:

Your equation for ELEIDX is usually going to be correct, but it will fail in the cases where the buffers are allocated with holes between them. This will always happen when the ELEITEMS*sizeof(element) is not a multiple of 8, at least on C64/C64x+/C674x devices, or a multiple of 4 for C62/C67 devices. The safest way is to subtract the start addresses of two adjacent buffers. Please keep in mind that sizeof(element) only works for element sizes of 1, 2, or 4 bytes, since those are all the choices available in the ESIZE field of the OPTions register. For complex/larger elements, you have to combine ESIZE with ELECNT/ELEIDX for the first dimension. Note that EDMA3 does not have that limitation.

FRMIDX:

Nobody ever likes (1-ELEIDX) rather than -(ELEIDX-1). From professors to code proof-readers, everyone prefers -(ELEIDX-1) because it is more intuitive to read and therefore better self-documented. Your FRMIDX equation will work regardless of alignment holes (if handled already in ELEIDX), and it is more robust for other element sizes. And again, sizeof(element) must be 1, 2, or 4 bytes, as mentioned above.

0 howy over 16 years ago

Intellectual 460 points

Programming the EDMA takes a lot of time and is hard to debug.

Instead of reordering the data with an elaborate multi chained EDMA setup, since I had to convert that data to floating point and perform peak detection on all inputs and outputs, I made that routine performed the reordering at the same time. I did have to go thru the trouble of making the EDMA work in an autonomous ping pong fasion, but the resulting data was not stored in a nice logical order.

If you have to do 192khz watch out for EDMA latency. I got it to work, but I struggled with that for a long time and never did figure out why the EDMA would be held off for long periods of time (1.5 sample periods at 192khz sample rate) when my main processing interrupt started. It seemed like the time when the registers were being pushed onto the stack to prepare for the interrrupt, the DMA would be held off.

Sorry, but I can't give you my source code.

-howy

0 howy over 16 years ago in reply to howy

Intellectual 460 points

Sorry about the confusion, the project I am refering to in this post is a 6713 Pro Audio product that I designed several years ago when the 6713 first came out.

My other posts are for a different product that I am porting from 6713 to 6747.

-howy

0 howy over 16 years ago in reply to howy

Intellectual 460 points

I should also mention that I was using DSP/BIOS running out of SDRAM in that 6713 product which may have accounted for the odd delays involved.

The 6747 fixes that issue by providing read and write fifos on the McASP ports. If the fifos were a little bigger we would not have to bother with audio DMA at all.

-howy

0 Eliot Blennerhassett over 16 years ago in reply to howy

Intellectual 445 points

howy said:

Programming the EDMA takes a lot of time and is hard to debug.

Instead of reordering the data with an elaborate multi chained EDMA setup, since I had to convert that data to floating point and perform peak detection on all inputs and outputs, I made that routine performed the reordering at the same time. I did have to go thru the trouble of making the EDMA work in an autonomous ping pong fasion, but the resulting data was not stored in a nice logical order.

Thanks Howy. I agree with your comment about EDMA programming. I'd like to see TI provide a PC based simulator for the EDMA so that you could enter parameter sets, and then click a button to generate events, and watch the source and dest addresses. Perhaps this can be done with the device simulator?

Combining reordering with other stuff, yes I do have to convert to float as well. However this still involves a second buffer and the attendant cache issues resulting from stepping through either source or destination non-linearly.

0 RandyP over 16 years ago in reply to Eliot Blennerhassett

TI__Guru* 84110 points

The simulator does simulate the operation of the EDMA, and it does it as close to cycle accurate as possible. For example, if you start a QDMA or write to ESR to trigger a DMA and single-step over that write and a couple more instructions, the simulator will have only run for a few CPU clock cycles so not much of the EDMA will have progressed yet - you might need to put in a for-delay loop and run past it. The EVM will do the whole transfer very quickly since the EDMA contiues to run between single-steps.

When I have tried to debug transfers, especially complex chained sequences, I turn off the chaining in the OPT registers and write in a bunch of writes to ESR (followed by a for i=0-1000 loop) and run to each ESR individually while watching the PARAM in a memory window. This lets you see exactly what transfer has happened based on your sync mode, etc. This might not be the same as the "click a button" technique you would like, but it does help to go more slowly through the process.

Please do offer more suggestions on how the tools and simulator could be made more useful.

0 Eliot Blennerhassett over 16 years ago in reply to RandyP

Intellectual 445 points

RandyP said:

When I have tried to debug transfers, especially complex chained sequences, I turn off the chaining in the OPT registers and write in a bunch of writes to ESR (followed by a for i=0-1000 loop) and run to each ESR individually while watching the PARAM in a memory window. This lets you see exactly what transfer has happened based on your sync mode, etc. This might not be the same as the "click a button" technique you would like, but it does help to go more slowly through the process.

Thanks Randy, great idea. I have created a project to test my ideas. I will post it here if I get company permission. Within the loop, I set a breakpoint on the ESR write, so pressing 'F5' runs one cycle, or using Animate shows it running at readable speed.

I had to edit the init6713sim.gel to make the whole PaRam readable, for some reason only the first few hundred bytes were enabled.

One further question - why do you turn off chaining? For my tests it is a vital part of the setup.

0 RandyP over 16 years ago in reply to Eliot Blennerhassett

TI__Guru* 84110 points

If you are using the simulator, then there may need to be a for loop delay after the ESR where you place the breakpoint. Otherwise, the entire transfer might not have completed.

The chaining project I built had three DMA channels sort of chaining through each other continuously. Ch A copied SRC/CNT/DST from a table to Ch C's PARAM then Ch A chained to Ch B; Ch B wrote to ESR to trigger Ch C to avoid an EDMA race condition; Ch C chained to Ch A.

So when I hit ESR the first time, the three channels should have run a bunch of times before I would see things stop. By turning off the chaining and just manually triggering each one, I could single-step through the sequence to debug it. Chaining definitely had to be turned back on to get the final application to work. It was vital for my case, too, but made it harder to see what was happening during debug.

0 Eliot Blennerhassett over 14 years ago in reply to RandyP

Intellectual 445 points

RandyP said:

EDMA3 on the newer processors is much more flexible than the EDMA2 that you have in the C6713. But it still needs an interim buffer, as far as I can see.

I'm refreshing this thread to add some more info now I have some direct experience with c6747, EDMA3 and McASP

I'm still using multi-serializer + TDM on the McASP. Sadly the EDMA3 is no better than the EDMA2 for this purpose. The first of the 3 dimensions is "wasted" because it is used to specify that there are 4 bytes in a word (edma2 had opt.esize for this). This only leaves 2 dimension (same as edam2) for the transfers, when 3 are required.

However, what does improve the situation is the McASP FIFO. This can be set up so it generates one event for for an entire audio frame of (Tdm_slots * serializers) words. I.e. it collapses the 'dimensions' of the McASP data from 2 to 1.

The McASP Tx fifo introduces its own quirks, in that it will 'suck up' a complete fifo full of data, thus putting Rx and Tx EDMA events out of sync. This can be solved by prefilling the FIFO with zeros before starting, at the expense of extra latency.

Processors

Processors forum

[C6713] McASP multi TDM + EDMA : will this work

Intro:

Untested idea

Questions