TMS320C6655: EDMA CC missed events

Jim Paul

Part Number: TMS320C6655

Hi TI folks

I'm debugging a legacy design for the C6655, and have detected EDMA3 CC missed events (using ERRINT) steadily occurring over time on chained DMA channels. There were no losses from the Transfer Controller. I'm looking for ideas to get rid of these missed events completely. I am quite close, but need advice regarding EDMA event prioritisations.

------ Let me explain the DMA architecture

There is a DMA process (lets call it DMA0) that performs a total of 256K A-Synchronised transfers in just over one second. The triggering event is from an external GPIO (0) every 4us. The array size is 16 bits (aCnt = 2); frame size = bCnt = 32. The 16 bits are transferred from a static location within an external FPGA buffer, lets call it FPGA[0] to an incrementing location in DDR external memory, lets call it location DDR[0]. I dont see any missed events on this DMA channel. However, DMA0 is chained (normal completion) to another DMA process, DMA1 (transfers data from FPGA[1] to DDR[1]), which in turn is chained to DMA2 (transfers data from FPGA[2] to DDR[2]), then DMA3(etc), and DMA4(etc). So there can be 5 concurrent DMA processes.

I am detecting missed DMA events on DMA1 mainly, with less on DMA2,3 and 4. There are no losses on DMA0, which is externally triggered. So, the losses only occur on the chained DMA channels, particularly the first one, DMA1.

There are also a couple of QDMA channels that are very active during this period (no missed events on those), and also some asynchronous HWI ISR that are occur infrequently.

------ Now, a couple of things that I dont understand

1) There is a trigger priority in the EDMA CC, which prioritises external events over chained events. But its not clear why the QDMA channels would not loose events BEFORE the chained EDMA3 CC events.

2) I have tried breaking the chaining, and instead of triggering from a single GPIO external interrupt, triggering from three external interrupts instead. This works initially, but stops after a short while.

Any advice on either 1, or 2. Or ideas on what might cause the EDMA3 Missed Events?

Thanks

Jim

over 6 years ago

0 Mukul Bhatnagar over 6 years ago

TI__Guru* 84505 points

Jim

Might need you to tabulate your channels used, TCC, used, src dst for DMAs vs QDMA , number of bytes transferred by each channel per sync event (is DMA1 doing the same ACNT, BCNT as DMA0?) and DMAQNUM - essentially which channel is going to which queue/TC - to visualize this better.

Are you sure that the EMR bits and associated error interrupts are happening truly due to a missed "chained" event and not because the event is hitting a NULL param.? Do you configure the params as static?

Does the issue not happen if the QDMA transfers were not happening?

As per the definition of the EMR register

For a particular DMA channel, if a second event is received prior to the first event getting cleared/serviced,the bit corresponding to that channel is set/asserted in the event missed registers (EMR/EMRH). All trigger types are treated individually, that is, manual triggered (ESR/ESRH), chain triggered (CER/CERH),and event triggered (ER/ERH) are all treated separately. The EMR/EMRH bits for a channel are also set if an event on that channel encounters a NULL entry (or a NULL TR is serviced). If any EMR/EMRH bit is set (and all errors, including bits in other error registers (QEMR, CCERR) were previously cleared), the EDMA3CC generates an error interrupt. See Section 2.9.4 for details on EDMA3CC error interrupt generation

[MB] Will depend what is actually happening on your DMA channels vs QDMA channels. The missed event bit will get set if the CC has not been able to move the TRP from CC to TC and the next chained event happens - which is unlikely , given you have normal completion set (I think). It is more to do with what is keeping a particular DMA channel "busy" such that the next event came in while previous chained event was not serviced.

Usually i do not see this happening with chained events, it typically would happen with the async events like GPIO or McBSP rx/tx, which will just go on at a fixed interval without notion of internal DMA transfers etc. Does that make sense?

[MB] Clarify why you took this approach? and what does "initially" mean - did it stop after QDMA started, or you have some programming issue with your GPIO, interrupt setup?

Regards

Mukul

0 lding over 6 years ago in reply to Mukul Bhatnagar

TI__Guru* 95265 points

Jim,

>>>There is a DMA process (lets call it DMA0) that performs a total of 256K A-Synchronised transfers in just over one second>>>> If the GPIO every 4 us and ACNT= 2 bytes, BCNT = 32 bytes, A-sync. So 64 bytes in 4 us, that is 16MB/second. How do you only get 256KB in one second? If you change the GPIO to a larger interval, say 8 or 16 us, will you see less event miss?

Also can you explain

0 Jim Paul over 6 years ago in reply to lding

Prodigy 240 points

Hi lding ,

Let me explain the DMA architecture again, hopefully a little clearer. I will respond to Mukut seperately above as soon as I have the data he is requesting.

This is essentially a ping/pong data acquisition/processing from 5 sources. For each DMA[0|1|2|3|4] process, 256K (262,144) ADC[0|1|2|3|4] samples are transferred one at a time (this is A-synchronised) from a static (for each ADC) transmit location ADC_buffer[0|1|2|3|4] containing the most recent ADC[0|1|2|3|4] sample into external DDR sample memory (incrementing the destination location DDR[0|1|2|3|4] "ping" buffer by 16 bits each time), for further processing on all 5 sets. The sample rate on each ADC[0|1|2|3|4] is 250kHz, therefore new ADC[0|1|2|3|4] samples are arriving every 4 micro seconds. Every 256K samples, when the DMA[0|1|2|3|4] is completed on all the transfers, it is restarted (but this time into a "pong" buffer), etc etc.

There are GPIO interrupts triggered each time there is a new sample from each ADC[0|1|2|3|4], these are GPIO[0|1|2|3|6].

However, we are only using GPIO 0 to initiate the transfer from ADC_buffer[0] into DDR[0], and after normal completion of this transfer, we are chaining the rest of the transfers together. So, every time there is an external interrupt on GPIO[0] (every 4us) it triggers (by chaining) the other 4 transfers one after the other.

If I try to reduce the GPIO interval to say 8 or 16us, this would require a fundamental change in the ADC sampling rate, which is not possible with this legacy design.

Jim

0 Jim Paul over 6 years ago in reply to Mukul Bhatnagar

Prodigy 240 points

Hi Mukut,

Thanks for your response. Please refer below. Thanks, Jim

Q1. Might need you to tabulate your channels used, TCC, used, src dst for DMAs vs QDMA , number of bytes transferred by each channel per sync event (is DMA1 doing the same ACNT, BCNT as DMA0?) and DMAQNUM - essentially which channel is going to which queue/TC - to visualize this better.

// GPIO 0 triggered every 4usecs, every new sample from ADC 0

// GPIO 1,2,3,6 each generate 4usec interrupts, every new sample from ADC 1,2,3,4; BUT these other GPIO’s are NOT used

static const U32 cEdmaChannelAdc0=CSL_EDMA3CC2_GPINT0;

static const U32 cEdmaChannelAdc1=CSL_EDMA3CC2_INTC1_OUT7;

static const U32 cEdmaChannelAdc2=CSL_EDMA3CC2_INTC1_OUT8;

static const U32 cEdmaChannelAdc3=CSL_EDMA3CC2_INTC1_OUT9;

static const U32 cEdmaChannelAdc4=CSL_EDMA3CC2_INTC1_OUT10;

// Note that there are no EDMA missed events on cEdmaChannelAdc0, triggered from the external GPIO0, but I see the most missed events from INTC1_OUT7, less from OUT8, OUT9, and OUT10

// 5x16 bit static DMA src locations, each updated every 4usec when a new 16 bit ADC sample arrives

U32 iSrcAddr; // ADC FPGA Addresses Fpga_adc[0|1|2|3|4]

// Static external memory location for each ADC sample set (incrementing DMA dst address), total size =16x2x256x1024 bits

U32 iDstAddr; // DDR ADC Buffers adcBuffer[0|1|2|3|4]

// Setup the PaRam Entry

iDmaConfig[0].option = CSL_EDMA3_OPT_MAKE(CSL_EDMA3_ITCCH_***, // Refer to Table 1 below

CSL_EDMA3_TCCH_***, // Refer to Table 1 below

CSL_EDMA3_ITCINT_DIS,

CSL_EDMA3_TCINT_***, // Refer to Table 1 below

TCC, // Refer to Table 1 below

CSL_EDMA3_TCC_NORMAL,

CSL_EDMA3_FIFOWIDTH_NONE,

CSL_EDMA3_STATIC_DIS,

CSL_EDMA3_SYNC_A,

CSL_EDMA3_ADDRMODE_INCR, // CSL_EDMA3_ADDRMODE_CONST is NOT SUPPORTED by the C6655

CSL_EDMA3_ADDRMODE_INCR); // CSL_EDMA3_ADDRMODE_CONST is NOT SUPPORTED by the C6655

iDmaConfig[0].srcAddr = (Uint32)iSrcAddr;

iDmaConfig[0].aCntbCnt = CSL_EDMA3_CNT_MAKE(16,32);

iDmaConfig[0].dstAddr = (Uint32)iDstAddr;

iDmaConfig[0].srcDstBidx = CSL_EDMA3_BIDX_MAKE(0,16);

iDmaConfig[0].srcDstCidx = CSL_EDMA3_CIDX_MAKE(0,16);

iDmaConfig[0].cCnt = 256*1024;

// Table 1: ADC Data Acquisition PaRAM differences across all instances

#	iSrcAddr	iDstAddr	Open DMA Ch	TCC	ITCCH	TCCH
0	Fpga_adc[0]	adcBuffer[0]	cEdmaChannelAdc0	cEdmaChannelAdc1	Enabled	Enabled
1	Fpga_adc[1]	adcBuffer[1]	cEdmaChannelAdc1	cEdmaChannelAdc2	Enabled	Enabled
2	Fpga_adc[2]	adcBuffer[2]	cEdmaChannelAdc2	cEdmaChannelAdc3	Enabled	Enabled
3	Fpga_adc[3]	adcBuffer[3]	cEdmaChannelAdc3	cEdmaChannelAdc4	Enabled	Enabled
4	Fpga_adc[4]	adcBuffer[4]	cEdmaChannelAdc4	cEdmaChannelAdc4	Disabled	Disabled

Q2. Are you sure that the EMR bits and associated error interrupts are happening truly due to a missed "chained" event and not because the event is hitting a NULL param?

No. In what situations would a NULL TR be serviced?

Q3. Do you configure the params as static?

Are you referring to the PaRAM entries? Since double buffering is used, the PaRAM entry is reloaded to point to a slightly different dstAddr at the end of every transfer (which is approximately every 1.048s, 256x1024 samples x 4usec). This is shown in the table above.

Q4. Does the issue not happen if the QDMA transfers were not happening?

I haven’t tried this yet.

Q5. There is a trigger priority in the EDMA CC, which prioritises external events over chained events. But its not clear why the QDMA channels would not loose events BEFORE the chained EDMA3 CC events.

[MB] Usually I do not see this happening with chained events, it typically would happen with the async events like GPIO or McBSP rx/tx, which will just go on at a fixed interval without notion of internal DMA transfers etc. Does that make sense?

Yes, this does make sense to me, and is what I would expect.

FYI - The QUE prioritisations are set up as follows:

CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_0, CSL_EDMA3_QUE_PRI_0); // 0-7 highest-lowest

CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_1, CSL_EDMA3_QUE_PRI_7); // 0-7 highest-lowest

CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_2, CSL_EDMA3_QUE_PRI_7); // 0-7 highest-lowest

CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_3, CSL_EDMA3_QUE_PRI_7); // 0-7 highest-lowest

QDMA occurs on CSL_EDMA3_QUE_1 at the same time as the Data Acquisition DMA on CSL_EDMA3_QUE_DEFAULT = 0 = CSL_EDMA3_QUE_0.

QDMA SrcAddr is DDR memory/L2 Cache, DstAddr is L2 Cache/DDR memory (data moved back and forth).

Q6. I have tried breaking the chaining, and instead of triggering from a single GPIO external interrupt, triggering from three external interrupts instead. This works initially but stops after a short while.

[MB] Clarify why you took this approach? and what does "initially" mean - did it stop after QDMA started, or you have some programming issue with your GPIO, interrupt setup?

I found by experimentation that if I used 3xGPIO external interrupts (instead of just GPIO 0 as described in Table 1 above) and therefore reduced the amount of chaining, then the number of EDMA missed events reduced to zero. However, after several minutes the DMA appeared to stop. I haven’t debugged this yet, so this might not be the cause. But while it worked, it did make a significant difference to the EDMA missed events counted. So, this is not a solution.

static const U32 cEdmaChannelAdc0=CSL_EDMA3CC2_GPINT0;

static const U32 cEdmaChannelAdc1=CSL_EDMA3CC2_GPINT1; // new

static const U32 cEdmaChannelAdc2=CSL_EDMA3CC2_INTC1_OUT8;

static const U32 cEdmaChannelAdc3=CSL_EDMA3CC2_GPINT3; // new

static const U32 cEdmaChannelAdc4=CSL_EDMA3CC2_INTC1_OUT10;

// Table 2: (Using ext. GPIO 0,1,3) ADC Data Acquisition PaRAM differences across all instances

#	iSrcAddr	iDstAddr	Open DMA Ch	TCC	ITCCH	TCCH
0	Fpga_adc[0]	adcBuffer[0]	cEdmaChannelAdc0	cEdmaChannelAdc0	Disabled	Disabled
1	Fpga_adc[1]	adcBuffer[1]	cEdmaChannelAdc1	cEdmaChannelAdc2	Enabled	Enabled
2	Fpga_adc[2]	adcBuffer[2]	cEdmaChannelAdc2	cEdmaChannelAdc2	Disabled	Disabled
3	Fpga_adc[3]	adcBuffer[3]	cEdmaChannelAdc3	cEdmaChannelAdc4	Enabled	Enabled
4	Fpga_adc[4]	adcBuffer[4]	cEdmaChannelAdc4	cEdmaChannelAdc4	Disabled	Disabled

Note that if I use 2xGPIOs, the number of EDMA missed events reduced by a factor of 4 compared to just using GPIO 0 only, but the missed events did not reduce to zero. So, the use of external GPIOs definitely has an impact.

I have also seen this code run with less than 5xDMA Data Acquisition processes, and I think (to be confirmed) that the amount of EDMA missed events is also lower in those situations.

Further, I also see a dependency on the amount of other concurrent access to DDR (could this be contending with the EDMA accesses?).

Any further clues/ideas as to why this might be occurring? Let me know if you require more details please?

Thanks

Jim

0 lding over 6 years ago in reply to Jim Paul

TI__Guru* 95265 points

Jim,

Thanks for the explanation! I saw you used:
static const U32 cEdmaChannelAdc0=CSL_EDMA3CC2_GPINT0;

static const U32 cEdmaChannelAdc1=CSL_EDMA3CC2_INTC1_OUT7;

static const U32 cEdmaChannelAdc2=CSL_EDMA3CC2_INTC1_OUT8;

static const U32 cEdmaChannelAdc3=CSL_EDMA3CC2_INTC1_OUT9;

static const U32 cEdmaChannelAdc4=CSL_EDMA3CC2_INTC1_OUT10;

From searching CSL code:
#define CSL_EDMA3CC2_GPINT0 (0x00000006)
#define CSL_EDMA3CC2_INTC1_OUT7 (0x00000032)
#define CSL_EDMA3CC2_INTC1_OUT8 (0x00000033)
#define CSL_EDMA3CC2_INTC1_OUT9 (0x00000034)
#define CSL_EDMA3CC2_INTC1_OUT10 (0x00000035)

There are 4 TCs for C6655. By default (if you didn't change channel to queue mapping), this will submitted to different queues:
-TC2/TC2/TC3/TC0/TC1.

And from QUEPRI, TC0 has the highest priority (0) and TC1/2/3 has the lowest (7). ", but I see the most missed events from INTC1_OUT7, less from OUT8, OUT9, and OUT10">>>>>>> If you raise the QUEPRI of TC2 to 0, will it help to alleviate the event miss on INTC1_OUT7? Also if you change the destination buffer from DDR into a fast memory (L2 or MSMC) will this help?

Regards, Eric

0 Mukul Bhatnagar over 6 years ago in reply to lding

TI__Guru* 84505 points

Hi Jim

Thank you for taking the time to explain your setup with greater detail. Much appreciated.

Nothing jumps out immediately.

Few debug suggestions

1) Please do follow up on what Eric is saying on understanding 1) Queue-TC allocation for your ADC channels and QDMA channels, it would be good to play with separating TC allocation (which is governed by DMA to queue allocation) for the channels that are showing missed events 2) Queue Priority 3) internal memory vs external memory

2) Null PARAM - I think it is unlikely, as if you read the section on NULL vs Dummy transfers, typically a NULL transfer will also have SER set , which I do not think is the case here.

3) Investigate if there is some rogue TCC , that is the same value as your chained channel TCCs , and accidentally triggering the DMA channel set for chained transfers.

4) Finally, I think it will be good to somehow confirm that if you are getting your GPIO event at a fixed 4 usec interval, how are you making share that all 4 chained transfers are happening before the next GPIO sync event shows up. If you have a situation where in the next GPIO event shows up the previous set of transfers are done , maybe that can cause such an issue (if there is somewhere a system back up) - not completely sure how that will have happen with normal completion setup, but it would be good to make sure.

Maybe it is possible to instrument this by chaining yet anotehr channel that triggers a GPIO output, and you can make sure that this GPIO always triggers before your ADC GPIO event shows up?

Hope this helps.

Regards

Mukul.

0 Jim Paul over 6 years ago in reply to lding

Prodigy 240 points

Hi Eric, thanks for the suggestions. I will try the suggestions you mentioned and get back to you. In the meantime, can you explain to me where in the documentation there is the relationship between the queues and the TC's? i.e. why those 5 DMA channels are mapped to TC2/TC2/TC3/TC0/TC1 ?

0 Jim Paul over 6 years ago in reply to Mukul Bhatnagar

Prodigy 240 points

Hi Mukul, Thanks for the suggestions. I will respond later to Eric on 1) but had a question on the other thread. For 3) How would I determine if there is a "rogue" TCC? and for 4) The GPIO events are definately every 4usec (measured on scope). GPIO's 0,1,2,3 & 6 were all checked (although currently only using GPIO 0, except in the "experiment" I tried with 0,1 and 3 described earlier) the timing is as follows GPIO[0|3] both within 15ns, then 50ns later GPIO[1|6] both within 15ns, then 50ns later GPIO 2. This pattern repeats every 4 microseconds.

Can you explain a little further on your last suggestion "chaining yet another channel that triggers a GPIO output", I'm not clear on what you are meaning?

Jim

0 lding over 6 years ago in reply to Jim Paul

TI__Guru* 95265 points

Jim,

See EDMA user guide: www.ti.com/.../sprugs5b.pdf

4.2.1.5 DMA Channel Queue n Number Registers (DMAQNUMn)

The DMA channel queue number register (DMAQNUMn) allows programmability of
the DMA channels in the EDMA3CC to submit its associated synchronization event to
any event queue in the EDMA3CC. At reset, all channels point to event queue 0.
The DMAQNUMn is shown in Figure 4-5 and described in Table 4-6.
Table 4-7 shows the channels and their corresponding bits in DMAQNUMn.
Note—Because the event queues in EDMA3CC have a fixed association to the
transfer controllers, that is, Q0 TRs are submitted to TC0, Q1 TRs are
submitted to TC1, etc., by programming DMAQNUMn for a particular DMA
channel n also dictates which transfer controller is utilized for the data
movement (or which EDMA3TC receives the TR request).

You can use a JTAG to check those registers, they are at offset: (0x0240 + 4*n) /* 0 .... 7 */ to the EDMA CC. By default (after power on reset) it is 0, that means all channels use Q0 and submitted to TC0, that is NOT balanced.

I would suggest you program all eight registers to 0x32103210. The first register controls DMA channel 0 to 7, the next register controls DMA channel 8 ..15, etc....

When you use:
#define CSL_EDMA3CC2_GPINT0 (0x00000006), this is DMA channel 6, with 0x32103210 you will submit to Q2 that is TC2.
...
#define CSL_EDMA3CC2_INTC1_OUT8 (0x00000033), this is DMA channel 51, with 0x32103210 you will submit to Q3 that is TC3.
...

Please try to make 5 EDMA channels balanced in terms of TC used.

Regards, Eric

0 Mukul Bhatnagar over 6 years ago in reply to Jim Paul

TI__Guru* 84505 points

Hi Jim

Resposnes a bit brief as I am traveling this week

1) Rogue TCC : essentially any channel can take any TCC, it is just the OPT field programming - some times this depends on the resource manager. Chaining would essentially be Channel number n needs to be programmed into the TCC field of channel m channel options parameter (OPT) set

So it would be good to audit that no other DMA or QDMA channel is using the same TCC bit. If you know the EDMA channels used and their OPT programming for TCC , hopefully this is a quick audit ?

Jim Paul said:
Can you explain a little further on your last suggestion "chaining yet another channel that triggers a GPIO output", I'm not clear on what you are meaning?

Sorry for not being clear the first time - it would be good to "instrument" your code such that you know that your final chained channel is "completed" before the next 4 usec GPIO event shows up. One way to do that would be to do a GPIO toggle that you can visualize with a scope in relation to your incoming GPIO sync event from ADC? the way to do this would be to either toggle a GPIO your ping/pong completion ISR - or - chain to another DMA channel that writes to a GPIO reg to toggle the GPIO. So source could be any memory location, data will be 0 to 1 toggle and DST would be a GPIO register that you can observe on your board?

0 Jim Paul over 6 years ago in reply to lding

Prodigy 240 points

Hi Eric

I noticed that the Event Q's that I used for data acquisition from the ADCs (DMA channels 6,50-53) were mapped to Q: 0, so I seperated these out as shown below. Note that Q 1 is used for the two QDMA channels that I have (these are also at a lower queue priority). However, I did NOT notice a decrease in the missed events, particularly on DMA channel 50, and less on 51-53 (same as before). They are still occurring (at the rate of about 1 missed event per 8 seconds).

ADC DMA Ch:53 Mapped to Event Q:2 (was Q:0)

DMA Ch:52 Mapped to Event Q:0

DMA Ch:51 Mapped to Event Q:3 (was Q:0)

DMA Ch:50 Mapped to Event Q:2 (was Q:0)

DMA Ch:6 Mapped to Event Q:0

QDMA Ch:1 Mapped to Event Q:1

QDMA Ch:0 Mapped to Event Q:1

The Event Q to TC mapping is as follows (this has not changed)

Event Q:3 Mapped to TC:3

Event Q:2 Mapped to TC:2

Event Q:1 Mapped to TC:1

Event Q:0 Mapped to TC:0

I also tried combining this approach with the technique I mentioned earlier in this thread (Table 2) of breaking up the chaining, but using GPIO 0, 1 and 3 (not just GPIO 1). This did not make a difference, I achieved the same results as before.

Jim

0 Jim Paul over 6 years ago in reply to Mukul Bhatnagar

Prodigy 240 points

Hi Mukul, Thanks for the suggestions. I will try these ideas, and post back to the forum.
Jim

0 lding over 6 years ago in reply to Jim Paul

TI__Guru* 95265 points

Jim,

Thanks for the test with separation of 4 queues! When get chance, please let me know if changing the QUEPRI or destination into MSMC or L2 helps?

Regards, Eric

0 lding over 6 years ago in reply to lding

TI__Guru* 95265 points

Jim,

Any chance to test the suggestions?

Regards, Eric

0 Jim Paul over 6 years ago in reply to lding

Prodigy 240 points

Hi Eric, no, not yet. Please keep this posting open in the meantime, and I will get back to these suggestions. Thanks! Jim

0 lding over 6 years ago in reply to Jim Paul

TI__Guru* 95265 points

Jim,

Please let me know when you plan to revisit it. If it is not very soon, I would like to close this thread and please re-open when you have chance.

Regards, Eric

0 Jim Paul over 6 years ago in reply to lding

Prodigy 240 points

Hi Eric, likely in at least a couple of weeks (mid-end May). Is it straightforward to re-open threads?

Thanks, Jim

0 lding over 6 years ago in reply to Jim Paul

TI__Guru* 95265 points

Yes. Please re-open when you have new info.

Regards, Eric

Processors

Processors forum

TMS320C6655: EDMA CC missed events