This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6655: EDMA CC missed events

Part Number: TMS320C6655


Hi TI folks

I'm debugging a legacy design for the C6655, and have detected EDMA3 CC missed events (using ERRINT) steadily occurring over time on chained DMA channels. There were no losses from the Transfer Controller. I'm looking for ideas to get rid of these missed events completely. I am quite close, but need advice regarding EDMA event prioritisations.

------  Let me explain the DMA architecture

There is a DMA process (lets call it DMA0) that performs a total of 256K A-Synchronised transfers in just over one second. The triggering event is from an external GPIO (0) every 4us. The array size is 16 bits (aCnt = 2); frame size = bCnt = 32. The 16 bits are transferred from a static location within an external FPGA buffer, lets call it FPGA[0] to an incrementing location in DDR external memory, lets call it location DDR[0]. I dont see any missed events on this DMA channel. However, DMA0 is chained (normal completion) to another DMA process, DMA1 (transfers data from FPGA[1] to DDR[1]), which in turn is chained to DMA2 (transfers data from FPGA[2] to DDR[2]), then DMA3(etc), and DMA4(etc). So there can be 5 concurrent DMA processes.

I am detecting missed DMA events on DMA1 mainly, with less on DMA2,3 and 4. There are no losses on DMA0, which is externally triggered. So, the losses only occur on the chained DMA channels, particularly the first one, DMA1.

There are also a couple of QDMA channels that are very active during this period (no missed events on those), and also some asynchronous HWI ISR that are occur infrequently.

------ Now, a couple of things that I dont understand

1) There is a trigger priority in the EDMA CC, which prioritises external events over chained events. But its not clear why the QDMA channels would not loose events BEFORE the chained EDMA3 CC events.

2) I have tried breaking the chaining, and instead of triggering from a single GPIO external interrupt, triggering from three external interrupts instead. This works initially, but stops after a short while. 

Any advice on either 1, or 2. Or ideas on what might cause the  EDMA3 Missed Events?

Thanks

Jim

  • Jim

    Might need you to tabulate your channels used, TCC, used, src dst for DMAs vs QDMA , number of bytes transferred by each channel per sync event (is DMA1 doing the same ACNT, BCNT as DMA0?) and DMAQNUM - essentially which channel is going to which queue/TC - to visualize this better.

    Are you sure that the EMR bits and associated error interrupts are happening truly due to a missed "chained" event and not because the event is hitting a NULL param.? Do you configure the params as static?

    Does the issue not happen if the QDMA transfers were not happening?

    As per the definition of the EMR register

    For a particular DMA channel, if a second event is received prior to the first event getting cleared/serviced,the bit corresponding to that channel is set/asserted in the event missed registers (EMR/EMRH). All trigger types are treated individually, that is, manual triggered (ESR/ESRH), chain triggered (CER/CERH),and event triggered (ER/ERH) are all treated separately. The EMR/EMRH bits for a channel are also set if an event on that channel encounters a NULL entry (or a NULL TR is serviced). If any EMR/EMRH bit is  set (and all errors, including bits in other error registers (QEMR, CCERR) were previously cleared), the EDMA3CC generates an error interrupt. See Section 2.9.4 for details on EDMA3CC error interrupt generation

     

    --

     

    1) There is a trigger priority in the EDMA CC, which prioritises external events over chained events. But its not clear why the QDMA channels would not loose events BEFORE the chained EDMA3 CC events.

    [MB] Will depend what is actually happening on your DMA channels vs QDMA channels. The missed event bit will get set if the CC has not been able to move the TRP from CC to TC and the next chained event happens - which is unlikely , given you have normal completion set (I think). It is more to do with what is keeping a particular DMA channel "busy" such that the next event came in while previous chained event was not serviced.

    Usually i  do not see this happening with chained events, it typically would happen with the async events like GPIO or McBSP rx/tx, which will just go on at a fixed interval without notion of internal DMA transfers etc. Does that make sense?

    2) I have tried breaking the chaining, and instead of triggering from a single GPIO external interrupt, triggering from three external interrupts instead. This works initially, but stops after a short while. 

    [MB] Clarify why you took this approach? and what does "initially" mean - did it stop after QDMA started, or you have some programming issue with your GPIO, interrupt setup?

    Regards

    Mukul 

     

  • Jim,

    >>>There is a DMA process (lets call it DMA0) that performs a total of 256K A-Synchronised transfers in just over one second>>>> If the GPIO every 4 us and ACNT= 2 bytes, BCNT = 32 bytes, A-sync. So 64 bytes in 4 us, that is 16MB/second. How do you only get 256KB in one second? If you change the GPIO to a larger interval, say 8 or 16 us, will you see less event miss?

    Also can you explain
  • Hi lding ,

    Let me explain the DMA architecture again, hopefully a little clearer. I will respond to Mukut seperately above as soon as I have the data he is requesting.

    This is essentially a ping/pong data acquisition/processing from 5 sources. For each DMA[0|1|2|3|4] process, 256K (262,144) ADC[0|1|2|3|4] samples are transferred one at a time (this is A-synchronised) from a static (for each ADC) transmit location ADC_buffer[0|1|2|3|4] containing the most recent ADC[0|1|2|3|4] sample into external DDR sample memory (incrementing the destination location DDR[0|1|2|3|4] "ping" buffer by 16 bits each time), for further processing on all 5 sets. The sample rate on each ADC[0|1|2|3|4] is 250kHz, therefore new ADC[0|1|2|3|4] samples are arriving every 4 micro seconds. Every 256K samples, when the DMA[0|1|2|3|4] is completed on all the transfers, it is restarted (but this time into a "pong" buffer), etc etc.

    There are GPIO interrupts triggered each time there is a new sample from each ADC[0|1|2|3|4], these are GPIO[0|1|2|3|6].

    However, we are only using GPIO 0 to initiate the transfer from ADC_buffer[0] into DDR[0], and after normal completion of this transfer, we are chaining the rest of the transfers together. So, every time there is an external interrupt on GPIO[0] (every 4us) it triggers (by chaining) the other 4 transfers one after the other.

    If I try to reduce the GPIO interval to say 8 or 16us, this would require a fundamental change in the ADC sampling rate, which is not possible with this legacy design.

    Jim

  • Hi Mukut,

     

    Thanks for your response. Please refer below. Thanks, Jim

     

    Q1. Might need you to tabulate your channels used, TCC, used, src dst for DMAs vs QDMA , number of bytes transferred by each channel per sync event (is DMA1 doing the same ACNT, BCNT as DMA0?) and DMAQNUM - essentially which channel is going to which queue/TC - to visualize this better.

     

    // GPIO 0 triggered every 4usecs, every new sample from ADC 0

    // GPIO 1,2,3,6 each generate 4usec interrupts, every new sample from ADC 1,2,3,4; BUT these other GPIO’s are NOT used

    static const U32 cEdmaChannelAdc0=CSL_EDMA3CC2_GPINT0;

    static const U32 cEdmaChannelAdc1=CSL_EDMA3CC2_INTC1_OUT7;

    static const U32 cEdmaChannelAdc2=CSL_EDMA3CC2_INTC1_OUT8;

    static const U32 cEdmaChannelAdc3=CSL_EDMA3CC2_INTC1_OUT9;

    static const U32 cEdmaChannelAdc4=CSL_EDMA3CC2_INTC1_OUT10;

     

    // Note that there are no EDMA missed events on cEdmaChannelAdc0, triggered from the external GPIO0, but I see the most missed events from INTC1_OUT7, less from OUT8, OUT9, and OUT10

     

    // 5x16 bit static DMA src locations, each updated every 4usec when a new 16 bit ADC sample arrives

    U32 iSrcAddr; // ADC FPGA Addresses Fpga_adc[0|1|2|3|4]

     

    // Static external memory location for each ADC sample set (incrementing DMA dst address), total size =16x2x256x1024 bits

    U32 iDstAddr; // DDR ADC Buffers adcBuffer[0|1|2|3|4]

     

    // Setup the PaRam Entry

    iDmaConfig[0].option = CSL_EDMA3_OPT_MAKE(CSL_EDMA3_ITCCH_***, // Refer to Table 1 below

                                               CSL_EDMA3_TCCH_***, // Refer to Table 1 below

                                               CSL_EDMA3_ITCINT_DIS,

                                               CSL_EDMA3_TCINT_***, // Refer to Table 1 below

                                                TCC, // Refer to Table 1 below

                                               CSL_EDMA3_TCC_NORMAL,

                                               CSL_EDMA3_FIFOWIDTH_NONE,

                                               CSL_EDMA3_STATIC_DIS,

                                               CSL_EDMA3_SYNC_A,

                                               CSL_EDMA3_ADDRMODE_INCR, // CSL_EDMA3_ADDRMODE_CONST is NOT SUPPORTED by the C6655

                                               CSL_EDMA3_ADDRMODE_INCR); // CSL_EDMA3_ADDRMODE_CONST is NOT SUPPORTED by the C6655

    iDmaConfig[0].srcAddr     = (Uint32)iSrcAddr;        

    iDmaConfig[0].aCntbCnt   = CSL_EDMA3_CNT_MAKE(16,32);      

    iDmaConfig[0].dstAddr     = (Uint32)iDstAddr;      

    iDmaConfig[0].srcDstBidx = CSL_EDMA3_BIDX_MAKE(0,16);

    iDmaConfig[0].srcDstCidx = CSL_EDMA3_CIDX_MAKE(0,16);

    iDmaConfig[0].cCnt       = 256*1024;

     

    // Table 1: ADC Data Acquisition PaRAM differences across all instances

    #

    iSrcAddr

    iDstAddr

    Open DMA Ch

    TCC

    ITCCH

    TCCH

    0

    Fpga_adc[0]

    adcBuffer[0]

    cEdmaChannelAdc0

    cEdmaChannelAdc1

    Enabled

    Enabled

    1

    Fpga_adc[1]

    adcBuffer[1]

    cEdmaChannelAdc1

    cEdmaChannelAdc2

    Enabled

    Enabled

    2

    Fpga_adc[2]

    adcBuffer[2]

    cEdmaChannelAdc2

    cEdmaChannelAdc3

    Enabled

    Enabled

    3

    Fpga_adc[3]

    adcBuffer[3]

    cEdmaChannelAdc3

    cEdmaChannelAdc4

    Enabled

    Enabled

    4

    Fpga_adc[4]

    adcBuffer[4]

    cEdmaChannelAdc4

    cEdmaChannelAdc4

    Disabled

    Disabled

     

    Q2. Are you sure that the EMR bits and associated error interrupts are happening truly due to a missed "chained" event and not because the event is hitting a NULL param?

     

    No. In what situations would a NULL TR be serviced?

     

    Q3. Do you configure the params as static?

     

    Are you referring to the PaRAM entries? Since double buffering is used, the PaRAM entry is reloaded to point to a slightly different dstAddr at the end of every transfer (which is approximately every 1.048s, 256x1024 samples x 4usec). This is shown in the table above.

     

    Q4. Does the issue not happen if the QDMA transfers were not happening?

     

    I haven’t tried this yet.

     

    Q5. There is a trigger priority in the EDMA CC, which prioritises external events over chained events. But its not clear why the QDMA channels would not loose events BEFORE the chained EDMA3 CC events.

    [MB] Will depend what is actually happening on your DMA channels vs QDMA channels. The missed event bit will get set if the CC has not been able to move the TRP from CC to TC and the next chained event happens - which is unlikely , given you have normal completion set (I think). It is more to do with what is keeping a particular DMA channel "busy" such that the next event came in while previous chained event was not serviced.

    [MB] Usually I do not see this happening with chained events, it typically would happen with the async events like GPIO or McBSP rx/tx, which will just go on at a fixed interval without notion of internal DMA transfers etc. Does that make sense?

     

    Yes, this does make sense to me, and is what I would expect.

     

    FYI - The QUE prioritisations are set up as follows:

     

    CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_0, CSL_EDMA3_QUE_PRI_0); // 0-7 highest-lowest

    CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_1, CSL_EDMA3_QUE_PRI_7); // 0-7 highest-lowest

    CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_2, CSL_EDMA3_QUE_PRI_7); // 0-7 highest-lowest

    CSL_edma3SetEventQueuePriority(hDma, CSL_EDMA3_QUE_3, CSL_EDMA3_QUE_PRI_7); // 0-7 highest-lowest

     

    QDMA occurs on CSL_EDMA3_QUE_1 at the same time as the Data Acquisition DMA on CSL_EDMA3_QUE_DEFAULT = 0 = CSL_EDMA3_QUE_0.

     

    QDMA SrcAddr is DDR memory/L2 Cache, DstAddr is L2 Cache/DDR memory (data moved back and forth).

     

    Q6. I have tried breaking the chaining, and instead of triggering from a single GPIO external interrupt, triggering from three external interrupts instead. This works initially but stops after a short while. 

    [MB] Clarify why you took this approach? and what does "initially" mean - did it stop after QDMA started, or you have some programming issue with your GPIO, interrupt setup?

     

    I found by experimentation that if I used 3xGPIO external interrupts (instead of just GPIO 0 as described in Table 1 above) and therefore reduced the amount of chaining, then the number of EDMA missed events reduced to zero. However, after several minutes the DMA appeared to stop. I haven’t debugged this yet, so this might not be the cause. But while it worked, it did make a significant difference to the EDMA missed events counted. So, this is not a solution.

     

    static const U32 cEdmaChannelAdc0=CSL_EDMA3CC2_GPINT0;

    static const U32 cEdmaChannelAdc1=CSL_EDMA3CC2_GPINT1; // new

    static const U32 cEdmaChannelAdc2=CSL_EDMA3CC2_INTC1_OUT8;

    static const U32 cEdmaChannelAdc3=CSL_EDMA3CC2_GPINT3; // new

    static const U32 cEdmaChannelAdc4=CSL_EDMA3CC2_INTC1_OUT10;

     

    // Table 2: (Using ext. GPIO 0,1,3) ADC Data Acquisition PaRAM differences across all instances

    #

    iSrcAddr

    iDstAddr

    Open DMA Ch

    TCC

    ITCCH

    TCCH

    0

    Fpga_adc[0]

    adcBuffer[0]

    cEdmaChannelAdc0

    cEdmaChannelAdc0

    Disabled

    Disabled

    1

    Fpga_adc[1]

    adcBuffer[1]

    cEdmaChannelAdc1

    cEdmaChannelAdc2

    Enabled

    Enabled

    2

    Fpga_adc[2]

    adcBuffer[2]

    cEdmaChannelAdc2

    cEdmaChannelAdc2

    Disabled

    Disabled

    3

    Fpga_adc[3]

    adcBuffer[3]

    cEdmaChannelAdc3

    cEdmaChannelAdc4

    Enabled

    Enabled

    4

    Fpga_adc[4]

    adcBuffer[4]

    cEdmaChannelAdc4

    cEdmaChannelAdc4

    Disabled

    Disabled

     

    Note that if I use 2xGPIOs, the number of EDMA missed events reduced by a factor of 4 compared to just using GPIO 0 only, but the missed events did not reduce to zero. So, the use of external GPIOs definitely has an impact.

     

    I have also seen this code run with less than 5xDMA Data Acquisition processes, and I think (to be confirmed) that the amount of EDMA missed events is also lower in those situations.

     

    Further, I also see a dependency on the amount of other concurrent access to DDR (could this be contending with the EDMA accesses?).

     

    Any further clues/ideas as to why this might be occurring? Let me know if you require more details please?

     

    Thanks

    Jim

  • Jim,

    Thanks for the explanation! I saw you used:
    static const U32 cEdmaChannelAdc0=CSL_EDMA3CC2_GPINT0;

    static const U32 cEdmaChannelAdc1=CSL_EDMA3CC2_INTC1_OUT7;

    static const U32 cEdmaChannelAdc2=CSL_EDMA3CC2_INTC1_OUT8;

    static const U32 cEdmaChannelAdc3=CSL_EDMA3CC2_INTC1_OUT9;

    static const U32 cEdmaChannelAdc4=CSL_EDMA3CC2_INTC1_OUT10;

    From searching CSL code:
    #define CSL_EDMA3CC2_GPINT0 (0x00000006)
    #define CSL_EDMA3CC2_INTC1_OUT7 (0x00000032)
    #define CSL_EDMA3CC2_INTC1_OUT8 (0x00000033)
    #define CSL_EDMA3CC2_INTC1_OUT9 (0x00000034)
    #define CSL_EDMA3CC2_INTC1_OUT10 (0x00000035)

    There are 4 TCs for C6655. By default (if you didn't change channel to queue mapping), this will submitted to different queues:
    -TC2/TC2/TC3/TC0/TC1.

    And from QUEPRI, TC0 has the highest priority (0) and TC1/2/3 has the lowest (7). ", but I see the most missed events from INTC1_OUT7, less from OUT8, OUT9, and OUT10">>>>>>> If you raise the QUEPRI of TC2 to 0, will it help to alleviate the event miss on INTC1_OUT7? Also if you change the destination buffer from DDR into a fast memory (L2 or MSMC) will this help?

    Regards, Eric
  • Hi Jim

    Thank you for taking the time to explain your setup with greater detail. Much appreciated.

    Nothing jumps out immediately. 

    Few debug suggestions 

    1) Please do follow up on what Eric is saying on understanding 1) Queue-TC allocation for your ADC channels and QDMA channels, it would be good to play with separating TC allocation (which is governed by DMA to queue allocation) for the channels that are showing missed events 2) Queue Priority 3) internal memory vs external memory 

    2) Null PARAM - I think it is unlikely, as if you read the section on NULL vs Dummy transfers, typically a NULL transfer will also have SER set , which I do not think is the case here.

    3) Investigate if there is some rogue TCC , that is the same value as your chained channel TCCs , and accidentally triggering the DMA channel set for chained transfers.

    4) Finally, I think it will be good to somehow confirm that if you are getting your GPIO event at a fixed 4 usec interval, how are you making share that all 4 chained transfers are happening before the next GPIO sync event shows up. If you have a situation where in the next GPIO event shows up the previous set of transfers are done , maybe that can cause such an issue (if there is somewhere a system back up) - not completely sure how that will have happen with normal completion setup, but it would be good to make sure.

    Maybe it is possible to instrument this by chaining yet anotehr channel that triggers a GPIO output, and you can make sure that this GPIO always triggers before your ADC GPIO event shows up? 

    Hope this helps. 

    Regards

    Mukul. 

  • Hi Eric, thanks for the suggestions. I will try the suggestions you mentioned and get back to you. In the meantime, can you explain to me where in the documentation there is the relationship between the queues and the TC's? i.e. why those 5 DMA channels are mapped to TC2/TC2/TC3/TC0/TC1 ?

  • Hi Mukul, Thanks for the suggestions. I will respond later to Eric on 1) but had a question on the other thread. For 3) How would I determine if there is a "rogue" TCC? and for 4) The GPIO events are definately every 4usec (measured on scope). GPIO's 0,1,2,3 & 6 were all checked (although currently only using GPIO 0, except in the "experiment" I tried with 0,1 and 3 described earlier) the timing is as follows GPIO[0|3] both within 15ns, then 50ns later GPIO[1|6] both within 15ns, then 50ns later GPIO 2. This pattern repeats every 4 microseconds.

    Can you explain a little further on your last suggestion "chaining yet another channel that triggers a GPIO output", I'm not clear on what you are meaning?

    Jim

  • Jim,

    See EDMA user guide: www.ti.com/.../sprugs5b.pdf

    4.2.1.5 DMA Channel Queue n Number Registers (DMAQNUMn)

    The DMA channel queue number register (DMAQNUMn) allows programmability of
    the DMA channels in the EDMA3CC to submit its associated synchronization event to
    any event queue in the EDMA3CC. At reset, all channels point to event queue 0.
    The DMAQNUMn is shown in Figure 4-5 and described in Table 4-6.
    Table 4-7 shows the channels and their corresponding bits in DMAQNUMn.
    Note—Because the event queues in EDMA3CC have a fixed association to the
    transfer controllers, that is, Q0 TRs are submitted to TC0, Q1 TRs are
    submitted to TC1, etc., by programming DMAQNUMn for a particular DMA
    channel n also dictates which transfer controller is utilized for the data
    movement (or which EDMA3TC receives the TR request).

    You can use a JTAG to check those registers, they are at offset: (0x0240 + 4*n) /* 0 .... 7 */ to the EDMA CC. By default (after power on reset) it is 0, that means all channels use Q0 and submitted to TC0, that is NOT balanced.

    I would suggest you program all eight registers to 0x32103210. The first register controls DMA channel 0 to 7, the next register controls DMA channel 8 ..15, etc....

    When you use:
    #define CSL_EDMA3CC2_GPINT0 (0x00000006), this is DMA channel 6, with 0x32103210 you will submit to Q2 that is TC2.
    ...
    #define CSL_EDMA3CC2_INTC1_OUT8 (0x00000033), this is DMA channel 51, with 0x32103210 you will submit to Q3 that is TC3.
    ...

    Please try to make 5 EDMA channels balanced in terms of TC used.

    Regards, Eric
  • Hi Jim

    Resposnes a bit brief as I am traveling this week

    1) Rogue TCC : essentially any channel can take any TCC, it is just the OPT field programming - some times this depends on the resource manager. Chaining would essentially be  Channel number n  needs to be programmed into the TCC field of channel m channel options parameter (OPT) set

    So it would be good to audit that no other DMA or QDMA channel is using the same TCC bit. If you know the EDMA channels used and their OPT programming for TCC , hopefully this is a quick audit ?

    Jim Paul said:
    Can you explain a little further on your last suggestion "chaining yet another channel that triggers a GPIO output", I'm not clear on what you are meaning?

    Sorry for not being clear the first time - it would be good to "instrument" your code such that you know that your final chained channel is "completed" before the next 4 usec GPIO event shows up. One way to do that would be to do a GPIO toggle that you can visualize with a scope in relation  to your incoming GPIO sync event from ADC? the way to do this would be to either toggle a GPIO your ping/pong completion ISR - or - chain to another DMA channel that writes to a GPIO reg to toggle the GPIO. So source could be any memory location, data will be 0 to 1 toggle and DST would be a GPIO register that you can observe on your board?


  • Hi Eric

    I noticed that the Event Q's that I used for data acquisition from the ADCs (DMA channels 6,50-53) were mapped to Q: 0, so I seperated these out as shown below. Note that Q 1 is used for the two QDMA channels that I have (these are also at a lower queue priority). However, I did NOT notice a decrease in the missed events, particularly on DMA channel 50, and less on 51-53 (same as before). They are still occurring (at the  rate of about 1 missed event per 8 seconds).

    ADC DMA Ch:53 Mapped to Event Q:2 (was Q:0)

    DMA Ch:52 Mapped to Event Q:0

    DMA Ch:51 Mapped to Event Q:3 (was Q:0)

    DMA Ch:50 Mapped to Event Q:2 (was Q:0)

    DMA Ch:6 Mapped to Event Q:0

    QDMA Ch:1 Mapped to Event Q:1

    QDMA Ch:0 Mapped to Event Q:1

    The Event Q to TC mapping is as follows (this has not changed)

    Event Q:3 Mapped to TC:3

    Event Q:2 Mapped to TC:2

    Event Q:1 Mapped to TC:1

    Event Q:0 Mapped to TC:0

    I also tried combining this approach with the technique I mentioned earlier in this thread (Table 2) of breaking up the chaining, but using GPIO 0, 1 and 3 (not just GPIO 1). This did not make a difference, I achieved the same results as before.

    Jim

  • Hi Mukul, Thanks for the suggestions. I will try these ideas, and post back to the forum.
    Jim
  • Jim,

    Thanks for the test with separation of 4 queues! When get chance, please let me know if changing the QUEPRI or destination into MSMC or L2 helps?

    Regards, Eric
  • Jim,

    Any chance to test the suggestions?

    Regards, Eric
  • Hi Eric, no, not yet. Please keep this posting open in the meantime, and I will get back to these suggestions. Thanks! Jim

  • Jim,

    Please let me know when you plan to revisit it. If it is not very soon, I would like to close this thread and please re-open when you have chance.

    Regards, Eric
  • Hi Eric, likely in at least a couple of weeks (mid-end May). Is it straightforward to re-open threads?

    Thanks, Jim

  • Yes. Please re-open when you have new info.

    Regards, Eric