This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: C6678 missed completion interrupts for concurrent EDMA transfers

Part Number: TMS320C6678
Other Parts Discussed in Thread: SYSBIOS

Hi,

I am using all cores of the C6678 for parallel data processing. The different concurrent tasks issue EDMA requests and wait busily for the respective completion interrupts. It is possible to concurrently issue transfer requests by different DSP cores to the channel controller? Or is there a general limitation with several cores accessing the same EDMA CC?

I use the CSL for the EDMA calls. For this question let's assume that I use only the channel controller CSL_EDMA3_TPCC0.

The program works fine if I use only one core. If several cores are used, interrupts are missed and the system hangs while waiting busily.

  • There is at maximum one EDMA transfer ongoing for each core.
  • If I ensure mutual exclusive access to the EDMA channel controller (covering the EDMA setup until the completion interrupt), I can safely use up to all eight cores.

I suspect one of the following problems:

  • The EDMA CC queues are not initialized correctly.
  • There is a problem with the interrupt registration or the assignment of the shadow regions.

Do you see a problem with my EDMA initialization? Is there anything else that has to be done differently?

Thank you!

P.S.: Here are the most important parts of the code I am using:

bool edmaTransfer() { // is called concurrently
    CSL_Edma3Handle hModule;
    CSL_Edma3ChannelHandle hChannel;
    CSL_Edma3CmdIntr regionIntr;
    CSL_Edma3Context context;
    CSL_Edma3Obj edmaObj;
    CSL_Edma3CmdDrae regionAccess;
    CSL_Edma3ChannelAttr chAttr;
    CSL_Edma3ChannelObj chObj;
    CSL_Edma3ParamHandle hParamStart;
    CSL_Status statusCode;

    CSL_InstNum ccNum = 0;

    uint8_t channelNum = ccNum;

    /* Module Initialization */
    if (! initEdmaHandle(channelNum, ccNum,
            context, edmaObj, statusCode, regionAccess, chAttr, chObj, hModule, regionIntr, hChannel, hParamStart)) {
        return false;
    }

    // Set EDMA loop index space
    CSL_Edma3ParamSetup paramSetup;
    if (! setEdmaParams(paramSetup, channelNum)) {
        return false;
    }

    /* Setup the Ping Entry which loaded after the Pong entry gets exhausted */
    if (CSL_edma3ParamSetup(hParamStart,&paramSetup) != CSL_SOK) {
        return false;
    }

    /* Enable channel */
    if (CSL_edma3HwChannelControl(hChannel,CSL_EDMA3_CMD_CHANNEL_ENABLE, NULL) != CSL_SOK) {
        return false;
    }

    if ((statusCode = CSL_edma3HwChannelControl(hChannel,CSL_EDMA3_CMD_CHANNEL_SET,NULL)) != CSL_SOK) {
        return false;
    }

    do {
        /* Poll on interrupt bit 0 */
        CSL_edma3GetHwStatus(hModule,CSL_EDMA3_QUERY_INTRPEND, &regionIntr);
    } while (!(regionIntr.intr & (0x1 << channelNum)));

    /* Clear interrupt bit 0 */
    if (CSL_edma3HwControl(hModule,CSL_EDMA3_CMD_INTRPEND_CLEAR, &regionIntr) != CSL_SOK) {
        return false;
    }

    /* Close channel */
    if (CSL_edma3ChannelClose(hChannel) != CSL_SOK) {
        return false;
    }

    /* Close EDMA module */
    if (CSL_edma3Close(hModule) != CSL_SOK) {
        return false;
    }
    return true;
}


bool initEdmaHandle(uint8_t channelNum, CSL_InstNum instNum,
        CSL_Edma3Context& context, CSL_Edma3Obj& edmaObj, CSL_Status& statusCode, CSL_Edma3CmdDrae& regionAccess,
        CSL_Edma3ChannelAttr& chAttr, CSL_Edma3ChannelObj& chObj, CSL_Edma3Handle& hModule,
        CSL_Edma3CmdIntr& regionIntr, CSL_Edma3ChannelHandle& hChannel, CSL_Edma3ParamHandle& hParamStart) {
    int32_t regionNum = DNUM; // core number
    /* Module Initialization */
    if (CSL_edma3Init(&context) != CSL_SOK) {
        return false;
    }
    /* Module level open */
    hModule = CSL_edma3Open(&edmaObj, instNum, NULL, &statusCode);
    if ((hModule == NULL) || (statusCode != CSL_SOK)) {
        return false;
    }
    // sprugs5b 2.9.1:
    // If the channel is used in the context of a shadow region and you intend for the shadow region interrupt to be asserted,
    // then ensure that the bit corresponding to the TCC code is enabled in IER/IERH
    // and in the corresponding shadow region's DMA region access registers (DRAE/DRAEH).
    regionAccess.region = regionNum;
    regionAccess.drae = (0x1 << channelNum);
    regionAccess.draeh = 0x0;
    if (CSL_edma3HwControl(hModule, CSL_EDMA3_CMD_DMAREGION_ENABLE, &regionAccess) != CSL_SOK) {
        return false;
    }

    /* Interrupt enable (Bits 0-11) for the shadow region 5 */
    regionIntr.region = regionNum;
    regionIntr.intr = (0x1 << channelNum);
    regionIntr.intrh = 0x0000;
    if (CSL_edma3HwControl(hModule, CSL_EDMA3_CMD_INTR_ENABLE, &regionIntr) != CSL_SOK) {
        return false;
    }
    /* Open the channel in context of the specified region number. */
    chAttr.regionNum = regionNum;
    chAttr.chaNum = channelNum;
    hChannel = CSL_edma3ChannelOpen(&chObj, instNum, &chAttr, &statusCode);
    if ((hChannel == NULL) || (statusCode != CSL_SOK)) {
        return false;
    }
    int PaRAM_Set = DNUM; // core number
    /* Map the DMA Channel to the appropriate PARAM Block. We start with PING
     * which is located at PARAM Block 1. */
    if (CSL_edma3HwChannelSetupParam(hChannel, PaRAM_Set) != CSL_SOK) {
        return false;
    }
    /* Obtain a handle to parameter set 1 */
    hParamStart = CSL_edma3GetParamHandle(hChannel, PaRAM_Set, &statusCode);
    if (hParamStart == NULL) {
        return false;
    }
    return true;
}


// Calculate index space for EDMA transfer - values like aCnt, bCnt are calculated and stored elsewhere.
bool setEdmaParams(CSL_Edma3ParamSetup& paramSetup, uint8_t channelNum) const {
    // Set EDMA parameters
    paramSetup.option = CSL_EDMA3_OPT_MAKE( CSL_EDMA3_ITCCH_DIS, CSL_EDMA3_TCCH_DIS,\
                CSL_EDMA3_ITCINT_DIS, CSL_EDMA3_TCINT_EN, channelNum, CSL_EDMA3_TCC_NORMAL,\
                CSL_EDMA3_FIFOWIDTH_NONE, CSL_EDMA3_STATIC_DIS, CSL_EDMA3_SYNC_AB,\
                CSL_EDMA3_ADDRMODE_INCR, CSL_EDMA3_ADDRMODE_INCR );
    paramSetup.aCntbCnt   = CSL_EDMA3_CNT_MAKE(aCnt, bCnt);
    paramSetup.srcDstBidx = CSL_EDMA3_BIDX_MAKE(aCnt, aCnt);
    paramSetup.srcDstCidx = CSL_EDMA3_CIDX_MAKE(0, 0);
    paramSetup.cCnt = 1;
    paramSetup.linkBcntrld= CSL_EDMA3_LINKBCNTRLD_MAKE(CSL_EDMA3_LINK_NULL, 0); // no linked transfer
    paramSetup.srcAddr = reinterpret_cast<uint32_t>(makeAddressGlobal(src));
    paramSetup.dstAddr = reinterpret_cast<uint32_t>(makeAddressGlobal(dest));
    return true;
}

  • I want to add in the "KeyStone Architecture Enhanced Direct Memory Access (EDMA3) Controller User's Guide" (sprugs5b) in section "1.1 Overview", it's mentioned that "The EDMA3CC serves to prioritize incoming software requests or events from peripherals". Thus, I assume that it should be possible to have parallel requests to the channel controllers from the different DSP cores. And then, consequently, it should also be possible to wait for the termination of each of these parallel requests on each DSP core independently.

    Could you please give a hint where it is documented in the user guide, how this can be achieved?

    Thanks in advance,
    Roman
  • Hi,

    Can you elaborate what TI software you refer to write above code? Is it MCSDK or Processor SDK RTOS? What release? What example (CSL?) is used for this?

    >>>>It is possible to concurrently issue transfer requests by different DSP cores to the channel controller? Or is there a general limitation with several cores accessing the same EDMA CC? >>>> No, there is no such issue, the EDMA supports multiple concurrently transfers, from the same or different cores. The CC has global region and 8 shadow regions, you can use per shadow region per core as you did.

    Also, the C6678 has 3 EDMA CCs, each CC has multiple TCs. See the TMS320C6678 datasheet,
    7.9.2 EDMA3 Channel Controller Configuration
    7.9.3 EDMA3 Transfer Controller Configuration

    Also from your top level call, edmaTransfer(), this call always used channelNum = 0, then in the initEdmaHandle, regionAccess.drae = (0x1 << channelNum); =======> this will always use the first DMA channel for each shadow region. Is this the intent?

    There is DMAQNUMn registersto submit the transfer to different TC (transfer controller). It looks to me you always use the CC 0 and channel 0 (will be submitted to TC0 by default). You may consider spreading the concurrent EDMA to multiple CCs and TCs to balance the system.

    When multiple transfer submitted to the same TC, they will be queued. Even you always use CC0 and TC0 for concurrent transfers (as opposite to spreading cross multiple CCs and TCs), they still work.

    Now back to the interrupt loss, I didn't see any code for ISR. Only thing I saw is that
    /* Poll on interrupt bit 0 */ and /* Clear interrupt bit 0 */.

    This is a polling method. If you have interrupt, I assume you have some ISR code, the clear interrupt bit will be inside ISR. Is that possible the ISR code is too big and when multiple interrupt happened you loss some? What if you just write a simple ISR (like increment a counter)?

    Regards, Eric
  • Eric,

    thank you for your response.

    I use the Processor SDK RTOS with pdk 2.0.10, which the used CSL functions are part of.

    For this example I just set 'CSL_InstNum ccNum = 0;'. I have a function to assign one of the three channel controllers (0, 1 and 2) using semaphores to block a fourth core to do EDMA simultaneously, which works fine so far.
    The channelNum is then set to the channel controller number, which is okay as the maximum number of parallel calls to EDMA is three in my case.
    Yet, the reason for using every channel controller only by one core/thread simultaneously is that otherwise I encounter lost interrupts, as noted before. Perhaps this observation leads to the problem I am having.
    I have tried to assign the channelNum to the coreNum, as this is the maximum parallelity in my case, so far without success.

    At present, there is no special ISR for this interrupt. The code ist just polling for the interrupt bit to be set.
    Probably I should consider setting an ISR for better contol.

    I didn't jet think of the interrupt handling as a probable problem. Is the same interrupt used for all channels for EDMA completion?
    Going into it, at some points in the code the interrupts are disabled. If I understand the SYSBIOS docs below correctly, then it is possible that, if two interrupts happen during a period of disabled interrupts, then the interrupts function (ISR?) is only called once. That may lead to a lost interrupt in my case. Could that be?

    bios_6_52_00_12 Hwi_disable()
    Servicing of interrupts that occur while interrupts are disabled is postponed until interrupts are reenabled. However, if the same type of interrupt occurs several times while interrupts are disabled, the interrupt's function is executed only once when interrupts are reenabled.

  • Hi,

    >>>>The code ist just polling for the interrupt bit to be set.>>>> There are 8 shadow regions, the offset starts at 0x2000, 0x2200, 0x2400 ... 0x2E00. The IPR is at offset 0x68 to each region. When you use channel 0, I thought the IPR bit 0 should be set after the transfer completion. Without using code to clear it, can you use JTAG to confirm the IPR are set for all 8 regions.

    The while loop to pull the IPR bit is blocking. What do you mean that loss the interrupt in concurrent test?

    Regards, Eric
  • Hi Eric,

    >>>> Without using code to clear it, can you use JTAG to confirm the IPR are set for all 8 regions.
    Removing the code to clear the interrupt bits, I can confirm that after running some time, the following holds true for the SHADOW registers at 0x2000 + i*0x200:
    SHADOW[i].TPCC_IPR == 1 << i;

    EDMACC0_tpcc_ipr
    00000000000000000000000011111111
    (The code uses only EDMACC0 here)

    As I would see it, all eight cores are calling EDMA and eventually get their interrupt bit set.
    There are a lot of calls to EDMA if the program is running - maybe this is a sporadic failure which happens only for certain race conditions I am not yet aware of.

    >>>> The while loop to pull the IPR bit is blocking. What do you mean that loss the interrupt in concurrent test?
    To be correct, it is interrupt bits I am not receiving.

    As a next step I will register an ISR and see if it works if using real interrupts instead of just looking at the interrupt bits.
  • Hi,

    Thanks for the update and wait for your results using ISR! Just a note, I will be out of office for the next 7 days, I will check this thread middle of next week.

    Regards, Eric
  • Hi Eric,

    we debugged that yesterday with a small test code that artificially forces all cores to submit EDMA request concurrently. And for some reason that we don't understand, it was always core 2 (core counting starts from 0) that fails to read the transfer finished bit while busily waiting in the loop and testing the interrupt register (loop in lines 43-46 in the code above). For all other cores, the edma transfer terminates successully.


    Here's the essence of this test-code that skips core number 2 (and this works):

    void testParallelEDMA() {
    	coreBarrier();
    
    	uint64_t s = getWTime();
    
    	if( DNUM != 2) { // when this if-statement is removed and the transfer is executed by all cores unconditionally, then we don't get past the second core barrier below
    		qdma::EDMA_Transfer(par_conf[i].dest, par_conf[i].src, copySize);
    	}
    
    	uint64_t transferEnd = getWTime();
    
    	coreBarrier();
    
    uint64_t e = getWTime(); print("%llu %llu %llu", s, transferEnd, e); }

    Any idea what might go wrong here?

    Thanks,

    Roman

  • Roman,

    Maybe something related to code optimization? Do you use -O0 for compilation? For core 2 (0-based), if you knew that IPR bit is set already (you can verify it via CCS JTAGF), but code didn't work as expected:

    do {
    /* Poll on interrupt bit 0 */
    CSL_edma3GetHwStatus(hModule,CSL_EDMA3_QUERY_INTRPEND, &regionIntr);
    } while (!(regionIntr.intr & (0x1 << channelNum)));

    Are you able to step into disassembly to see why the code didn't work?

    Regards, Eric
  • Hi Eric,

    when we disassembly stepped into this, we didn't see anything unexpected. Furthermore, we did verify the IPR register - it is in fact NOT set if I remember correctly, and that's why the loop does not terminate. We use -O0 for our debug build, which we used in that case, so I don't think this is related to code optimization. The fact that it works on all other cores also supports that claim, I think.

    Do you have any further suggestions what we could try?

    Roman

  • Hi,

    But you mentioned earlier that IPR is set:

    >>>> Without using code to clear it, can you use JTAG to confirm the IPR are set for all 8 regions.
    Removing the code to clear the interrupt bits, I can confirm that after running some time, the following holds true for the SHADOW registers at 0x2000 + i*0x200:
    SHADOW[i].TPCC_IPR == 1 << i;

    EDMACC0_tpcc_ipr
    00000000000000000000000011111111
    (The code uses only EDMACC0 here)

    Can you confirm which is correct? Suppose the IPR did not set, do you see any event miss or error? See EDMA user guide:

    4.2.2 Error Registers
    4.3 EDMA3 Transfer Controller (EDMA3TC) Registers, 0120h ERRSTAT Error Register

    Regards, Eric
  • Hi Eric,

    thanks for the hint regarding the error and errstat registers, we'll investigate that again.

    Regarding the fact whether the IPR is set or not - I think the register for core2 gets set SOMETIMES (when no other DMA transfers happen at the same time), but not always (especially not when other EDMA transfers are started concurrently).

    Roman

  • Roman,

    Do you have any findings?

    Regards, Eric
  • Hi Eric,

    thanks for asking. Unfortunately I'm currently not working on that topic, and Wolfgang is out of office until end of April. He will come back to you then.

    Regards,
    Roman

  • Roman,

    Thanks for the update! Waiting more info from Wolfgang when he back to office ...

    Regards, Eric
  • Roman,

    Let me know if we expect to get some update for your side soon. Thanks!

    Regards, Eric
  • Hi Eric,

    thanks for your patience. I am back to office and I am working on the issue now. I have the following results for today:

    I introduced a timeout to the while loop at line 46 in the code above and set a breakpoint to see the registers for core #2. It halts on the first missed transfer interrupt bit.
    I am looking at the struct CSL_TpccRegs of CSL_Edma3Obj of pdk_c667x_2_10_0\packages\ti\csl\csl_edma3.h, which is not exactly the same as the sprugs5b 4.3 EDMA3TC registers:
    - SHADOW[2].TPCC_IPR (0x2468) == 0
      as expected.
    - TPCC_EMR (0x300) == 4
      Ok. According to the explanation in sprugs5b 4.2.2.1, EDMA gets a missed event for core#2. This is worth some more investigation.
     
     Seemingly, the problem arises only for certain circumstances at core #2.
    - Probably a non-empty queue is a prerequisite.
    - I read in sprugs5b 2.3.3 that an accidental Null PaRAM Set can be a reason for an issue like this. I will check this.

    Thanks for the hint to the error registers. I didn't see the EMR being set for core#2 until now.

    I'll report again after the weekend. Hopefully I have some more results then.

    Regards,
    Wolfgang


  • Hi,

    I did some more investigation on the event missed for core #2:

    - I added some code to check for the TPCC_EMR. The EMR seems to be set to 0x4 whenever any core other than #0 does an EDMA transfer. I have no explanation for that yet. Does my code initialize the EDMA for, say, core #5 in a way that EMR is set to 0x4?

    - I can assure that for all cores other than #0, all EDMA transfers are issued by one SYSBIOS task. Especially there are no two concurrent calls with the same DMA channel.
    - No Null transfers are issued. This could be a cause for the EMR to be set according to the docs. In fact, I can confirm that core #2 has had no EDMA activity when EMR being set to 0x4 at the first time.
    - EMR is always being set to 0x4 only. That would correspond to DMA channel #2 according to sprugs5b 4.2.2.1. I am quite curious about that, as I think that the channel used is a different one in most transfers which are showing this behavior.

    Regards,
    Wolfgang

  • According to my earlier post from today, I now think that the EMR is only set by core#2's EDMA transfers.

    The CSL_TpccRegs is a struct shared by all cores, isn't it? If core#2 produces a missed event, this can be seen by other cores as well.

    Furthermore, I am using debug output which is relayed over core#0, so debug lines may not be printed in sequential order. That is why I suppose a misinterpretation at the points 1 and 4 in the previous post, which I marked accordingly.

    Nevertheless, only core#2 produces missed events (I suppose only the first transfer of core#2 to be a success). I still have no clue why, as it is treated just as the other cores #1, #3, #4, #5, #6 and #7.

  • Hi,

    " In fact, I can confirm that core #2 has had no EDMA activity when EMR being set to 0x4 at the first time. "======>If you see this, you can clear the EMR by writing to EMCR, then start your EDMA transfer.

    Is any different processing task on this core #2 other than core 1, 3...7? Or, all the cores doing the same work? Is that possible that core 2 transfer is slow for some reason and the next TR is coming from your SYSBIOS task (to all 7 cores) before core 2 transfer finishing?

    Do you know what is the PaRAM set you submitted to core 2 transfer when this happened? Is that possible the Param set is corrupted? Usually the Param set is cleared after submission. But you can set the OPT register BIT 3 (STATIC), so the PARAM set is kept after submission. Then you can look at if corrupted?

    Regards, Eric
  • Hi Eric,

    I am very sorry for the late reply, but Wolfgang stopped working on the issue and it took me a while to get into it.

    "Is any different processing task on this core #2 other than core 1, 3...7? Or, all the cores doing the same work?" ===> Cores 1-7 are all doing the same work.

    "Do you know what is the PaRAM set you submitted to core 2 transfer when this happened? Is that possible the Param set is corrupted? Usually the Param set is cleared after submission. But you can set the OPT register BIT 3 (STATIC), so the PARAM set is kept after submission. Then you can look at if corrupted?" ===> I checked the PaRAM set and it does not seem corrupted.

    To extract the problem from our big project, I created a new project which runs just essential code (SYSBIOS, etc ...). Within this project I did not have any EDMA timeouts on any core. All cores seem to perform just equally good.

    While further investigating the problem in our big project I noticed the following:

    - After a powercycle I do not get any EDMA timeouts (if I run one EDMA transfer from all cores)

    - Rerunning the EDMA transfer without powercycling in between yields EDMA timeouts on core 2

    - Loading the small project onto the DSP while running EDMA requests in the big project in advance and without powercycling in between yields super long transfer times on core 2 (which do not appear if I powercycle and run several EDMA transfers with the small project)

    For both projects I use the same EDMA setup (the one Wolfgang pointed out in his first post).

    Do you have any idea what could cause these EDMA timeouts?

    Regards

    Paul

  • Hi,

    Sorry, I can't think about what could be the issue. It must be at system level, the center questions are why only core 2 has such issue in the full system. You mentioned that core 1-7 are doing the same work. The issue was opened for a while, can you remind me if you used the same EDMA CC and TC on the same 8 cores or you have the usage spread so the system is more balanced?

    Regards, Eric

  • Hi Eric,

    The issue was opened for a while, can you remind me if you used the same EDMA CC and TC on the same 8 cores or you have the usage spread so the system is more balanced? >>>> We use the same controller for all eight cores for this project.

    I subsequently removed code from our project. I discoverd that without a timer we do not have any edma timeouts. This timer (CSL_GEM_TINT9L) wakes up a task via interrupts.

    I read in the TMS320C6678 datasheet that it is possible to trigger EDMA transfers with timer interrupts.


    Do you have any idea how this timer causes edma timeouts on core 2?

    Regards
    Paul

  • Paul,

    Glad to know you find something! Yes, it is possible to use timer interrupt as the input to trigger an EDMA. So can you clarify how the EDMA is triggered in all 8 cores? Any difference on core 2?

    - manual trigger (write ESR/ESRH bit)

    - event trigger (like by timer)

    - chain trigger (from another EDMA channel)

    This timer (CSL_GEM_TINT9L) wakes up a task via interrupts.=========> It looks that you don't use timer 9 to trigger EDMA.

    I looked at C66x interconnect, all timers are connected to SCR_6P_B bridge, all DSP cores are connected SCR_3_A bridge. I don't see any particular relationship between a timer and a core. 

    Are you able to switch this timer 9 to some other timers (like timer 8, 10-15) to see if the core 2 still miss EDMA? This may be simpler. Another suggestion is that EDMA has 3 CC, and each CC has 2-4 TC, you may consider partition the TC among cores, this will be a bigger work.

    Regards, Eric

  • Eric,

    "So can you clarify how the EDMA is triggered in all 8 cores? Any difference on core 2?" =========> on all 8 cores we trigger the EDMA transfer manually.

    "Are you able to switch this timer 9 to some other timers (like timer 8, 10-15) to see if the core 2 still miss EDMA?" =========> I changed timers to:

    - CSL_GEM_TINT8L which caused edma timeouts on core 0

    - CSL_GEM_TINT9L which caused edma timeouts on core 2

    - CSL_GEM_TINT10L which caused edma timeouts on core 4

    - CSL_GEM_TINT11L which caused edma timeouts on core 6

    It looks like, that the timers are somehow connected to a processor core?

    If i use e.g. timer CSL_GEM_TINT12L I do not have any EDMA timeouts.

    Regards,

    Paul

  • Paul,

    Glad you found out some relationship between Timer and EDMA timer out. I looked at the C6678 datasheet inter-connection section Chapter 4 and I didn't see any specific relationship between a particular Timer to a EDMA instance or corepac. Timer connected to TeraNet 6P_B and CPU/6 speed. This TeraNet 6P_B connected to EDMA or corepac via bridges.

    Is possible to use INT12L as a workaround.

    Regards, Eric