This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA LLD completion handler spinning in a loop

Other Parts Discussed in Thread: SYSBIOS

Hi,

there is a problem with the EDMA3 completion handler which can start spinning in a loop for some time. This problem was originally discussed in another post, but I didn't see any progress:
http://e2e.ti.com/support/embedded/bios/f/355/p/227104/800437.aspx#800437

The affected function is edma3ComplHandler() within edma3resmgr.c of EDMA3 LLD version 2.11.05.02.

There are two situations which can cause this problem:

1.
The EDMA3 channel controller has less than 33 interrupt channels. For example, the EDMA3 CC0 only has 16 TCCs on a C6678 device. Within the handler, the variable indexh is initialized with 1 and never set to 0 and the following loop in line 5241 will always count up to EDMA3_RM_COMPL_HANDLER_RETRY_COUNT:

while ((Cnt < EDMA3_RM_COMPL_HANDLER_RETRY_COUNT)
                    && ((indexl != 0u) || (indexh != 0u)))
...

2.
A TCC channel is allocated, but is only used for polling, no ISR is registered. In this case, the interrupt flag in the IPR will not be cleared and the ISR will also count up to EDMA3_RM_COMPL_HANDLER_RETRY_COUNT. Depending on the TCC value, the inner loop in line 5253 (and 5288) will also spin. A solution would be to mask the IPR register with the IER register in line 5245:

pendingIrqs = shadowRegs->IPR & shadowRegs->IER;

I also don't understand the usefulness for the outer loop counting up to EDMA3_RM_COMPL_HANDLER_RETRY_COUNT. All TCC interrupts should already be processed within the first cycle in my opinion.

Thanks,
Ralf

  • I'm wondering if someone took a look at this issue.

    Ralf

  • Ralf,

    The EDMA3 user guide states the following regarding EDMA transfer completion interrupt,

    Transfer completion interrupts that are la
    tched to the interrupt pending registers
    (IPR/IPRH) are cleared by writing a 1 to th
    e corresponding bit in the interrupt pending
    clear register (ICR/ICRH). Fo
    r example, a write of 1 to ICR.E0 clears a pending
    interrupt in IPR.E0.
    If an incoming transfer completion code (TCC) gets latched to a bit in IPR/IPRH, then
    additional bits that get set due to a subsequ
    ent transfer completion will not result in
    asserting the EDMA3CC comple
    tion interrupt. In order for the completion interrupt
    to be pulsed, the required transition is from a state where no enabled interrupts are set
    to a state where at least one enabled interrupt is set.
     
    This is the fundamentally reason that edma3CompHandler has to iterate or spin several times to make sure to serve all the potential EDMA transfer completions that may happen after the 1st interrupt is latched.
     
    For some specific application including yours, the EDMA transfers and behaviors are more determinitic, you don't need to do this as long as the above mentioned scenario does not happen.  EDMA LLD is tailored to very generalized applications, it has to do this to make sure it works correctly in all the times.  I agree it's less efficient.  But it works functionaly.
     
    Xiaohui
  • Ralf,

    After discussing this with Xiaohui, we have determined that your first point must be reported as a program error in which indexh is initialized to the wrong value. It should be initialized to 0 so the entry loop test will work (due to indexl=1u) but the loop test will not be extended by indexh!=0 for the case where numTCC<=32.

    Your point #2 about using IER is a system-oriented issue. The handler is faster by reading from a RAM location instead of the ConfigBus to get IER. If the EDMA3 LLDs are used as expected, then there should not be any issues with the method used, from our determination. This means that when you have configured a callback function for an IPR/IPRH bit, you must remove that configuration whenever you also clear the IER bit.

    There are numerous inefficiencies in the EDMA3 LLD completion interrupt handler, but it does function correctly. Decreasing the total number of ConfigBus accesses, such as reading or writing to IPR, ICR, IER, and so on, would improve the performance of the handler. But the functionality is still good, which only means that we have to discuss it more with the development team.

    Thank you for your diligent work on this. Your efforts will help us to make this a better system of tools.

    Regards,
    RandyP

  • Hi RandyP,

    I agree about #1, that initializing indexh to 0 should solve this problem.

    But there is still a problem about #2. I'm trying to use a TCC for polling operation. I don't have any callback registered and IER is not set for this TCC. But the TCC is allocated and therefore, the bit in allocatedTCCs[][] is set. If any other TCC triggers the completion handler at the same time, the inner and outer while() loops start spinning.
    Maybe it would make sense to change the use of allocatedTCCs: Instead of storing registered TCCs, we could store registered call-backs. This way, we can avoid reading the IER register, but it would be consistent with IER.

    I'm still a little bit confused about the outer loop. This is how I understand it:
    Additional TCCs can be latched in IPR while we are in the completion handler, but no additional completion interrupt will be fired for them. A new completion interrupt will only be fired, when all enabled interrupts are not set in IPR before.
    This is true, unless we set the IEVAL register. This is done at the end of the handler. Therefore, I think the outer loop could be removed.
    The implemented completion handler looks like a mixed version of the interrupt servicing procedure 2-1 and 2-2 which are described in the EDMA3 user guide. Procedure 2-1 makes sure, that all enabled interrupts are cleared in IPR (which corresponds to the outer while() loop). Procedure 2-2 only processes one interrupt flag and then sets the IEVAL register if IPR is not 0 on exit.

    Thanks,
    Ralf

  • Ralf,

    #1 will be submitted to be fixed. This issue of initializing indexh is a clear and simple logic design mistake.

    #2 is more of a quality-of-design issue than a logic or functionality issue. I agree 100% that this handler can and should be re-written to improve the efficiency of this critical part of any EDMA3-based application. But it is difficult for me to suggest to the development team that they should prioritize improving this handler over adding new drivers for other new features that are in development for our DSP and ARM family of processors. This handler works functionally; it could be improved, but there may be lessons the developers learned along the way that explain why they made the choices they did.

    Two critical points to consider:

    a) If you remove the outer loop but a new IPR bit gets set before you exit the handler, then you will have to go through a complete Hwi context restore/save cycle to service that new IPR bit. Using the outer loop avoids that.

    b) There is a warning (I think it is in the EDMA3 User Guide) that IEVAL.EVAL should not be pulsed if IPR and IPRH are 0. I am not sure how bad this is, but to avoid it you have to read IPR and IPRH before pulsing IEVAL.EVAL, and if you are reading those in the handler anyway, then you could go ahead and service any bits that are set.

    The tradeoffs have to do with the probability of new bits being set while in the handler versus the extra time it takes to re-read these slow ConfigBus registers.

    Personally, I prefer having the handler choose to either do the outer loop until IPR & IER == 0 or just do one pass and set IEVAL.EVAL. If the cost of the IEVAL pulse when IPR == 0 is an errant interrupt occasionally, that may be an acceptable loss - the details need to be understood, and those may be too deep in the architecture for you and me to access.

    Also, I would do as you suggest, and use IER instead of the stored values because IER is what the hardware uses and there is a chance of mis-match. Using IER is more robust and it clearly helps with the case of mixing interrupts and polling.

    I would also clear all the ICR bits with a single write instead of doing it one-at-a-time for each ISR as it gets called. This gives more time for a repeat IPR bit to avoid being lost, and it requires fewer writes to the ICR register.

    I wish I could say that all of these ideas and yours will be included in a future release, but I cannot make that commitment and I cannot speculate on the chances. Wish us luck, though. If you decide to rewrite the function to include it separately as source in your project, please post the enhanced version and you might get comments.

    Thank you for your hard work on this.

    Regards,
    RandyP

  • Hello RandyP,

    I totally agree with your explanations.

    I will try to implement an optimized version of the completion handler and post it later.

    Thanks a lot!

    Ralf

  • Hi RandyP,

    this is the optimized version of the ISR which I'm using for some time now:

    static void edma3ComplHandler (const EDMA3_RM_Obj *rmObj)
        {
        volatile EDMA3_CCRL_Regs *ptrEdmaccRegs;
        volatile EDMA3_CCRL_ShadowRegs *shadowRegs;
        EDMA3_RM_TccCallbackParams *pEdma3IntrParams;
        uint32_t pendingIrqs;
        uint32_t pendingIrqL;
        uint32_t pendingIrqH = 0;
        uint32_t ier, ierh;

        uint32_t indexl;
        uint32_t indexh;
        uint32_t edma3Id;
        uint32_t useIrqH;

    #ifdef EDMA3_INSTRUMENTATION_ENABLED
        EDMA3_LOG_EVENT(&DVTEvent_Log,"EDMA3",
                    EDMA3_DVT_DESC(EDMA3_DVT_eINT_START,
                    EDMA3_DVT_dNONE,
                    EDMA3_DVT_dNONE,
                    EDMA3_DVT_dNONE));
    #endif /* EDMA3_INSTRUMENTATION_ENABLED */

        assert (NULL != rmObj);

        edma3Id = rmObj->phyCtrllerInstId;
        useIrqH = (rmObj->gblCfgParams.numTccs > 32);
        ptrEdmaccRegs = (volatile EDMA3_CCRL_Regs *)rmObj->gblCfgParams.globalRegs;
        shadowRegs = (volatile EDMA3_CCRL_ShadowRegs *)(&ptrEdmaccRegs->SHADOW[edma3RegionId]);

        ier = shadowRegs->IER;
        if(useIrqH)
            ierh = shadowRegs->IERH;

        pEdma3IntrParams = edma3IntrParams[edma3Id];
            edma3OsProtectEntry (edma3Id,
                                EDMA3_OS_PROTECT_INTERRUPT_XFER_COMPLETION,
                                NULL);

        do
            {
            pendingIrqL = ier & shadowRegs->IPR;
            if(useIrqH)
                pendingIrqH = ierh & shadowRegs->IPRH;

            // clear all pending and enabled interrupts with a single write
            shadowRegs->ICR = pendingIrqL;

            // Process all the pending interrupts
            pendingIrqs = pendingIrqL;    // use a copy here, the original value is used later
            indexl = 0u;
            while (pendingIrqs)
                {
    #ifdef _TMS320C6400
                indexl = 31-_lmbd(1, pendingIrqs);
                pendingIrqs ^= 1<<indexl;
    #else
                if((pendingIrqs & 1u) == TRUE)
                    {
    #endif
                    if(pEdma3IntrParams[indexl].tccCb != NULL)
                        {
                        pEdma3IntrParams[indexl].tccCb (indexl,
                                    EDMA3_RM_XFER_COMPLETE,
                                    pEdma3IntrParams[indexl].cbData);
                        }
    #ifndef _TMS320C6400
                    }
                ++indexl;
                pendingIrqs >>= 1u;
    #endif
                }

            if(useIrqH)
                {
                // clear all pending and enabled interrupts with a single write
                shadowRegs->ICRH = pendingIrqH;

                // Process all the pending interrupts
                pendingIrqs = pendingIrqH;    // use a copy here, the original value is used later
                indexh = 0u;
                while (pendingIrqs)
                    {
    #ifdef _TMS320C6400
                indexh = 31-_lmbd(1, pendingIrqs);
                pendingIrqs ^= 1<<indexh;
    #else
                    if((pendingIrqs & 1u)==TRUE)
                        {
    #endif
                        if(pEdma3IntrParams[32u+indexh].tccCb!=NULL)
                            {
                            pEdma3IntrParams[32u+indexh].tccCb(32u+indexh,
                                        EDMA3_RM_XFER_COMPLETE,
                                        pEdma3IntrParams[32u+indexh].cbData);
                            }
    #ifndef _TMS320C6400
                        }
                    ++indexh;
                    pendingIrqs >>= 1u;
    #endif
                    }
                }
        
            // Make sure all enabled interrupts are inactive before leaving the ISR!
            } while (pendingIrqL != 0 || pendingIrqH != 0);

            edma3OsProtectExit (rmObj->phyCtrllerInstId,
                                EDMA3_OS_PROTECT_INTERRUPT_XFER_COMPLETION,
                                NULL);

    #ifdef EDMA3_INSTRUMENTATION_ENABLED
        EDMA3_LOG_EVENT(&DVTEvent_Log,"EDMA3",
                    EDMA3_DVT_DESC(EDMA3_DVT_eINT_END,
                    EDMA3_DVT_dNONE,
                    EDMA3_DVT_dNONE,
                    EDMA3_DVT_dNONE));
    #endif /* EDMA3_INSTRUMENTATION_ENABLED */
        }

    These are the main changes:

    • Only enabled interrupts are processed (IER), the field allocatedTCCs[][] isn't used any more
    • I implemented procedure 2-1 of the EDMA3 User Guide: IPR is read and processed in a loop, until all enabled interrupt bits are cleared
    • IPR bits are cleared by a single write to ICR (for each iteration of the outer loop)
    • The inner loops for processing the IPR bits are using the _lmbd() intrinsic for C64x DSPs

    Ralf

  • Ralf,

    Thank you for posting this great code work. I hope others will find this to be very useful.

    Some quick comments:

    1. EDMA3 was introduced with the C64x+ generation, so any device using EDMA3 will be C6400 or later. This could simplify your #if for readability.
    2. To reduce the chance of missing an EDMA3 interrupt, it is best to read IPR/H as early in the process as possible and then to immediately write to both ICR and ICRH. This only matters if there is any chance that a second completion to the same TCC could occur too quickly.
    3. I like the use of lmbd. That required some careful logic planning on your part. I will copy that in the future. 

    Regards,
    RandyP 

  • Hi RandyP,

    Thanks for your comments.
    I stole the idea with _lmbd() from the EDMA interrupt dispatcher in the C6000 CSL :-)

    Ralf

  • Randy and Ralf,

    Do you suggest using the EDMA LLD over the EDMA CSL implementation regarding performance ?

    Is there any advantage of using the LLD over the CSL ?

    Randy you mention inefficiencies in the LLD, are they also present in the CSL ?

    Thank you,
    Regards,
    Clément

  • Hi Clément,

    I think the C6000 CSL is only used for older devices:
    http://www.ti.com/tool/sprc090

    As far as I know, the CSL for C64+ and newer devices don't provide an ISR for EDMA.

    Ralf

  • Hi Ralf,

    Thank you for your prompt answer.

    I was talking about the CSL included in the MCSDK (http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide#Chip_Support_Library_.28CSL.29)

    which provides functions to use the EDMA3.

    I was thinking of using EDMA3 CSL with SYSBIOS CPINTC module to handle interrupts.

    Clément

  • Clement,

    Ralf found issues with the LLD's ISR, but with his version that problem is gone.

    Functional CSL is not available for all of our devices, but register-level CSL is generally available. The ISR that Ralf wrote does not require any of the functional CSL, from what I saw.

    The LLD is much more robust and complete than CSL. LLD includes resource management, which can be confusing to learn, but with patience it is quite functional and powerful. The LLD can be very inefficient when you use the convenience functions that do read-modify-writes to some of the PARAM registers. Those r-m-w instructions are very inefficient in terms of cycle counts, but that may not matter if you are doing the setup only once and then the DMA channels just run. The LDD can be much more efficient by using the function that writes all of the PARAM entries from a single struct.

    It may be best if you will start a new thread for your more specific questions on a slightly different topic. Once you do that, please include a link to that thread here so we will know about it.

    Regards,
    RandyP

  • RandyP,

    Sorry for hijacking the thread.

    Here's the new stread about the best choice between EDMA LLD or CSL.
    http://e2e.ti.com/support/embedded/bios/f/355/t/275449.aspx

    Regards,
    Clément

  • Hi,

    three years later, I just want to point out that issue #1 is still present in the current EDMA LLD. Very disappointing...

    Ralf