This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM335x EDMA errors

Hi.

We are attempting to use the EDMA with UARTs on the AM335x.  This is mostly working, but we sometimes see a problem as the system starts up and data starts to flow.

The EDMA driver in the Linux kernel (arch/arm/common/edma.c) has an error interrupt handler, dma_ccerr_handler(). According to the TRM, this interrupt can be generated when there is a missed DMA event (latched in EMR or EMRH), a missed QDMA event (latched in QEMR), a threshold exceeded (latched in CCERR) or a TCC error (also latched in CCERR).

In the condition that we are experiencing, the interrupt handler checks those four error registers, but does not find any error event. All of the registers are 0. If the interrupt handler returns IRQ_NONE, the kernel masks interrupt 14 since no driver has claimed it. This causes problems with future DMA processing.

What can cause EDMAERRINT other then the sources listed TRM and indicated by the four registers?

Regards,
    Steve

  • Hi Steve,

    I will forward this to the factory team. Which Linux version are you using?

  • Hi Biser.

    We are using Linux version 3.2.0.

    Regards,
         Steve

  • hi,Steve Schefter

       I encounter the same problem. but i use starterware02.00.00.07.

      have you fixed the problem now?

  • Hi Wen Tang.

    It appears to be occurring as a result of a race condition.

    The error handler will get invoked as a result of a DMA request being activated before the DMA can service the previous request.  With the DMA reading from a FIFO, this does not cause a loss of data, but does cause the DMA interrupt.

    I believe that the following was happening.  The second DMA request occurred and caused the error interrupt to be pending.  Before the error ISR is run, the DMA is given a new buffer and started.  The error condition is gone, but the interrupt is still pending.

    There isn't a way around the race other than to redo the DMA buffer setup such that it never runs out of space.  Rather, the solution was to accept this condition and clear up the interrupt properly.  In dma_ccerr_handler(), that means writing to EEVAL and indicating that the interrupt was handled:

            if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
                (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
                (edma_read(ctlr, EDMA_QEMR) == 0) &&
                (edma_read(ctlr, EDMA_CCERR) == 0))
            {
                    edma_write(ctlr, EDMA_EEVAL, 1);
                    return IRQ_HANDLED;
            }

    Regards,

        Steve

  • thank you,Steve Schefter.

       Once enter into Edma3ccErrorIsr(),  i cannot clear the pending status with EMCR/EMCRH,EEVAL,CCERRCLR.

      I try to disable the TPCC,TPTC0,TPTC1,TPTC2 ,UART1 clock,and then use INTC_ISR_CLEAR0 to clear the pending status in INTC module. it doesn't work.  only i can do is to repower.

     

  • Hi, Biser

    My customer have same probrem.
    We wanna know why dma_ccerr_handler is called, When four register(EMR/EMRH/QEMR/CCERR) are clear 0.
    Do you know this reason ?

    Best Regards
    Hiroyasu
  • Hi, Biser

    Can you tell me any comments?

    Best Regards
    Hiroyasu
  • Hi, Biser

    Does Steave's comments correct ?
    If yes, Can we have a way to check from register ?
    I want TI comments.

    Best Regards
    Hiroyasu
  • Hi,

    You have attached your questions to a thread that is two years old, no it's no wonder that we have missed them. In the future please open new threads for questions you have, if you want fast feedback. Now to your questions: Please describe your problem, use case and software that you use.
  • Hi, Biser

    Thank you for your reply!
    Next time, I will post new thread if it is old thread.
    Now I got same situation as Steave.

    =================
    Steave says
    The EDMA driver in the Linux kernel (arch/arm/common/edma.c) has an error interrupt handler, dma_ccerr_handler(). According to the TRM, this interrupt can be generated when there is a missed DMA event (latched in EMR or EMRH), a missed QDMA event (latched in QEMR), a threshold exceeded (latched in CCERR) or a TCC error (also latched in CCERR).

    In the condition that we are experiencing, the interrupt handler checks those four error registers, but does not find any error event. All of the registers are 0. If the interrupt handler returns IRQ_NONE, the kernel masks interrupt 14 since no driver has claimed it. This causes problems with future DMA processing.
    =================

    and I have same question.

    What can cause EDMAERRINT other then the sources listed TRM and indicated by the four registers?

    Best Regards
    hiroyasu
  • Hi Hiroyasu,

    Hiroyasu said:
    What can cause EDMAERRINT other then the sources listed TRM and indicated by the four registers?

    When saying EDMAERRINT, you actually mean EDMA3CC_ERRINT right? I assume the error interrupt is based out of channel controller EDMA3CC and not transfer controller. Because, there could be possibilities for reporting error conditions (BUSERR, MMRAERR, TRERR) on the transfer controller too. I can not find any other EDMA3CC_ERRINT source than these listed in AM335x TRM, section 11.3.9.4 Error Interrupts, thus it might be the race condition that Steve talks about.

    Do you enter in the interrupt handler with valid source (bit is set in EMR/EMRH/QEMR/CCERR) and then you enter in the interrupt handler without valid source (all bits in EMR/EMRH/QEMR/CCERR are 0)? If yes, then make sure when you are in the valid interrupt handler, before exit/return to write 1 in the EVAL bit and to clear the valid interrupt source in the EMCR/EMCRH/QEMCR/CCERRCLR registers.

    Are you using AM335x TI EVM or custom board?

    Note that in the latest AM335x PSDK 3.00.00.04, the DMA driver is at:

    linux-4.4.12/drivers/dma/edma.c

    I see the dma_ccerr_handler() is updated compared with arch/arm/common/edma.c. So I would recommend you to switch to PSDK 3.00.00.04 or at least to align with the linux-4.4.12/drivers/dma/edma.c driver.

    Regards,
    Pavel

  • Hi, Pavel

    Thank you for your comments.

    My customer enter in the interrupt without valid source (all bits in EMR/EMRH/QEMR/CCERR are 0).
    and My customer use their customer target board.

    I have another quetion.

    (1)About Error bit
    Steave says

    ====================
    I believe that the following was happening.
    The second DMA request occurred and caused the error interrupt to be pending.
    Before the error ISR is run, the DMA is given a new buffer and started.
    The error condition is gone, but the interrupt is still pending.
    ====================

    I think EMR/EMRH/QMR/CCERR registers has some bit for every channel.
    If DMA get second other request, Error condition is gone ?

    Best Regards
    Hiroyasu
  • Hi, Pavel

    I have an another quetion.
    If my situation is same as Steave's situation,
    Steave's program is no problem ?

    =================
    if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
    (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
    (edma_read(ctlr, EDMA_QEMR) == 0) &&
    (edma_read(ctlr, EDMA_CCERR) == 0))
    {
    edma_write(ctlr, EDMA_EEVAL, 1);
    return IRQ_HANDLED;
    }
    =================

    Best Regards
    Hiroyasu
  • Hiroyasu said:
    My customer use their customer target board.

    I would recommend to test this on the AM335x TI EVM, to see if the issue is specific to your custom board or not.

    Hiroyasu said:
    If DMA get second other request, Error condition is gone ?

    EDMA3CC has an error detection logic that causes an error interrupt generation on various error conditions, like missed DMA event. You should clean the event missed error before re-triggering the DMA channel. For a particular DMA channel, if a second event is received prior to the first event getting cleared/serviced,
    the bit corresponding to that channel is set/asserted in the event missed registers (EMR/EMRH).
    If additional error events are latched prior to the original error bits clearing, the EDMA3CC does not generate additional interrupt pulses.
    Writing 1 to EVAL bit causes the pulsing of error interrupt if any pending errors are in EMR/EMRH or CCERR.

    Regards,
    Pavel

  • Hiroyasu said:
    I have an another quetion.
    If my situation is same as Steave's situation,
    Steave's program is no problem ?

    As Steave's program is not tested/validated from TI, I can not state if it is proper or not.

    I can suggest you to work with the latest EDMA driver (from PSDK3), which is tested and validated by TI.

    Regards,
    Pavel

  • Hi, Paul

    Thank you for your comments.I got it.
    You told me  User have to clear EMR/EMRH/QEMR/CCERR register when first event get.

    but actually My customer get EMR/EMRH/QEMR/CCERR = 0 when dma_ccerr_handler() is called.

    Is it possible to clear EMR/EMRH/QEMR/CCERR register without user clear ?

    Best Regards
    Hiroyasu

  • Hi, Pavel

    I ask my customer to check their program.
    Thank you.

    Best Regards
    Hiroyasu
  • Hiroyasu said:
    Is it possible to clear EMR/EMRH/QEMR/CCERR register without user clear ?

    I am not sure I understand this question. Can you provide more details what exactly you need to know regarding EMR/EMRH/QEMR/CCERR register?

    Regards,
    Pavel

  • Hi Pavel

    My Customer get interrupt when EMR/EMRH/QEMR/CCERR register are 0.
    I wanna know why this situation is occur.

    If EDMA hardware logic can not clear these register, My customer have to look for clear source code.
    If EDMA hardware logic can clear these register, I wanna know about this reasons.

    As long as I read TRM, I think EDMA hardware logic can not clear these register.
    I wanna make sure this understanding is correct.

    but If EDMA hardware logic can not clear these register,
    I guess Steave's thinking is not correct.
    Because He did not clear EMR/EMRH/QEMR/CCERR register and think The error condition is gon
    when second DMA is given a new buffer and started.

    Best Regards
    Hiroyasu
  • EMR/EMRH/QEMR/CCERR are read-only registers. The bits there can be set to 1 only by the EDMA hardware logic. The bits there should be cleared to 0 by user software (not EDMA HW logic), cleared to 0 by writing 1 in EMCR/EMCRH/QEMCR/CCERRCLR registers.

    Once a missed event is posted (by EDMA hw logic) in the event missed registers (EMR/EMRH), the bit remains set and you need to clear the set bit(s). This is done by way of CPU writes to the event missed clear registers (EMCR/EMCRH). Writing a 1 to any of the bits clears the corresponding missed event (bit) in EMR/EMRH; writing a 0 has no effect.

    Regards,
    Pavel
  • Hi, Pavel

    O.K. Thank you !

    Best Regards
    Hiroyasu