This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5726: ARM hangs and GPMC error

Part Number: AM5726

We a have the problem described in the previous post. The problem has been reproduced on the Beagleboard X board. We are using embedded RTOS, but have minimized functionality for a minimum for test purposes. After some period of work, MPU0 stuck and is even inaccessible by the  JTAG debugger. It is still possible to connect to the debugger to the other cores (DSP, IPU) and sometimes it is possible to connect to the MPU1. Also, I can access some peripheral registers, but never MPU register. As described in the suggestion in the previous post we have minimized the SW and switch from the custom board to the Beagleboard X. Problem observed also on the Beaglebone. Making simulation with unexpected code execution or tight loop has don't reproduce the issue.

The problem with the similar symptoms has been reproduced when switching off the clock from the GPMC and running memory read operations on physical address starting from zero. GPMC was not initialized with any CS region at this time. 

Another observation regarding this issue related to the work of the Ethernet subsystem together with GPMC or EMMC. When Ethernet works alone no problem is observed. When GPMC or EMMC works alone everything also works fine. But as soon as Ethernet start working with GPMC or EMMC issue starts to reproduce. Also observed that in the GPMC_ERR_TYPE register errors appears (0x211). This error doesn't always lead to the MPU stuck and wasn't noticed that impact system functionality.

One more notes, our Ethernet drivers used chases, when we turn off the cache and starts working on the uncashed buffers GPMC error doesn't appear.

What could be the correlation between the uncashed memory, GPMC error, and CPU stuck?

 

  • The RTOS team have been notified. They will respond here.
  • Thanks, as for me it is not related with the RTOS, I suppose it is more related to the hardware level
  • Romko,

    This sounds like a software issue.  It is possible with any of the processor cores to get it into loops where JTAG cannot connect.  This usually occurs after encountering a corrupt pointer or when the processor is trying to execute from memory where no program is loaded.

    You indicate that JTAG connectivity is robust.  Also, it is known to be robust on the X15.  Therefore, if the problem is reproduced on the X15, you need to debug your code.  I will pass this thread back to the RTOS team.

    Tom

  • I understand that this looks like SW problem, but I can't understand how for instance corrupted pointer can cause such system lock, from my experiments it is always exception state.
    I have tried to reproduce the problem by filing memory with the bad data, writing the short loops but with no success. The CPU enters the exception handler and stops in the loop. But I'm still able to connect the debugger and access MPU register.

    As I have described I have made a simple test. Disabled clock for GPMC and simply trying to read each word of memory starting from 0 address. And on some addresses CPU stuck with the same as in described problem. So as if I understand somehow we can provoke some lock on the internal CPU bus .
    Could it be the reason of the problem?
  • Note that this is a continuation of an earlier thread that is now locked: e2e.ti.com/.../637787

  • Romko,

    Turning off the clock to GPMC block and then repeatedly reading from it is an invalid operating case.  It is not surprising that the processor stalls.  That does not mean that the cause of the system hand is the same.  You need to debug your code to isolate the cause of the problem.  That normally means that you segment your code and disable sections until the problem vanishes.  Then you need to examine / segment the part causing to problem until you can further isolate the cause of the fault.

    Tom

  • Hi Tom,

    For the peripheral initializing we are using the TI PDK based code, version 1.0.8. We have found the next code in the evmAM572x_clock.c file:

        CSL_FINST(coreCmReg->CM_L3MAIN1_CLKSTCTRL_REG,
            CORE_CM_CORE_CM_L3MAIN1_CLKSTCTRL_REG_CLKTRCTRL, RESERVED_2);
    
        while(CSL_CORE_CM_CORE_CM_L3MAIN1_CLKSTCTRL_REG_CLKACTIVITY_L3MAIN1_L3_GICLK_ACT !=
           CSL_FEXT(coreCmReg->CM_L3MAIN1_CLKSTCTRL_REG,
            CORE_CM_CORE_CM_L3MAIN1_CLKSTCTRL_REG_CLKACTIVITY_L3MAIN1_L3_GICLK));


    RESERVED_2 field is not mentioned in the TI documentation, I can find only NO_SLEEP or HW_AUTO allowed.

    After changing the value from the RESERVED_2 to the NO_SLEEP looks like the system works stable for now. Issue can't be reproduced for now, but tests are still running.

    Is it a bug in the PDK or missed description in the documentation?
    Could this setting lead to the described problem?

  • Romko,

    The behavior is definitely pointing to a bug or missed description in the documentation which may have been missed as we don`t have anything connected on the EVM. I checked the EVM GEL file and it appears that the CSL_CORE_CM_CORE_CM_L3MAIN1_CLKSTCTRL is being set to SW_WKUP (2) even in that initialization script which may have been the root cause of the board library clock setting.

    I have pinged the design team to get some clarification on this issue. Depending on what they respond, I will post the response and file SW and Literature bug for this issue.

    Regards,
    Rahul

  • Dear Rahul,

    Have you already clarified the issue with the design team? Is it SW or documentation problem?
  • Romko,

    Currently, I do not have an answer from the design team on this issue but I have pinged them on providing a response to provide clarification. Are you still running into issue with NO_SLEEP settings ? I will update the post as soon as I hear from the design folks.

    Regards,
    Rahul