This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: Compiler optimization issue

Part Number: AM5728

GNU compiler -03 optimization around BIOS Timer ISR stalls the processor after hours of running

Hi,

We've  been chasing an elusive issue on customer's system and after many iterations isolated it to a simple project running on IDK under JTAG where changing optimization from -o3 to -o2 on a single file makes the problem go away. 

The problem manifests itself upon attempting to pause execution and the message in CCS console is 

CortexA15_0: Trouble Halting Target CPU: (Error -1323 @ 0x80005738) Device failed to enter debug/halt mode because pipeline is stalled. Power-cycle the board. If error persists, confirm configuration and/or try more reliable JTAG settings (e.g. lower TCLK). (Emulation package 8.0.27.9)

It may take up to 36 hours for the problem to occur. I have the CCS project which can be used to reproduce the issue but would refrain from sharing it on the public forum. 

I'll appreciate your guidance here

Thanks,

Michael

  • Hi Michael,

    I need to consult internally on this.

    In the meantime, would it be possible for you to share a simple (minimum frills, no extra code) example which demonstrates the problem on a TI hardware platform?

    Thanks,
    Frank

  • Hi Frank,

    I shared via BOX with the detailed message, you should receive BOX notification. Please let me know if you didn't.

    thanks

    Michael

  • Hi Michael,

    Today I was told actively debugging this is outside the scope of our support model, so I won't debug the code you've shared. Instead I'll share with you what I learned from internal discussions. I hope this helps.

    Regards,
    Frank

    The CCS message shows the CPU is unable to enter halt mode. This happens when it can't retire an instruction. The CPU generally can't retire an instruction when some part of the memory path has frozen.

    The failure may be caused by the generated code not respecting some hardware sequencing, or by the code shifting the timing of an issue which causes some level of the memory to fail. The memory failure could be in the L1/L2 caches if there is a CPU errata, or it could be out in the L3 interconnect where some slave interface locked up which the CPU was accessing.

    The best way to debug is at issue time to connect to the DAP port (not the CPU) and look around the memory for "sick" address ranges. It's best to first look at the FLAG MUXes to see if an L3 error has been flagged. The use of ETM can sometimes help as the CPU will stop emitting trace with the issue happens. Next the Onchip buffer can be read via DAP.

    .  This happens when it can’t retire an instruction.  The CPU generally can’t retire an instruction when some part of the memory path has frozen.

  • Hi Frank,

    The debugging along the lines you suggest has already been done, we did observe random L3 errors at random addresses from run to run (mostly in GPMC space). 

    It is not a matter of debug effort to solve customer's problem at this point. We have a workaround that makes the symptoms go away. The project I shared is a result of many months of debugging to zoom in on offensive sequence of instructions. It shows that if GPMC address space is repeatedly accessed in busy-wait loop there is something that looks like cumulative timing shift that eventually brings about timing violation and processor hang. In my eyes this at the very least warrants an investigation. 

    Let me know your thoughts

    Michael

  • Hi Michael,

    Purely from a technical perspective I agree that it warrants an investigation. However, I'm not currently in a position to undertake the investigation.

    Regards,
    Frank