This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28388D: NMI by uncorrectable Error in CM

Part Number: TMS320F28388D


Hello Ibukun

Until now, this problem (e2e.ti.com/.../tms320f28388d-nmi-by-uncorrectable-error-in-cm) has not occurred again with the NOP at the corresponding location. With the latest software release, this problem occurs again in two other places. The behavior is very similar. An NMI interrupt is triggered and in the register UNC_ERR_ADDR_HIGH or UNC_ERR_ADDR_LOW there is a value pointing into the C1RAM (stack) (0x1FFFCE08, 0x1FFFCF28). If FRDCNTL.RWAIT is increased from 2 to 3, the problem no longer occurs. But according to the specification a RWAIT = 2 should be sufficient with a CPUCLK of 120 MHz. Is this implementation correct or is there another timing condition that requires a larger RWAIT?
With the debugger this problem cannot be observed currently, although the code is also executed from the flash. The functions are located at other addresses.

Best regards 

Simon

  • Simon,

    Ibukun is out of the office, I'm going to look through the previous post and this one to see if I can help out in the meantime.  I will try and give an update tomorrow, the 25th.

    Best,

    Matthew

  • Simon,

    I read through the previous post.  I'm going to see if he looped in some others for debug if that can help get us all up to speed while he is out.

    One thing, in the new failing scenario, can you also show where the code returns after the Uncorrectable/NMI.  Specifically trying to see if the assembly instructions that may have been executed prior to the event are similar to your original post.

    I also noticed that in all cases that the stack is aligned to address with final byte = 8.  Would it be possible to change the starting address of the M4 stack by +2 words to see if the error still occurs (just trying to rule out the RAM address here).

    What kind of time duration does it take for this to show up after power up and boot?

    There is no other limitation to the flash waitstates than what you have quoted from the DS.  But from that experiment, something is impacting the time it takes for the flash(or perhaps RAM) to settle its data lines before it can be read.

    Best,

    Matthew

  • Hello Matthew

    Thank you very much for thinking with me.

    • I wanted to look with the debugger at which point in the code it jumps back after the NMIISR. But unfortunately the error in the debugger did not appear again, although the code runs from flash as in the release. But I can read the return address from an object after the restart and with the .lst file then the assembler code can be looked at this place.
      Return address: 0x00234A2C
      From the .map file:

      From the .lst file:

      From the .hex File:

      And the corresponding C code:

      Can this be used to detect a correlation to the error?

    • After startup, it takes a few seconds to a few minutes for the error to occur.

    • Stack change:

      Stack before the change:

      Stack after the change:

      The error still occurs with UNC_ERR_ADDR_LOW = 0x1FFFCF38.

    Best regards

    Simon Schoch

  • Hello Simon,

    A few requests/questions.

    1. Is it possible to share a .out file so that we can try to simulate it and identify the exact source of this error? (Or alternatively if you have some test code that you are able to replicate the error with, that works also.) I already sent a friendship request previously so that you can send it privately.

    2. Also, in this new case, does inserting a NOP in between the call to gSlvGetMapLen() and the comparison to the message buffer data change the behavior or resolve this particular ECC error?

    3. One more thing we would like to check: Does the same error still happen if you allocate the stack to C0 RAM instead of C1 RAM?

    So far it still looks like some corner case in the hardware that is being triggered, but we would really like to get to a definitive root cause.

    Thanks,
    Ibukun

  • Hello Ibukun,

    1. It is difficult to create a project that shows this error on the evalboard. already when I run this project as a debug version, the error does not occur anymore. I can't implement this at the moment.
    2. With a NOP after gSlvGetMapLen()  the error does not occur any more (test time > 1.5h)

      From the .lst file:
    3. Stack in C0RAM:

      The error occurred again after a short test time (about 60s).

    So a NOP at the right place seems to fix the problem.
    Should I run any more tests?

    Best regards 

    Simon

  • Simon,

    OK, this is an interesting data point that it does not behave the same way in a debug version. Does the debug version have different compiler settings (e.g. optimization)? Just trying to see if there are any more clues we can glean. Beyond that, we can't do much more without having a way to reproduce the issue on our end.

    The NOP is still an acceptable solution since we believe this is a rare hardware timing bug that has been triggered by your application.

    Thanks,
    Ibukun

  • Hello Ibukun

    The optimizer is switched off in the release and debug versions. In the debug version the memory mapping is different, therefore the code parts are located at other addresses. Maybe this is the reason why the error does not occur in the debug variant.
    At the moment I prefer the solution with RWAIT = 3 instead of 2 over the solution with a NOP at the right place, because only in case of an error this place can be found. I hope with RWAIT = 3 this error does not occur any more.

    Thanks,

    Simon

  • Understood. I believe RWAIT=3 would be a more robust solution is it forces slower timing overall at the expense of code throughput, versus the NOP which deals with a specific instance. Please do let us know if you ever identify something that we can reproduce on our end. We appreciate your patience through this debug process!