This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28388D: CPU1 gets stuck in ITRAP after power on

Part Number: TMS320F28388D
Other Parts Discussed in Thread: C2000WARE

Last year we had feedback about CPU1 startup failures, and here are two threads from that time:

 

TMS320F28388D: CPU1 gets stuck after power on

TMS320F28388D: RE: TMS320F28388D: CPU1 gets stuck after power on

 

We have recently experienced the same problem with other products, with about 5% of chips having CPU1 jump to ITRAP during the power-on startup process.

 

I conducted a test using the faulty equipment, and the following phenomena occurred:

 

1. If only CPU1 is run, there is no problem. If CPU1 boots CPU2, even if there are no application program in CPU2, CPU1 has a probability of entering ITRAP

 

2. I used TI's routine for the above test. The memory configuration and code came from C2000ware, and CPU1 would also enter ITRAP.

 

3. I have tested the time interval between CPU1 booting CPU2 and CPU1 entering ITRAP, which is about 37us. I ran the test using three abnormal devices, and the time interval was almost identical. I've drawn two diagrams to illustrate the normal and problematic running of our application for further discussion.

 

We have the following judgments and questions:

 

1. Since problems can occur with TI routines, and the problems are concentrated on a few specific chips, we tend to think that this is a hardware or chip problem. But according to our earlier tests, if I randomly add a few NOP statements to the program to slightly change the FLASH space allocation and the program execution timing, the problem will disappear, and then as I add more NOP, the problem will return, which makes us very confused as to why the software tweaks will affect the problem recurrence.

 

2. The time interval between CPU1 starting CPU2 and the problem occurring is fixed at 37us. Do we have the means to determine what actions the whole system, especially CPU2, is performing at this moment? Can you give us some hints, or can I look up relevant information from TRM?

 

3.My judgment is that during the startup of CPU2, CPU1's access to FLASH will be affected at some point, causing CPU1 to jump to ITRAP. For this reason, I tried to avoid this "problem moment" by having CPU1 jump to RAM to execute a loop of about 10ms after CPU1 boots CPU2. After adding this mechanism, CPU1 did not get stuck to ITRAP anymore.

  • Hi,

    Thank you for providing the detailed info about the issue and the observations. Are you able to produce this fail condition with debugger (CCS) connected ? 

    IN the figure you have CPU2 Boot step. What is that ? Is that CPU2 running the cinit part before jumping to main ? Is it possible to share that code with us ?

    Also Is there a way to check the CPU1 code integrity inside flash after the ITRAP happens ? 

    Regards,

    Vivek Singh

  • Hi, Vivek,

    Glad to receive your reply, and for your questions:

    Are you able to produce this fail condition with debugger (CCS) connected ?

    I used the following steps and did not reproduce the problem:

      1) Use CCS to connect CPU1 only;

      2) Load CPU1 program;

      3) Run CPU1. If no problem exists, reset CPU1, then restart the program and run again

    Are the above steps correct? Is there a way to restart CPU1 without  "Load CPU1 program"?

     

    IN the figure you have CPU2 Boot step. What is that ? Is that CPU2 running the cinit part before jumping to main ? Is it possible to share that code with us ?

    Yes, "CPU2 Boot" represents the program CPU2 was executing before it ran main.

    Will this program be affected by my configuration or application? Since I can also reproduce the problem with routines, I can send you the routines I used if needed.

     

    Also Is there a way to check the CPU1 code integrity inside flash after the ITRAP happens ? 

    After ITRAP occurred, I used the debugger to connect the device to check the FLASH area where the fetch anomaly occurred, and the content was intact. Are there other ways to check “code integrity”?

    Thanks and regards,

    Shawn

  • Hi,

        Here is the CPU2 Boot Flow that I found in TRM

        

        

        

        Previously I found a typical time intercal of 37 microseconds between the time I call "Device_bootCPU2(BOOTMODE_BOOT_TO_FLASH_SECTOR0)" in CPU1 and CPU1 got stuck. Can you help me analyze which step of boot flow CPU2 might be in after 37 microseconds of startup?

  • Hi,

    Is there a way to restart CPU1 without  "Load CPU1 program"?

    I did not understand this query. Since the code is in flash, you don't have to load the program each time. You can just load the symbol and then run it.

    Will this program be affected by my configuration or application? Since I can also reproduce the problem with routines, I can send you the routines I used if needed.

     

    If you have a routines to reproduce this issue, that is great. Are you able to run the routine on TI provided controlCARD or launchpad ? If yes then I would like you to send me these routines.

    After ITRAP occurred, I used the debugger to connect the device to check the FLASH area where the fetch anomaly occurred, and the content was intact. Are there other ways to check “code integrity”?

    Ok.

    "Device_bootCPU2(BOOTMODE_BOOT_TO_FLASH_SECTOR0)" in CPU1 and CPU1 got stuck. Can you help me analyze which step of boot flow CPU2 might be in after 37 microseconds of startup?

    It's very difficult to predict where CPU2 code would be but in general CPU2 is waiting for BOOT CMD from CPU1 and as soon as it gets the BOOT command, it'll jump to application. My guess would be that CPU2 will be running the cinit code at this time. Could it be possible that CPU2 is corrupting CPU1 code/data space ? 

    Regards,

    Vivek Singh

  • 1. I repeatedly restarted CPU1 through CCS while leaving CPU2 disconnected, but found that CPU2 did not seem to restart with the restart of CPU1. Is there something wrong with my operation here?

    2. I will try to find the controlCARD and get back to you later.

    3. I want to locate the statement CPU2 is executing at the time CPU1 is abnormal. Do you have a recommended method? For example, when CPU1 enters ITRAP, CPU2 is ordered to stop running, or read CPU2's Program Counter at that moment.

  • Hi,

    1. I repeatedly restarted CPU1 through CCS while leaving CPU2 disconnected, but found that CPU2 did not seem to restart with the restart of CPU1. Is there something wrong with my operation here?

    When you restart CPU1, do you reset CPU1 and then restart or just restart ? If just restarting then it may cause issue. You need to reset and then restart.

    3. I want to locate the statement CPU2 is executing at the time CPU1 is abnormal. Do you have a recommended method? For example, when CPU1 enters ITRAP, CPU2 is ordered to stop running, or read CPU2's Program Counter at that moment.

    I can not think of a direct way to do this. If you are connected to both CPU1 and CPU2  with CCS  when this condition happens then we may have some way to do this which I need to check with our CCS team.

    Regards,

    Vivek Singh

  • As you suggested, I conducted the following tests:

    1. Load the program into the faulty DSP, and then reset the DSP by power-on reset for a total of 20 times, of which CPU1 entered ITRAP for 11 times

    2. Connect CPU1 and CPU2 using CCS, and load symbols

    3. Then proceed in the following order:

        a. Reset CPU1 & 2

        b. Restart CPU1

        c. Resume CPU2, then Resume CPU1

        d. Observe the phenomenon, Suspend CPU1&2

        e. repeat step 3.a to 3.d

    4. I performed the operations in Step 3 for 30 times, and no exception occurs.

    5. Then I adjusted the sequence of 3.c, first Resume CPU1, then Resume CPU2, then I repeated Step 3 for 30 times again, still no exception occurs.

     

    Is there any problem with the above operation?

    Is there any difference between the reset operation through CCS and the power-on reset?

  • Thanks. Your steps look ok to me.

    Is there any difference between the reset operation through CCS and the power-on reset?

    Yes, there is difference. In case of CCS reset full logic inside design does not get reset where as power on reset will reset every logic in design. Also I see that in original post you have mentioned that inserting few NOPs in code can make it pass or fail which means timing also has a play into it and since you are manually starting CPU1 and CPU2 in CCS case, it may not be hitting that timing condition. So you need to have CPU2 disconnected only in this case. 

    Also have you looked at the VDD and VDDIO to make sure there is no dip on those pins when failure occurs. 

    Also after error happens, can you connect to CCS and then check the NMISHDFLG register value ?

    Regards,

    Vivek Singh

  • Hi, Vivek,

    Thanks for your reply.

    1. I've tried in the following manner: 

    1. Connect CPU1 only and leave CPU2 disconnected
    2. Reset CPU
    3. Restart CPU1
    4. Resume CPU1
    5. Delete the GEL files of CPU2 and connect CPU2 to check its status

    Then I found CPU2 did not restart at all. Is there any problem with my above steps?

    2. I need colleagues from the hardware department to help check VDD and VDDIO. I will reply to you later.

    3. I checked the NMISHDFLG register of both CPUs when the device was in a boot failure state

    CPU1

    CPU2

    Regards,

    Shawn

  • Can you try below for CPU1

    • Reset CPU1
    • In CCS memory browser write 0xA500 at addr 0xD01
    • Click on Run

    See if this makes any difference.

    Regards,

    Vivek Singh