This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM57L843 CPU LBIST does not execute - asm("WFI") has no effect

Other Parts Discussed in Thread: HALCOGEN

I am trying to execute the CPU LBIST of a TI Hercules RM57L843 (integrated into the RM57L HDK Development Board).  I am following the directions in SPNU562 (the RM57 Family Users Guide) 10.9.1 to set up the STC1 registers and put the CPU into idle mode with asm("WFI").  However, I am noticing that execution blows right past the asm("WFI") and gets stuck in an infinite loop that I implemented to catch this very possibility.  The self-test does not execute and the CPU does not reset.

Symptoms are very similar to these two unresolved threads.

TMS570 LBIST Trouble

TMS570 WFI trouble in LBIST


Here is the code I am using to execute the self-test.  As I said, it is very similar to the algorithm in SPNU 10.9.1;  however, when that didn't work I also added some undocumented tricks from TI's HALCOGEN generated self-test file {{HL_sys_selftest.c}}.  That did not help.  I present my code below for your perusal.

void cpuSelfTest()
{
    // This code implements the steps from spnu462 10.9.1
    
    // Block here to avoid boot loop!
    volatile int i = 0;
    while(i == 0);

    // Step 1:
    //
    // Set the STC clock rate.
    //
    // "The maximum clock rate for the self-test is 110MHz" (spnu215a 5.5.5.2)
    // STC1 is always driven by GCLK                         (spnu562 10.8.12)
    // GCLK is configured to use PLL1.    (BSP_CORE @r97029 HL_system.c : 282)
    // Assuming PLL1 is 300 MHz, then /3 will yield 100 MHz.
    // To divide by 3, use a STCCLKDIV of 2.                 (spnu562 10.8.12)

    uint32 temp = stcREG1->STCCLKDIV;
    temp &= ~((0x7 << 24) | (0x7 << 16)); // Clear [26:24] and [18:16]
    temp |= ((0x2 << 24) | (0x02 << 16)); // STCCLKDIV[26:24] = STCCLK[18:16] = 0x02
    stcREG1->STCCLKDIV = temp;

    // Step 2:
    //
    // Clear the CPU RST status bit in the System Exception Status Register.
    // 
    // Writing 1 to SYSESR[5] clears the bit.               (spnu562 2.5.1.46)

    systemREG1->SYSESR = (1 << 5);     // Write 1 to SYSESR[5].

    // Step 3:
    //
    // Choose the STC test interval count.  Larger intervals have greater test
    // coverage (and take longer to run).                     (spnu562 10.1.2)
    // The largest interval supported on STC1 Segment 0 is 125, and yields
    // 92.17% coverage.                                         (spnu562 10.5)
    // 

    temp = stcREG1->STCGCR0;
    temp &= ~(0xFFFF << 16);    // Clear [31:16]
    temp |= (125U << 16);       // STCGCR0[31:16] = 125

    // TODO also setting RS_CNT to 1 to "Restart STC run from interval 0".
    // Shouldn't make a difference because STCCICR is already 0 on boot.
    temp |= 0x1;

    stcREG1->STCGCR0 = temp;

    // Step 4:
    //
    // Set the self-test timeout, measured in terms of VBUS clock cycles.
    //                                                        (spnu562 10.8.3)

    stcREG1->STCTPR = 0xFFFFFFFF;

    // TODO HL_sys_selftest.c puts a "wait for 16 VBUS clock cycles" here.  Why?
    asm("NOP"); // 50 NOPs.
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP"); 
    asm("NOP"); 
    asm("NOP"); 
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP"); 
    asm("NOP"); 
    asm("NOP"); 
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");
    asm("NOP");

    // Step 5:
    //
    // Choose which core to test (STCGCR1[11:8]).
    // Other than 0x5 or 0xa == "Both cores in parallel".     (spnu562 10.8.2)
    // TODO this register may ignore writes:  test its value via reading it back
    // 

    // AND:

    // Step 6:
    //
    // Enable self-test (STCGCR1[3:0]).                     (spnu562 10.8.2)
    // 0xA = "Self-test run enabled"
    // This does not kick off the self-test (yet!).

    stcREG1->STCGCR1 = 0xA;

    // Step 7:
    //
    // Context save - 10.4.2
    // TODO


    // Step 8:
    //
    // Kick off the LBIST CPU self-test by putting the CPU into idle mode.
    // Does not return - instead resets the CPU.

    // Make sure the memory writes actually posted.  TODO might not be
    // necessary.
    asm("DSB");

    while(true)
    {
        asm("WFI");
        // TODO _gotoCPUIdle_ in HL_core_sys.asm puts 4 NOPs here.
        // WHY???
        asm("NOP");
        asm("NOP");
        asm("NOP");
        asm("NOP");
    }

    // That while(true) better not return. :-)
    while(1);
}

Please note:

  • I am running this function immediately after setting up the C++ runtime and initializing my oscillators, PLL, flash, and so on via HL_system.c systemInit() version 4.04.00.
  • You will notice the infinite while(1) loops in my code.  I added those to prevent the dreaded STC boot loop problem.  When execution gets stuck on those lines, I enter a debugger and manually advance past the loop.

So here are my questions:

  • Do you see any problems with my code that would prevent the LBIST from running?
  • Can you present some "known good" code for executing the LBIST on the RM57?
  • Why is the algorithm used by HL_sys_selftest.c different from the algorithm recommended in the chip's Family User's Guide, and which should I use?
    • Why does HALCOGEN "wait for 16 VBUS Clock Cycles" between programming most STC1 registers and writing STCGCR1 to enable the self-test?
    • Why does HALCOGEN put NOPs after the WFI?  I could not find any reason for this in the ARM Architecture Reference Manual.
    • Do I have to put the WFI in a loop?  (HALCoGen doesn't, but certain threads recommend that I should...)

I have got to say that all of this conflicting and sketchy information about the RM57L's safety features is not making me confident that it's the right processor for my application...

  • Peter,

    I am attaching a LBIST working example for your reference. You will need to change the CPU type to RM57 and re-compile it. No other change is needed.

    Thanks and regards,

    Zhaohong

     


    Zhaohong - I think the issue is a little more complex than the example you posted so I removed it. 
    Plus this test case code is redundant w. the safety library code that does the same. 


    The issue that multiple people seem to be having is that the WFI is only a 'hint' instruction.  So the CPU is not guaranteed to execute it and enter standby.    In a simple test case you won't see this but when you try to integrate the code into a real project with interrupts then there will be a problem when only a single WFI is used.

    Interrupts need to be disabled but in VIM because the CPSR.I and CPSR.F do not mask off the wakeup request from the interrupt controller,  i.e.  using the CPSR to disable interrupts isn't enough to make sure you enter standby when WFI is executed.

  • Peter,

    Sorry for the confusion.  I think we have underestimated how tricky LBIST entry can be in a real system (versus simple test code).

    ARM's processor manual is clear about the operation of the WFI instruction and about Standby Entry.
    See ARM DDI 0460D  section 10.2.2 Standby mode.   Honestly it 'feels' a little bit klugy to me that the LBIST entry is tied to Standby (a low power mode) - but it may make a lot of sense as the CPU delays as long as possible entering standby .. probably good for interrupt latency but it complicates things.   Also LBIST is different because the code isn't normally expected to exit by running through the WFI - it will start again after LBIST from the reset vector.   But we have to handle the case where WFI entry doesn't occur.

    Our golden reference for self test entry should really be Hercules SafeTI™ Diagnostic Library  whether or not HalCoGen includes similar functionality.   There is some overlap as not everyone using HalCoGen will use the SafeTI Diagnostic Library.   

    Based on the feedback you noted - and analyzing the way the WFI instruction is used in the diagnostic library - it should be the case that many times LBIST will not actually run and instead the functions will return  (rather than 'never returning' as the comments indicate).    

    I think this is actually a pretty tricky problem and it's going to take some time to get to a consensus on the best way to implement STC entry in a 'function' that is robust enough to be in the SafeTI Library and subject to a wide range of use cases.    

    Next step I think is on our end - to mix the SafeTI Library code with an example that has interrupts in the background, and work through the required prerequesites before calling the SafeTI Library code.    BTW we have some complications in that the device includes some NMI interrupts - it may be that these always trump STC entry but that needs to be analyzed.

  • Peter,

    I didn't directly answer your questions...  Here goes:

    Peter Fidelman said:
    Do you see any problems with my code that would prevent the LBIST from running?

    I would need to know the state of the VIM in the context of this code running.  Clearly if there is an interrupt request coming from VIM then your code would have trouble entering the self-test.

    Peter Fidelman said:
    Can you present some "known good" code for executing the LBIST on the RM57?

    This would be the code from the SafeTI Diagnostic Library - but in regard to LBIST / STC entry we need to beef up this code to be more robust to integration in a wider range of systems.   It currently assumes  the WFI will succeed and if it does not, the STC function just returns. (even though comments say it won't return / won't execute past the WFI).

    Peter Fidelman said:
    Why is the algorithm used by HL_sys_selftest.c different from the algorithm recommended in the chip's Family User's Guide, and which should I use?

       Need to research this one, but I think it's implementing a subset of the code assuming the WFI succeeds.   Might be ok for a standalone test case but breaks down when integrated with a larger application.

    Peter Fidelman said:
    Why does HALCOGEN "wait for 16 VBUS Clock Cycles" between programming most STC1 registers and writing STCGCR1 to enable the self-test?

      This one I also need to research.  I don't get the sense that it's related to WFI at all though - so probably an idiosyncrasy of the STC logic.

    Peter Fidelman said:
    Why does HALCOGEN put NOPs after the WFI?  I could not find any reason for this in the ARM Architecture Reference Manual.

      My sense is that this is the way that someone writing an assembly language test case for device verification would account for potential slippage between the WFI instruction execution and the actual standby mode entry.   When things like these NOPs get introduced - nobody want's to 'remove' them without a good understanding of the implications.  Probably not needed if the DSB is used but I don't see them hurting anything.   I'd suggest deferring this issue till later as the cost is low.

    Peter Fidelman said:
    Do I have to put the WFI in a loop?  (HALCoGen doesn't, but certain threads recommend that I should...)

      I think so but this needs additional analysis too.  

    I could probably make an counter-argument which is that the self-test function in the SafeTI library may be better if it simply exits and reports that the STC did not occur when something like an interrupt blocks WFI from causing standby.

    It may be that it's better for the application to re-enable interrupts, handle any that are pending, and then call the STC function again until it eventually succeeds.  

    So 'yes' there needs to be a loop but I think there needs to be some debate as to at what level this loop is implemented.. and whether the loop is inside the safety library or it needs to be in the code calling the safety library function.

    If you have inputs on this point we'd like to hear them.

  • Hi Anthony,

    Thank you for the detailed and prompt response.

    To address your points:

    INTERFERENCE FROM INTERRUPTS

    Yes, I also noticed that section of the processor manual.

    I run the LBIST relatively early in the startup sequence, before I even initialize the VIM and most other peripherals.  So interference from the VIM ought not to be a problem.  To confirm this, here's a dump of the VIM registers sampled at the time that execution hits asm("WFI") for the first time.  All Interrupt Enable Set registers are 0, for "disabled".  (The 0x03 caught my attention, but it is the RESERVED bits in REQENASET0 and SPNU562 says they always read as "high").

    0xfffffe30:  0x00000003 0x00000000
    0xfffffe38:  0x00000000 0x00000000

    So interference from the VIM does not seem to be my problem.

    That said, I am also potentially interested in running the LBIST after startup, at a time when interrupts are enabled.  So I am still interested in the outcome of that tricky problem you mentioned!

    SAFETI LIBRARY STUFF

    You pointed me at the SafeTI Diagnostic Library as a golden reference.  Thank you for the link.  I did not know this was available.  I will take a look at its implementation and see if I'm missing anything obvious that it includes (or vice versa).

    I still feel a little uneasy that I am essentially "cobbling together" code for a successful LBIST, rather than coding against simple, well-understood and well-documented hardware.  I am also concerned as to why the steps in the Family User's Guide do not work.  Are there any other references on how to get a successful LBIST and troubleshooting the process if it does not work?

    I remain interested in the answers to my other questions, which you are researching.  I'll stay tuned for updates.

    SAFETI FUNCTION BEHAVIOR PROPOSAL

    I agree, it makes sense that the function should return an error value if the LBIST does not execute.  With my limited understanding of the LBIST process, I would naively expect a SafeTI LBIST self-test function to act as follows:

    cpuSelfTest():

    - Set the STC registers appropriately.
    - Read-back the STC registers, if any have rejected the values we wrote, then return a failure (e.g. EACCES).
    - Kick off the LBIST test with asm("WFI")
    - If LBIST entry is successful, the CPU resets - do not return.
    - If LBIST entry is unsuccessful, then check for pending interrupts.
        - If pending interrupts are detected, then return a failure (e.g. EINTR).  The caller must decide whether to handle or clear the interrupt, and whether re-calling cpuSelfTest() is still appropriate.
        - If no pending interrupts are detected, then the LBIST did not start for an unknown reason and retrying may not help.  Return a different failure (e.g. EIO) and let the caller decide how to proceed.

    I deliberately did not include saving and restoring CPU context.  I would expect separate functions to save and restore CPU context, as a building block to include in my reset-LBIST handler.  I don't know if SafeTI already includes something suitable.

    INTERFERENCE FROM DEBUGGER

    The ARM DDI0406C Architecture Reference Manual mentions that trying to execute the WFI instruction when the CPU is in Debug State might result in "unpredictable behavior".  (C5.3, p. C5-2098 - Executing Instructions in Debug State).

    As I mentioned, I included some while(1) loops in my example as a protection against boot loops.  I currently use a debugger to break out of these loops.  My debugger is a hardware debugger that may interact with the debug features of the ARM core.  I wouldn't be surprised if it is the culprit that is keeping my CPU out of standby.

    I will modify my repro to run standalone (without the debugger connected) and see if that fixes my problem.

    In the meantime, I'll stay tuned for your updates on the other questions I raised.


    Best,

    Peter

  • Hi Peter,

    First, thank you for sending your thoughts on SAFETI FUNCTION BEHAVIOR PROPOSAL .. We have a ticket open on this issue - SDOCM00120246 for future reference. And a meeting tomorrow to discuss potential solutions.

    What do you read from 0xFFFFFE20-0xFFFFFE2C?

    The first vim channel 0 - that is always enabled - comes from the ESM. It's possible that some error in the ESM is requesting an interrupt on the VIM channel 0 - and even if you don't see this interrupt because you've just come out of reset and the CPSR.F bit is disabled - I believe it would be enough to prevent the WFI from entering standby.

    Alternatively it could be the debug request issue you mention. My understanding from talking to colleagues here though is that this doesn't mean simply that a debugger is connected to the device but that there is a request to enter the debug (halted) state.

    So single stepping through the WFI probably is never going to 'work' (by work, I mean standby is entered). But running past the WFI to a breakpoint set on a subsequent instruction should give the WFI a chance to enter standby.

    BTW - this may actually be where the NOPs come from - an artifact that is useful for debugging purposes because they would give a place to set a HW breakpoint that is sufficiently past the WFI. But 4 NOPs may not be the correct recipe for the RM57L if it is a hold-over from the TCM devices, because the cache can the CPU with instructions faster (in fewer cycles on average) than the TCM flash can.

    I actually don't have a setup to test these ideas out but will work on that today.

    Best Regards,
    Anthony
  • Anthony,

    Although I noticed the reserved 0x3 in REQENASET0, I did not notice that this meant VIM ch.0 and ch.1 are permanently enabled.  Yowza.  Good catch.

    Here's the contents of my 0xFFFFFE20 - 0xFFFFFE2C (INTREQ0-3) when the debugger is paused on the instruction immediately before the WFI.

    0xfffffe20: 0x00000001 0x00000000
    0xfffffe28: 0x00000000 0x00000000

    INTREQ0[0] is set meaning yes there is an ESM error.  And because this VIM channel cannot be disabled, the VIM must be presenting an interrupt request to the CPU.  This gives one reason that the WFI instruction may not be putting the CPU into sleep mode and allowing the LBIST to start.

    I am trying to solve the ESM issue now*.  I will respond in this thread once I have resolved that problem and know whether it is the sole cause of my LBIST failure.  

    I'm still interested in the results of your STC research and testing, especially the 16 VBUS cycles wait, the NOPs after WFI, and other details of programming the STC.

    Thank you,

    Peter

    --------

    * P.S.
    In case it matters, my ESM error is coming from the known issue where flashing, and connecting a debugger, both cause a CCM-R5 CPU Compare Error.  Relevant threads below.  If I cannot resolve this problem in a reasonable amount of time, I will open a separate issue.

    e2e.ti.com/.../1491776

    e2e.ti.com/.../1414658

  • Hi Peter,

    I believe the ESM Error bit should be read-clear. So if you read ESMSR2 in code before trying to enter LBIST, then this flag should be cleared and the interrupt request from ESM should be deasserted.

    Thanks for letting me know it's the error due to DEVICE #56 errata nERROR assertion on debugger connect. That may explain the recent spike in LBIST problems because it is going to be very common for anyone using the newer RM57L / TMS570LC devices, while it doesn't exist on the older RM48x / TMS570LS series devices.
  • Anthony,

    Success!

    I modified my code to clear ESMSR1, ESMSR2, and the error LED immediately before dropping into the WFI loop.  The "WFI" instruction now triggers a CPU reset (and presumably a LBIST - I don't know for sure yet because I haven't hooked up my "cause of reset" handler yet).

    Attached please find the modified section of code.

    This confirms that Errata #56 (the ESM Error bit) was the cause of my issue.

    I am still interested in whether I am following the right steps in programming the STC, and in the results of your STC research and testing, and any other documentation about the module that you can provide.  But I will mark your post as the "answer" because you answered my biggest question of "why doesn't this execute?"

    Thank you.

        // Step 8:
        //
        // Kick off the LBIST CPU self-test by putting the CPU into idle mode.
        // Does not return - instead resets the CPU.
    
        // Make sure the memory writes actually posted.  TODO might not be
        // necessary.
        asm("DSB");
    
        // Reset the ESM pending interrupts and the error LED
        *( (volatile uint32_t *) 0xfffff518 ) = 0xffffffff;
        *( (volatile uint32_t *) 0xfffff51c ) = 0xffffffff;
        esmREG->EKR = 0x05;
        // Reset the VIM pending interrupts triggered by ESM
        *( (volatile uint32_t *) 0xfffffe20 ) = 0xffffffff;
    
        while(true)
        {
            asm("WFI");
            // TODO _gotoCPUIdle_ in HL_core_sys.asm puts 4 NOPs here.
            // WHY???
            asm("NOP");
            asm("NOP");
            asm("NOP");
            asm("NOP");

  • Forgot to mention that, of course, I do not plan to blithely reset the ESM interrupts in production, safety-critical code. That just treats the symptoms of the problem and has the side-effect of suppressing real errors. I look forward to the resolution of SDOCM00120246.

    However, it's certainly good enough to unblock my LBIST proof-of-concept code for the moment!
  • Good point Peter,

    It may be better to try to move the clearing of ESMSR2 into the debugger.  Perhaps as part of the GEL OnTargetConnect() hook function.   Rather then embedding this functionality into the code itself.  That way you only clear blindly the errata ESM channel when there is actually a debugger connected.

    Need to look into the best way to do this...


    -Anthony

  • Here is a modified GEL file:

    1727.rm57l8xx.zip

    I just added a few lines at the end of the file:

    menuitem "ERRATA_DEVICE_#56 ";

            hotmenu CLEAR_ESMSR2(){
               ClearESMSr2();
            }

    And a function after OnTargetConnect():

    ClearESMSr2() {
        *(int *) 0xfffff51C = 0xffffffff;
    }

    This writes all '1's to ESMSR2 which will clear any ESM2 error flags set.  You may want to go further and only write a 0x04 so that only the DEVICE #56 error is cleared.

    I added this as a separate menu item so you can test and see that it works.   If you remove the code you added to clear ESMSR2,  rebuild/reflash, disconnect CCS, and apply a power on reset,  then the next time you connect you should see that ESMSR2 reads 0x00000004 again due to the errata.

    Then you can test the GEL function to confirm it clears this bit.  To test select the function from the menu:

    If you have the register view open showing ESM1 you'll see ESMSR2 and the interrupt vector registers get cleared:

    When you are comfortable with that you can move the call to "ClearESMSr2()" into the OnTargetConnect() gel function right above it - and you should never see ESMSR2 = 0x0000004 again after connecting.

    If you are not familiar w. GEL files, there is online help:

    The GEL file that is automatically loaded for a target processor is specified as part of the target configuration file, you can see it on the advanced tab once you select the CPU core.

    But the absolute path isn't shown, so an easier way to find this file is to launch the target and use the context menu to open the GEL Files tab.

    Then hovering will show you the absolute file path... 

    and double clicking will open the file for editing...


    You can see the 'OnTargetConnect() function already exists in the default GEL so simply adding the line:

        *(int *) 0xfffff51C = 0xffffffff;  

    Or if you prefer

        *(int *) 0xfffff51C = 0x00000004;

    To this function should do the trick and you'll be able to remove the ESMSR2 clear from your actual target code.

  • Anthony,

    I am not using the TI development toolchain on my board so cannot test the GEL file. However, I can probably script my IDE and debugger to do something similar. Thank you for the information.

    Peter