This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM46L430: Safety Manual states that RAM7A & RAM7B are covered by SL_SelfTest_PBIST, but SL_SelfTest_PBIST cannot be run the TCRAM as running the test corrupts the 'C' runtime system

Part Number: RM46L430

Hello All,

The Safety Manual from the SafeTI Diagnostics Library v2.3.1 says that measure RAM7A & RAM7B are covered by SL_SelfTest_PBIST.

SL_SelfTest_PBIST crashes if you call it on the TCRAM.

There is a note in the Code:

/* Note: If executing on TCM RAM, Stack Contents are corrupted, so be careful with return data */

There is a note in help:

Note: PBIST Algorithm should not be used on SRAM when code/data resides in SRAM.The application. needs to appropriately branch to non volatile location without the use of the data variables during this test.

The implementation is a ‘C’ function, and all I can do is call it.

When called, it stores the return location on the stack.

When running the polling loop in the code use the stack.

When running it destroys the whole RAM, including the stack.

When finished it returns to the location on the stack.

It crashes when it tries to return, but it’s a miracle that it even gets to the end.

It is not possible to that RAM7A & RAM7B measures can be covered by this test!

Regards,

Mark.

  • Hi Mark,

    The PBIST is usually run on device start-up as it is a destructive test and all contents of the tested SRAM module are overwritten during the test. The SRAM are divided into several groups, and you can place your data and instructions in one SRAM group, and test other SRAM groups. Another way, you can copy the data to the RAM of unused peripheral (for example CAN1 RAM) and restore those important data back to SRAM after PBIST.
  • Mark,

    To add to QJ's comments, RAM7A and 7B indicate boot time and periodic execution of PBIST. Running PBIST on SRAM, as you note, is destructive and should be done only at boot time since it is unlikely that you will find another RAM in the device that you could use to back up your data/runtime information during a PBIST execution executed periodically. For some devices, the SRAM is broken into different sections so that PBIST could theoretically be ran on one while another is used for backup, but even this is difficult to manage due to dependencies of the SW on RAM accesses that are beyond user control. To combat this, I would recommend implementing the code as inline code during boot time so that the Stack isn't needed/used.

    In addition, if the application is such that it has relatively short ON times (automotive applications, for example) ECC is sufficient to protect against single point faults and latent faults are unlikely during runtime. If, however, your application is an industrial or other application such that ON times are very long (Days, weeks, even years) without a power cycle, latent faults become an issue since the possibility of a fault on demand would increase. In these cases, there may be a need to include the concept of a maintenance cycle where the unit is power cycled or soft reset at a regular, specified interval in time (once/wk, once/year... depends on the application needs). This interval could also, arguably, be considered a periodic test. This same concept also applies towards LBIST execution.
  • Hi,

    There multiple steps to the testing the TC RAM via the SafetTI Diagnostics Library and they are all written in 'C'.

    You call a 'C' fucntion to start the test (SL_SelfTest_PBIST), which uses the stack and starts the hardware mechanism. An overview of the SL_SelfTest_PBIST functionality is:

    1. Save the return value (Link Register) on the stack.
    2. Initial the PBIST mechanism.
    3. Start the PBIST.
    4. Use the Link Register from the stack to return to the caller.

    The start of SL_SelfTest_PBIST, notice the STMBD saving the R14 (Link Register) to the R13 (Stack Pointer):

    /*SAFETYMCUSW 61 D MR: 8.10,8.11 <APPROVED> Comment_15*/
    /*SAFETYMCUSW 7 C MR: 14.7 <APPROVED> Comment_3*/
    /*SAFETYMCUSW 62 D MR: 16.7 <APPROVED> Comment_17*/
    #if defined(_TMS570LC43x_) || defined(_RM57Lx_)
    boolean SL_SelfTest_PBIST (register SL_SelfTestType testType, register uint64 ramGroup, register uint32 algoInfo)
    #else
    boolean SL_SelfTest_PBIST (register SL_SelfTestType testType, register uint32 ramGroup, register uint32 algoInfo)
    #endif
    {
      0x0000ADA8:   E92D41FC     STMDB     R13!,{R2-R8,R14}
    > 0x0000ADB8:   E1A04000     MOV       R4,R0
    > 0x0000ADBC:   E1A07001     MOV       R7,R1
    > 0x0000ADC0:   E1A06002     MOV       R6,R2
    

    The end of SL_SelfTest_PBIST, notice the LDMIA, restoring the Link Register directly to R15 (Program Counter) from R13 (Stack pointer) to return to the caller:

                /* Start PBIST */
                /* Note: If executing on TCM RAM, Stack Contents are corrupted, so be careful with return data */
                /*SAFETYMCUSW 440 S MR: 11.3 <APPROVED> Comment_18*/
                sl_pbistREG->DLR = (PBIST_DLR_DLR4 | PBIST_DLR_DLR2);
      0x0000AED4:   E3A0C014     MOV       R12,#0x14
      0x0000AED8:   E585C000     STR       R12,[R5,#0x0]
      0x0000AEDC:   E8BD81FC     LDMIA     R13!,{R2-R8,R15}
      ...
                retVal = TRUE;
             break;
            default:
            	/* nothing here - comment to avoid misra-c warning */
                break;
        }
    #if(FUNC_RESULT_LOG_ENABLED == 1)
        SL_Log_Result(FUNC_ID_ST_PBIST, testType, (SL_SelfTest_Result)retVal , 0u);
    #endif
        return(retVal);
    }

    This clearly shows that the stack is being used after the hardware test has been started. Now, I'm sure that you will argue that there is only a small chance that the test will have corrupted the stack before it can return, which is most likley true (although very poor design), but there is another 'C' function that needs to be polled to get the result, SL_SelfTest_Status_PBIST, which also uses the stack.

    Again, the start of SL_SelfTest_Status_PBIST, notice the STMBD saving the R14 (Link Register) to the R13 (Stack pointer) to return to the caller:

    /*SAFETYMCUSW 61 D MR: 8.10,8.11 <APPROVED> Comment_15*/
    /*SAFETYMCUSW 7 C MR: 14.7 <APPROVED> Comment_3*/
    boolean SL_SelfTest_Status_PBIST(SL_PBIST_FailInfo* param1)
    {
      0x0000A594:   E92D4038     STMDB     R13!,{R3-R5,R14}
      0x0000A598:   E1A04000     MOV       R4,R0
    	boolean retVal = FALSE;
    	boolean tmp;
    #ifdef FUNCTION_PARAM_CHECK_ENABLED
        /*LDRA_INSPECTWINDOW 50 */
        /*SAFETYMCUSW 439 S MR:11.3 <APPROVED> Comment_4*/
        /*SAFETYMCUSW 439 S MR:11.3 <APPROVED> Comment_4*/

    The end of SL_SelfTest_Status_PBIST, notice the LDMIA,  restoring the Link Register directly to R15 (Program counter) to return to the caller:

    #if(FUNC_RESULT_LOG_ENABLED == 1)
            /*SAFETYMCUSW 440 S MR: 11.3 <APPROVED> Comment_18*/
            SL_Log_Result(FUNC_ID_ST_PBIST_STATUS, (SL_SelfTestType)0, param1->stResult, 0u);
    #endif
            }
            retVal = TRUE;
        return retVal;
      0x0000A660:   E3A00001     MOV       R0,#0x1
      0x0000A664:   E8BD8038     LDMIA     R13!,{R3-R5,R15}
      ...
    }

    So, one again to clarify my concerns about this mechanism:

    Whilst the test is running, the test status is being polled by a 'C' function that is using the stack which is corrupted by the running hardware test.

    I do no understand you comment 'Another way, you can copy the data to the RAM of unused peripheral'. This is the stack that is being used by the 'C' runtime system, how can it be saved to a peripheral memory, the stack is used for each function call!

    The satement in the Safety Manual that RAM7A & RAM7B are covered by calling SL_SelfTest_PBIST is not true.

    The Demo application from the SafeTI Diagnostics Library does not even call SL_SelfTest_PBIST with any of the ESRAM enumerate literals, which I guess is because it just crashes!

    If I missunderstand the whole 'C' runtime mechanism and the implementation of the SL_SelfTest_PBIST & SL_SelfTest_Status_PBIST, then please provide me with an example of how I can run the test on the TCRAM using the mechanism as described in the Safety Manual.

  • Hi Mark,

    The stack issue is a known issue. There is even a note in the code indicating that care has to be taken regarding the return value from the function. Even with this comment, though, it really isn't a desirable situation to have this as a function call since it will still use the STACK and, as you pointed to about, the call to the logging function adds further issue on top of the problematic implementation. A CQ ticket has been raised for this so it can be addressed in the next release.

    My comments previously were that the functions that are called could be potentially declared as inline which would eliminate the stack use and that there is some potential that other RAMs on the device could be used to back up data. I also noted that this would be difficult to impossible for any runtime related information. For the case of calling this function at boot time, you could certainly back up the stack to one of the RAM locations before instantiating the PBIST and then restore it before returning (after assuring the PBIST execution is not ongoing).

    Note that your second example, is not an issue with the STACK since this function, SL_SelfTest_Status_PBIST, is only accessing the registers and returning the status. PBIST would/should not be active during the call to this function other wise it wouldn't return the correct status update.

    In regard to the Safety Manual, I understand now that you are referring to the SafeTI Diag Library Safety Manual and not necessarily the Device Safety Manual. I believe the intent was that the function satisfy the boot or periodic execution of PBIST. For sure, Boot time execution is possible with some care. Periodic execution would prove to be too difficult unless you plan to initiate a warm reset or some other 'restart' where the boot time would be come periodic. For the cases where there is long term need for RAM use we rely on the ECC and it's secondary diagnostics as well as the RAM Bit Multiplexing implementation to protect memory content. i.e., virtually all errors in RAM will be single bit errors and ECC will correct these on the fly with no impact to the applications.
  • Hi Chuck,

    This implementation is supposed to be using the diagnostics features of the RM46 to to ensure safe operation of the system, but the implementation that is using the diagnostics features is not safe. Your idea of 'care' means that I have to ensure that there is no stack used whilst this test is running.

    The demo application is example is bad, it's just a while:

    while (TRUE != SL_SelfTest_WaitCompletion_PBIST());

    No timeout, just runs forever if there are hardware problems. No deterministric runtime for any kind of scheduling and runtime calculations

    Here is an example based on the Demo application usage, still uses the stack:

            retVal = SL_SelfTest_PBIST(PBIST_EXECUTE, PBIST_RAMGROUP_10_VIM, config_p->algo);
      0x00008904:   9804         LDR       R0,[SP,#0x10]
      0x00008906:   6802         LDR       R2,[R0,#0x0]
      0x00008908:   7100F44F     MOV.W     R1,#0x200
      0x0000890C:   20BD         MOV       R0,#0xBD
      0x0000890E:   EA60F002     BLX       #0xADD2
      0x00008912:   0018F88D     STRB.W    R0,[SP,#+0x18]
    
            if(retVal == SC_STD_SL_TRUE)
      0x00008916:   0018F89D     LDRB.W    R0,[SP,#+0x18]
      0x0000891A:   2801         CMP       R0,#0x1
      0x0000891C:   D103         BNE       #0x8926
            {
                while (TRUE != SL_SelfTest_WaitCompletion_PBIST());
      0x0000891E:   E8DCF001     BLX       #0x9ADA
      0x00008922:   2801         CMP       R0,#0x1
      0x00008924:   D1FB         BNE       #0x891E
            }

    With a timeout before calling the 'SL_SelfTest_WaitCompletion_PBIST' function, there is even more stack usage:

            retVal = SL_SelfTest_PBIST(PBIST_EXECUTE, PBIST_RAMGROUP_10_VIM, config_p->algo);
      0x00008904:   9804         LDR       R0,[SP,#0x10]
      0x00008906:   6802         LDR       R2,[R0,#0x0]
      0x00008908:   7100F44F     MOV.W     R1,#0x200
      0x0000890C:   20BD         MOV       R0,#0xBD
      0x0000890E:   EA4CF002     BLX       #0xADAA
      0x00008912:   0018F88D     STRB.W    R0,[SP,#+0x18]
    
            /* Ensure that the test was started */
    
            if(retVal == SC_STD_SL_TRUE)
      0x00008916:   0018F89D     LDRB.W    R0,[SP,#+0x18]
      0x0000891A:   2801         CMP       R0,#0x1
      0x0000891C:   D13C         BNE       #0x8998
            {
                /* Wait until the test should be finished */
    
                sc_rm46_DelayTicks(config_p->timeoutTicks);
      0x0000891E:   9804         LDR       R0,[SP,#0x10]
      0x00008920:   6840         LDR       R0,[R0,#0x4]
      0x00008922:   F9C8F7FE     BL        #0x6CB6
    
                retVal = SL_SelfTest_WaitCompletion_PBIST();
      0x00008926:   E8C4F001     BLX       #0x9AB2
      0x0000892A:   0018F88D     STRB.W    R0,[SP,#+0x18]
    
                /* Ensure that the test has finished */
    
                if(retVal == SC_STD_SL_TRUE)
      0x0000892E:   0018F89D     LDRB.W    R0,[SP,#+0x18]
      0x00008932:   2801         CMP       R0,#0x1
      0x00008934:   D12A         BNE       #0x898C
                {

    What happens if there is a fault in the controller? The demo application code makes no attempt at defensive programming, something that is considered a good idea in functional safety circles. But, I already know you answer, 'it's only a demo, so it doesn't matter how bad it is'. If only I had the luxury of writting 'bad' code, I could have finished months ago.

    Your 'note in the code' coment made me smile, this is supposed to be a library. Never before have I used a library that required me to read the entire source code before calling any of the functions.

    To inline the code would require me to modify and rebuild the SafeTI Diagnostics Library, which would then invalidate what little certification is actually provided with the library.

    Also, based on the following extract from the compiler user manual, I'm not even sure how usefull you 'inline' suggestion even is:

    'The inline keyword is a suggestion from the programmer to the compiler. Even if your optimization level is high, inlining is still optional for the compiler.'

    I do not agree with you statement 'with care', I think that the phrase 'with luck' is more appropriate.

    I give up. You will clearly not admit how terrible this is, there will be no adequate solution from your side and I have wasted too much of my time on this forum post.

    I have written the TCRAM PBIST test and Hardware Initialise in assembler. It is safe, with timeouts and error checking. It runs the hardware test and initialises the RAM afterwards to prepare the ECC. It puts the sytem in a safe state if the RAM test fails. It does not corrupt the stack, as it is called after the core register intialise and before the RAM is ever used.

    More code that I have to design, write test and certify myself!

  • Hi Mark,

    Clearly you are frustrated and I understand this have spent time also looking through the code. Yes, the code is a problem and needs to be repaired. This is why I stated there is an open ticket on it to be repaired in the next release.

    Without any further explanation or excuse for the software, I believe your choice to implement the boot time PBIST as you have done is the optimum way to implement it and really fits best with the vision we had when we identified this as a safety mechanism.

    My apologies for the confusion and for any grief the poorly thought through PBIST implementation in the safety diagnostic library may have caused you.