This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMDS570LS31HDK: CPU SelfTest Coverage, SafeTI SRAM dabort integration, Flash syndrome calculation issues.

Part Number: TMDS570LS31HDK
Other Parts Discussed in Thread: HALCOGEN

Hi there,

I had a few technical questions regarding the implementation of the SafeTI Library and other safety features with the LS31HDK. 

  1. What is the test coverage of the CPU self-test / STC test? I can find how to implement it but I'm not actually sure what tests it does on the CPU.
  2. How are the tests meant to be integrated with HALCoGen? In the SL_SelfTest_SRAM SRAM_ECC_ERROR_FORCING_2BIT test, I'm finding that the test is failing because the ESM status register (ESMSR3) is being cleared inside of the _dabort routine generated by default in HALCoGen. I cannot remove the statements that clear SR3 in the _dabort without causing the tests in sys_startup.c to fail. I know error handling is supposed to be outside the scope of the SafeTI library but the example exception handler provided there doesn't do any actual handling, it just tries to mask the abort.
  3. In the SL_SelfTest_FEE FEE_ECC_SYN_REPORT_MODE test, I'm finding that the test fails because the syndrome value calculated by the library (NOT the one calculated by the MCU itself) is incorrect. Following the reference manual, the syndrome value the MCU calculates is 0x45, which is expected for a multibit failure. However, the SafeTI calculated value (which is then used for a comparison between the two) is always calculated as 0. Is this a bug?
  • Hi,

    What is the test coverage of the CPU self-test / STC test?

    The CPU selftest and STC selftest coverage are listed in device datasheet and TRM. If all 25 interval is used, the coverage is 90.21%. The CPU selftest uses LBIST Controller as the test engine to test ARM-CPU core logics.

    SL_SelfTest_SRAM SRAM_ECC_ERROR_FORCING_2BIT test

    This is an option of SRAM selftest API of SDL. If SDL is used, the HALCOGen generated selftest function should not be used. The selftest with fault injection will generate an error intentionally, and the error should be cleared in the error handler manually.

    SL_SelfTest_FEE FEE_ECC_SYN_REPORT_MODE test,

    The calculated syndrome should be correct. I will check the test case.

  • Hi QJ,

    What is the test coverage of the CPU self-test / STC test?

    The CPU selftest and STC selftest coverage are listed in device datasheet and TRM. If all 25 interval is used, the coverage is 90.21%. The CPU selftest uses LBIST Controller as the test engine to test ARM-CPU core logics.

    I should've been clearer - I did review the datasheet and I saw the table covering this kind of coverage, though I am not sure what this actually covers? What tests are being run by the LBIST on the CPU? What parts of the ARM-CPU logics are actually being tested? How does increasing the interval lead to an increase in coverage?

    SL_SelfTest_SRAM SRAM_ECC_ERROR_FORCING_2BIT test

    This is an option of SRAM selftest API of SDL. If SDL is used, the HALCOGen generated selftest function should not be used. The selftest with fault injection will generate an error intentionally, and the error should be cleared in the error handler manually.

    In that case, what is the intended use of the SDL? The example project of the SDL integrates within HALCoGen anyways, and to my knowledge the `checkRAMECC()` test built into HALCoGen cannot be disabled via code generation (using `#if 0` is not acceptable for our certification justification). Even so, as stated in my original question modifying the error handler to clear the error interferes with other startup tests.

    How are end users expected to determine appropriate peripheral selftest coverage? I have already looked through the SDL user's guide and the TI Hercules Safety Manual, neither of which uses consistent terminology nor provides in my opinion enough information to actually determine the effects of these tests. The implementation of the SDL & HALCoGen tests are not identical and mutually exclusive, so how are users supposed to know which one is the "correct" implementation of the test? As many TI members have said on this forum, it is on the designer to understand what the code is doing, but the documentation is not sufficient to support this. For a certification application, we cannot simply say that the end result of the test justifies the means, we have to justify a particular implementation including the SDL code. I'd appreciate clarification beyond just picking one of the two tests.

    SL_SelfTest_FEE FEE_ECC_SYN_REPORT_MODE test,

    The calculated syndrome should be correct. I will check the test case.

    Were there any updates on this?

    Thanks for your response QJ.

  • LBIST test operates on the digital logic of the CPU (including MPU, FPU). It can be used to detect latent faults at a transistor level within the CPU. The LBIST intervals do not target any one element within the CPU, are determined by the operating cycles. This time sliced test feature enables the LBIST to be used effectively as a runtime diagnostic with execution of test time slices per safety critical loop as well as a comprehensive test for CPU logic fault during MCU initialization.

  • HALCoGen is the lowest software layer. It contains software modules with direct access to MCU and is responsible for system initialization. The SDL is a collection of functions for access to Safety Functions and response handlers for various safety mechanisms. The SDL supports 1-to-1 mapping to the safety mechanisms described in the part's safety manual and the FMEDA spreadsheet. 

    The FMEDA lists all the on-chip diagnostics and safety mechanisms and you can also see the effect of enabling / disabling any diagnostic or safety mechanism on the overall diagnostic coverage number. The spreadsheet also allows you to tailor the FMEDA per pin usage, module usage, or safety mechanism usage.

  • It can be used to detect latent faults at a transistor level within the CPU. The LBIST intervals do not target any one element within the CPU, are determined by the operating cycles.

    I see. Is it a correct interpretation then that this test should cover the ability of the CPU to detect faults? IE if this test passes does this guarantee that the CPU will always be able to detect say a data abort? 

    Also, were there any updates on the FEE syndrome calculation? Thank you!

  • Is it a correct interpretation then that this test should cover the ability of the CPU to detect faults?

    Yes, the CPU LBIST + lockstep (cycle by cycle checking) provide very high test coverage.

    Also, were there any updates on the FEE syndrome calculation

    Can you show me where do you get the calculated syndrome?

  • Can you show me where do you get the calculated syndrome?

    I am calling the following code:

    /* Run Flash EEPROM ECC Syndrome Reporting test. */
    slRet = SL_SelfTest_FEE(FEE_ECC_SYN_REPORT_MODE, true, &failInfoFlash);
    if ( (slRet != true) || (failInfoFlash != ST_PASS) ) {
        return false;
    }

    The calculation happens in sl_selftest.c line 1598:

    syndrome = ecc ^ sl_flashWREG->FEMUECC;
    

    The calculated syndrome is 0x0 (Top right corner). 

    This doesn't match the calculated syndrome of 0x45 from the sl_flashWREG->FEMUECC (top right)


    Not really sure if I missed a setup step or similar to cause this discrepancy.

  • Please change the line #1589 to:

            ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF);

  • Hi QJ,

    Please change the line #1589 to:

            ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF);

    I don't think that had an effect. Stepping through the program, I noticed that the FEMUECC register value is 0x45 even before the test is called. This value does not change even when we load the FEMU regs on lines 1584-1587

    /*load FEMU_XX regs in order to generate ecc and use it for next operations*/
    sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
    fdiagCtrl |= F021F_FDIAGCTRL_DMODE_SYN_RPT;
    sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
    sl_flashWREG->FEMUDLSW = FEE_TEST_DATA;

    Because the FEMUECC register doesn't change value, ECC gets stored as 0x45. Thus, when the syndrome is calculated it becomes `0x45 ^ 0x45` which turns to 0.

    I'm not familiar enough with the FEE ECC yet to know how this register gets set. Have I missed a configuration or cleanup step before this function is called? When should this ECC register value change?


    Is it a correct interpretation then that this test should cover the ability of the CPU to detect faults?

    Yes, the CPU LBIST + lockstep (cycle by cycle checking) provide very high test coverage.

    Thanks for this. Can you elaborate what you mean by test coverage? What would be the difference between low coverage or high coverage? I know the LBIST is not software but is there some sort of documentation that covers what the LBIST actually does to test the CPU? 

  • I thought the statement assigns the upper byte of sl_flashWREG->FEMUECC, which is 0x0, to ecc.

    1. sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
        sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
        sl_flashWREG->FEMUDLSW = FEE_TEST_DATA;

        address=0xF020_0000, data=0x0000_0000_0000_0000, the ECC=0x45

    2. sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
        sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
        sl_flashWREG->FEMUDLSW = FEE_TEST_DATA ^ BIT(FEE_ERROR_POS);

        address=0xF020_0000, data=0x0000_0000_0000_0004, the ECC=0xA3

    the syndrome = 0x45 ^ 0xA3 = 0xE6

  • The diagnostic coverage is the ratio of dangerous detected failures to the total number of dangerous failures expressed as a percentage.

    Please refer to the FMEDA, and Detailed Safety Analysis Report.

  • I thought the statement assigns the upper byte of sl_flashWREG->FEMUECC, which is 0x0, to ecc.

    If that was the goal of that statement, I think it should've been ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF00); or otherwise. The FEMUECC is already 0x45 on startup so 0x45 & 0xFF = 0x45. If we're expecting FEMUECC to be 0 on function entry, that's the issue since it's not currently the case:



    Is there a required clean up for the SL FEE functions? I'm currently calling a few in succession:


    As for stepping thru the function:
    1.sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
        sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
        sl_flashWREG->FEMUDLSW = FEE_TEST_DATA;
        ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF);

    address=0x0020_0000, data=0x0000_0000_0000_0000, FEMUECC = ECC=0x45

    2. sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
        sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
        sl_flashWREG->FEMUDLSW = FEE_TEST_DATA ^ BIT(FEE_ERROR_POS);
        syndrome = ecc ^ sl_flashWREG->FEMUECC;

    address=0x0020_0000, data=0x0000_0000_0000_0004, FEMUECC = 0x45, ECC=0x45, Syndrome = 0x45 ^ 0x45 = 0x0

    It seems to me like the FEMUECC isn't updating when we update the FEMUD* registers. Not sure why this would be however.

    The diagnostic coverage is the ratio of dangerous detected failures to the total number of dangerous failures expressed as a percentage.

    If it's the ratio of detected dangerous features wouldn't 90.21% coverage mean 90% of failures possible are occurring?  

    Please refer to the FMEDA, and Detailed Safety Analysis Report

    Where are these located? I cannot see them on the MCU or HDK pages: https://www.ti.com/product/TMS570LS3137?keyMatch=TMS570LS3137#all https://www.ti.com/tool/TMDS570LS31HDK 

  • Where are these located?

    Those docs are under NDA control. Does your company have NDA with TI for the sensitive information?

  • It seems to me like the FEMUECC isn't updating when we update the FEMUD* registers. Not sure why this would be however.

    I run the test couple time days ago, and the FEMUECC is updated every time.

  • Those docs are under NDA control. Does your company have NDA with TI for the sensitive information?

    We do not at this time. How would we acquire this?

  • Hi Robin,

    Please contact with TI local sales to get help. 

  • For the syndrome I was able to find the issue, the HALCoGen generated checkFlashEEPROMECC() was being called during startup and wouldn't properly clean up after itself. After removing it the syndrome calc works.