TMDS570LS31HDK: CPU SelfTest Coverage, SafeTI SRAM dabort integration, Flash syndrome calculation issues.

Robin S

Part Number: TMDS570LS31HDK
Other Parts Discussed in Thread: HALCOGEN

Hi there,

I had a few technical questions regarding the implementation of the SafeTI Library and other safety features with the LS31HDK.

What is the test coverage of the CPU self-test / STC test? I can find how to implement it but I'm not actually sure what tests it does on the CPU.
How are the tests meant to be integrated with HALCoGen? In the SL_SelfTest_SRAM SRAM_ECC_ERROR_FORCING_2BIT test, I'm finding that the test is failing because the ESM status register (ESMSR3) is being cleared inside of the _dabort routine generated by default in HALCoGen. I cannot remove the statements that clear SR3 in the _dabort without causing the tests in sys_startup.c to fail. I know error handling is supposed to be outside the scope of the SafeTI library but the example exception handler provided there doesn't do any actual handling, it just tries to mask the abort.
In the SL_SelfTest_FEE FEE_ECC_SYN_REPORT_MODE test, I'm finding that the test fails because the syndrome value calculated by the library (NOT the one calculated by the MCU itself) is incorrect. Following the reference manual, the syndrome value the MCU calculates is 0x45, which is expected for a multibit failure. However, the SafeTI calculated value (which is then used for a comparison between the two) is always calculated as 0. Is this a bug?

over 1 year ago

0 QJ Wang over 1 year ago

TI__Guru**** 192316 points

Hi,

Robin S said:
What is the test coverage of the CPU self-test / STC test?

The CPU selftest and STC selftest coverage are listed in device datasheet and TRM. If all 25 interval is used, the coverage is 90.21%. The CPU selftest uses LBIST Controller as the test engine to test ARM-CPU core logics.

Robin S said:
SL_SelfTest_SRAM SRAM_ECC_ERROR_FORCING_2BIT test

This is an option of SRAM selftest API of SDL. If SDL is used, the HALCOGen generated selftest function should not be used. The selftest with fault injection will generate an error intentionally, and the error should be cleared in the error handler manually.

Robin S said:
SL_SelfTest_FEE FEE_ECC_SYN_REPORT_MODE test,

The calculated syndrome should be correct. I will check the test case.

0 Robin S over 1 year ago

Prodigy 145 points

Hi QJ,

QJ Wang said:
Robin S said:
What is the test coverage of the CPU self-test / STC test?

The CPU selftest and STC selftest coverage are listed in device datasheet and TRM. If all 25 interval is used, the coverage is 90.21%. The CPU selftest uses LBIST Controller as the test engine to test ARM-CPU core logics.

I should've been clearer - I did review the datasheet and I saw the table covering this kind of coverage, though I am not sure what this actually covers? What tests are being run by the LBIST on the CPU? What parts of the ARM-CPU logics are actually being tested? How does increasing the interval lead to an increase in coverage?

QJ Wang said:
Robin S said:
SL_SelfTest_SRAM SRAM_ECC_ERROR_FORCING_2BIT test

This is an option of SRAM selftest API of SDL. If SDL is used, the HALCOGen generated selftest function should not be used. The selftest with fault injection will generate an error intentionally, and the error should be cleared in the error handler manually.

In that case, what is the intended use of the SDL? The example project of the SDL integrates within HALCoGen anyways, and to my knowledge the `checkRAMECC()` test built into HALCoGen cannot be disabled via code generation (using `#if 0` is not acceptable for our certification justification). Even so, as stated in my original question modifying the error handler to clear the error interferes with other startup tests.

How are end users expected to determine appropriate peripheral selftest coverage? I have already looked through the SDL user's guide and the TI Hercules Safety Manual, neither of which uses consistent terminology nor provides in my opinion enough information to actually determine the effects of these tests. The implementation of the SDL & HALCoGen tests are not identical and mutually exclusive, so how are users supposed to know which one is the "correct" implementation of the test? As many TI members have said on this forum, it is on the designer to understand what the code is doing, but the documentation is not sufficient to support this. For a certification application, we cannot simply say that the end result of the test justifies the means, we have to justify a particular implementation including the SDL code. I'd appreciate clarification beyond just picking one of the two tests.

QJ Wang said:
Robin S said:
SL_SelfTest_FEE FEE_ECC_SYN_REPORT_MODE test,

The calculated syndrome should be correct. I will check the test case.

Were there any updates on this?

Thanks for your response QJ.

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

LBIST test operates on the digital logic of the CPU (including MPU, FPU). It can be used to detect latent faults at a transistor level within the CPU. The LBIST intervals do not target any one element within the CPU, are determined by the operating cycles. This time sliced test feature enables the LBIST to be used effectively as a runtime diagnostic with execution of test time slices per safety critical loop as well as a comprehensive test for CPU logic fault during MCU initialization.

0 QJ Wang over 1 year ago in reply to QJ Wang

TI__Guru**** 192316 points

HALCoGen is the lowest software layer. It contains software modules with direct access to MCU and is responsible for system initialization. The SDL is a collection of functions for access to Safety Functions and response handlers for various safety mechanisms. The SDL supports 1-to-1 mapping to the safety mechanisms described in the part's safety manual and the FMEDA spreadsheet.

The FMEDA lists all the on-chip diagnostics and safety mechanisms and you can also see the effect of enabling / disabling any diagnostic or safety mechanism on the overall diagnostic coverage number. The spreadsheet also allows you to tailor the FMEDA per pin usage, module usage, or safety mechanism usage.

0 Robin S over 1 year ago in reply to QJ Wang

Prodigy 145 points

QJ Wang said:
It can be used to detect latent faults at a transistor level within the CPU. The LBIST intervals do not target any one element within the CPU, are determined by the operating cycles.

I see. Is it a correct interpretation then that this test should cover the ability of the CPU to detect faults? IE if this test passes does this guarantee that the CPU will always be able to detect say a data abort?

Also, were there any updates on the FEE syndrome calculation? Thank you!

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

Robin S said:
Is it a correct interpretation then that this test should cover the ability of the CPU to detect faults?

Yes, the CPU LBIST + lockstep (cycle by cycle checking) provide very high test coverage.

Robin S said:
Also, were there any updates on the FEE syndrome calculation

Can you show me where do you get the calculated syndrome?

0 Robin S over 1 year ago in reply to QJ Wang

Prodigy 145 points

QJ Wang said:
Can you show me where do you get the calculated syndrome?

I am calling the following code:

/* Run Flash EEPROM ECC Syndrome Reporting test. */
slRet = SL_SelfTest_FEE(FEE_ECC_SYN_REPORT_MODE, true, &failInfoFlash);
if ( (slRet != true) || (failInfoFlash != ST_PASS) ) {
    return false;
}

The calculation happens in sl_selftest.c line 1598:

syndrome = ecc ^ sl_flashWREG->FEMUECC;

The calculated syndrome is 0x0 (Top right corner).

This doesn't match the calculated syndrome of 0x45 from the sl_flashWREG->FEMUECC (top right)

Not really sure if I missed a setup step or similar to cause this discrepancy.

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

Please change the line #1589 to:

ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF);

0 Robin S over 1 year ago in reply to QJ Wang

Prodigy 145 points

Hi QJ,

QJ Wang said:
Please change the line #1589 to:

ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF);

I don't think that had an effect. Stepping through the program, I noticed that the FEMUECC register value is 0x45 even before the test is called. This value does not change even when we load the FEMU regs on lines 1584-1587

/*load FEMU_XX regs in order to generate ecc and use it for next operations*/
sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
fdiagCtrl |= F021F_FDIAGCTRL_DMODE_SYN_RPT;
sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
sl_flashWREG->FEMUDLSW = FEE_TEST_DATA;

Because the FEMUECC register doesn't change value, ECC gets stored as 0x45. Thus, when the syndrome is calculated it becomes `0x45 ^ 0x45` which turns to 0.

I'm not familiar enough with the FEE ECC yet to know how this register gets set. Have I missed a configuration or cleanup step before this function is called? When should this ECC register value change?

QJ Wang said:
Robin S said:
Is it a correct interpretation then that this test should cover the ability of the CPU to detect faults?

Yes, the CPU LBIST + lockstep (cycle by cycle checking) provide very high test coverage.

Thanks for this. Can you elaborate what you mean by test coverage? What would be the difference between low coverage or high coverage? I know the LBIST is not software but is there some sort of documentation that covers what the LBIST actually does to test the CPU?

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

I thought the statement assigns the upper byte of sl_flashWREG->FEMUECC, which is 0x0, to ecc.

1. sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
sl_flashWREG->FEMUDLSW = FEE_TEST_DATA;

address=0xF020_0000, data=0x0000_0000_0000_0000, the ECC=0x45

2. sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
sl_flashWREG->FEMUDLSW = FEE_TEST_DATA ^ BIT(FEE_ERROR_POS);

address=0xF020_0000, data=0x0000_0000_0000_0004, the ECC=0xA3

the syndrome = 0x45 ^ 0xA3 = 0xE6

0 QJ Wang over 1 year ago in reply to QJ Wang

TI__Guru**** 192316 points

The diagnostic coverage is the ratio of dangerous detected failures to the total number of dangerous failures expressed as a percentage.

Please refer to the FMEDA, and Detailed Safety Analysis Report.

0 Robin S over 1 year ago in reply to QJ Wang

Prodigy 145 points

QJ Wang said:
I thought the statement assigns the upper byte of sl_flashWREG->FEMUECC, which is 0x0, to ecc.

If that was the goal of that statement, I think it should've been ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF00); or otherwise. The FEMUECC is already 0x45 on startup so 0x45 & 0xFF = 0x45. If we're expecting FEMUECC to be 0 on function entry, that's the issue since it's not currently the case:

Is there a required clean up for the SL FEE functions? I'm currently calling a few in succession:

As for stepping thru the function:
1.sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
sl_flashWREG->FEMUDLSW = FEE_TEST_DATA;
ecc = (uint8)(sl_flashWREG->FEMUECC & 0xFF);

address=0x0020_0000, data=0x0000_0000_0000_0000, FEMUECC = ECC=0x45

2. sl_flashWREG->FEMUADDR = (uint32)ADDR_DATA_MSW;
sl_flashWREG->FEMUDMSW = FEE_TEST_DATA;
sl_flashWREG->FEMUDLSW = FEE_TEST_DATA ^ BIT(FEE_ERROR_POS);
syndrome = ecc ^ sl_flashWREG->FEMUECC;

address=0x0020_0000, data=0x0000_0000_0000_0004, FEMUECC = 0x45, ECC=0x45, Syndrome = 0x45 ^ 0x45 = 0x0

It seems to me like the FEMUECC isn't updating when we update the FEMUD* registers. Not sure why this would be however.

QJ Wang said:
The diagnostic coverage is the ratio of dangerous detected failures to the total number of dangerous failures expressed as a percentage.

If it's the ratio of detected dangerous features wouldn't 90.21% coverage mean 90% of failures possible are occurring?

QJ Wang said:
Please refer to the FMEDA, and Detailed Safety Analysis Report

Where are these located? I cannot see them on the MCU or HDK pages: https://www.ti.com/product/TMS570LS3137?keyMatch=TMS570LS3137#all https://www.ti.com/tool/TMDS570LS31HDK

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

Robin S said:
Where are these located?

Those docs are under NDA control. Does your company have NDA with TI for the sensitive information?

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

Robin S said:
It seems to me like the FEMUECC isn't updating when we update the FEMUD* registers. Not sure why this would be however.

I run the test couple time days ago, and the FEMUECC is updated every time.

0 Robin S over 1 year ago in reply to QJ Wang

Prodigy 145 points

QJ Wang said:
Those docs are under NDA control. Does your company have NDA with TI for the sensitive information?

We do not at this time. How would we acquire this?

0 QJ Wang over 1 year ago in reply to Robin S

TI__Guru**** 192316 points

Hi Robin,

Please contact with TI local sales to get help.

+1 Robin S over 1 year ago in reply to Robin S

Prodigy 145 points

For the syndrome I was able to find the issue, the HALCoGen generated checkFlashEEPROMECC() was being called during startup and wouldn't properly clean up after itself. After removing it the syndrome calc works.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMDS570LS31HDK: CPU SelfTest Coverage, SafeTI SRAM dabort integration, Flash syndrome calculation issues.