SafeTI: extremely poor documentation

Jarkko Silvasti

Other Parts Discussed in Thread: HALCOGEN

Hello,

First I must say that this SafeTI documentation is extremely poor (and example projects are almost as poor as documentation because those are not using SafeTI functionality in boot and not doing all tests and IAR & other tests has slight differences... Windows style "help" is like a joke, it looks to be generated from source code without giving any helpful information.

you also mention every where that SafeTI example is just an example - I would say that it is just a _bad_ example :) and it is implementers responsibility to select correct tests and perform them correctly...

- This job would be A LOT of easier if that example would be made properly -> Do all possible test in boot etc. so then people could basiclally rip the code to leave the necessary tests now you have to do practically everything from scratch. Proper example would also save combined thousands of hours of work from the implementers when everyone must individually figure things out.

I list here just a couple of issues what I have encountered
========================
Issue 1) which test type maps to which unique identifier in safety manual?

Lets see for example the flash/ECC, SafeTI offers this kind of test types

- FEE_ECC_DATA_CORR_MODE

- FEE_ECC_TEST_MODE_1BIT

- FEE_ECC_TEST_MODE_2BIT

- FEE_ECC_SYN_REPORT_MODE

- FEE_ECC_MALFUNCTION_MODE1

- FEE_ECC_MALFUNCTION_MODE2

- FLASH_ECC_TEST_MODE_1BIT

- FLASH_ECC_TEST_MODE_2BIT
- FLASH_ECC_ADDR_TAG_REG_MODE // in example this is inside #if 0 in the boot sequence

- FEE_ECC_ADDR_TAG_REG_MODE // in example no one calls this in the boot sequence

And list following unique identifiers for FLASH function

FEE1 FEE Data ECC SL_SelfTest_Flash // notice, flash function tests FEE

FLA1 Flash Data ECC SL_SelfTest_Flash

FLA10 Flash wrapper diag mode 5 test SL_SelfTest_Flash

FLA11 Flash wrapper diag mode 7 SL_SelfTest_Flash

FLA12 Software test of parity logic SL_SelfTest_Flash

- So we have 5 unique identifiers and 3 test types??? That would mean that some test types makes more than 1 unique identifier but which ones?

And these for for FEE-function:

FEE8 Flash wrapper diag mode 1 test SL_SelfTest_FEE

FEE9 Flash wrapper diag mode 2 test SL_SelfTest_FEE

FEE10 Flash wrapper diag mode 3 test SL_SelfTest_FEE

FEE11 Flash wrapper diag mode 4 test SL_SelfTest_FEE
- 4 unique identifiers and 7 test types (obviously more than 1 test type is required to full fill one unique identifier but which ones)

===================================
Issue 2) Is it allowed to modify the source code of SafeTI that is not said anywhere, normally it isn't and typically safety software provides CRC or some other mean to verify that source code is intact.

- This is not said anywhere...

SafeTI code as is cannot be used with (OS and) legacy interrupts because sl_esm. has __irq and __fiq keywords in interrupt handling functions (with any documentation of course), this will result to stack/PC corruption while trying to return to upper level IRQ handler after calling these functions...
===================================

Issue 3) For fault insertion test types the return values are meaningless ie. it returns the value which was given to it. Lets take for example PSCON_SELF_TEST_ERROR_FORCING_FAULT_INJECT. Nowhere is mentioned that return value should not be checked in this case.

tStatus.stResult = ST_FAIL;

bRet = SL_SelfTest_PSCON( PSCON_SELF_TEST_ERROR_FORCING_FAULT_INJECT, TRUE, &tStatus );

DBG_PRINT( "Ret: %u, Stat: %s\r\n", bRet, tStatus.stResult == ST_PASS ? "PASS" : "FAIL" );
-->

Ret: 1, Stat: FAIL<CR><LF>

and

tStatus.stResult = ST_PASS;

bRet = SL_SelfTest_PSCON( PSCON_SELF_TEST_ERROR_FORCING_FAULT_INJECT, TRUE, &tStatus );

DBG_PRINT( "Ret: %u, Stat: %s\r\n", bRet, tStatus.stResult == ST_PASS ? "PASS" : "FAIL" );
-->

Ret: 1, Stat: PASS<CR><LF>

======================
Issue 4) for window help and source code refers to obsolete function SL_SelfTest_PBIST_ExecStatus()

======================
Issue 5) SafeTI safety manual chapter 6.1 says that init produce described in _may_ be followed... Great, examples throws something there but nothing is mapped to anything... No where is mentioned what for example this phase contains when doing it by using SafeTI functions

29. Run the self-test on the CPU's SECDED logic for accesses to main data RAM (B0TCM and B1TCM) (Section 2.24 ).

Is it all of these or part of these or is it implementers decision????

- SRAM_PAR_ADDR_CTRL_SELF_TEST

- SRAM_ECC_ERROR_FORCING_1BIT

- SRAM_ECC_ERROR_PROFILING

- SRAM_ECC_ERROR_FORCING_2BIT

- SRAM_RADECODE_DIAGNOSTICS // this is run in SafeTI example after phase ~42 ESMInit()

- SRAM_LIVECLOCK_DIAGNOSTICS // not run at all in the example boot sequence

======================
Issue 6) certain boot time test in SafeTI example cannot be run at least after debugger reset if FIQ interrupt has been once enabled (don't want to reveal which ones :)) - it is always nice too cpu crashing to undefined instruction... After debugger reset the FIQ in CPU register is 1 but

=====================

Issue 7) Stack definition in sl_config.h are stupidly made + those are not even used in sl_asm_api_IAR.asm!!!! which defines those again and does not even include sl_config.h :). So if do not want use default settings you must write your own ASM-function to init stack because only other option is to modify SafeTI source.

Also linker script for IAR to setup the stack is horrible, with a "minor tuning" you can defined stacks so that the length is given only one place (in linker file) and that the linker also monitors that space reserved for stacks are not exceeded --- current SafeTI solution is far from this --- Had to write own Stack initializer function which uses linker provided symbols like SVC_STACK$$Limit.
=====================

Issue 8) Init documents does not mention anything about . _errata_CORTEXR4_66_(), ja _errata_CORTEXR4_57_() and errata_PBIST_4(). What is actually correct place to execute this, HalcoGen code has had so many bugs that there is no quarantee that HalCoGens placement for these are correct...

And how about errata_PBIST_4(), you clearly recommended every where in the forums that selftest.c routines SHALL NOT BE USED instead SafeTI-routines shall be used -- there is not routine for this so selftest.c must be used against recommendations...

======================

Issue 9) as some of the selftest.c functionality is still used somehting needs to be implemented into selftestFailNotification() as this function may be called from somehere... SafeTI documentation does not instruct user to put some sensible code here

======================

Issue 10) SafeTI documentation does not provide any check list, like that issue mentioned in 9 has been handled...

======================

Issue 11) FIQ and OS (operating system) and reality that FIQ interrupt cannot be masked except changing CPU to FIQ mode manually, nothing is mentioned about this in the documentation, this must need some special handling if trying to use OS services from FIQ handler... Haven't really digged this yet but at least uc/OS tries to disable IRQ&FIQ bits in CPSR (cannot disable FIQ once it is enabled) and after that the operation system trusts that context switch cannot occur when doing it's own magic -- by a quick look it looks like OS routines cannot be called from FIQ so if for example ESM high comes then some other means are needed to handle it... Maybe some other OS or port of uc/OS has tackled this issue some how.

=====================

There would "millions" of these issued but I stop here, this just shows that SafeTI maturity is far from "just integrating " the safety to the project. I have spent days (read actually weeks) just to make linker-scripts, own asm stack initalizers, proper boot-sequence (and this isn't even finished yet) by using SafeTI functionality "only" with capability to store test results over RAM pbist&init... And I am 100% sure that everyone must stumble to same things and first figure out why after certain test in boot comes the undefined instruction for debugger reset is given, why stacks will overflow even everything is set correctly in sl_config.h and in the linker and so on (these setting won't help cause asm overrides them :) (not funny))...

I am also 100% sure that current SafeTI example could be turned to nearly perfect by using a couple of days by a person who is expert in coding and hercules safety functionality - just wondering why TI haven't done this, current situation is far from marketing material... Current example just compiles that's it.. And when this example is fixed then a bit more is needed to fix the documentation, currently unique identifier list & mappings to SafeTI functions (without test types of course...) is only valuable information what it provides...

over 8 years ago

0 Chuck Davenport over 8 years ago

TI__Guru 59540 points

Hello Jarkko,

I'm sorry that you are having so much difficulty in your integration of the SafeTI elements into your application. For sure there is some learning curve in using the products. I am forwarding your concerns to our management team as I think some observations certainly point to areas where we have significant room for improvement. I will let you know when I hear back from them regarding any specific course of action related to the issues you have outlined. Rather than going through your list item by item, I would prefer we focus on individual issues and try to address these one at a time. To start, ff it might be helpful to setup a call with our SW team to address some of your concerns over the integration of the SafeTI Diagnostic Library, please let me know and I can set that up.

0 Jarkko Silvasti over 8 years ago in reply to Chuck Davenport

Expert 1395 points

Hello,

Call can be arranged if you feel that it will help, I would still prefer email over phone cause I am non-native english speaker and when things go to really technical... --- well you know...

I tried to list most important ones to the list (and throw some minor ones in to the mix). Personally I think that I have tackled all the possible issues in init routine and currently have "perfect and robust" (well can code ever be perfect&robust :)) init routine which follows your suggested CPU init practically phase by phase and only PBIST4 errata is taken from selftest.c, couple of "tests" made by itself (pratically mimicing selftest.c functionality) and then using SafeTI for the rest. PSCON tests are not mentioned in the init routine I decided to do those before phase 38...).

But of course I still cannot know is this correct or not, at least pratically "all the tests" offered by SafeTI interface are executed in boot-up :) and code works with & without debugger and does STC & CCM tests if debugger is not connected.

I agree that there is of course always a learning curve for anything you ever do but currently it is way too steep for SafeTI, I literally had to learn ASM, linker scripts, all the details from VIM & ESM and of course I have had to read SafeTI sources 10 times and TRM & datasheets continuosly + investigate also arm documentation of the CPU. I am not saying that it was bad thing for me to be forced to dig out all the details (at least I feel that I have now quite good overall undestanding of SafeTI and related functionality). But people should not be forced to do this and as majority of the "fatal problems" comes from "very simple things" so it would be really easy to tackle those in example code (and in the documentation).

Of course there will always be a questions like "should this be executed before that" or "why this test looks to be not working" but I would rather answer to those kind of questions than "why my code jumps to undefined instruction".

================
I just noticed that issue 6 misses some text (hard to write text to rather small window without preview option).

It should be something like this:
Issue 6) certain boot time test in SafeTI example cannot be run at least after debugger reset if FIQ interrupt has been once enabled (don't want to reveal which ones :)) - it is always nice too cpu crashing to undefined instruction... After debugger reset the FIQ (F bit) in CPU program status register is 1 but when executing SL_Init_R4Registers() it it automatically re-enabled by this MRS R1, CPSR -command , the F bit goes to 0 meaning that FIQ is enabled ( If I understood correctly from ARM documentation, this is read command, how it can modify CPSR???)
---------------

Here is some analyzation for issue 6 (which I think is one of the most critical issues (together with stack initialization) because code behaves quite unexpectably):
Problematic tests are of course esm group 2 tests because those cannot be masked out which SafeTI does for group1 tests. SafeTI does acknowledge away the ESM-source for gorup2 after the test but doesn't ackowledge those from VIM in case the FIQ is disabled (it is still disabled in regular boot at that phase) by using syntax:
vimREG->INTREQ0 = 1U;

So you have to do that acknowledge in application after the test (and I think that SafeTI documentation should mention this).

Without that manual ackowledge in regular "first time boot" there is VIM request waiting for FIQ enabling in CPSR and FIQVECREQ doesn't contain proper vector (cause VIM RAM init not initialized yet, the vector fecthed to FIQVECREQ is 0x3c3c3c3c or something like that) in case group2 related test was made before vimInit() (and those are done if init-sequence is followed). For some reason the vimrambase writing replaces the content in IRQVECREQ and FIQVECREQ to be the vimram base content (which is by default phantom interrupt) in this case if vimInit() is called before then FIQ enable the VIM replaces previous fecthed 0x3c3c3c3c vector in FIQVECREQ with phantoninterrupt --- this is some hidden functionality which is not mentioned in TRM that vimrambase writing does some magic)... And now if FIQ is enabled without that manual VIM ch1 acknowledging the code jumps to phanton interrupt and every time in regular power up boot you get phantom call.

And if you perform such tests after debugger reset (the FIQ is enabled if it was enabled before reset) when it will immediately jump to FIQ and for example init routine instructs to PBIST for VIM in phase 28 and then test SRAM in phase 29 which asserts that FIQ which will then jump to 0x3c3c3c3c as PBIST corrupted vector table and FIQ was enabled so it jump to corrupted address, same applies also to phase 30 flash testing.

So for example I tackled this issue (after I had spend significant time to figure out the root cause of the problem) by acknowledging VIM CH1 manually always after certain tests (which are group2 related test) this prevents phantom calls and also made guard that test is not driven at all if FIQ is enabled -> prevents prefecth aborts, this way you can do what ever you want with debugger (reset cpu when ever you want, well cpu init phase 7 traps if you reset with debugger when group3 tests are not yet acked from ESM - thats only problem left what I encountered) and you will never see prefecth aborts or phantom interrupt calls in boot sequence...
- If SafeTI documentation would have said that pending VIM request is not handled inside SafeTI for these certain tests and you have to do it manually in certain situation (FIQ not enabled) (and example code would have do it) I would have saved tons of time, and I am pretty sure that everybody sees that phantom interrupt if code even slightly follows either example code or suggested init routine... And if you throw a debugger reset to the mix then it explodes :).

Or course one could argue is it feasible to run whole init sequnce again after debugger reset, but that's another story (I am still pretty sure that you will need debugger reset quite much when implementing the init routines --- so some "warning" in the documentation would be more than welcome). Stacks are not stored so possible CPU-reset pattern cannot be followed, maybe some flag could be set to skip something from init pattern or do like I did that I skip certain tests based on CPU FIQ status. But anyway I think that something should have been said in the SafeTI manual about it (or better if implemented into example code)...

0 Chuck Davenport over 8 years ago in reply to Jarkko Silvasti

TI__Guru 59540 points

Hello Jarkko,

I really appreciate your insights and comments about our tools and documentation. I fully acknowledge that there are still some gaps to be filled in the collection of documents, SW, and tools that can make it much easier for all of our customers. For sure, some of these gaps exist in a sort of intentional way since we try to remain flexible in how the examples and libraries are used and others because of other constraints such as time, schedule, priorities, and general resource constraints. These are all certainly still in development and, in a practical since, will be for some time to come as we learn more about how they are used allowing better optimization of the examples and reference code.

For now, I will send you my direct email information so that we can have more detailed discussions if you so choose. When I do, if you can respond in kind, I will give some additional details to each of your points and, perhaps, shed some light on where we are currently and where we want to be in the not so distant future.

Arm-based microcontrollers

Arm-based microcontrollers forum

SafeTI: extremely poor documentation