RM46L430: Cannot Integration SafeTI Library into my Application

Mark Kingston

Part Number:

I am trying to integrate the SafeTI Diagnostics Library into my application, but am having some serious problems.

The Self Tests as a whole are running, but I get unmasked interrupts whilst the SRAM tests are running.

The demo application has an example for the VIC and data abort interrupts, but I cannot understand how this is supposed to work in a Funcational Safety related application.

The 'exception_handlers.c' file from the demo application contains a manual prototype of a function that actually exists in a private SafeTI Diagnostics Library module:

boolean SL_FLAG_GET(sint32 flag_id); /* avoid compiler warning */

This appears to be a hack to avoid the physically including a SafeTI Diagnostics Library private header file, do I interpret this correctly?

Are the exception handlers part of the application or the SafeTI Diagnostics Library?

The demo application just masks the exceptions, how does the Self Test ensure that the exceptions were actially raised?

There seems be an implementation of a fault injection test callback logging, but it does not do anything with the logged results, it's part of the application and it includes the private API of the SafeTI Diagnostics Library.

The demo application exceptions handlers themselves contain a masking mechanism to stop the propagation of exceptions that are generated during Self Test. It is this mechanism that is using the private interface from the SafeTI Diagnostics Library.

/*
 * DAbort due to access to illegal transaction to L2 Memory?
 * 0x00000008 indicates that it is an external abort caused by read and is AXI decode error
 * 0xFFF80000 is the protected location accessed to create the L2 interconnect error trap AXI decode error
 */
if((TRUE == SL_FLAG_GET(L2INTERCONNECT_UNPRIVELEGED_ACCESS)) &&
    ((0x00000008u == (0x0000008u & _SL_Get_DataFault_Status())) &&
        (0xFFF7A400U == _SL_Get_DataFault_Address())))
{
    maskDAbort = TRUE;
}

The above code snippet is from the demo application. My application is for a Functional Safety related product and I need to be able to justify the above code. The implementation of the whole masking mechanism requires a detailed knowledge of items such as the memory address that was used to generate errors and number of exceptions raised, but I cannot find this anywhere in the SafeTI Diagnostics Library API or the Safety Manual.

Also, the above code snippet uses '_SL_Get_DataFault_Address' , which is described in the SafeTI Diagnostics Library API as 'NOTE: for future enhancements. Do not use these APIs'.

How do I implement the masking mechanism without accessing the private API from the SafeTI Diagnostics Library?

How do I implement the masking mechanism without detailed information concerning the diagnostic tests in the SafeTI Diagnostics Library?

Can someone please help me, I don't know how to continue with my integration as don't understand what it trying to be achieved here, I can't find the associated information the SafeTI Diagnostics Library documentation and it seems to be that there a large back-end part of the Self Test library that needs to be implemented in the application.

over 8 years ago

0 Chuck Davenport over 8 years ago

TI__Guru 59540 points

Hi Mark,

See my comments within the context of your original post below.

Mark Kingston said:
Part Number: RM46L430

I am trying to integrate the SafeTI Diagnostics Library into my application, but am having some serious problems.

The Self Tests as a whole are running, but I get unmasked interrupts whilst the SRAM tests are running.

CWD--> I would need more details about which interrupts are happening and during which SRAM test? i.e., during PBIST execution? Are the interrupts the NMI interrupts associated with the ESM?

The demo application has an example for the VIC and data abort interrupts, but I cannot understand how this is supposed to work in a Funcational Safety related application.

CWD--> I would have to dig deeper into the code to get a complete understanding of what they are attempting to do, but the most likely explanation is they are ignoring the exceptions since the exceptions are created by the test itself. In a full-on FS application, you would have to identify that your were in this diagnostic mode and implement an exception handler that could process the abort in the event of the test condition vs a different path in the active application mode. i.e., process the exception based on the context in which it occurs.

The 'exception_handlers.c' file from the demo application contains a manual prototype of a function that actually exists in a private SafeTI Diagnostics Library module:
boolean SL_FLAG_GET(sint32 flag_id); /* avoid compiler warning */
This appears to be a hack to avoid the physically including a SafeTI Diagnostics Library private header file, do I interpret this correctly?

Are the exception handlers part of the application or the SafeTI Diagnostics Library?

The demo application just masks the exceptions, how does the Self Test ensure that the exceptions were actially raised?

CWD--> As I stated above, since this is not a true implementation and only an example, there may be some shortcomings that you as the integrator need to fill in. For certain the exception handling needs to be context aware as I mentioned above since some tests will result in an exception by design and the application will have to determine if the exception is a result of the diagnostic or the result of a real event. Based on the use in the code you show below, I would hypothesize that the prototype is merely getting the result of the specific test/or fault type, checking against the content of the Fault Status (most likely the ESM channel or perhaps the FSR in the CPU and also checking that the error address corresponds to that used in the test code. If all of these are true, the exception is ignored as a byproduct of the diagnostic and the error notification path is validated. There could then also be an else condition if the abort occurs and this information doesn't match such that the abort was generated by another source. If left as is, the abort would be left in tact and not ignored/masked and could be handled as a real exception by the mainline code.

There seems be an implementation of a fault injection test callback logging, but it does not do anything with the logged results, it's part of the application and it includes the private API of the SafeTI Diagnostics Library.

CWD--> My belief is that the call back logging is intended more for debug than for use in an application. i.e., you can monitor the logging as part of your validation to insure all diags are called under the appropriate conditions as a validation of the code to your overall requirements.

The demo application exceptions handlers themselves contain a masking mechanism to stop the propagation of exceptions that are generated during Self Test. It is this mechanism that is using the private interface from the SafeTI Diagnostics Library.
/*
 * DAbort due to access to illegal transaction to L2 Memory?
 * 0x00000008 indicates that it is an external abort caused by read and is AXI decode error
 * 0xFFF80000 is the protected location accessed to create the L2 interconnect error trap AXI decode error
 */
if((TRUE == SL_FLAG_GET(L2INTERCONNECT_UNPRIVELEGED_ACCESS)) &&
    ((0x00000008u == (0x0000008u & _SL_Get_DataFault_Status())) &&
        (0xFFF7A400U == _SL_Get_DataFault_Address())))
{
    maskDAbort = TRUE;
}
The above code snippet is from the demo application. My application is for a Functional Safety related product and I need to be able to justify the above code. The implementation of the whole masking mechanism requires a detailed knowledge of items such as the memory address that was used to generate errors and number of exceptions raised, but I cannot find this anywhere in the SafeTI Diagnostics Library API or the Safety Manual.

CWD--> the address that is compared to would be the left side of the == statement. i.e., 0xFFF7A400U, but it seems this doesn't match the comment above the code so this is a bit confusing. This could certainly be verified by stepping through the code.

Also, the above code snippet uses '_SL_Get_DataFault_Address' , which is described in the SafeTI Diagnostics Library API as 'NOTE: for future enhancements. Do not use these APIs'.

CWD--> I don't know the source of the note or the comment either. Certainly there is the possibility to read the FSR directly if that is what the function call is doing. This would eliminate one level of abstraction that would impact performance but would increase the footprint if this is used in many places (which it shouldn't be).

How do I implement the masking mechanism without accessing the private API from the SafeTI Diagnostics Library?

CWD--> I don't know the intent of the private API. It could be what is simply created for the demo and not part of the overall Diag Library ( i.e., eveidence in CSP doesn't apply to this). If it isn't officially part of the Diag Library, then the same functionality would need to be implemented by the integrator and tested/validated by the integrator. Note that although the function to retrieve the flag might be part of the private API, the mask definition isn't so this would lend itself to being used by your application to do the notification of test mode or the test status.

How do I implement the masking mechanism without detailed information concerning the diagnostic tests in the SafeTI Diagnostics Library?

CWD-->given the diagnostic code is provided as source, couldn't this information be derived from there? Understand that it would be better to have more detailed info in the documentation, but given we don't, is the source as a resource a possibility?

Can someone please help me, I don't know how to continue with my integration as don't understand what it trying to be achieved here, I can't find the associated information the SafeTI Diagnostics Library documentation and it seems to be that there a large back-end part of the Self Test library that needs to be implemented in the application.

CWD--> Certainly there is still a resource need by the integrator to implement the diagnostic library. Essentially, we can't predict the use of the library or the requirements for a given application so there is some disconnect that has to be defined by the system integrator. Although the diagnostics provided are a big step in defining the diagnostic measures, there are still some decisions left to the developer to be made on how to handle the results of the diagnostics and the impact to the larger system implementation.

0 Mark Kingston over 8 years ago in reply to Chuck Davenport

Intellectual 320 points

Hi Chuck,

CWD--> I would need more details about which interrupts are happening and during which SRAM test? i.e., during PBIST execution? Are the interrupts the NMI interrupts associated with the ESM?

II have responded to a separate thread with more details about this SRAM issues.

CWD--> I would have to dig deeper into the code to get a complete understanding of what they are attempting to do, but the most likely explanation is they are ignoring the exceptions since the exceptions are created by the test itself. In a full-on FS application, you would have to identify that your were in this diagnostic mode and implement an exception handler that could process the abort in the event of the test condition vs a different path in the active application mode. i.e., process the exception based on the context in which it occurs.

There is nowhere near enough information in the Safety Manual or the API help file to implement a ‘full on’ Functional Safety application.

CWD--> As I stated above, since this is not a true implementation and only an example, there may be some shortcomings that you as the integrator need to fill in. For certain the exception handling needs to be context aware as I mentioned above since some tests will result in an exception by design and the application will have to determine if the exception is a result of the diagnostic or the result of a real event. Based on the use in the code you show below, I would hypothesize that the prototype is merely getting the result of the specific test/or fault type, checking against the content of the Fault Status (most likely the ESM channel or perhaps the FSR in the CPU and also checking that the error address corresponds to that used in the test code. If all of these are true, the exception is ignored as a byproduct of the diagnostic and the error notification path is validated. There could then also be an else condition if the abort occurs and this information doesn't match such that the abort was generated by another source. If left as is, the abort would be left in tact and not ignored/masked and could be handled as a real exception by the mainline code.

How can an application determine if it’s a test exception when there is no information in the Safety Manual about the test sequence and process?

CWD--> the address that is compared to would be the left side of the == statement. i.e., 0xFFF7A400U, but it seems this doesn't match the comment above the code so this is a bit confusing. This could certainly be verified by stepping through the code.

The address is a magic number obtained from the looking in the code, where is the specification for the test?

CWD--> I don't know the source of the note or the comment either. Certainly there is the possibility to read the FSR directly if that is what the function call is doing. This would eliminate one level of abstraction that would impact performance but would increase the footprint if this is used in many places (which it shouldn't be).

Even if I read the FSR by myself, where is the specification of the address that I should be expecting?

CWD--> I don't know the intent of the private API. It could be what is simply created for the demo and not part of the overall Diag Library ( i.e., eveidence in CSP doesn't apply to this). If it isn't officially part of the Diag Library, then the same functionality would need to be implemented by the integrator and tested/validated by the integrator. Note that although the function to retrieve the flag might be part of the private API, the mask definition isn't so this would lend itself to being used by your application to do the notification of test mode or the test status.

This is a private API of the SafeTI Diagnostics Library. If I need to implement this whole mechanism by myself, then where is the specification?

CWD-->given the diagnostic code is provided as source, couldn't this information be derived from there? Understand that it would be better to have more detailed info in the documentation, but given we don't, is the source as a resource a possibility?

Are you serious?

This is a Functional Safety project.

I’m not sure that you understand the implications of you statement.

Are you really suggesting that I just ‘reverse engineer’ a large part of my safety code, from the ‘ifdef’ spaghetti code, where most of the comments concern why the code doesn’t conform to MISRA and is so heavily optimised that ‘C’ debugging is mostly impossible and it has to be debugged in assembler mode?

Which argument should I use to the Certification Authority?

I’m quite sure that ‘The TI Hercules Safety Microcontrollers Forum reply said to do it this way’ would not be accepted.

CWD--> Certainly there is still a resource need by the integrator to implement the diagnostic library. Essentially, we can't predict the use of the library or the requirements for a given application so there is some disconnect that has to be defined by the system integrator. Although the diagnostics provided are a big step in defining the diagnostic measures, there are still some decisions left to the developer to be made on how to handle the results of the diagnostics and the impact to the larger system implementation.

I completely understand you argument, but where is the documentation. There is no mention of any of this in the Safety Manual. Based on the information in the Safety Manual and the expectations of my manager all I have to do it call the functions in the API and it will cover the measures as indicated in the Safety Manual. This is clearly not the case.

I am being put under an ever increasing amount of pressure to finish up this SafeTI Diagnostics Library integration. The Project Management was sold this solution as the SafeTI Diagnostics Library is pre-certified, there is a demo application and an evaluation board that shows it all working.

The demo application is a joke, it’s more of a hack to get a few tests to pass. Most of the RM46L430 tests are not even run as the application only runs the tests that work on all of the targeted processors. It seems to be a HalCoGEN generated project that has been ifdef'd into submission. At one point, it even hacks the stack to shift the LR return value to skip over injected fault instructions as the SafeTI diagnostics Library has no mechanism to undo it's own fault injection. How can this be considered and good ‘example’?

Based on the SafeTI Diagnostics Library API and the 'demonstrated' sequence of calls, it's not even possible to follw the guidlines in spna106d (Initialization of Hercules ARN Cortex R4F). This is the only real specification of how to get the system up and running.

There are 37 '#if 0' blocks of removed functionality in the demo application sys_startup.c file, there are even a couple in the Flash SPI and the SafeTI Diagnostic source code.

There are 1088 MISRA violations in the SafeTI Diagnostics Library and TPS Driver, of which my favorite justification comment is:

/*Comment_1:
 * "Reason -  Needed"*/

Back to the stack hacking, here is the 'SafeTI Diagnostics Library stack hacking demonstration':

#if FUNCTION_PROFILING_ENABLED
{
if(SL_Profile_Struct[SL_Active_Profile_Testtype-TESTTYPE_MIN].aborthandler_entrytick == 0 )
{
    SL_Profile_Struct[SL_Active_Profile_Testtype - TESTTYPE_MIN].aborthandler_entrytick = entrytick;
    SL_Profile_Struct[SL_Active_Profile_Testtype-TESTTYPE_MIN].aborthandler_exittick = _pmuGetCycleCount_();
}
/* Update the return address, on stack, so that we return to the next instruction */
#if defined(_TMS570LS31x_) || defined(_TMS570LS12x_) || defined(_RM48x_) || defined(_RM46x_) || defined(_TMS570LC43x_) || defined(_RM57Lx_)
#ifdef __TI_COMPILER_VERSION__
	__asm(" LDR R0, [SP, #108]");
	__asm(" ADD R0, R0, #8");
	__asm(" STR R0, [SP, #108]");
#endif
#endif
#if defined(_RM42x_) || defined(_TMS570LS04x_)
#ifdef __TI_COMPILER_VERSION__
    	__asm(" add SP, SP, #4 ");
	__asm(" ldmfd	r13!, {r0 - r6, r12, lr} ");
	__asm(" subs	pc, lr, #4 " );
#endif
#endif
}
#else
{
/* Update the return address, on stack, so that we return to the next instruction */

#if defined(_TMS570LS31x_) || defined(_TMS570LS12x_) || defined(_RM48x_) || defined(_RM46x_) || defined(_TMS570LC43x_) || defined(_RM57Lx_)
#ifdef __TI_COMPILER_VERSION__

#if	OPTIMISATION_ENABLED

#if defined(_TMS570LC43x_) || defined(_RM57Lx_)
    	__asm(" LDR R0, [SP, #92]");
    	__asm(" ADD R0, R0, #8");
    	__asm(" STR R0, [SP, #92]");
#endif
#if defined(_TMS570LS31x_) || defined(_TMS570LS12x_) || defined(_RM48x_) || defined(_RM46x_)
    	__asm(" LDR R0, [SP, #100]");
    	__asm(" ADD R0, R0, #8");
    	__asm(" STR R0, [SP, #100]");
#endif

#else
    	__asm(" LDR R0, [SP, #100]");
    	__asm(" ADD R0, R0, #8");
    	__asm(" STR R0, [SP, #100]");
#endif

#endif
#endif

#if defined(_RM42x_) || defined(_TMS570LS04x_)

#ifdef __TI_COMPILER_VERSION__

    	__asm(" ldmfd	r13!, {r0 - r5, r12, lr} ");
    	__asm(" subs	pc, lr, #4 " );

#endif

#endif
}

0 Chuck Davenport over 8 years ago in reply to Mark Kingston

TI__Guru 59540 points

Hi Mark,

First, I understand your frustrations with the code. However, to be clear and to correct one of your very important misconceptions of the code, it is not pre-certified. The code is provided for free, as is, under a BSD license. It was developed in a certified process (does not necessarily translate to certified SW) and there is a compliance support package to provide evidence of compliance to the functional safety standards. This evidence includes the static and dynamic test results and can be obtained together with the limited LDRA license to regression test the code in you application and generate evidence specific to your implementation. These are items necessary for system level assessment.

Also, it is important to highlight that the device level certification is not dependent on this specific SW set/library. The device certificate is strictly a HW certification. We have had and continue to have customers that implement the safety mechanisms on their own without the use of the SafeTI Diag Library. Also, please note that although we have included all safety measures in our certification assumptions, they are not all required by every system implementation, and, through application level justification, some/many can be eliminated through system level fault analysis. This is the purpose of including the FMEDA tool with our Detailed Safety Analysis Report in order to customize the safety metrics to the application.

At a high level, the intent of the SafeTI Diag Library was to provide some of the SW diagnostics to make it easier for our customers to implement many of the diagnostics specified in the safety manual. Clearly you do not believe that this software achieved that goal and for that I apologize. As I said before, I do believe there is a disconnect between the implementation and the documented mechanisms in the safety manual.

In regard to your comment " How can an application determine if it’s a test exception when there is no information in the Safety Manual about the test sequence and process?" This is beyond the scope of the Safety Manual. The Safety Manual is a high level document outlining the overall safety methodology (Safe Island/1oo1D) and architecture along with identification and high level description of the primary and secondary diagnostics. Implementation of the diagnostics is left to the integrator (as described in the Safety Analysis Reports) which may, in the long run, be more effective and easier to support than trying to use or support the SafeTI Diag library. The SafeTI Diagnostic Lib is an attempt to help the integrator at this task (even if poorly executed as you have previously noted).

In regard to documentation of each of the tests and the expected outcomes, etc. I will have to admit my lack of knowledge on this part of the documentation as I have not reviewed it in full detail. Certainly the content, magic numbers and all, within the code could probably have been documented better, at least with comments. I believe in many cases the values are either arbitrary or are tied to the specific tested feature and registers identified in the TRM.

Arm-based microcontrollers

Arm-based microcontrollers forum

RM46L430: Cannot Integration SafeTI Library into my Application