This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to save context before Periodic STC ?

The TRM says that must save context like:

1. CPU core registers (all modes R0-R15, PC, CPSR)
2. CP15 System Control Coprocessor registers - MPU control and configuration registers, Auxiliary
Control Register used to Enable ECC, Fault Status Register etc.
3. CP13 Coprocessor Registers - FPU configuration registers, General Purpose Registers
4. Hardware Break Point and watch point registers like BVR, BSR, WVR, WSR etc.

but how to save context specifically ?

  • The best way to do this is to set up an area in memory such as an global/static array then read and write the content of the registers to the memory allocated for the save function. Upon detection of the CPU reset after an interval execution, the content of the memory can be restored to the CPU by writing the content back.
  • I would also like to have clarification why there is a need for the periodic execution? In all of our work with the certifying bodies, we have not seen a requirement for periodic execution of the STC and there may be another method for you to achieve your requirement. Please give us more details so we can provide alternatives to consider.
  • I told our customers that no use of periodic STC is OK shown in FMEDA of RM48, but their safety consultant says all of safety realated hardware resources must be diagnosed in every 24 hours.
    So we can just find a way to execute STC and PBIST periodically.
  • This is a classic paradime from treating the Hercules lockstep architecture in the same way as a traditional CPU architecture. In most cases there is no need for the additional execution of the CPU self test or PBIST because both are monitored continuously by their respective diagnostics.

    In the case of lockstep cores, the second core is constantly executing the same instructions/code as the primary core and the outputs of each core are continuously compared through the core compare module. If there is a malfunction in one of the two cores, this will be caught by the diagnostic (compare). In addition, the code execution is delayed by 2.5cycles to provide a temperal diversity and the cores are flipped and rotated from each other to provide a physical diversity. This concept is very well thought out, has been reviewed by many certification entities, and has been widely accepted.

    In regard to PBIST, a one time execution at startup proves integrity over the entirety of RAM. During application run time, SECDED takes over as the active diagnostic and prevents/notifies of any issues with RAM. Note that PBIST is not designed to be ran periodically in that it is a destructive test. There may be methods that could be thought of to preserve some data by selectively moving data between RAM sections in order to preserve it during the PBIST execution, but this could prove tricky and costly relative to resources/performance considerations.

    With these statements in mind TI recognizes that, certainly, there could be applications where use of the periodic STC. For this reason this module was designed with the capability to be ran on a periodic basis. This is why the test ROM is divided into "intervals" such that one or more intervals can be executed each control loop. The number of intervals ran can then be equated to a time over which the CPU would be tested on a periodic basis. For example, if there are 24 intervals and 2 intervals are executed each control loop where each loop is 1ms, the STC will test the CPU to its full coverage capability each 12ms (2 intervals/ms) of operation.

    In contrast, I believe your intention is to execute the STC fully in each 24hour period of operation. Again, this is an alternative and is certainly capable of being implemented provided you have an acceptable measure of time in place. As an alternative, one might also consider implementing a reset scheme where the MCU is reset each 24 hours given the periodic execution of STC is such that the core will be offline during the self test anyway. This also allows for the periodic execution of PBIST without the worries of it being destructive.

    In the end, all of these decisions are depedent on your application needs/requirements. My goal here is simply to provide options to potentially make the implementation simpler since complexity breeds errors/bugs. Hopefully, these explanations help with your decisions.

  • Thank you for patiently answered my question, I know it's optional about excuting STC and PBIST in runtime, but our customer said that they discussed this with TI engineers which I don't know were China TI or TI global. Their discuss result is doing STC and PBIST perodically so we have to try.

    Back to the original question, I don't familiar about ARM, could you show me some example code to save registers above ? Even if there is no example I feel very grateful about you helping. Thank you again ~
  • Hello again,

    Upon further thought and consideration of the original requirement driving your question; i.e., "safety consultant says all of safety related hardware resources must be diagnosed in every 24 hours." The key point is that the lockstep cores are actually performing the diagnostic continuously so the normal use case of the device meets this requirement. In addition, the ECC logic for RAM also is a continuous diagnostic; so, again, this requirement is met without the execution of the PBIST algorithms each 24 hours.

    Also, I would be surprised if the information from TI came from our "factory" team as we generally do not make recommendations to run STC on a periodic basis since it adds complexity to the application. Certainly, we would not advise running PBIST since it is destructive and, without specific application information. For this reason, we do not have specific examples of performing the context save/backup of CPU data readily available. Given it is currently the holidays here in the US, there are limited resources in the office. I will followup after the holidays with my associates to see if any of them might have an example to provide.

    This doesn't mean that it isn't possible, simply that we don't usually advise customers to do so without a deep understanding of the application and the application requirements. We have worked with many different safety standards including IEC61508, ISO26262, ISO/EN13849, in addition to medical standards, aerospace standards, etc. and none of these have resulted in the strict requirements that you mentioned so I am concerned that the interpretations are too conservative. If you are aware of the specific industry standard you are designing to, this might also help us to analyze the requirements and make life easier for you with our experiences and background.
  • Hello,

    Please have a look at this post to see an example of backing up the CPU registers.

    http://e2e.ti.com/support/microcontrollers/hercules/f/312/t/122665
  • Hi User,

    Both of these tests are pretty intrusive. I think it's almost got to be treated like the CPU goes offline and online again - and then what does this mean to the application. If you were going to save off application state to disk so that you could restart faster or more seamlessly .. what state would you save and then restore.

    You should be able to simply use memcpy() to copy the RAM from one array to another for the PBIST tests. The biggest problem is that between saving and restoring the RAM contents, all the code probably needs to be written carefully so that it doesn't require any stack. Which means you might be inlining the memcpy() code ...

    For the LBIST test of the CPU - you're coming out of this through a reset. So I don't think the right analogy is a simple 'context save' like an RTOS would perform between tasks. Because quite a lot of initialization will need to be repeated after exiting this reset.
    The question probably should be - whether it makes sense or not to try to restart you application after performing this test through an 'alternate' path compared to what you are doing from a power on reset. Would you do this to recover faster or more seamlessly? If so then what state variables would your application code need to save and restore for this alternate initialization path. What I'm trying to convey here is that I think this problem is more something you handle at the level of your 'C' level application code rather than low level assembly trying to save/restore every nook of the processor state and hide the restart from the application; like you would perhaps hide an interrupt or task switch. It's going to be more intrusive than either of these analogies when you take the CPU offline to test it.
  • We have seen cases where a specific assessor requests a periodic BIST execution per time interval. The idea is to detect accumulated latent faults, especially for systems which will run for long periods of time without a startup/shutdown sequence and the associated tests. Lockstep and online ECC tests are great, but they only check the resources which are currently being used. If you have code which is rarely using a memory space or a set of CPU functions (i.e. FPU), then there is some possibility to accumulate faults which would not be detected until you used the resource.

    While the idea is sound, the probability calculations typically do not support that such testing is needed; this should be calculated for each system based on its specific operating conditions. With the detailed SAR document provided by TI you can use the spreadsheet to evaluate the expected failure rate over time and perform a "what if" analysis of failure rates with and without the periodic testing over a given interval.

    Regards,
    Karl