smxRTOS with separation between boot code and app code along with ECC errors and ESM

Juan Martinez

Other Parts Discussed in Thread: TMS570LS20216

Hello,

I am using smx RTOS version 4.0.0 along with IAR Embedded Workbench 6.0 (and a C++ compiler) to work in a TMS570LS20216 device, and I am trying to create boot code that will reside in Bank 0 of the device that can act as a downloader of a binary file and program it in Bank 1 (along with ECC values that I calculate and program to flash thanks to the F035 API). Once I finish programming the new app code, I want to branch to it (so that it runs) and once there I want to enable ECC (i.e. EDACEN = 0xA according the Technical Reference Manual).

What I'm not sure is what happens next if correctable errors or uncorrectable errors occur.

1) Does hardware (CPU and Flash wrapper) fix correctable errors automatically and continue executing the code without stopping?

2) Do I need to enable the ECC checking inside CPU (as shown in p.274 of reference manual) so that correctable errors are automatically fixed? Or would the flash wrapper enable (EDACEN = 0xA) be enough for hardware to automatically fix correctable errors?

With uncorrectable errors I've seen in document spns141f that there is a "Flash (ATCM) - uncorrectable error" which causes a non-maskable interrupt because it is connected to Group2 channel 4, but then it also shows a "Flash (ATCM) - ECC uncorrectable error" which causes an Abort because it is connected to Group3 channel 7. The first case is much easier for me to handle because I can update the VIM ISR vector table in the app code and then the correct service routine is executed. The second case is really bad since the Abort will cause the app code to jump to Bank 0 for the abort even though Bank 0 is not part of the app code's memory space, and the theory is that the boot code memory space (Bank 0) has no knowledge of how to handle uncorrectable errors that occurred in the app code. So...

3) What happens next? Is the uncorrectable error that is asserted the one that causes the interrupt or the abort?

4) If the Abort exception is the one that will run, which Abort exception will occur? Data or Prefetch Abort?

5) What is the difference between those two types of uncorrectable errors that force them to cause two different types of exceptions?

As you can see this is aimed at being able to implement ISRs if either correctable or uncorrectable errors once ECC is enabled, so I guess that the general question of this very long post is

6) How do I implement service routines for ECC checking if errors occur in the device?

I would appreciate any help you can give me,

Juan Martinez

over 14 years ago

0 Juan Martinez over 14 years ago

Prodigy 170 points

Hello? Anybody there?

-Juan

0 KGreb over 14 years ago

TI__Mastermind 23000 points

Hi Juan,

You may wish to reference the Hercules safety manual for the TMS570LS31x/21x products to aid in understanding the ECC logic. There are some differences in the devices but with respect to the ECC the products are the same.

1. The ECC logic in the CPU will perform an inline correction of detected errors in flash using the Cortex R4F's hard error cache. As the flash memory is non-volatile, it is not possible for the CPU to write back a correction to the flash and refetch, as it can with the SRAM.

2. ECC checking must be enabled both in the flash wrapper and in the CPU. Depending on how you configure the CPU, you can select whether you wish to detect only or to perform detection and correction.

The group 2 Flash (ATCM) uncorrectable error is generated by a number of possible causes in the flash wrapper. These are not well documented in the current TRM flash wrapper section so I have requested an enhancement. An example of such an error is bad address parity received from the CPU on a flash access. For such a case, the failure is unrecoverable so we generate an NMI and an error pin response.

The group 3 Flash ATCM uncorrectable error entry to ESM is an ECC error managed directly by the CPU as a program or data abort depending on the type of access performed. As the CPU should directly execute the abort handler, it is not necessary to also generate an interrupt response; only the error pin response is needed.

When you are performing your write and swap operation, have you considered using the memory swap feature to swap the base addresses of the flash and the SRAM? By using this feature you could provide secondary handlers for the aborts in the SRAM while you are performing your flash management. This is done in our flash programmation tools and is a good best practice in any case, as you should always be able to handle CPU aborts during operation. This is independent of ECC operation - how would you manage if other aborts were generated during this process?

3. Group 2 ESM errors will generate an NMI and an external error pin response. Group 3 ESM errors will only generate the error pin response as they should already have a CPU abort issued.

4. Data abort will occur if the fault is on a data access. Prefetch abort will occur if the fault is on an instruction access.

5. The difference is that one is related to ECC embedded in the CPU, while the other is related to additional diagnostics which are located outside the CPU.

6. ECC errors should be managed by the CPU abort handlers you implement, as the ECC diagnostic is located in the core and generates CPU aborts upon faults.

Regards,

Karl

0 Juan Martinez over 14 years ago in reply to KGreb

Prodigy 170 points

Hi Karl,

What is the memory swap feature that you mention? How do I use it? I have not come across that feature in my use of the TMS570LS20216 or in the reference manual.

We have not considered a way to handle aborts, since an abort for us mean a catastrophic failure, and so the aborts execute an infinite loop.

Thank you for your answer,

Juan

0 KGreb over 14 years ago in reply to Juan Martinez

TI__Mastermind 23000 points

Hi Juan,

The feature is detailed in section 1.3.1, "Boot Memory Selection", P. 18 of the TMS570LS20x/10x TRM. Basically if you write a key to the BMMCR1 register and then initiate a CPU reset via the MMUGCR register, the CPU will reboot with the SRAM mapped to 0x0 and the flash mapped to 0x00800000. We use this approach in our own programming tools to execute from SRAM when we are programming flash.

Regards,

Karl

0 Juan Martinez over 14 years ago in reply to KGreb

Prodigy 170 points

That is a very interesting solution, but how do you make sure that after the swap (and reset) the RAM contains the code to execute? Wouldn't the RAM have been cleared by the reset?

-Juan

0 KGreb over 14 years ago in reply to Juan Martinez

TI__Mastermind 23000 points

Hi Juan,

You must of course program valid code into the SRAM from your bootloader before you swap the memory and reset the CPU. So long as power is not removed from the device, the SRAM will maintain its contents.

Note that it is also possible to take these steps using a debugger such as code composer if you want to check out the functionality. The bits in question are simply memory mapped - the device does not care if written by CPU or by debugger.

Regards,

Karl

Arm-based microcontrollers

Arm-based microcontrollers forum

smxRTOS with separation between boot code and app code along with ECC errors and ESM