Understanding Flash ECC Correction disabled

guilherme.muller

Other Parts Discussed in Thread: RM48L952, HALCOGEN, TEST2

I'm using Halcogen 04.05.01 generated code with a RM48HDK, with a RM48L952 which I believe is Die Rev. C, to validate the hardware initialization registers for the software team.

While implementing the SRAM ECC tests, I've modified the bit BTCMECC of the Secondary Auxiliary Control Register of Cortex-R4 cp15 to disable correction. By doing that, I could beautifully generate a data abort as soon as I had a single-bit error in SRAM, which was treated as an uncorrectable error. For us, this is a must-have, since our certification company don't really like correction, and both single and double bit errors are treated in the same handler.

But, the same thing does not happen when injecting error in the Flash ECC. Even when enabling the ATCMECC register, only a double-bit error generate a data abort. I understand that the access to ECC, OTP and the mirrored space is handled by the Flash Wrapper, while the main program flash (starting from address 0x00000000) is handled by ATCM.

While injecting the error in the checkFlashECC function from Halcogen code, all I got was an ESM group 1 error, because the function checks the mirrored space flash memory (handled by Flash Wrapper). Changing the addresses which were being checked to the main program flash does not generate any error at all, since the injection is in the Flash Wrapper. Double-bit faults do generate data aborts.

The doubts I have:

1) I've tried, before changing the checkFlashECC function, to generate an ESM Group 3 error by changing EDACMODE to 0x05 while injecting error, but got no success. Only ESM Group1 error was signaled. Is that correct?

2) Disabling correction on ATCM port with bit ATCMECC set will generate a data abort when a single-bit error in the main program flash is detected?

3) If the answer to question 2 is yes, how can I test it? I've tried to manually change the ECC for address 0x00000000 by changing address 0xF0000000 (the same way to test the SRAM ECC), but of course, that generates a data abort.

My major concern is that if main program flash correction is disabled and no abort or ESM flag is set, then it is possible that the main program keeps running with corrupted data.

over 9 years ago

0 Zhaohong Zhang over 9 years ago

TI__Mastermind 22715 points

The issue here is that you cannot inject an real error in Flash for normal Flash access. The Flash ECC is validated as follows.

(1) Create an working Flash image.

(2) Create an ECC image for the Flash image.

(3) Introduce error in the original Flash image.

(4) Program the Flash image in (3) and ECC image in (2).

(5) Run the test. You will see the expected behavior: signal bit error causes abort when the correction is disabled.

Thanks and regards,

Zhaohong

0 Charles Tsai over 9 years ago

TI__Guru**** 190986 points

Hello Guiherme,

I want to add on to Zhaohong's reply. When the CPU accesses the flash through the mirrored address the transaction takes place via the CPU's AXI-Slave interface. A single-bit ECC error will be corrected on the fly and hence you do not see abort like when you access the SRAM directly via the BTCM interface. If you follow Zhaohong's suggestion and I think you have already tried it, you will get abort on the ATCM interface if you disable correction for the ATCM like you did for BTCM. You can also check the FEDACSTATUS register in the flash wrapper if you get any ECC errors.

0 guilherme.muller over 9 years ago

Intellectual 270 points

Zhaohong, Charles,

Thanks for the prompt reply.

I was checking for alternatives of testing, and I found two.

1) Inject an ECC error through the linker, as on processors.wiki.ti.com/.../Linker_Generated_ECC
The only problem with this option is that I'm getting errors when compiling with the syntax "--ecc:data_error=0x0030,0x01" (error #10416: error specified at 0x30 does not lie within an ECC input or output range). I've tried in many address, all with the same error. I think that this would be the easiest way to test the Flash ECC.

2) Read the both of the addresses below:
- 64-bit data at address 0x701F8 will have a single-bit error.
- 64-bit data at address 0x701FC will have a double-bit error.

This is documented on this post: e2e.ti.com/.../1687572 and is valid for F28377x, but it seems to work fine on RM48 (with 32 bits reading since they are 4 bytes apart). I've tested the first address, and generated a data abort when correction was disabled. With correction enabled, an ESM group 1 was generated. The second address does not seem to work. This is not documented anywhere in any document from TI.

I'll meet with the software team next week to try algorithm Zhaohong suggested. I confess that being a hardware guy, sometimes is hard for me to know exactly what to do in this algorithm.

0 Charles Tsai over 9 years ago in reply to guilherme.muller

TI__Guru**** 190986 points

Hello Guilherme,

I think what might have happened is that the first time you read 0x701F8 a single bit error is detected. There is a hardware error cache built inside the CPU. The address 0x701F8 is getting cached. The reason for this hardware error cache is to avoid repeating retrying the same address over and over again if there is a hard fault in the memory. The second time you read 0x701FC is actually the upper 32-bit word from 0x701F8 so from the cache tag address comparison it is a match and the CPU simply reads from the cache without reading the flash memory. What you can try is to read the 0x701FC first and see if you can get a double bit ECC error.

0 guilherme.muller over 9 years ago

Intellectual 270 points

I've managed to create a single and a double bit error inside the flash memory, using linker. Both generates a data_abort. Single bit generate group 1 channel 6 error, and double bit generates a group 3 channel 7 error, as expected. It wasn't easy, though.

I had to modify the linker to link with the following.

/*----------------------------------------------------------------------------*/
/* Linker Settings                                                            */

/* USER CODE BEGIN (1) */
--retain="*(.intvecs)"
--unused_section_elimination=off
/* USER CODE END */

/*----------------------------------------------------------------------------*/
/* Memory Map                                                                 */

MEMORY
{

    VECTORS (X)  : origin=0x00000000 length=0x00000020
    FLASH0  (RX) : origin=0x00000020 length=0x0017FF00 vfill=0xFFFFFFFF
    ERR_INJ  (RX) : origin=0x0017FF20 length=0x000000C0 vfill=0xFFFFFFFF
    FLASH1  (RX) : origin=0x00180000 length=0x00180000 vfill=0xFFFFFFFF
    STACKS  (RW) : origin=0x08000000 length=0x00001500
    RAM     (RW) : origin=0x08001500 length=0x0003EB00
	ECC_VEC  : origin=0xf0400000 length=0x000004 ECC={ input_range=VECTORS }
    ECC_FLA0 : origin=0xf0400004 length=0x02FFE0 ECC={ input_range=FLASH0  }
	ECC_ERR  (RX) : origin=0xf0400004+0x02FFE0 length=0x00000018 ECC={ input_range=ERR_INJ  }
    ECC_FLA1 : origin=0xf0430000 length=0x030000 ECC={ input_range=FLASH1  }

/* USER CODE BEGIN (2) */
/* USER CODE END */
}

// 196604

/* USER CODE BEGIN (3) */
ECC {
   algoR4F021 : address_mask = 0x003ffff8 /* Address Bits 21:3 */
                 hamming_mask = R4         /* Use R4 build in Mask */
                 parity_mask  = 0x0c       /* Set which ECC bits are Even and Odd parity */
                 mirroring    = F021       /* RM4x and TMS570LSx are build in F021 */
}
/* USER CODE END */


/*----------------------------------------------------------------------------*/
/* Section Configuration                                                      */

SECTIONS
{
    .intvecs : {} > VECTORS
    .text    : {} > FLASH0 | FLASH1
    .const   : {} > FLASH0 | FLASH1
    .cinit   : {} > FLASH0 | FLASH1
    .pinit   : {} > FLASH0 | FLASH1
    .ErrInj  : {} > ERR_INJ
    .bss     : {} > RAM
    .data    : {} > RAM
	.sysmem  : {} > RAM
	

/* USER CODE BEGIN (4) */

/* USER CODE END */
}



/* USER CODE BEGIN (5) */
--ecc:data_error=0x0017FF20,0x01
--ecc:data_error=0x0017FF28,0x03
/* USER CODE END */

The --unused_section_elimination=off saved my life. Linker keeps giving "uninitialized data" error. Then, I've created two variables with single and a double bit errors, at section .ErrInj, as attributes.

const unsigned long long __attribute__((section(".ErrInj"))) test  = 3;
const unsigned long long __attribute__((section(".ErrInj"))) test2  = 83;

When reading test, single bit ecc error is generated. When reading test2, double bit ECC error is generated.

Arm-based microcontrollers

Arm-based microcontrollers forum

Understanding Flash ECC Correction disabled