This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How ECC works on TMS570LC43x ? RAM, FLASH and other locations

Other Parts Discussed in Thread: TMS570LC4357, HALCOGEN

Hi TI team,

I need some clarifications about the ECC mechanisms in the TMS570LC4357


a) is my understanding of the ECC SECDED on data for the RAM is good ? :

  •  for R/W access in 32-bit, 16-bit, or 8-bit access size on the RAM ECC is performed by the L2RAMW module
  •   in case of Write Single-bit-error, the ESM interrupt Group 1 Channel 26 (L2RAMW - correctable error) will trigger
  •   in case of Write/Read Multiple-bit-error, the nERROR pin will be activated (ESM 3.3
  •  for R/W access in 64-bitaccess size, ECC is performed by the Cortex-R4F
    •   in case of Write/read single-bit-error, the ESM interrupt 1.4 will be triggered
    •   in case of Write/read multiple-bit-error, the ESM interrupt 2.21 will be triggered

 so if i want to be able to monitor read single-bit-errors, i have to access my ram in 64-bit access size to get the ESM interrupt 1.4

b) what happens once I read a data in RAM with a single-bit-error corrected by the ECC ?

  •  does the data is corrected inside the RAM memory ?

 or

  •  does the data is not corrected inside the RAM memory but corrected each time i access it ? (i know about the EPC purpose to store the address of correctable errors to avoid repeated ECC errors on the same address)




c) I also want to use the EPC to monitor my correctable errors. will this bit of code work ? (I'm using HalCoGen generated project)

 uint64_t read_value
 
 epcCAMInit();                                // reset all cam entry to 'available'
 
 read_value = *(uint64_t *) 0x0800BEEF;       // read data

 for(i=0;i<32;i++)                            // for each CAM_index
 {
  if(epcCheckCAMEntry(i) == false)            // if CAM_index occupied
  {
   if(epcREG1->CAM_CONTENT[i] == 0x0800BEEF)  // verify if error address is the one we just read
   {
    printf("data at address 0x0800BEEF was corrected by SECDED\n");
   }
  }
 }

 
d) Will the same mechanism will work for FLASH ECC error monitoring ? if i access FLASH with 64-bit access size, will the Cortex-R5F do the ECC SECDED instead of the L2FMC?


e) same as d) with EMIF

 

f) I heard about a HARD ERROR CACHE but I can't find it on the TRM.
in the case of a correctable error detected in flash, is the corrected data stored inside some cache ? and if it is, how do I clear this cache to have the error again ?

g) is there some example code for ECC error injection in RAM, FLASH and EMIF ?

 

 


thanks in advance

Benjamin,

  • Hi Benjamin,

    Please find my answers below.

    a) is my understanding of the ECC SECDED on data for the RAM is good ? :

    •  for R/W access in 32-bit, 16-bit, or 8-bit access size on the RAM ECC is performed by the L2RAMW module
    •   in case of Write Single-bit-error, the ESM interrupt Group 1 Channel 26 (L2RAMW - correctable error) will trigger
    •   in case of Write/Read Multiple-bit-error, the nERROR pin will be activated (ESM 3.3
    •  for R/W access in 64-bitaccess size, ECC is performed by the Cortex-R4F
      •   in case of Write/read single-bit-error, the ESM interrupt 1.4 will be triggered
      •   in case of Write/read multiple-bit-error, the ESM interrupt 2.21 will be triggered

     so if i want to be able to monitor read single-bit-errors, i have to access my ram in 64-bit access size to get the ESM interrupt 1.4

    Charles>> First of all the ECC checking is done inside the CPU no matter what data sizes (8, 16, 32, 64bits). Corresponding to each 64-bit word there is an 8-bit ECC. If CPU only tries to do a 8-bit read, the entire 64-bit data and the 8-bit ECC are returned by the L2RAMW to the CPU. The CPU will first perform ECC checking. After the ECC check is done, the specified byte is then loaded to the CPU's register. Same for other sizes. When the CPU performs a byte write, the CPU will still generate the 8-bit ECC along with the 64-bit data even though only 8-bit is intended to be written to the destination. Once the 64-bit data and the 8-bit ECC reach the L2RAMW, the L2RAMW will evaluate the ECC first. Yes, there is also ECC checking inside the L2RAMW for write accesses. After the ECC checking is done by the L2RAMW, the intended byte to be written is first combined with the other 7 bytes from the memory in a so-called 'read-modify-write' operation. The final 8 bytes are again computed for the ECC before they are written to the memory. So there are two ECC operation done inside the L2RAMW for a write access.

    b) what happens once I read a data in RAM with a single-bit-error corrected by the ECC ?

    •  does the data is corrected inside the RAM memory ?

     or

    •  does the data is not corrected inside the RAM memory but corrected each time i access it ? (i know about the EPC purpose to store the address of correctable errors to avoid repeated ECC errors on the same address)

     

    Charles>> Please refer to the Memory Scrubbing feature and MSE bit in the RAMCTRL register in the TRM. You can enable MSE bit so that a single bit error will be first detected by the L2RAMW and then corrected before written back to the memory. The corrected data later returned to the CPU will undergo another ECC checking which is done inside the CPU. You might ask why doing ECC checking at both places. The final ECC checking done at the CPU checks the complete path between the CPU and the L2RAMW. There are other modules such as interconnect sitting between the CPU and the L2RAMW. Even if the data is corrected by the L2RAMW there is still not guarantee that there is no faults at the interconnect level.  


    c) I also want to use the EPC to monitor my correctable errors. will this bit of code work ? (I'm using HalCoGen generated project)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    uint64_t read_value
    epcCAMInit();                                // reset all cam entry to 'available'
    read_value = *(uint64_t *) 0x0800BEEF;       // read data
     
    for(i=0;i<32;i++)                            // for each CAM_index
    {
     if(epcCheckCAMEntry(i) == false)            // if CAM_index occupied
     {
      if(epcREG1->CAM_CONTENT[i] == 0x0800BEEF)  // verify if error address is the one we just read
      {
       printf("data at address 0x0800BEEF was corrected by SECDED\n");
      }
     }
    }

     

    Charles>> What is 0x0800BEEF? Why are you doing a 64-bit read from such a location which is a byte address. This is an unaligned access. Otherwise I think will work. 


    d) Will the same mechanism will work for FLASH ECC error monitoring ? if i access FLASH with 64-bit access size, will the Cortex-R5F do the ECC SECDED instead of the L2FMC?

    Charles>> The L2FMC does not do any ECC checking unlike L2RAMW. No matter what sizes the ECC is only done by the CPU. 


    e) same as d) with EMIF

     

    Charles>> There is no ECC done on the EMIF. The LC4357 EMIF interface does not support external memories with ECC. 

     

    f) I heard about a HARD ERROR CACHE but I can't find it on the TRM.
    in the case of a correctable error detected in flash, is the corrected data stored inside some cache ? and if it is, how do I clear this cache to have the error again ?

    Charles>> Please refer to this post  for additional information about hard error cache. 

    g) is there some example code for ECC error injection in RAM, FLASH and EMIF ?

     

    Charles>> Please refer to the  . The SafeTI library implemented various diagnostics to check flash and RAM.

  • Charles,

    first of all, thank you very much, you clarified a whole lot of points for me.

    Charles Tsai said:
    b) what happens once I read a data in RAM with a single-bit-error corrected by the ECC ?
    •  does the data is corrected inside the RAM memory ?

     or

    •  does the data is not corrected inside the RAM memory but corrected each time i access it ? (i know about the EPC purpose to store the address of correctable errors to avoid repeated ECC errors on the same address)

     

    Charles>> Please refer to the Memory Scrubbing feature and MSE bit in the RAMCTRL register in the TRM. You can enable MSE bit so that a single bit error will be first detected by the L2RAMW and then corrected before written back to the memory. The corrected data later returned to the CPU will undergo another ECC checking which is done inside the CPU. You might ask why doing ECC checking at both places. The final ECC checking done at the CPU checks the complete path between the CPU and the L2RAMW. There are other modules such as interconnect sitting between the CPU and the L2RAMW. Even if the data is corrected by the L2RAMW there is still not guarantee that there is no faults at the interconnect level.

    So, EPC masking behavior apart, if Memory Scrubbing is disabled, every read access to the same corrupted data will generate a ECC error, right ?

    Charles Tsai said:
    Charles>> What is 0x0800BEEF? Why are you doing a 64-bit read from such a location which is a byte address. This is an unaligned access. Otherwise I think will work. 

    yeah, sure... don't mind the address, it was just a fast example.

    thanks again.


    Benjamin

  • Hi Benjamin,

    Benjamin GREFFE said:
    So, EPC masking behavior apart, if Memory Scrubbing is disabled, every read access to the same corrupted data will generate a ECC error, right ?

    Yes, the EPC will record the single bit error address once the error is detected. A repeating error on the same address will not generate an error to the ESM since the error is already recorded. This is is to prevent a repeating NMI interrupt to the CPU if you are in a loop. Please see below excerpt from the TRM.

    The 64-bit aligned address of the correctable fault from each IP FIFO is sent to the CAM to check if the
    correctable fault is unique or repetitive. If it is a repetitive address for the correctable fault, then the
    correctable fault and its address are discarded and no further indication to the CPU. If it is a unique
    address, then the address will be remembered in the CAM content and CAM index will be set to occupied.
    It is software configurable to raise an error event to ESM if SERRENA bits in EPCCNTRL are enable.