TMS570 - Error detection/correction mechanisms

Henry Abril1

Could you please help me to clarify/collect all the required information in order to handle the following correctable errors:

Group1 - 6 FMC - correctable error: bus1 and bus2 interfaces (does not include accesses to EEPROM bank)

Group1 - 26 RAM even bank (B0TCM) - correctable error

Group1 - 28 RAM odd bank (B1TCM) - correctable error

Group1 - 35 FMC - correctable error (EEPROM bank access)

For me it is important to know how the mechanisms work and which user actions/configurations are required in each case.

Where is this info available? Is there a document that explain the operation of the different mechanisms? Implementation examples?

Some points to clarify are e.g.:

---------------------------------------------------------------------------------------------------------------------------------

In case of RAM single error correction:

1. Is the ECC/data really restored/fixed by the CPU in RAM so that in the next read access to the same memory location the error is not present/detected? or, is the value just buffered internally in the correction mechanism? How does it work? If the data is just buffered, could this affect future double error detection? How?

If the idea is to correct all the detected single errors and continue with safe operation, which is the approach to follow?

e.g.:

- Every time there is a single error correction, shall the user read the correctable error address register and clear the error flags to allow the correction mechanism to keep operation?

- The user can just disregard all the corrected errors without any actions like reading the correctable error address and clearing the error flags?

---------------------------------------------------------------------------------------------------------------------------------

In general:

is the any TI/ARM/... DETAILED document where the user can understand how all the detection/correction mechanisms work?

Thank you

over 12 years ago

0 Charles Tsai over 12 years ago

TI__Guru**** 191886 points

Hello Henry,

CT>> In general the ECC checking for memories on ATCM and BTCM is performed by the CPU itself. For ECC checking on the EEprom, the operation is done by the FMC module. For ECC related errors, the CPU will log the error status and error address in its CP15 registers,i.e. data fault status register and data fault address register. You may also find similar information such as status and address in the FMC or TCRAM module. Just make sure that you export these error events detected by the CPU onto the CPU's event but enabling the X bit in the Performance Monitor Control Register in the CPU.

I try to answer your below questions:

CT> yes, the CPU employs a corect-and-retry approach when detecting a correcable error. It will write back the corrected data to the RAM and retry the read again. CPU also has an internal hard error cache. When the processor writes corrected data back to the TCM, it is also allocated to the hard error cache. When the access is retried there is a hit in the error cache and the data is read from here in preference to the TCM. Please also note that the correct-and-retry approach is only true for TCM side instruction read or data read. If the access is coming from the AXI-Slave then the processor will correct the data in-line without retry operation.

If the idea is to correct all the detected single errors and continue with safe operation, which is the approach to follow?

e.g.:

- Every time there is a single error correction, shall the user read the correctable error address register and clear the error flags to allow the correction mechanism to keep operation?

CT>> Yes, you should read the error address and clear the flag so that subsequent errors can be recorded in the FMC or TCRAMW module. The error address registers are frozen from getting update if the error flags are not cleared.

- The user can just disregard all the corrected errors without any actions like reading the correctable error address and clearing the error flags?

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

Thanks for your feedback.

Could you please let me know if there is ANY documentation which contains information about the detection/correction mechanisms in a DETAILED way? (From my point of view the TRM does not provide enough information for the handling of the referenced erros)

It would be easier for me to first check a detailed documentation and then ask some doubts, but not to try to understand those complex mechanisms with 100 questions via the E2E forum.

Thank you and BR,

0 KGreb over 12 years ago in reply to Henry Abril1

TI__Mastermind 23000 points

Hello Henry,

The TRM is the detailed document for this information. If you feel it is not adequate, I'd recommend that you click on the document feedback link at the bottom of the relevant TRM page and file a ticket specifying which details you believe are necessary. This ensures the point is discussed for the next document update. We'll also be glad to answer specific questions on the E2E forum as best possible.

Please keep in mind that different readers may require different levels of detail. Some have sent feedback that the current docs are too detailed on the same points where you are asking for more detail. We try to shoot for a balance to best serve the majority of customers.

Best Regards,

Karl

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hello Henry,

For details about error correction and detection performed by the CPU the ARM's Cortex-R4 TRM has the detailed description on how the TCM internal error detection and correction are handled. Please check section "8.4.3 TCM internal error detection and correction"

ARM's TRM mentions briefly about the hard error cache. This is described "8.2.4 Hard errors". I agree that the description is a bit lacking. I will try to supplement more details here. This hard error cache contains only one 64-bit entry shared between the three TCM ports. When the processor writes corrected data back to the TCM, it is also allocated to the TCM-HEC (this is the only situation in which the TCM HEC is allocated, and the allocation is write-through). When the access is retried and if there is a hit in the TCM-HEC then the data is read from here in preference to the TCM. Subsequent writes to the address allocated in the TCM-HEC will replace the data. If a subsequent TCM read detects a 1-bit error, then the new corrected data replaces the current entry.

If you are using spnu499b TRM for TMS570LS31x/21x devices then you will find how the flash memory correctable and uncorrectable ECC error addresses and status are captured and how they are cleared at sections starting at 5.7.5 . It does mention that the address value captured in the error address registers is frozen from being updated until iti s read by the CPU. Additional errors are blocked until the error address registers are read. Please let us know what is unclear so we can try to add more information.

Similarly you can find how the correctable/uncorrectable errors are handled for the RAM in section 6.3 of the TCRAMW Module. Additional information on the error address registers and status registers and how they are captured and cleared will be found in the control and status registers section starting at section 6.7.5-6.7.7. Again, please let us know what is not clear so we can update the TRM for future revision.

0 David Sabol over 12 years ago in reply to Charles Tsai

Genius 4850 points

Here is a link to the document I used to learn more.

http://www.ti.com/lit/an/spna126/spna126.pdf

Good luck!

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

I have found very useful the information you provided me, specially the info from the second paragraph that IMHO you cannot find so detailed in any part the documentation.

We had some situations when the interrupts were triggered only once(the first time) when introducing periodically a "single bit error" in the ECC of an specific RAM memory location.

To solve it, we have tried:

- Alternating the introduction of periodical errors in different RAM banks e.g.:B0TCM --> B1TCM --> B0TCM...

- Re-writing the RAM memory location under test in every cycle

In both cases the interrupt was triggered in every cycle as expected but without a clear explanation WHY.

1. It seems to be clear now that it was because the access was performed directly from the TCM-HEC and not from the TCM. Is that correct?

Please take a look to the following scenario and help me to clarify some open items:

Assume that we have configure correctly the module: single error correction, interrupt, threshold = 1, etc.

Step1. Write to "dummy_var"

Step2. At this point for some reason the ECC is not correct for "dummy_var" (single bit error)

Step3. Read "dummy_var"

Step4. ESM interrupt generated, error automatically and successfully corrected by the CPU (Correct-and-retry), user actions to clear the peripheral error flags and detect new errors. Return from interrupt.

Step5. Read "dummy_var"

2. Is at this point the DATA retrieved from the TCM-HEC?

3. If 2 is TRUE: If no new write to "dummy_var" is performed and no new ECC errors are detected (the TCM-HEC entry is not replaced), will it read always the value of "dummy_var" from the TCM-HEC?

4. What would happen if the CPU "Correct-and-retry" operation fail in Step4.(e.g. because the memory location is really damaged)? What will happen in Step5. under this situation? Will the correct data value be available in the TCM-HEC as in the original scenario? If in this case we read the correct value form the TCM-HEC, I asumme no error will be generated even when the TCM memory location is really damaged, is this correct?

Thank you and BR,

0 Henry Abril1 over 12 years ago in reply to David Sabol

Intellectual 485 points

Hi David,

Thanks for your feedback too. It is a nice document to have a good overview of the TMS570 ECC.

BR,

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hi Henry,

I have the asnwers in-line.

1. It seems to be clear now that it was because the access was performed directly from the TCM-HEC and not from the TCM. Is that correct?

CT>> Yes, the access will come from the TCM-HEC because it is a cache hit.

Please take a look to the following scenario and help me to clarify some open items:

Assume that we have configure correctly the module: single error correction, interrupt, threshold = 1, etc.

Step1. Write to "dummy_var"

Step2. At this point for some reason the ECC is not correct for "dummy_var" (single bit error)

Step3. Read "dummy_var"

Step5. Read "dummy_var"

2. Is at this point the DATA retrieved from the TCM-HEC?

CT>> Yes, the DATA is retrieved from the TCM-HEC since this is already the second read of the same address of which is already cached. The first read happens in step 3.

3. If 2 is TRUE: If no new write to "dummy_var" is performed and no new ECC errors are detected (the TCM-HEC entry is not replaced), will it read always the value of "dummy_var" from the TCM-HEC?

CT>> If you keep reading dummy_var then it will read from the TCM-HECC since it is always a hit.

CT>> This is the whole purpose of the HEC which is to address a hard memory error (i.e. permanent fault in the memory) as you just described. Your understanding is correct that in such a memory the CPU will keep reading from the TCM-HEC as long as it is a cache hit. What is stored in the HEC is a corrected data.

0 Charles Tsai over 12 years ago in reply to Charles Tsai

TI__Guru**** 191886 points

Hi Henry,

Also to add another note is that the hard error cache is enabled after reset. To control this feature, you may enable or disable via the secondary auxiliary control register bit 22 DCHE. It is something for you to experiment with but we do not recommed you disabling this feature in your final application.

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

As you mentioned before, the CPU is handling the ECC for the ATCM(Flash) and BTCM(RAM) and the TCM-HEC applies for both of them.

I assume that the ECC detection and correction for the TCM Flash is working in a similar way as for the discussed TCM RAM items, I mean, the CPU is using also the TCM HEC for allocating the corrected value of a Flash memory location with a single bit error and when the access is retried and if there is a hit in the TCM-HEC then the data is read from here in preference to the TCM.

1. Is this assumption correct?

AFAIK the OTP and FEE memory (bank7) are protected by SECDED logic in the flash wrapper.

2. Is in the flash wrapper a similar mechanism for allocating the corrected value as the TCM HEC? How does the correction mechanism work in the Flash Wrapper? Could you please provide me some details?

Thank you and BR,

0 Henry Abril1 over 12 years ago in reply to Henry Abril1

Intellectual 485 points

Hi Charles,

Apart from the items described in the previous message, could you please help me to clarify the behavior of the correctable error counters when the TCM-HEC is involved into the error correction:

FCOR_ERR_CNT Flash Correctable Error Count Register Section 5.7.4

RAMOCCUR TCRAM Module Single-Bit Error Occurrences Control Register Section 6.7.3

I would expect that the counters above are incremented in the access when the error is detected/corrected and the corrected value is allocated into the TCM-HEC, but I would not expect the values to be incremented in subsequent access to the same memory location when the value is read from the TCM-HEC instead of the real TCM location.

Please consider the following scenarios and let me know if my assumptions/understanding is correct:

Assume that we have configure correctly the module: single error correction, interrupt, RAMTHRESHOLD = 2

a.) Single error detected/corrected in 1 memory location of B0TCM

a.1) Read "dummy_var_0" --> Single error detected/corrected and corrected value allocated in TCM-HEC. RAMOCCUR incremented by ONE, RAMOCCUR = 1. No interrupt generated.

a.2) Read "dummy_var_0" again. The value is read from the TCM-HEC, and I assume that the RAMOCCUR value IS NOT incremented, and then no interrupt is generated. Is this correct?

a.3) Subsequent reads of "dummy_var_0" getting the value from the TCM-HEC will NOT increment the RAMOCCUR value. Is this correct?

b.) Multiple single error detected/corrected in different (64bit) memory locations

b.1) Read "dummy_var_0" --> Single error detected/corrected and corrected value allocated in TCM-HEC. RAMOCCUR incremented by ONE, RAMOCCUR = 1. No interrupt generated.

b.2) Read "dummy_var_1" --> Single error detected/corrected and NEW corrected value allocated in TCM-HEC. RAMOCCUR incremented by ONE, RAMOCCUR = 2.

b.3) ESM interrupt generated, user actions to clear the peripheral error flags and detect new errors. Return from interrupt.

b.4) Subsequent reads of "dummy_var_1" getting the value from the TCM-HEC will NOT increment the RAMOCCUR value. Is this correct?

Thank you and BR,

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hello Henry,

Please find my answers below.

1. Is this assumption correct?

CT>> Yes, it works similarly for the ATCM interface. The one single entry HEC is shared between ATCM, B0TCM and B1TCM.

AFAIK the OTP and FEE memory (bank7) are protected by SECDED logic in the flash wrapper.

CT>> Yes, the OTP (from bank 0-6) and FEE memory (bank7) are protected by the SECDED that is built inside the flash wrapper. Access to OTP and FEE is via the Bus2 interface of the flash wrapper which is accessed from the CPU's AXI bus. There is no ECC checking on CPU's AXI bus interface and hence the flash wrapper will do the ECC checking locally.

CT>> No. There is no hard error cache on the CPU AXI interface as there is no ECC error detection mechanism for this interface. If there is a hard error on the FEE or the OTP memory then the flash wrapper will correct for 1-bit error or detect for 2-bit error and signal to the ESM in either situation.

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hi Henry,

My answers are inline.

CT>> Your understanding is correct. Once the corrected value is allocated to the TCM-HECC, reading the same location will happen from the HEC instead of the TCM memory. Therefore, no subsequent error event will be generated by the CPU's on its event bus.

Please consider the following scenarios and let me know if my assumptions/understanding is correct:

Assume that we have configure correctly the module: single error correction, interrupt, RAMTHRESHOLD = 2

a.) Single error detected/corrected in 1 memory location of B0TCM

a.1) Read "dummy_var_0" --> Single error detected/corrected and corrected value allocated in TCM-HEC. RAMOCCUR incremented by ONE, RAMOCCUR = 1. No interrupt generated.

a.2) Read "dummy_var_0" again. The value is read from the TCM-HEC, and I assume that the RAMOCCUR value IS NOT incremented, and then no interrupt is generated. Is this correct?

CT>> This is correct. The error counter is not incremented because no error event is generated on CPU's event bus.

a.3) Subsequent reads of "dummy_var_0" getting the value from the TCM-HEC will NOT increment the RAMOCCUR value. Is this correct?

CT>> This is correct. This is the third read which is no different than the second read in step a.2.

b.) Multiple single error detected/corrected in different (64bit) memory locations

b.1) Read "dummy_var_0" --> Single error detected/corrected and corrected value allocated in TCM-HEC. RAMOCCUR incremented by ONE, RAMOCCUR = 1. No interrupt generated.

b.2) Read "dummy_var_1" --> Single error detected/corrected and NEW corrected value allocated in TCM-HEC. RAMOCCUR incremented by ONE, RAMOCCUR = 2.

CT>> This is correct. dummy_var_1 is a different address. It will cause the CPU to perform correct and retry and also allocate the HEC for this new address.

b.3) ESM interrupt generated, user actions to clear the peripheral error flags and detect new errors. Return from interrupt.

CT>> yes. Make sure the single bit error enable bit is set.

b.4) Subsequent reads of "dummy_var_1" getting the value from the TCM-HEC will NOT increment the RAMOCCUR value. Is this correct?

CT>> Correct. Since the HEC contains the corrected data for dummy_var_1 the RAMOCCR will not increment.

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

For the previous examples most of the time I have referred to the TCM RAM detection/correction cases.

Could you please clarify if the same applies for TCM Flash memory locations which contain instructions to be executed?

e.g.: A single error is present in a TCM Flash memory location which contains an instruction that is going to be executed periodically.

This is my assumption (please let me know if it is correct):

1. The first time that part of code will be executed the error is detected/corrected and the corrected value is loaded to the TCM-HEC. The instruction is successfully executed due to the corrected value present in the TCM-HEC and the FCOR_ERR_CNT is incremented by 1.

2. In no new single errors are detected then the next time the same memory location is fetched to be executed, the value is retrieved from the TCM-HEC and no new error is detected nor FCOR_ERR_CNT increment is present. The instruction is successfully executed due to the corrected value present in the TCM-HEC.

Thank you and BR,

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hi Henry,

Your understanding is correct about the effect of HEC on the ATCM for flash memory. Please note that flash is not writeable directly by the CPU. This means that when the CPU performs the correct and retry, the corrected data will not be written into the flash. Since the HEC already contains the corrected data, the next time the CPU reads from the same address it will retrieve the data from the HEC.

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

As one of the possibilities we are considering to configure the TMS to detect and correct all single bit errors without generating interrupts, it means we will not clear/read/check the following registers when a single bit error is detected/corrected:

TCM RAM:

RAMERRSTATUS
RAMSERRADDR
RAMOCCUR
...

TCM Flash:

FCOR_ERR_CNT
FCOR_ERR_ADD
FCOR_ERR_POS
FEDACSTATUS
...

1.) Could you please confirm that even when that registers are not read/check/cleared the TCM SECDED mechanism will keep working normal:

1.a.) All single bit errors will be detected and corrected (according to our configuration).

1.b.) All the uncorrectable errors will be detected and the corresponding reactions(interrupts and aborts) will take place:

FMC - uncorrectable error (address parity on bus1 accesses) ESM Group2 4
RAM even bank (B0TCM) - uncorrectable error ESM Group2 6
RAM odd bank (B1TCM) - uncorrectable error ESM Group2 8
RAM even bank (B0TCM) - address bus parity error ESM Group2 10
TCM - ECC live lock detect ESM Group2 16
RAM even bank (B0TCM) - ECC uncorrectable error ESM Group3 3
RAM odd bank (B1TCM) - ECC uncorrectable error ESM Group3 5
FMC - uncorrectable error: bus1 and bus2 interfaces ESM Group3 7
...

1.c.) Could you please help me to clarify the following note in the description of the FCOR_ERR_ADD register SPNU499B–November 2012–Revised August 2013 section 5.7.5 Flash Correctable Error Address Register (FCOR_ERR_ADD) :

"Correctable Error Address
COR_ERR_ADD records the CPU logical address of which a correctable error is detected
by the ECC logic. This error address is frozen from begin updated until it is read by the
CPU. Additional error are blocked until this register is read."

I understand that additional error reactions are blocked (as interrupts) until the register is read, but if there are new single bit errors before the register is read they will be detected and corrected too, but no new reactions nor update of the COR_ERR_ADD will be present. Is my understanding correct? I mean, no error detection/correction will be blocked even if the COR_ERR_ADD registers is not read.

2.) We would like to clarify if the same approach applies for the SECDED mechanism which is not related to the TCMs. Configuring the TMS to detect and correct all single bit errors without generating interrupts, it means we will not clear/read/check the following registers when a single bit error is detected/corrected:

EE_COR_ERR_CNT
EE_COR_ERR_ADD
EE_COR_ERR_POS
EE_STATUS
...

1.) Could you please confirm that even when that registers are not read/check/cleared the flash wrapper SECDED mechanism will keep working normal:

1.a.) All single bit errors will be detected and corrected (according to our configuration).

1.b.) All the uncorrectable errors will be detected and the corresponding reactions(interrupt, if configured) will take place:

FMC - uncorrectable error (EEPROM bank access) Group1 36

Thank you and BR,

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hi Henry,

My answers are inline.

TCM RAM:

RAMERRSTATUS
RAMSERRADDR
RAMOCCUR
...

TCM Flash:

FCOR_ERR_CNT
FCOR_ERR_ADD
FCOR_ERR_POS
FEDACSTATUS
...

1.) Could you please confirm that even when that registers are not read/check/cleared the TCM SECDED mechanism will keep working normal:

CT>> Yes, it will continue to work. the SECDED is inside the CPU. It does not depend on reading these registers in the flash wrapper to work.

1.a.) All single bit errors will be detected and corrected (according to our configuration).

CT>> Yes.

1.b.) All the uncorrectable errors will be detected and the corresponding reactions(interrupts and aborts) will take place:

FMC - uncorrectable error (address parity on bus1 accesses) ESM Group2 4

RAM even bank (B0TCM) - uncorrectable error ESM Group2 6
RAM odd bank (B1TCM) - uncorrectable error ESM Group2 8
RAM even bank (B0TCM) - address bus parity error ESM Group2 10
TCM - ECC live lock detect ESM Group2 16
RAM even bank (B0TCM) - ECC uncorrectable error ESM Group3 3
RAM odd bank (B1TCM) - ECC uncorrectable error ESM Group3 5
FMC - uncorrectable error: bus1 and bus2 interfaces ESM Group3 7
...

CT>> All of the above are related to uncorrectable errors. They are independent of the TCM ECC error correction/detection operations. They will continue to work as long as you read the corresponding uncorrectable error address registers in the respective modules so they are not frozen from capturing new errors.

CT>> CPU will continue to correct and detect any ECC error regardless if you read the COR_ERR_ADD. The COR_ERR_ADD is a register inside the flash wrapper. It helps user to profile or to determine the location of the errors. Since this register is memory mapped, it is easier to access from a programmer model.

EE_COR_ERR_CNT
EE_COR_ERR_ADD
EE_COR_ERR_POS
EE_STATUS
...

1.) Could you please confirm that even when that registers are not read/check/cleared the flash wrapper SECDED mechanism will keep working normal:

1.a.) All single bit errors will be detected and corrected (according to our configuration).

CT>> Make sure the EE_EOFEN and EE_EZFEN bits are disabled so that single bit errors will continue to be corrected and detected. EE_EOFEN and EE_EZFEN are to enable for single bit error interrupts.

1.b.) All the uncorrectable errors will be detected and the corresponding reactions(interrupt, if configured) will take place:

FMC - uncorrectable error (EEPROM bank access) Group1 36

CT>> Yes, the uncorrectable error detection will continue to work as long as you read the EE_UNC_ERR_ADD register in your error handler so that it does not become frozen from capturing the new errors.

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

All references to TRM are pointing to --> TRM SPNU499B–November 2012–Revised August 2013

-----------------------------------------------------------------------------------------------------------------------------------------------------

From your previous feedback the CPU is checking the ECC for memories on the ATCM and BTCM and the Flash Wrapper is checking the ECC for the EEPROM region.

I also have found the following info in the TRM:

5.3 SECDED
The Flash memory can be protected by Single Error Correction Double Error Detection (SECDED). The
main program memory is protected by the SECDED circuit inside of the Cortex R4 CPU. All OTP and the
FEE memory (bank 7) is protected by SECDED logic in the flash wrapper.

So, my intention now is to clarify which part (CPU or Flash Wrapper) is performing the ECC checking for each memory region described in the datasheet document:

SPNS162B –APRIL 2012–REVISED JULY 2013, Section 4.9.2 Memory Map Table:

TCM Flash 0x0000_0000 0x00FF_FFFF
TCM RAM + RAM ECC 0x0800_0000 0x0BFF_FFFF
Mirrored Flash 0x2000_0000 0x20FF_FFFF
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF
Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF
EEPROM Bank 0xF020_0000 0xF03F_FFFF
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF

My assumption, according to my understanding of the information is:

a.) Memory regions which ECC is checked by the CPU(TCM-HEC related):

TCM Flash 0x0000_0000 0x00FF_FFFF
TCM RAM (RAM ONLY, not RAM ECC!!!!!) 0x0800_0000 0x0803_FFFF

b.) Memory regions which ECC is checked by the Flash Wrapper:

Mirrored Flash 0x2000_0000 0x20FF_FFFF
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF
EEPROM Bank 0xF020_0000 0xF03F_FFFF

1.) Are my assumptions a.) and b.) correct? Are all those memory areas checked for ECC?

2.) According to the TRM section: 6.2 RAM Memory Map there are no ECC errors detected nor error reactions when accesing the TCM RAM ECC memory area, so I would not expect modifications to RAM registers as RAMERRSTATUS, RAMOCCUR, ESM flags, nERROR pin reaction, etc, etc under any situation when accessing directly the TCM RAM ECC memory area, is this correct?

3.) Is the ECC also evaluated/checked when reading directly from the rest of ECC areas (ECC Flash, ECC OTP, etc)? I mean, can read accesses to the ECC memory locations described below generate ECC errors and reactions?

Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF --> Yes? No?
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF --> Yes? No?
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF -->Yes? No?
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF --> Yes? No?
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF --> Yes? No?
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF --> Yes? No?
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF --> Yes? No?
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF --> Yes? No?

4.) If 3.) is "yes", could you please help me to clarify which which part (CPU or Flash Wrapper) is performing the ECC checking for each ECC memory region?

Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF --> ?
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF --> ?
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF -->?
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF --> ?
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF --> ?
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF --> ?
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF --> ?
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF --> ?

5.) Is 3.) is "yes", in case of correctable errors when reading directly from the different ECC areas, is this also affecting the registers FCOR_ERR_CNT or EE_COR_ERR_CNT? How? Which operation affects which register?

Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF --> ?
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF --> ?
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF -->?
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF --> ?
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF --> ?
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF --> ?
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF --> ?
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF --> ?

-----------------------------------------------------------------------------------------------------------------------------------------------------

From TRM:

5.7.4 Flash Correctable Error Count Register (FCOR_ERR_CNT).

This register applies to the main flash banks.

6.) Could you please clarify which are the main flash banks for this case? As far as I understand Bank0 and Bank1, but is this including the TCM Flash only (0x0000_0000 0x00FF_FFFF)? or it includes also the OTP regions related to Bank0 and Bank1? What about the mirrored flash? Could accesses over the mirrored flash influence the value of the register too?

Please help me to correct/ fill the table above:

TCM Flash 0x0000_0000 0x00FF_FFFF Influence, Yes
Mirrored Flash 0x2000_0000 0x20FF_FFFF Influence?
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF Influence, Yes
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF Influence, Yes
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF Influence, No
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF Influence, Yes
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF Influence, Yes
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF Influence, No
EEPROM Bank 0xF020_0000 0xF03F_FFFF Influence, No

From TRM:

5.7.37 EEPROM Emulation Correctable Error Count Register (EE_COR_ERR_CNT)

7.) Same as item 6.) but for the EE_COR_ERR_CNT register:

Please help me to correct/ fill the table above:

TCM Flash 0x0000_0000 0x00FF_FFFF Influence, No
Mirrored Flash 0x2000_0000 0x20FF_FFFF Influence, No
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF Influence, No
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF Influence, No
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF Influence, Yes
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF Influence, No
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF Influence, No
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF Influence, Yes
EEPROM Bank 0xF020_0000 0xF03F_FFFF Influence, Yes

Thank you and BR,

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hi Henry,

Please see my answers inline.

From your previous feedback the CPU is checking the ECC for memories on the ATCM and BTCM and the Flash Wrapper is checking the ECC for the EEPROM region.

I also have found the following info in the TRM:

So, my intention now is to clarify which part (CPU or Flash Wrapper) is performing the ECC checking for each memory region described in the datasheet document:

SPNS162B –APRIL 2012–REVISED JULY 2013, Section 4.9.2 Memory Map Table:

TCM Flash 0x0000_0000 0x00FF_FFFF
TCM RAM + RAM ECC 0x0800_0000 0x0BFF_FFFF
Mirrored Flash 0x2000_0000 0x20FF_FFFF
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF
Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF
EEPROM Bank 0xF020_0000 0xF03F_FFFF
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF

My assumption, according to my understanding of the information is:

a.) Memory regions which ECC is checked by the CPU(TCM-HEC related):

TCM Flash 0x0000_0000 0x00FF_FFFF
TCM RAM (RAM ONLY, not RAM ECC!!!!!) 0x0800_0000 0x0803_FFFF

CT> Yes. One slight correction on your assumption is that if CPU has enabled the ECC for the BTCM then reading the RAM ECC will also subject to ECC checking by the CPU. It is just that the TCRAM wrapper will ignore the error event from the CPU and not assert error to the ESM.

b.) Memory regions which ECC is checked by the Flash Wrapper:

Mirrored Flash 0x2000_0000 0x20FF_FFFF

CT>> No, this is done by the CPU. This is a mirror region of the flash which happens through AXI-S to the ATCM interface.

Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF
EEPROM Bank 0xF020_0000 0xF03F_FFFF

CT>> Yes, all of the above are checked by the flash wrapper.

1.) Are my assumptions a.) and b.) correct? Are all those memory areas checked for ECC?

CT>> See above answers.

CT>> I think you are refering to the below note. If you read from the RAM ECC space and if the CPU has enabled the ECC checking on the BxTCM then you will most likely get an uncorrectable error and abort. But the error event generated by the CPU on its event bus to the TCRAM wrapper will be ingored and hence no error will be asserted to the ESM. So you still need to be careful that CPU can take the abort. If you really want to access the RAM ECC space I will suggest that you first disable the ECC checking on the BTCM.

Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF --> Yes? No?
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF --> Yes? No?
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF -->Yes? No?
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF --> Yes? No?
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF --> Yes? No?
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF --> Yes? No?
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF --> Yes? No?
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF --> Yes? No?

CT>> All of the above will be ECC checked by the flash wrapper. The reason is that physically the ECC word and the data word are all residing in the same banks. When you read from the ECC space, the flash wrapper needs to physically read an entire long word from the flash bank which contains both the data and the ECC. So the flash wrapper will evaluate the welling being of memory even though you are accessing the ECC space and not the data space.

4.) If 3.) is "yes", could you please help me to clarify which which part (CPU or Flash Wrapper) is performing the ECC checking for each ECC memory region?

Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF --> ?
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF --> ?
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF -->?
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF --> ?
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF --> ?
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF --> ?
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF --> ?
Flash Data Space ECC 0xF040_0000 0xF04F_FFFF --> ?

CT>> As stated, all of them by the flash wrapper.

Customer OTP–ECC, TCM Flash Bank 0 0xF004_0000 0xF004_03FF --> ?
Customer OTP–ECC, TCM Flash Bank 1 0xF004_0400 0xF004_07FF --> ?
Customer OTP–ECC, EEPROM Bank 7 0xF004_1C00 0xF004_1FFF -->?
TI OTP–ECC, TCM Flash Bank 0 0xF00C_0000 0xF00C_03FF --> ?
TI OTP–ECC, TCM Flash Bank 1 0xF00C_0400 0xF00C_07FF --> ?
TI OTP–ECC, EEPROM Bank 7 0xF00C_1C00 0xF00C_1FFF --> ?
EEPROM Bank–ECC 0xF010_0000 0xF013_FFFF --> ?

CT>> The registers will work the same as if you are reading from their corresponding non-ECC-space. For example, reading from Customer OTP–ECC, TCM Flash Bank 0 with single bit error detected is like reading from Customer OTP, TCM Flash Bank 0 as far as the registers update is concerned.

Flash Data Space ECC 0xF040_0000 0xF04F_FFFF --> ?

CT>> This one is slightly different. You know that reading from just the flash data space will happen from ATCM and the ECC is done by CPU. Reading the Flash Data ECC space will happen on the bus2 for which the flash wrapper will handle the ECC checking. If there is correctable error, the FCOR_ERR_CNT will be updated.

-----------------------------------------------------------------------------------------------------------------------------------------------------

From TRM:

5.7.4 Flash Correctable Error Count Register (FCOR_ERR_CNT).

This register applies to the main flash banks.

Please help me to correct/ fill the table above:

TCM Flash 0x0000_0000 0x00FF_FFFF Influence, Yes
Mirrored Flash 0x2000_0000 0x20FF_FFFF Influence?
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF Influence, Yes
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF Influence, Yes
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF Influence, No
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF Influence, Yes
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF Influence, Yes
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF Influence, No

EEPROM Bank 0xF020_0000 0xF03F_FFFF Influence, No

CT>> I'm not sure what you are asking here. Are you asking which one of the above is part of the main bank? if yes then physcially, the Customer OTP, EEPROM Bank 7, TI OTP, EEPROM Bank 7 and TI OTP, EEPROM Bank 7 are separate bank. All others are part of the same physical bank0 or bank1. It is just that their ECC spaces or OTP sectors are connected to a different port (bus2) and a different memory map at 0xF0xxxxxx rather than 0x0xxxxxxx. The idea is to map these non-performance critical to bus2 and make ATCM as fast as possible.

Or are you asking which one if the above will cause the FCOR_ERR_CNT to update when there is a single bit error accessing from the named space?

TCM Flash 0x0000_0000 0x00FF_FFFF Influence, Yes
Mirrored Flash 0x2000_0000 0x20FF_FFFF Influence? Yes
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF Influence, Yes
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF Influence, Yes
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF Influence, No
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF Influence, Yes
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF Influence, Yes
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF Influence, No

EEPROM Bank 0xF020_0000 0xF03F_FFFF Influence, No

From TRM:

5.7.37 EEPROM Emulation Correctable Error Count Register (EE_COR_ERR_CNT)

7.) Same as item 6.) but for the EE_COR_ERR_CNT register:

Please help me to correct/ fill the table above:

TCM Flash 0x0000_0000 0x00FF_FFFF Influence, No
Mirrored Flash 0x2000_0000 0x20FF_FFFF Influence, No
Customer OTP, TCM Flash Bank 0 0xF000_0000 0xF000_1FFF Influence, No
Customer OTP, TCM Flash Bank 1 0xF000_2000 0xF000_3FFF Influence, No
Customer OTP, EEPROM Bank 7 0xF000_E000 0xF000_FFFF Influence, Yes
TI OTP, TCM Flash Bank 0 0xF008_0000 0xF008_1FFF Influence, No
TI OTP, TCM Flash Bank 1 0xF008_2000 0xF008_3FFF Influence, No
TI OTP, EEPROM Bank 7 0xF008_E000 0xF008_FFFF Influence, Yes
EEPROM Bank 0xF020_0000 0xF03F_FFFF Influence, Yes

CT>> See above color coded anwers.

0 Henry Abril1 over 12 years ago in reply to Charles Tsai

Intellectual 485 points

Hi Charles,

I would like to clarify the following fields of the FEDACCTRL1 register:

EOFEN
EZFEN
EPEN

a.) Assume that I have configured the mechanism in the following way:

EDACEN = 0xA: Error Detection and Correction events are captured and sent to the ESM
EPEN = Error profiling is enabled.
EOFEN = Event on Ones Fail Disabled
EZFEN = Event on Zeros Fail Disabled
FEDACCTRL2.SEC_THRESHOLD = 8

1.) With this configuration I would expect to have the ESM group 1 channel 6 flag and interrupt reaction (if enabled) only when the SEC_THRESHOLD value is reached. Is this correct?

a.) Assume that I have configured the mechanism in the following way:

EDACEN = 0xA: Error Detection and Correction events are captured and sent to the ESM
EPEN = Error profiling is enabled.
EOFEN = Event on Ones Fail ENABLED
EZFEN = Event on Zeros Fail Disabled
FEDACCTRL2.SEC_THRESHOLD = 8

2.) Should I expect o have the ESM group 1 channel 6 flag and interrupt reaction (if enabled) at any time when a correctable error is detected (single bit error where a one reads as a zero when reading from the OTP or ECC memory locations) even when the SEC_THRESHOLD value is NOT reached in the FCOR_ERR_CNT?

Please confirm that the same apply in case of activating the EZFEN flag:

An ESM error event is generated on a single bit error where a zero reads as a one when reading from the OTP or ECC memory locations even when when the SEC_THRESHOLD value is NOT reached in the FCOR_ERR_CNT.

3.) Can we consider a similar behavior for the EEPROM configuration?

EE_EOFEN
EE_EZFEN
EE_EPEN
EE_SEC_THRESHOLD
EE_ERRCNT
ESM group 1 channel 35

Thank you and BR

0 Charles Tsai over 12 years ago in reply to Henry Abril1

TI__Guru**** 191886 points

Hi Henry,

My answers are inline.

a.) Assume that I have configured the mechanism in the following way:

EDACEN = 0xA: Error Detection and Correction events are captured and sent to the ESM
EPEN = Error profiling is enabled.
EOFEN = Event on Ones Fail Disabled
EZFEN = Event on Zeros Fail Disabled
FEDACCTRL2.SEC_THRESHOLD = 8

1.) With this configuration I would expect to have the ESM group 1 channel 6 flag and interrupt reaction (if enabled) only when the SEC_THRESHOLD value is reached. Is this correct?

CT>> The error profiling mode works independently of the settings on EOFEN and EZFEN. As long as you enable EPEN, it will generate error event to the ESM when the threshold is reached.

a.) Assume that I have configured the mechanism in the following way:

EDACEN = 0xA: Error Detection and Correction events are captured and sent to the ESM
EPEN = Error profiling is enabled.
EOFEN = Event on Ones Fail ENABLED
EZFEN = Event on Zeros Fail Disabled
FEDACCTRL2.SEC_THRESHOLD = 8

CT>> As I said, the profiling mode is independent from the error on one fail or error on zero fail. If you enable EOFEN then it will generate error event to the ESM when an one is read as zero. Same for EZFEN regardless if the threshold is reached or not.

Please confirm that the same apply in case of activating the EZFEN flag:

3.) Can we consider a similar behavior for the EEPROM configuration?

EE_EOFEN
EE_EZFEN
EE_EPEN
EE_SEC_THRESHOLD
EE_ERRCNT
ESM group 1 channel 35

CT>> Works the same way as just explained.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570 - Error detection/correction mechanisms