TMS570LS3137: Information required for ESRAM ECC generation

sreenivasan m

Part Number: TMS570LS3137
Other Parts Discussed in Thread: HALCOGEN

Hi Team,

We are performing single bit esram test using CCS8.0 version on Hercules TMS570LS3137 controller.
Could see that esmHighInterrupt or notification is getting called once after 3 times single bit ecc exception is generated.

Can you please share us if there is already a work around for this issue? If not, please share the sample piece of code where we can generate single bit ECC SRAM exception when it happens for the first time.

Regards,
M.Sreenivasan.

over 5 years ago

0 QJ Wang over 5 years ago

TI__Guru**** 186256 points

Hello Sreenivasan,

Please program threshold count in RAMTHRESHOLD register to 0x1. This is the threshold value for the Single-bit Error Correction (SEC) occurrences before the single-bit error interrupt is generated. If this threshold is set to 1 then all single-bit error addresses are captured. To enable the error occurrence detection, the threshold must be set to a non-zero value.

tcram1REG->RAMTHRESHOLD = 0x1U;
tcram2REG->RAMTHRESHOLD = 0x1U;

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Hi Wang,

We tested with threshold value set to 0x1, but still the issue is noticed.
Please let me know do you need any further information for this. Are you able to execute notifications for every single bit ecc event on sram?
Note: We are testing on eval board.

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

Hello,

I did test before and I got notification every time when I read from the location with 1-bit ecc error.
The ECC error is generated by flipping 1 bit of ecc data.

How did you test? Did you try the functions in sys_selftest.c?

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Hi Wang,

I tried with checkRAMECC function available in selftest.c file. I could see that in this function, both even and odd memory banks are corrupted (single bit in ECC area) and we are able to get notification twice. But, if we corrupt only single bank (lets say Even bank), then only once we are getting notification and from then onwards, no notifications are noticed.

Can you check the same test case with changes as mentioned above and let us know the reasons for this behaviour?

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

Hello Sreenivasan,

I will do test, and come back to you.

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Hi Wang,

Please share if you have my updates.

Regards,
M.Sreenivasan.

0 sreenivasan m over 5 years ago in reply to sreenivasan m

Genius 3705 points

Hi Wang,

Can you share your updates on this?

Regards,
M.Sreenivasan.

0 sreenivasan m over 5 years ago in reply to sreenivasan m

Genius 3705 points

Hi Wang,

We have been waiting for your inputs on this. Can you please prioritize and provide your inputs on this.

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

Hello,

To get the error status set everytime, you need to flip the ECC bit at different address, then read the corresponding RAM data. If you flip the ECC bit at the same address as last time, it will read the corrected data from cache (one entry cache).

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Hi Wang,

Can you confirm, why it was always reproducible by executing the test case in checkRAMECC at the same address?

Regards,
M.Sreenivasan.

0 sreenivasan m over 5 years ago in reply to sreenivasan m

Genius 3705 points

Hi Wang,

Also, for Cortex R4F, we dont have Cache and instead we have tightly coupled RAM. So, can you please add this in errata and mention the work around?

Regards,
M.Sreenivasan.

0 sreenivasan m over 5 years ago in reply to sreenivasan m

Genius 3705 points

Hi Wang,

Any updates on this?

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

This one entry cache is only for RAM ECC. This is not the 32-KB L1 cache mentioned in the datasheet. If you want to get the error, just change the address everytime.

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Thanks Wang. Yes with different address, we are getting exceptions. But, Halcogen doesn't provide option to enable cache configuration and by default it is disabled. How it will be cached as we are accessing only TCRAM areas?

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

Hello Sreenivasan,

Let me explain the hard error cache mentioned in our conference call.

The Cortex-R4 processor attempts to correct 1-bit errors in the SRAM by writing the corrected data back to the SRAM and retrying the access. If a 1-bit error is due to a hard fault, then doing this will not change the data read from the SRAM, and when the access is retried, the same error will be detected again and the processor will livelock, forever detecting the error and retrying and not making any progress.

The purpose of the hard error cache is to prevent CPU from reading the SRAM which has permanent single bit error. Let's say there is a defect in one of the memory cells. If you read from it the CPU will detect it as a single bit ECC error. What the CPU will try to do is to save the corrected data to the hard error cache and also write back the corrected to the SRAM and then retry. Next time if the CPU reads from the same error address then it simply read from the cache instead of reading from the SRAM since there is a match in the address.

You can use the RAMTHRESHOLD register to setup the single bit threshold occurrences. If this threshold is set to 3, the single-bit error interrupt is generated when the number of single-bit errors corrected by the CPU exceeds 3.

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Hi Wang,

Yes, you are correct and that is the reason why we will perform memory write operation in the exception handler to correct the data in SRAM for single bit ECC error. We would like to know how to get exception if we encounter single bit ECC error at the same address?
Also, the above explanation may not be applicable in all the cases as there could be some thread might be updating the memory which was read by other thread.

Regards,
M.Sreenivasan.

0 sreenivasan m over 5 years ago in reply to sreenivasan m

Genius 3705 points

Hi Wang,

I tested the below scenario and find its observation:
Executed checkB0RAMECC method and then read back the memory location whose ECC value got corrupted as part of checkB0RAMECC function. We always end up executing the exception handler (single bit ECC).
Please note that as a part of checkB0RAMECC function, we will read the corrupted memory address at the end of checkB0RAMECC and then we read it again the same address in order to generate the exception. Can you please check the behaviour on this and why it requires 2 times to access the corrupted memory in order to generate the single bit exceptions?

For reference, please find the code used in checkB0RAMECC:

void checkB0RAMECC(void)
{
volatile uint64 ramread = 0U;
volatile uint32 regread = 0U;
uint32 tcram1ErrStat, tcram2ErrStat = 0U;

uint64 tcramA1_bk = tcramA1bit;
uint64 tcramA2_bk = tcramA2bit;
volatile uint32 i;
/* USER CODE BEGIN (36) */
/* USER CODE END */

/* enable writes to ECC RAM, enable ECC error response */
tcram1REG->RAMCTRL = 0x0005010AU;
tcram2REG->RAMCTRL = 0x0005010AU;

/* the first 1-bit error will cause an error response */
tcram1REG->RAMTHRESHOLD = 0x1U;
tcram2REG->RAMTHRESHOLD = 0x1U;

/* allow SERR to be reported to ESM */
tcram1REG->RAMINTCTRL = 0x1U;
tcram2REG->RAMINTCTRL = 0x1U;

/* cause a 1-bit ECC error */
_coreDisableRamEcc_();
tcramA1bitError ^= 0x1U;
_coreEnableRamEcc_();

/* disable writes to ECC RAM */
tcram1REG->RAMCTRL = 0x0005000AU;
tcram2REG->RAMCTRL = 0x0005000AU;

/* read from location with 1-bit ECC error */
ramread = tcramA1bit;
}

main function will be as below:

while(1)
{
....
checkB0RAMECC()
ramread = tcramA1bit;
}

Regards,
M.Sreenivasan.

0 sreenivasan m over 5 years ago in reply to sreenivasan m

Genius 3705 points

Hi Wang,

Can you update on this case?

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

Hello Sreenivasan,

I did the test on LS3137 HDK, the error is generated at the first read.

0 sreenivasan m over 5 years ago in reply to QJ Wang

Genius 3705 points

Hi Wang,

It will generated for the first time when we read the corrupted address first time. But, for the same address to generate exception, we need to read it twice.

Regards,
M.Sreenivasan.

0 QJ Wang over 5 years ago in reply to sreenivasan m

TI__Guru**** 186256 points

Hello Sreenivasan,

As I said before, CPU will read from the cache instead of reading from the SRAM if the address is same.

The Cortex-R4 processor attempts to correct 1-bit errors in the SRAM by writing the corrected data back to the SRAM and retrying the access. If a 1-bit error is due to a hard fault, then doing this will not change the data read from the SRAM, and when the access is retried, the same error will be detected again and the processor will livelock, forever detecting the error and retrying and not making any progress.

The purpose of the hard error cache is to prevent CPU from reading the SRAM which has permanent single bit error. Let's say there is a defect in one of the memory cells. If you read from it the CPU will detect it as a single bit ECC error. What the CPU will try to do is to save the corrected data to the hard error cache and also write back the corrected to the SRAM and then retry. Next time if the CPU reads from the same error address then it simply read from the cache instead of reading from the SRAM since there is a match in the address.

You can do like this:
1. address 1: corrupt its ECC, read the data to get ESM error
2. address 2: corrupt its ECC, read the data to get ESM error
3. address 1 again: corrupt its ECC, read the data to get ESM error

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LS3137: Information required for ESRAM ECC generation