This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS3137: Information required for ESRAM ECC generation

Part Number: TMS570LS3137
Other Parts Discussed in Thread: HALCOGEN

Hi Team,

We are performing single bit esram test using CCS8.0 version on Hercules TMS570LS3137 controller.
Could see that esmHighInterrupt or notification is getting called once after 3 times single bit ecc exception is generated.

Can you please share us if there is already a work around for this issue? If not, please share the sample piece of code where we can generate single bit ECC SRAM exception when it happens for the first time.

Regards,
M.Sreenivasan.

  • Hello Sreenivasan,

    Please program threshold count in RAMTHRESHOLD register to 0x1. This is the threshold value for the Single-bit Error Correction (SEC) occurrences before the single-bit error interrupt is generated. If this threshold is set to 1 then all single-bit error addresses are captured. To enable the error occurrence detection, the threshold must be set to a non-zero value.

    tcram1REG->RAMTHRESHOLD = 0x1U;
    tcram2REG->RAMTHRESHOLD = 0x1U;
  • Hi Wang,

    We tested with threshold value set to 0x1, but still the issue is noticed.
    Please let me know do you need any further information for this. Are you able to execute notifications for every single bit ecc event on sram?
    Note: We are testing on eval board.

    Regards,
    M.Sreenivasan.
  • Hello,

    I did test before and I got notification every time when I read from the location with 1-bit ecc error.
    The ECC error is generated by flipping 1 bit of ecc data.

    How did you test? Did you try the functions in sys_selftest.c?
  • Hi Wang,

    I tried with checkRAMECC function available in selftest.c file. I could see that in this function, both even and odd memory banks are corrupted (single bit in ECC area) and we are able to get notification twice. But, if we corrupt only single bank (lets say Even bank), then only once we are getting notification and from then onwards, no notifications are noticed.

    Can you check the same test case with changes as mentioned above and let us know the reasons for this behaviour?

    Regards,
    M.Sreenivasan.
  • Hello Sreenivasan,

    I will do test, and come back to you.
  • Hi Wang,

    Please share if you have my updates.

    Regards,
    M.Sreenivasan.
  • Hi Wang,

    Can you share your updates on this?

    Regards,
    M.Sreenivasan.
  • Hi Wang,

    We have been waiting for your inputs on this. Can you please prioritize and provide your inputs on this.

    Regards,
    M.Sreenivasan.
  • Hello,

    To get the error status set everytime, you need to flip the ECC bit at different address, then read the corresponding RAM data. If you flip the ECC bit at the same address as last time, it will read the corrected data from cache (one entry cache).
  • Hi Wang,

    Can you confirm, why it was always reproducible by executing the test case in checkRAMECC at the same address?

    Regards,
    M.Sreenivasan.
  • Hi Wang,

    Also, for Cortex R4F, we dont have Cache and instead we have tightly coupled RAM. So, can you please add this in errata and mention the work around?

    Regards,
    M.Sreenivasan.
  • Hi Wang,

    Any updates on this?

    Regards,
    M.Sreenivasan.
  • This one entry cache is only for RAM ECC. This is not the 32-KB L1 cache mentioned in the datasheet. If you want to get the error, just change the address everytime.
  • Thanks Wang. Yes with different address, we are getting exceptions. But, Halcogen doesn't provide option to enable cache configuration and by default it is disabled. How it will be cached as we are accessing only TCRAM areas?

    Regards,
    M.Sreenivasan.
  • Hello Sreenivasan,

    Let me explain the hard error cache mentioned in our conference call.

    The Cortex-R4 processor attempts to correct 1-bit errors in the SRAM by writing the corrected data back to the SRAM and retrying the access. If a 1-bit error is due to a hard fault, then doing this will not change the data read from the SRAM, and when the access is retried, the same error will be detected again and the processor will livelock, forever detecting the error and retrying and not making any progress.

    The purpose of the hard error cache is to prevent CPU from reading the SRAM which has permanent single bit error. Let's say there is a defect in one of the memory cells. If you read from it the CPU will detect it as a single bit ECC error. What the CPU will try to do is to save the corrected data to the hard error cache and also write back the corrected to the SRAM and then retry. Next time if the CPU reads from the same error address then it simply read from the cache instead of reading from the SRAM since there is a match in the address.

    You can use the RAMTHRESHOLD register to setup the single bit threshold occurrences. If this threshold is set to 3, the single-bit error interrupt is generated when the number of single-bit errors corrected by the CPU exceeds 3.
  • Hi Wang,

    Yes, you are correct and that is the reason why we will perform memory write operation in the exception handler to correct the data in SRAM for single bit ECC error. We would like to know how to get exception if we encounter single bit ECC error at the same address?
    Also, the above explanation may not be applicable in all the cases as there could be some thread might be updating the memory which was read by other thread.

    Regards,
    M.Sreenivasan.
  • Hi Wang,

    I tested the below scenario and find its observation:
    Executed checkB0RAMECC method and then read back the memory location whose ECC value got corrupted as part of checkB0RAMECC function. We always end up executing the exception handler (single bit ECC).
    Please note that as a part of checkB0RAMECC function, we will read the corrupted memory address at the end of checkB0RAMECC and then we read it again the same address in order to generate the exception. Can you please check the behaviour on this and why it requires 2 times to access the corrupted memory in order to generate the single bit exceptions?

    For reference, please find the code used in checkB0RAMECC:

    void checkB0RAMECC(void)
    {
    volatile uint64 ramread = 0U;
    volatile uint32 regread = 0U;
    uint32 tcram1ErrStat, tcram2ErrStat = 0U;

    uint64 tcramA1_bk = tcramA1bit;
    uint64 tcramA2_bk = tcramA2bit;
    volatile uint32 i;
    /* USER CODE BEGIN (36) */
    /* USER CODE END */

    /* enable writes to ECC RAM, enable ECC error response */
    tcram1REG->RAMCTRL = 0x0005010AU;
    tcram2REG->RAMCTRL = 0x0005010AU;

    /* the first 1-bit error will cause an error response */
    tcram1REG->RAMTHRESHOLD = 0x1U;
    tcram2REG->RAMTHRESHOLD = 0x1U;

    /* allow SERR to be reported to ESM */
    tcram1REG->RAMINTCTRL = 0x1U;
    tcram2REG->RAMINTCTRL = 0x1U;

    /* cause a 1-bit ECC error */
    _coreDisableRamEcc_();
    tcramA1bitError ^= 0x1U;
    _coreEnableRamEcc_();

    /* disable writes to ECC RAM */
    tcram1REG->RAMCTRL = 0x0005000AU;
    tcram2REG->RAMCTRL = 0x0005000AU;

    /* read from location with 1-bit ECC error */
    ramread = tcramA1bit;
    }

    main function will be as below:

    while(1)
    {
    ....
    checkB0RAMECC()
    ramread = tcramA1bit;
    }

    Regards,
    M.Sreenivasan.
  • Hi Wang,

    Can you update on this case?

    Regards,
    M.Sreenivasan.
  • Hello Sreenivasan,

    I did the test on LS3137 HDK, the error is generated at the first read.
  • Hi Wang,

    It will generated for the first time when we read the corrupted address first time. But, for the same address to generate exception, we need to read it twice.

    Regards,
    M.Sreenivasan.
  • Hello Sreenivasan,

    As I said before, CPU will read from the cache instead of reading from the SRAM if the address is same.

    The Cortex-R4 processor attempts to correct 1-bit errors in the SRAM by writing the corrected data back to the SRAM and retrying the access. If a 1-bit error is due to a hard fault, then doing this will not change the data read from the SRAM, and when the access is retried, the same error will be detected again and the processor will livelock, forever detecting the error and retrying and not making any progress.

    The purpose of the hard error cache is to prevent CPU from reading the SRAM which has permanent single bit error. Let's say there is a defect in one of the memory cells. If you read from it the CPU will detect it as a single bit ECC error. What the CPU will try to do is to save the corrected data to the hard error cache and also write back the corrected to the SRAM and then retry. Next time if the CPU reads from the same error address then it simply read from the cache instead of reading from the SRAM since there is a match in the address.

    You can do like this:
    1. address 1: corrupt its ECC, read the data to get ESM error
    2. address 2: corrupt its ECC, read the data to get ESM error
    3. address 1 again: corrupt its ECC, read the data to get ESM error