This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS3137: Problem with multiple TCRAM ECC error detection for same memory location

Part Number: TMS570LS3137

Hi,

On the TMS570LS3137, when using the TCRAM ECC features, it seems that when a single bit error is detected and corrected once, if a subsequent 1bit error happens for the same memory location,
the status flags are not set. Even a double bit error will not be detected if on the same memory location(no interrupt will be generated, and the status flag is not set).
For detection(1bit or 2bit) to reoccur, it needs to happen to a different memory location before it can be detected on the first location.

Is this behavior normal or did I misconfigured the DSP ?


    // First configure and then enable the ECC Checks
    // RAMOCCUR register must be cleared before setting the threshold
    regTcRam1.ramOccur.bit.sErrOccur = 0u;
    regTcRam2.ramOccur.bit.sErrOccur = 0u;
    regTcRam1.ramThreshold.bit.threshold = 1u;
    regTcRam2.ramThreshold.bit.threshold = 1u;

    // Disable single bit error interrupt.
    regTcRam1.ramIntCtrl.bit.sErrEn = 0u;
    regTcRam2.ramIntCtrl.bit.sErrEn = 0u;

    // Enables ECC detection
    regTcRam1.ramCtrl.bit.eccDetEn = 0x1;
    regTcRam2.ramCtrl.bit.eccDetEn = 0x1;


Regards,

  • Hello Charles,

    Your description of the behavior is correct. The reason for this is we don't want the CPU to get bogged down with repeated aborts/ECC error reports. The way that this works is the corrected data is stored in a buffer within the CPU. For repeated accesses to the same address location, it will access the corrected data in the buffer so the error is not generated. The aspect that I need to discuss with my colleagues so I am clear on my answer is what happens with the data once it is corrected. I know at some point, the corrected value is eventually written back to the associated RAM location to eliminate the single bit error whenever possible (i.e., when it is truly a result of a transient and not permanent error). This is so there isn't an accumulation of single bit errors that will increase the probability of an eventual uncorrectable fault/multi-bit error.

    In addition to the write back operation details, I also need to double check on the multi-bit errors and how they are handled with respect to the buffer and the write back operation.
  • Hello Charles,

    Sorry for the delay in getting back with you, but this required some digging to find the exact description of the behavior. Below is an excerpt from the ARM TRM that describes the error handling pretty well.

      

    The part of your question that this doesn't clear up is the observed behavior regarding the double bit fault. The paragraph above clearly states that the ECC logic in the CPU will generate the abort for any double or multi-bit error.

    So we can better understand what might be happening, is there some code that demonstrates this behavior that you can post here? Preferably a CCS project, but if not it should be OK. One thing I am curious about is what does your code do when the initial uncorrectable error occurs? Do you clear the error flags and the uncorrectable error address register before accessing again? Or do you immediately go back and try to load the value again?