This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Tracking total single bit ECC corrections

Other Parts Discussed in Thread: HALCOGEN

Hello,

I am trying to write some test code that simply tracks the total # of single bit ECC corrections and double bit uncorrectable errors. My code fills an array of RAM with known values and then just sits there reading it back checking for errors against the known value.

I've gone through HalCoGen to initialize everything and the checkRAMECC function appears to work. I've modified the instruction that branches to itself forever to 

ramErrorReal
		bl		dataAbort2bit		; branch to custom abort handler in dAbortInterrupt.c
		ldmfd	r13!, {r0 - r12, lr}
		subs	pc, lr, #0x4

with my custom function simply increasing a global variable to track total 2 bit errors (when my test finds a value that's wrong, it corrects rewrites it to the known value, so I don't expect 1 double-bit error to increase my counter indefinitely:

void dataAbort2bit(void)
{
	gioSetBit(hetPORT1, 17, 1);
	ECC2_errors++;
}

However, I can't figure out how to do something similarly for the single bit errors sing the error handling is built into the CPU for speed purposes. I've thought about setting the RAMTHRESHOLD register really high and reading the RAMOCCUR register, but then my single bit errors won't actually be fixed by the CPU. Is there a way I can let the built-in process fix the single bit error, but still have an interrupt called that allows me to keep track of the single-bit event?

Thanks,

Andrew

  • Hello Andrew,

      Please see below 1-bit ECC error handling by the CPU from the Cortex-R4 TRM. The memory will be corrected and then retried. You can set the RAMTHRESHOLD to 1 and generates the single bit error events to the ESM module. The ESM module will then generate the interrupt to the VIM. In the esm interrupt ISR you can increment your global counter if the esm interrupt is due TCRAM correctable error.

    "When a correctable ECC error is detected on a TCM read made by the instruction-side or data-side, the processor normally generates the correct data and writes it back to the TCM. In the meantime, the processor retries the read to fetch the correct instruction or data."

    regards,

    Charles

  • Thanks for the response.

    In the datasheet- it does look like Group 1 channel 26 and Channel 28 are tied to correctable errors. Would something like this work?

    void esmGroup1Notification(int bit)
    {
    	if (bit == 26 || bit == 28)
    	{
    		ECC1_errors++;
    	}
    	return;
    }

    If that's the case = should I do the same thing for the uncorrectable errors rather than the custom function I wrote?

    Or do I need to tie those to a VIM channel interrupt in HalCoGen rather than handle it within the esmGroup1Notification function?

    Thanks,

    Andrew

  • Andrew,

      It should work. Make sure you enable the corresponding channels in the ESM for generating interrupts. Also need to enable the corresponding channel in VIM for ESM if you map the Group 1 errors to low level interrupt.

      You can not do the same for uncorrectable ECC error using an interrupt ISR. Uncorrectable ecc errors always generate abort because of it being a sever error if you enable ECC checking in the CPU. It does not generate interrupt. You will need to count these uncorrectable errors via the abort handler. 

    regards,

    Charles

  • Andrew,

    just a side-note:  The R4-PMU hardware is able to count various events, including the number of correctable single bit errors over time.