This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to enable RAM ECC

I am trying to figure out the steps to enable ECC on the RAM of my TMS570 development kit. The TCRAM section of the manual says: "The ECC checking logic inside the Cortex-R4F CPU is disabled after reset, and needs to be enabled by programming the auxiliary control register via a CP15 instruction". I can't find details on this register or the CP15 instruction.

Can someone detail the steps to do this?

 

Also, I tried clearing the lower 4 bits in the RAMCTRL register to enable ECC but they don't change. How do I enter privilage mode to be able to do this?

 

Thanks

DH

  • Here are an examples:

    _Enable_ECC

     ; reading/writing seconday Aux secondary Reg
        MRC p15,#0,r1,c9,c12,#0
        ORR r1, r1, #0x00000010 
        MCR p15,#0,r1,c9,c12,#0     


        MRC p15, #0, r1, c1, c0, #1  ;Enable Parity Check enable D0TCM
        ORR r1, r1, #0x1 <<26
        DMB
        MCR p15, #0, r1, c1, c0, #1
        ISB

        MRC p15, #0, r1, c1, c0, #1  ;Enable Parity Check enable D1TCM
        ORR r1, r1, #0x1 <<27
        DMB
        MCR p15, #0, r1, c1, c0, #1
        ISB

        MOV PC, lr   

    _Disable_ECC 
     
        MRC p15, #0, r1, c1, c0, #1  ;Disable B0TCM PARECCENRAM[1]
        MVN R0,#0x1 <<26
        AND R1 ,R1, R0
        MVN R0,#0x1 <<27
        AND R1 ,R1, R0

       DMB
       MCR p15, #0, r1, c1, c0, #1
       ISB

       MRC p15, #0, r1, c1, c0, #1  ;disable B1TCM  PARECCENRAM[2]
       DMB
       MCR p15, #0, r1, c1, c0, #1
       ISB

      ;PMNC to pass ERROR 
      MRC p15,#0,r1,c9,c12,#0       
      MVN R0,#0x00000010  ; (Clear 4th bit of PMNC register)
      AND R1 ,R1, R0
      DMB
      MCR p15,#0,r1,c9,c12,#0 
      ISB
     
      MOV PC, lr

    Regards,

    Haixiao

  • Thanks.

     

    I am a little confused on where exactly the TCRAM registers are located. Section 4.7 of the Tech manual shows their offsets and says they are accessed "through the system module registers’ space in the Cortex-R4F CPU’s memory map". The memory map shows this sections starts at address 0xFFF80000. Table 2 shows this address as the start of the DMA RAM register space. I don't see a table entry for the TCRAM module in this table. Can you point me to base address for these registers?

     

     

    Also, is there a way for me to corrupt the ECC bits to force a failure so I can be sure that the ECC is working correctly?

    Thanks

    DH

  • The TCRAM registers' base address are stated in the datasheet:

     

    I will update the TRM to make it more clear.

    There are also  'disable/enable' bits in the TCRAM registers. However, these bits only block or allow the ECC errors (check and generated by R4 CPU)  to go to ESM module. Therefore, you have to enable/disable the ECC through the CPU registers (a little bit confusing?).

    To test ECC:

    1. Eable the ECC through R4 CPU registers.

    2  Initilize the memory (fill in correct ECC)

    3.  Make sure the TCRAM wrapper is not blocking the ESM error.

    4. Disable the ECC through R4 CPU registers.

    5.  Modified the memory to insert a single bit error.

    5. Read the memory location with the injected error.

    6 Now, you should expect an ECC error and some interrupt if the interrupt is enabled.

    Regards,

    Haixiao 

  • So for step 2 above I want to turn on the hardware initialization by writing to the MSIENA and MINITGCR registers. This can only be done in privileged mode which I am still unsure how to do. Above you pasted in some code to enable ECC in privilege mode but I'm not sure which steps transition you from user to supervisor mode to do this.

  • Most of these steps have to be run in privilege mode. If you run this during startup, it is privilege mode by default. If you want to run it in the fly, you need to

    disable ECC -> modify RAM ->enable ECC->Read RAM. To enter privilege mode from user mode, you can generate a software interrupt, and then change the operating mode in the ISR. You should choose where you want to switch the operating mode and switch back.

    In bench, I run everything in privilege mode.

    Regards,

    Haixiao

  • How do you initialize the memory (fill in correct ECC)?

    Regards,
    Dave

  • I am just back from vacation. Sorry for the delay.

    After enabling the ECC, perform RAM initialization. Please see chapter 1.5 Memory Module Hardware Initialization in the TRM(SPNU489b) for details.

    The ECC area will be automatically filled during hardware initialization if ECC is enabled.

    You can also fill in the memory manually by software.

    Regards,

    Haixiao

  • Sorry for my delay in replying, I have been on holiday and also I was required to work on something else in the meantime.

    I now have the hardware initialisation working - the RAM gets set to 0s and the ECC area gets set to 0x0C throughout.

    I thought that I would be able to force a single bit error by disabling the ECC and then changing one bit in RAM, enabling the ECC and then reading the RAM where the change has been made. This does not give me a single bit error however.

    I can only get an error if I disable the ECC and then change an address in ECC memory, then enable the ECC and then read the RAM. This gives me a multi-bit error DERR (bit 5 in RAMERRSTATUS). Is this the way it is supposed to work, or am I doing something wrong?

    Regards,

    Dave

  • Dave,

    There are so many posts about RAM ECC. I think the right way is to provide an example code to check.

    Let me see if we can get this done by the end of this week.

    Regards,

    Haixiao

  • Dave Saggs said:

    Sorry for my delay in replying, I have been on holiday and also I was required to work on something else in the meantime.

    I now have the hardware initialisation working - the RAM gets set to 0s and the ECC area gets set to 0x0C throughout.

    I thought that I would be able to force a single bit error by disabling the ECC and then changing one bit in RAM, enabling the ECC and then reading the RAM where the change has been made. This does not give me a single bit error however.

    HW: You are right. Whenever the CPU/R4 writes to the RAM, it writes both the RAM and ECC bits even if the ECC is disabled. The ECC is only 'disabled' for reporting errors. The CPU/R4 still generate/check ECC. Due to this reason, you can only corrupt the ECC bits and you can not corrupt the RAM data body. This is different from M3 and Arm7 CPU.

    I can only get an error if I disable the ECC and then change an address in ECC memory, then enable the ECC and then read the RAM. This gives me a multi-bit error DERR (bit 5 in RAMERRSTATUS). Is this the way it is supposed to work, or am I doing something wrong?

    HW: It should be able to work. I attached an example here. I built it in CCS3.3 and verified the function on the USB stick. 2311.TMS570_RTI.zip. The code is just example code how to call the function, which is not used for production purpose. The code is provided as it is. We are not responsible for any problem caused by this example.

    Regards, Haixiao

    Dave

  • Thank you very much for your help.

    I now have ECC working in RAM. I am able to force single bit errors, by corrupting the ECC.

    The RAM does not appear to get corrected though. If the ECC module sees that the RAM is not consistent with the ECC code, shouldn't the RAM be corrected?

    The RAM stays as it was before the ECC was corrupted.

    Also the ESM pin is set. I have tried to drive the ESM pin back high (to clear the error), but this does not appear to work. Is this the right way of doing this?

    void esmClearErrPin(void)
    {
        /** - clear only when the error PIN is set          */
        /* Register address 0xFFFF F524 */
        esmErrorPinSet = esmREG->ESMEPSR;
        if(!esmREG->ESMEPSR)                   

        /** - clear the error using Error Key Register  */
        /* Register address 0xFFFF F538 */
        esmREG->ESMEKR      = 0x5;           

        /** - Wait till the error pin is reset             */
         while(!esmREG->ESMEPSR);           
    }
    I just get stuck in the loop waiting for the error pin to reset.

    Regards,
    Dave

    p.s. Sorry again for the delay, but my computer stopped working and I have had to reinstall my software

  • Dave Saggs said:

    Thank you very much for your help.

    I now have ECC working in RAM. I am able to force single bit errors, by corrupting the ECC.

    The RAM does not appear to get corrected though. If the ECC module sees that the RAM is not consistent with the ECC code, shouldn't the RAM be corrected?

    The RAM stays as it was before the ECC was corrupted.

    HW: It is a single bit error in ECC. The CPU will decode and find the failure bit in ECC and Correct the ECC bit. Any single bit error in RAM or ECC bits can be decoded and corrected.

    Also the ESM pin is set. I have tried to drive the ESM pin back high (to clear the error), but this does not appear to work. Is this the right way of doing this?

    void esmClearErrPin(void)
    {
        /** - clear only when the error PIN is set          */
        /* Register address 0xFFFF F524 */
        esmErrorPinSet = esmREG->ESMEPSR;
        if(!esmREG->ESMEPSR)                   

        /** - clear the error using Error Key Register  */
        /* Register address 0xFFFF F538 */
        esmREG->ESMEKR      = 0x5;           

        /** - Wait till the error pin is reset             */
         while(!esmREG->ESMEPSR);           
    }
    I just get stuck in the loop waiting for the error pin to reset.

    HW: It looks correct. In my example, it keeps generating single bit error. How about your test case? You can connect CCS, go to register 0xFFFFF538 and write 5 to the register. After that, you should be able to see the nError LED turns off.

    Regards,
    Dave

    p.s. Sorry again for the delay, but my computer stopped working and I have had to reinstall my software

  • Thank you for the explanation.

    My code is generating single bit errors OK.

    I have added some code to generate double bit errors, which cause an exception vector to be hit, as suggested in the Technical Reference Manual this behaviour is expected. We will force a reset or power down reset in this case.

    I have to write 0x0a, followed by 0x05 then 0x00, in order to reliably clear the ESM pin. I am not sure why I have to do this, but it seems to work OK this way.

    Regards,

    Dave

  • Hi Haixiao,

     

    I read the data then i observed the DATA ABORT. I also observed the same behavior for the Single bit error too.

    By default the ECC data is 0x0C0C0C0C for single flip i modified to 0x0D0D0D0D and for  double flip i modified 0x0F0F0F0F.

    I am introducing these errors on different ECC locations.

     

    I had another issue, the below is the description for that.

    I am observing DATA ABORT when i read the ECC data after flipping a single bit in a particular Ram location.

    Procedure followed:

    1. Modify the RAM data by using the walking 1's

    2. read ECC and compare it with a predefined table.

    Observations:

    The table is prepared by using the walking 1's on the RAM data. I am expecting the same behavior when in run time if i used walking ones on a specific RAM location and compare the corresponding ECC with the table.

    Observations:

    I modified the RAM data from 0 to 1 then the corresponding ECC value  changed to 0xADOBADOB. Then i try to read the ECC then observing the DATA ABORT and when i reached the DATA ABORT the corresponding ECC value updated to 0x6E6E6E6E, which is the expected.

    Can you please let me know how to avoid this DATA ABORT.

     

    Thanks & Regards

    Krishna B

  • Hello Haixiao,

    In your attached example file TCM_ecc_test_r4.c within this MAIL Trail, I see that you mentioed RAMCTRL  ECC_DETECT_EN field as 0x05 before writing to ECC Address Space.

    But in the following link:

    http://e2e.ti.com/support/microcontrollers/hercules/f/312/t/155421.aspx

    Posted by  on Jan 16, 2012 1:25 PM

    there is an example from Sunil Oak, which shows the above mentioned field as 0x0A before modifying ECC RAM Space.

    Can you please tell me which code is correct?

    Any information will help me.

    Thank you.

    Regards

    Pashan

     

     

  • I did that because:

    I want to disable the ECC dectect of TCRAMW when I inject an error to the TRCRAMW.

    Regards,

    Haixiao

  • Hello Haixiao,

    I thought in Sunil's example also he was injecting ERROR if you see the following line :

    tcramA1bitError ^= 0x1; 

    Does it mean that whether the ECC_DETECT_EN field value is 0x05 or 0x0A, it doesn;t matter for testing the Single Bit Error checking test being performed in both the test case [Suni's code and your code]?

    That is the confusion I am trying to sort out.

    There is also ECC Check Logic inside Cortex-R4 and so on.

    Hence all these confusion.

    ANy SIGNAL FLOW diagram between ESM/Cortex-R4/TCRAMW and the relation between different Coprocessor Bits for ECC as well as TCRAMW RAMCTL register will help me a lot to undersrtand.

    Thank you.

    Regards

    Pashan

     

  • Hello Pashan:

    The confusion, I believe, comes from the two different types of error notifications and their sources. The ECC logic within the core will cause an abort when an uncorrectable error occurs. In addition, signals are generated by the core to the wrapper that communicate that the error occurred. When the ECC is disabled within the wrapper and not within the core, then the errors to the wrapper are ignored and will not be captured by the ESM; however, you will get an abort generated by the core. If ECC is disabled within the core and not within the wrapper, no error will be generated (abort or ESM) since the core will not notify the wrapper that an error occurred.