This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS3137 RAM ECC double bit error test. Data abort handling.

Other Parts Discussed in Thread: TMS570LS3137, HALCOGEN

Hi TI team,

I am testing TMS570LS3137 RAM ECC double bit error test. So far I can insert a double bit error, read at the RAM address XYZ, and trigger the double bit error. I can also see the data abort. After data abort error handling, the instruction continue from the read access at address XYZ. Now a new data abort is triggered since the data/ECC pair is still wrong.

I want to continue with the other tests. So I would like to only trigger one data abort, and after data abort handling, the code should not trigger a second data abort at the same address.

In the data abort error handling routine, I tried to write 0x0 to the RAM address XYZ. I had expected that now the correct data and ECC pair wouldl be restored. But actually the write access triggered a new data abort.

It seems to me that I can no more access (read from or write to) an address if a double bit error was detected at the same address before the access. Could you please confirm this? Any idea how to avoid the second data abort?

Thanks for the support!

Libo

 

 

  • I am seeing a similar behavior. I am using the HalCoGen checkRAMECC() function. I am able to generate the two double bit errors as expected, but I get another data abort when trying to clear the uncorrectable error address register.

    /* Read the corrupted data to generate double bit error */
    ramread = TCRAM_A2_BIT;
    ramread = TCRAM_B2_BIT;
    
    /* Read to clear the Uncorrectable Error Address Register */
    ramread = tcram1REG->RAMUERRADDR;  // Causing Antoher Data Abort interrupt
    ramread = tcram2REG->RAMUERRADDR;

  • Jeremy,

    You need to do a 64 bit write. Any non-64 bit write is actually a read-modify-write operation.

    Thanks and regards,

    Zhaohong

  • Jeremy,

    Typically, the return instruction from a data abort is required to again execute the instruction that caused the data abort in the first place. This assumes that you have handled and eliminated the cause of the data abort in the handler the first time.

    In this case, however, since you are deliberately generating the data abort, you need to return back to the instruction after the one that generates the abort. This is managed in the dabort.asm file generated by HALCoGen.

    Regards, Sunil

  • Hi Sunil,

    I am using the checkRAMECC function and the dabort.asm from HalCoGen, but I am still seeing this same problem. The first two data abort instructions seem to work as expected, the data abort is triggered and dabort.asm determines that they are deliberate then returns to the instruction after the one that causes the abort. The problem occurs when I'm trying to reset the TCRAM1 RAMUERRADDR register by reading it.

  • Hi Jeremy,

    Please zip up your CCS project and attach it. I use the default self-check code generated by HALCoGen and do not see any issues getting to the main routine after completing all the self-checks.

    Regards, Sunil

  • Thanks Sunil,

    I figured it out, it was a simple mistake in the definition of the tcramBase struct. It seems to be working now.

  • Libo,

    Please confirm that the issue you were observing is also addressed in this thread.

    Regards, Sunil

  • Hi Sunil,

    Zhahong suggested

    "You need to do a 64 bit write. Any non-64 bit write is actually a read-modify-write operation."

    This note helps, but I saw some effects which I can't explain.

    Basically my test looks like this (all write access and read access uses a pointer to a 64bit long word):

    0. Enable ECC in CPU and ECC detection in RAM.

    1. Write data 0xOPUQ to address XYZ where XYZ is 64 bit aligned.

    2. Read ECC from address "XYZ+4MB".

    3. Disable ECC. Enable ECC write access.

    4. Write ECC xor 0x1 or 0x03 to address "XYZ+4MB".

    5. Enable ECC again.

    6. Read data from address XYZ.

    7. Depending on what I set in Step 4, I can trigger single bit error or double bit error as expected.

    8. If double bit error is triggered, I write 64bit value 0x0 to address XYZ in data abort handling. After the error handling, the read access in step 6 works without triggering the 2nd data abort. (At this step Zhaohong's suggestion is helpful to avoid the 2nd data abort).

    I have tested this flow with different values for the data 0xOPUQ above. The test value ranges from 0x0 to 0xF. This flow works only for 0x0, 0x1, 0x2, 0x4, 0x8, 0xD, 0xE. For the other values, I get a data abort at step 2. No bits in ESM error status register and no bits in RAMERRSTATUS is set.

    Moreover, I also tested with 0x0040AAAA, it works for the flow above. But failed at step 2 with the value 0xABCDEF23.

    As I am a newbie in ARM processor, I can't explain why it triggered a data abort for those values.

    For my test, I can live with the only testing with the value 0x0. But I would like to find out the reason for those data abort. Maybe you can give me some hints.

    Thank you.

    Libo

  • Libo,

    You need to be very careful in dynamically disable/reenable ECC checking. If you use assembly code to disable/reenable ECC checking in CPU, you need to insert a data barrier "DSB" before you do anything else. If you enable/disable ECC in the RAM wrapper, you need to read the value back to make sure that register is written before you do anything else.

    Thanks and regards,

    Zhaohong

  • Hi Zhaohong,

    thanks for the feedback. Below is my function to enable and disable ECC .

    void EnableRamEcc(void)
    {
        RAM_REGISTERS_TYPE *ramWrapper;

        /* RAM ECC error report logic must be enabled before the ECC logic in CPU is enabled. */
        /* Enable B0TCM ECC: */
        ramWrapper = &R4_Ram_Even_Registers;
        ramWrapper->Global_Control = 0x0005000A;
        /* ensure the write access is complete */
        ramWrapper->Global_Control;

        /* Enable B1TCM ECC: */
        ramWrapper = &R4_Ram_Odd_Registers;
        ramWrapper->Global_Control = 0x0005000A;
        asm("\tDMB");
        /* ensure the write access is complete */
        ramWrapper->Global_Control;

        /* Enable RAM ECC logic in CPU: */
        asm("\tMRC p15, #0, r1, c9, c12, #0  ; Read Performance Monitor Control Register (Gladiator: PMNC. Conqueror: PMCR)");
        asm("\tORR r1, r1, #0x00000010       ; Enable X flag (set bit 4, enable event bus export)");
        asm("\tDMB");
        asm("\tMCR p15,#0,r1,c9,c12,#0       ; Write back to PMNC");
        asm("\tISB                           ; To ensure the write before proceeding");

        asm("\tMRC p15, #0, r1, c1, c0, #1   ; Read Auxiliary Control Register");
        asm("\tORR r1, r1, #0x0C000000       ; Enable Cortex-R4 ECC error detection for BTCMs (set bit 26/27)");
        asm("\tDMB");
        asm("\tMCR p15, #0, r1, c1, c0, #1   ; Write back to Auxiliary Control Register");
        asm("\tISB                           ; To ensure the write before proceeding");

        asm("\tMRC p15, #0, r1, c15, c0, #0  ; Read Secondary Auxiliary Control Register");
        asm("\tORR r1, r1, #0x00000002       ; Enable RMW (set bit 1 BTCMRMW)");
        asm("\tBIC r1, r1, #0x00000008       ; Enable Cortex-R4 ECC correction (clear bit 3 BTCMECC)");
        asm("\tDMB");
        asm("\tMCR p15, #0, r1, c15, c0, #0  ; Write back to Secondary Auxiliary Control Register");
        asm("\tISB                           ; To ensure the write before proceeding");
    }

    void DisableRamEcc(void)
    {
        RAM_REGISTERS_TYPE *ramWrapper;

        asm("\tMRC p15, #0, r1, c9, c12, #0  ; Read Performance Monitor Control Register (PMNC)");
        asm("\tBIC r1, r1, #0x00000010       ; Clear X flag (clear bit 4, disable event bus export)");
        asm("\tDMB");
        asm("\tMCR p15,#0,r1,c9,c12,#0       ; Write back to PMNC");
        asm("\tISB                           ; To ensure the write before proceeding");

        asm("\tMRC p15, #0, r1, c1, c0, #1   ; Read Auxiliary Control Register");
        asm("\tBIC r1, r1, #0x0C000000       ; B0TCM and B1TCM ECC Check Disable (clear bit 26/27)");
        asm("\tDMB");
        asm("\tMCR p15, #0, r1, c1, c0, #1   ; Write back to Auxiliary Control Register");
        asm("\tISB                           ; To ensure the write before proceeding");

        asm("\tMRC p15, #0, r1, c15, c0, #0  ; Read Secondary Auxiliary Control Register");
        asm("\tBIC r1, r1, #0x00000002       ; Disable RMW (clear bit 1 BTCMRMW)");
        asm("\tORR r1, r1, #0x00000008       ; Disable ECC correction (set bit 3 BTCMECC)");
        asm("\tDMB");
        asm("\tMCR p15, #0, r1, c15, c0, #0  ; Write back to Secondary Auxiliary Control Register");
        asm("\tISB                           ; To ensure the write before proceeding");

        /* After CPU ECC logic is disabled, RAM eCC error report logic can be disabled.*/
        /* Disable B0TCM ECC error report logic*/
        ramWrapper = &R4_Ram_Even_Registers;
        ramWrapper->Global_Control = 0x00050005;
        /* ensure the write access is complete */
        ramWrapper->Global_Control;

        /* Disable B1TCM ECC error report logic*/
        ramWrapper = &R4_Ram_Odd_Registers;
        ramWrapper->Global_Control = 0x00050005;
        /* ensure the write access is complete */
        ramWrapper->Global_Control;
    }

    And below is the source code which corresponds to the step 0 to step 2 I described above.

    ......

            EnableRamEcc();
            tempData_pu64 = (U64 *)(eerlTestData_pu32); //eerlTestData_pu32 is a pointer to 32 bit word, it is 64 bit aligned, either in even bank or in odd bank.
            // *tempData_pu64 = (U64)(modeIndex_u32); // doesn't work when modeIndex_u32 is 3.
            // *tempData_pu64 = 0x0UL; // works
            // *tempData_pu64 = 0x1UL; // works
            // *tempData_pu64 = 0x3UL; // doesn't work
            *tempData_pu64 = (U64)(ramWrapper->Ram_Addr_Dec_Vector); // In debugger I use the RAM register RAMADDRDECVECT to test different write values (0x0---0xF)
            // *tempData_pu64 = 0x0400AAAAUL; // works
            // *tempData_pu64 = 0xABCDEF23UL; // doesn't work
            // *tempData_pu64 = 0x0BCDEF23UL; // doesn't work
            asm("\tDMB");
            /* ECC addr. offset from the corresponding data is always 4MB */
            testEcc_pu32 = eerlTestData_pu32 + RAM_ECC_WORD_OFFSET_ADDR; //RAM_ECC_WORD_OFFSET_ADDR is 0x00100000
            /* store the original ECC value: */
            testEcc_pu64 = (U64 *)testEcc_pu32;
            tempEcc_u64 = *testEcc_pu64;

    I want to read the original ECC to tempEcc_u64 and change the ECC later for my test. The comments "doesn't work" above means that the read access at this line triggers a data abort for the corresponding value. But no bits in ESM error status (group 1-group 3) and no bit in RAMERRSTATUS is set.

    In ARM TRM DDI0363E_cortexr4_r1p3_trm.pdf, I see the following description:

    "Memory barriers
    When a store instruction, or series of instructions has been executed to normal-type or
    device-type memory, it is sometimes necessary to determine whether any errors occurred
    because of these instructions. Because most of these errors are reported imprecisely, they might
    not generate an abort exception until some time after the instructions are executed. To ensure
    that all possible errors have been reported, you must execute a DSB instruction. Abort exceptions
    are only taken because of these errors if they are not masked, that is, the CPSR A-bit is clear. If
    the A-bit is set, the aborts are held pending.

    "

    According to the following post I think I should look in the register Data Fault Address Register.

    http://e2e.ti.com/support/microcontrollers/hercules/f/312/p/170682/685968.aspx#685968

    It seems that it is more related with ARM processor, and (I think) the data abort has nothing to do with the single/double bit error.

    I appreciate it very much If you can provide some insight here.

    Thank you!

    Libo