This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: L2RAMW "bus error" is not raised if MPU region for RAM is not configured in Strongly-ordered

Part Number: TMS570LC4357

Hi,

I am currently checking the CPU behaviour in various conditions of SEU and MEU errors in RAM.

1) First, I want to make a reference to an other post, in which I think there is a mistake: https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/630389

For the third point of the original question of this post, the answer is that ECC generation is not disabled when setting ECC DETECT EN field of RAMCTRL register to 0x5.

With the testing performed recently, I would say that this is wrong. Here is the test procedure:

  • set ECC_DETECT_EN to 0xA (enable ECC)
  • auto-initialize internal RAM to 0 (ECC codes are updated)
  • write 32 bits at address 0x0800_0000 with value 0x4
  • read corresponding 8-bit ECC value at address 0x0840_0000: value is 0xEA
  • set ECC_DETECT_EN to 0x5 (disable ECC)
  • write 32 bits at address 0x0800_0000 with value 0x5
  • read corresponding 8-bit ECC value at address 0x0840_0000: value is 0xEA

>> ECC memory reads are identical

Same test with letting the ECC enabled in RAM:

  • set ECC_DETECT_EN to 0xA (enable ECC)
  • auto-initialize internal RAM to 0 (ECC codes are updated)
  • write 32 bits at address 0x0800_0000 with value 0x4
  • read corresponding 8-bit ECC value at address 0x0840_0000: value is 0xEA
  • write 32 bits at address 0x0800_0000 with value 0x5
  • read corresponding 8-bit ECC value at address 0x0840_0000: value is 0x03

>> ECC memory reads are different due to ECC enabling and different values written

To avoid confusion, I recommend to add this point to the original post, or invalidate the given answer.

2) According to Table 8-1 of TMS570LC4357 TRM, the L2RAMW is supposed to generate a bus error when a double-bit Read-Modify-Write (RMW) error is detected during sub-64bits write by the cortex-r5f.

Thanks to point 1) being clarified, I was able to inject double-bit fault in RAM and perform a 16-bit write from the core to check this behaviour and I noticed the following:

  • if the core MPU is disabled (using the default memory map documented in Cortex-R5 TRM Table 7-1) >> no abort is generated
  • if the core MPU is enabled with one region for the RAM configured as DEVICE or NORMAL memory >> no abort is generated
  • if the core MPU is enabled with one region for the RAM configured as STRONGLY-ORDERED memory >> a data abort is generated and the data fault is logged in the Cortex-R5 DFSR register.
  • independently of the MPU confuguration, the ESM group 3 channel 3 is triggered as expected.

As far as I understand, the "bus error" documented in the TMS570LC4357 TRM is related to the Cortex-R5 TRM "External faults" documented in chapter "8.3.1 Faults > External Faults". Am I correct on this point?

By the following sentence "Non-exclusive stores to normal-type or device-type memory generate asynchronous aborts", I understand that when the MPU is configured as NORMAL or DEVICE memory for the RAM region, an asynchronous abort should be generated when executing a non-exclusive store, but even with a while loop after the 16-bit write to RAM is performed, the data abort is not generated.

This behaviour seems not consistent with the documentation. Could you clarify the normal and expected behaviour?

If this is a wrong behaviour, is it related to the errata "DEVICE#40" documented in Silicon B errata document?

Best regards,

Gael

  • Hello Gael,

    First, for the easy part of your questions, I do not believe that this is related to the known issue Device#40 since the issue described there is for accesses to unimplemented addresses/memory locations within those peripheral frames.

    For the rest of your comments and questions, I need to consult with one of our former design leads and a device expert on this device. I will get back to you soon with additional comments and, possibly, some follow-up questions.

  • Hi,  

      On your first question about the answer provided by Chuck in the other post, I think he was referring to the fact that the ECC checking at the CPU cannot be disabled. You can disable the ECC code generation at the RAM wrapper level as you have demonstrated.

    Gael Le Moing said:
    • f the core MPU is disabled (using the default memory map documented in Cortex-R5 TRM Table 7-1) >> no abort is generated
    • if the core MPU is enabled with one region for the RAM configured as DEVICE or NORMAL memory >> no abort is generated
    • if the core MPU is enabled with one region for the RAM configured as STRONGLY-ORDERED memory >> a data abort is generated and the data fault is logged in the Cortex-R5 DFSR register.
    • independently of the MPU confuguration, the ESM group 3 channel 3 is triggered as expected

     Your observation is correct. When the RAM wrapper fails the ECC checking during a Read-Modify-Write operation it generates a bus error signal back to the CPU. Note that this error is associated with the Sub-word write operation that you perform. How did you write to the RAM when you are in NORMAL or DEVICE. Note that the Cortex-R5F can perform write merge. It may have merged multiple sub-word writes into a 64-bit write in which case the entire 64-bit along with the 8-bit ECC will be written to the RAM overwriting what you previously had in RAM. A complete 64-bit write will not create Read-Modify-Write operation. 

    Gael Le Moing said:

    As far as I understand, the "bus error" documented in the TMS570LC4357 TRM is related to the Cortex-R5 TRM "External faults" documented in chapter "8.3.1 Faults > External Faults". Am I correct on this point?

    Your understanding is correct.

    Gael Le Moing said:
    By the following sentence "Non-exclusive stores to normal-type or device-type memory generate asynchronous aborts", I understand that when the MPU is configured as NORMAL or DEVICE memory for the RAM region, an asynchronous abort should be generated when executing a non-exclusive store, but even with a while loop after the 16-bit write to RAM is performed, the data abort is not generated.

    See my above comment. Try to do one 16-bit write instead of multiple of them which create write merging. 

  • Hi Charles,

    Ok for first point.

    For the second point:

    Charles Tsai said:
    Gael Le Moing
    • f the core MPU is disabled (using the default memory map documented in Cortex-R5 TRM Table 7-1) >> no abort is generated
    • if the core MPU is enabled with one region for the RAM configured as DEVICE or NORMAL memory >> no abort is generated
    • if the core MPU is enabled with one region for the RAM configured as STRONGLY-ORDERED memory >> a data abort is generated and the data fault is logged in the Cortex-R5 DFSR register.
    • independently of the MPU confuguration, the ESM group 3 channel 3 is triggered as expected

     Your observation is correct. When the RAM wrapper fails the ECC checking during a Read-Modify-Write operation it generates a bus error signal back to the CPU. Note that this error is associated with the Sub-word write operation that you perform. How did you write to the RAM when you are in NORMAL or DEVICE. Note that the Cortex-R5F can perform write merge. It may have merged multiple sub-word writes into a 64-bit write in which case the entire 64-bit along with the 8-bit ECC will be written to the RAM overwriting what you previously had in RAM. A complete 64-bit write will not create Read-Modify-Write operation. 

    I only performed only one 16-bit write to the MEU impacted memory: "*((unsigned int *)0x08000000) = 3;". So there is no merge from the Cortex. Moreover, the error is flagged in the ESM in Group 3 channel 3.

    Can you explain why the bus error (data abort) is not generated when MPU is configured in NORMAL or DEVICE memory for RAM region?

    For third point:

    Charles Tsai said:
    Gael Le Moing
    By the following sentence "Non-exclusive stores to normal-type or device-type memory generate asynchronous aborts", I understand that when the MPU is configured as NORMAL or DEVICE memory for the RAM region, an asynchronous abort should be generated when executing a non-exclusive store, but even with a while loop after the 16-bit write to RAM is performed, the data abort is not generated.

    See my above comment. Try to do one 16-bit write instead of multiple of them which create write merging. 

    I did perform a single 16-bit write followed by an infinite loop and the asynchronous abort never occurs (though the Group 3 channel 3 error is raised in the ESM). How do you explain this abort is not generated?

    Thanks,

    Gael

  • Did you enable the cache? How did you configure the cache scheme, write-through or write-back? Is it possible that you have a write-back cache and hence the 16-bit write only writes to the cache and not to the L2RAM. Try to disable cache or change to write-through cache scheme and see if that makes a difference.
  • The cache is disabled: SCTLR of Cortex-r5 has bits I and C set to 0.
    The MPU is enabled with a configured RAM region configured as Outer and Inner Non-cacheable non-shared normal memory (refer to Table 4-36 of Cortex-r5 TRM).
  • Can you check the A bit in the CPSR. If you clear it, will it make a difference?

    Imprecise abort masking
    The nature of imprecise aborts means that they can occur while the processor is
    handling a different abort. If an imprecise abort generates a new exception in such a
    situation, the r14_abt and SPSR_abt values are overwritten. If this occurs before the
    data is pushed to the stack in memory, the state information about the first abort is lost.
    To prevent this from happening, the CPSR contains a mask bit to indicate that an
    imprecise abort cannot be accepted, the A-bit. When the A-bit is set, any imprecise
    abort that occurs is held pending by the processor until the A-bit is cleared, when the
    exception is actually taken. The A-bit is automatically set when abort, IRQ or FIQ
    exceptions are taken, and on reset. You must only clear the A-bit in an abort handler
    after the state information has either been stacked to memory, or is no longer required.
  • Charles,

    For the first point, I would then ask for a documentation clarification. Indeed, RAMCTRL[ECC DETECT EN] has a misleading name and its description only says ECC detection will be disabled. Moreover, there is no text in the whole L2RAMW module chapter that says how the ECC generation can be disabled. Adding this information in the bit description seems necessary.

    Thanks.

  • Hi Charles,
    It's a win: when the A bit is cleared before performing the 16-bit write, the asynchronous abort is taken after a few cycles (I didn't measure it but will soon do that).
    And when the A bit is cleared during the while loop (after a few seconds) that follows the 16-bit write, the same behaviour is done (the documentation says that the abort is pending until the A bit is cleared).

    So that's ok, thank you for your help.
    Best regards,
    Gael