This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: DMA ECC Self-Test failing randomly due to incorrect address stored in DMAECCSBE

Part Number: TMS570LC4357

I'm experiencing exactly the same issue with the DMA ECC 1-Bit Self-Test that was reported in this (locked) thread: https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1097304/tms570lc4357-dma-ecc-test-fails-randomly/4064481?tisearch=e2e-sitesearch&keymatch=TMS570LC4357%20DMAECCSBE#4064481

I tried using dmaBadECC=0xFFF80014 instead of 0xFFF80010 as suggested in the thread, but DMAECCSBE still reported the wrong address randomly. 

Can you please help?

Thanks,

Cameron

  • Hi Cameron,

    I will do a test, then come back to you soon.

  • Hi Cameron,

    The DMAECCSBE is valid only when the SBERR flag is set. When you get the wrong error address, is the error flag set?

  • Hi QJ,

    The SBERR flag is set every time. Only the error address is wrong. I ran 50 trials corrupting the address 0xFFF80010. SBERR is always1, but DMAECCSBE was wrong 7 times:

    Trial 11, 13, 39, and 50: DMAECCSBE=0xFFF80014

    Trial 15 and 24: DMAECCSBE=0xFFF8001C

    Trial 26 DMAECCSBE=0xFFF80018

    Thanks for any guidance you can provide,

    Cameron

  • Hi Cameron,

    I did the same test on my launchpad. I haven't reproduced the issue:

          

    testCount0 is the number of total test, and testCount1 is the number of test whose error address is not correct (0xFFF80014).

    #define dmaBadECC        0xFFF80014u

  • I've tried this on two TMS570LC4357 MCUs, and on both of them incorrect addresses are reported in DMAECCSBE, regardless of whether I define dmaBadECC as 0xFFF80010u or 0xFFF80014u. SBERR is always set correctly in DMASECCCTRL.

    Since it is working for you, I wonder if there's some setup you are doing that I've missed. The only configuration I'm doing prior to running SL_SelfTest_DMA() from (as defined in the SafeTI Diagnostic Library 2.4.0) is to set DMAPCR equal to 0xA. Is there anything else I should set up beforehand that could be causing the errors I'm seeing?

    Thanks very much for you help!

    Cameron

  • Hi Cameron,

    I use the same API (SL_SelfTest_DMA()) of SDL 2.4. 

  • Hi QJ,

    I'm still trying to figure out a solution to this problem, and I've found that the DMA 2-bit Self-Test also usually passes, but occasionally fails because both the ECC Error Detection Flag (bit 24) of the DMAPAR register is not set and the ESM ECC Uncorrectable error bit is not set in Status Register 1. My impression is that the 2-bit Self-Test is failing about as often as the 1-bit Self-Test. Does this suggest anything to you that might help me find what is going wrong?

    Thanks,

    Cameron

  • Hi Cameron,

    This is my test code for DMA ECC error injection:

    TMS570LC4357_DMA_SelfTest.zip

  • Thanks very much for sending your project. I noticed some of the configuration steps clearing ESM status registers that I was not doing, but even after adding them I still occasionally see the incorrect address stored in DMAECCSBE. 

    One thing I noticed however is that you are using the compiler ARMv16.9, which I am not able to install in my environment. I am using ARM Compiler Tool  v5.1.6 with CCS v6.0.1.00040. Is it possible that has an impact on the DMA ECC Self-Tests?

    Thanks,

    Cameron

  • I am using ARM Compiler Tool  v5.1.6 with CCS v6.0.1.00040. Is it possible that has an impact on the DMA ECC Self-Tests?

    I am using CCS12.1 and compiler 16.9 or 20.2.7.

    Did you try my *.out file on your board?

  • I was able to try your test code out and it works perfectly using Debug mode. In Debug mode, my project also consistently stores the correct address in DMAECCSBE. However, the issue only occurs with a Release build (not in Debug). I check the contents of the register by writing it out a serial port using SCI.

  • Hi Cameron,

    Debug and Release are build configurations.  A project in CCS can have one or more build configurations. A build configuration specifies the version of the compiler to use, the settings and options for the compiler, which source files to include/exclude... 

    Often people will setup the debug build configuration to have full symbolic debug information enabled to improve the ability to debug the application. It is also common to remove debug information from the release configuration and to use a higher level of optimization. Note that the build options used by each configuration are completely under your control.

  • What I think I should have replied is that your code works works correctly every time when I run it with the CDT debugger in CCS, which appeared to be what you used to check that the DMAECCSBE address is correct (as in the image you've posted above).

    My project also works correctly with the debugger, but regardless of the build options I've tried (including no optimization) it still does not consistently report the correct address in DMAECCSBE when the build is run apart from the debugger.

  • Hi Cameron,

    when the build is run apart from the debugger.

    Does it run after power-on reset?

  • Yes, it runs and experiences the issue after a power-on reset.

  • I am using TMS570LC43x Launchpad for this test. The launchpad is powered through the USB cable which is also used for XDS110 debugger and SCI. 

    I run the test with debugger connected and debugger disconnected (power-on reset). I got the same test results.

    This is my test result:

    The first two rows: test with debugger connected

    The remaining (row #3 ~ row #8): test after power-on reset.

  • Hi again QJ,

    Is it possible that CPU Clock speed could be responsible for the difference in results we're seeing? Mine is configured at maximum speed. If yours is lower, could that account for the difference?

    Thanks for your help,

    Cameron

  • Hi Cameron,

    300MHz CPU clock is used in my example project. The maximum CPU clock for TMS570LC43x is 300MHz.

  • Hi QJ,

    Continuing with this theme, I compared the clock configurations in your TestSafeTI.hcg file with that in our .hcg file. In the GCM page in the TMS570LC4357ZWT tab, I noticed the following differences (our values in parentheses). Would any of these different values be relevant to the errors we're seeing? Thanks.

    HCLK divider = 2 (1)
    HCLK = 100.000 (150.000)
    VCLKA Src = VCLK (PLL2)
    VLCK1 = 50.000 (75.000)
    VLCK2 = 50.000 (75.000)
    VLCK3 = 12.500 (75.000)
    VLCKA = 50.000 (80.000)
    RTI1CLK = 50.000 (75.000)
    VCLKA4_DIV = 50.000 (75.000)
    VCLKA4_S = 50.000 (75.000)
    Oscillator = 16.0 (18.0)
    PLL2 = 300.0 (80.0)

  • Hi Cameron,

    I re-configured the HCLK to 150MHz, and all the test passed.

    testCount0 and testCount1: # of passed test

    # of total test is: 200

         

  • That looks good, QJ, but I'm still having trouble with the test.

    I notice that you're running the test in a loop, but I am doing it after a power-on reset each time. Could you also try running the test after a power-on reset a few times to see if it still works consistently for you? 

    Thanks,

    Cameron

  • After my last question, I tried to perform an Auto-Initialization of the DMA RAM (per section 2.2.4.2 of the Technical Reference Manual) just after enabling ECC and just before running the test. When I do this, I'm seeing the correct address every time I run the DMA ECC 1-bit test and the 2-bit test also passes every time.

    This would resolve my issue, but I'd like to confirm with you that it is actually a required step to perform DMA RAM auto-init before the ECC functionality is reliable. Is that correct? (Maybe I missed it, but I didn't see anyplace it was done in the example project that was linked above.)

    Thanks,

    Cameron

  • Hi Cameron,

    You are correct. The CPU RAM and peripheral RAMs should be hardware initialized to avoid ECC errors right after power on.

    The warm reset and power-on reset don't clean the DMA RAM. For any flipped bit in DMA RAM, we have to flip the bit back manually, or hardware initialize the RAM. Thank you for pointing this out.