This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: ECC fault injection test

Part Number: TDA4VM

Hi, TI

We have done a fault injection test based on the TI SDL_RLS_01.00.00. We found many failed cases for some RAMIDs. I wonder if there are some special steps shall be done before test.

Here are the summary of failed RAMIDs

1. CABSS, RAMID: 0,1,10 - didn't get the ESM events after fault injection

2. R5F0, RAMID: 0~26, - can't set ECC control register, and setting some of RAMIDs will cause stuck of SW.

3. R5F0, RAMID: 27, - didn't get the ESM events after fault injection

4, R5F1, all RAMIDs - can't access the registers, (we are using lockstep mode, is that why R5F1 can't be access?)

5, NAVSS, RAMID: 60, 62 - SW stuck when access these two

6. NAVSS, RAMID: 2-25,29,30-54,63-65, - didn't get the ESM events after fault injection

Looking forward to your response.

Thank you!

Zhihua

  • Hi, TI

    Any suggestion? Thanks!

    Zhihua

  • Hi,

    Could you provide more information regarding which specific modules are being tested and on which SDK version?

    Thanks,

    Josiitaa

  • Hi Josiitaa,

    SDK version is 8.2

    I am not clear about the "modules" you mentioned, do you mean the ECC modules? Like I said in original question, NAVSS is short of SDL_ECC_MEMTYPE_MCU_NAVSS0

    Thanks!

    Zhihua

  • Hi Zhihua,

    Could you share your test logs?

    Thanks,

    Josiitaa

  • Hi, Josiitaa

    I am not sure if it helps you, you can find in the code that if I didn't get ESM event, it will print a message.  

    Thanks!

    Zhihua

  • Hi Zhihua,

    I have a few more questions. What kind of error(s) are being injected? Single or double bit? When injecting the error, what are the arguments given for each ram-id? Are SDL APIs being used?

    Thanks,

    Josiitaa

  • Hi Josiitaa

    We will inject both single and double bit fault,  above failed test happens during single bit fault injection. We are not use SDL API directly because we are using SPL, which is not supported by the SDL after discussed with local TI supporter. 

    So we port the SDL to our project and follow the test steps according to the SDL. 

    The steps are as below:

    1, Set ECC_CTRL.ENABLE_RMW = 1, ECC_CTRL.ECC_CHECK = 1, ECC_CTRL.ECC_ENABLE = 1

    2, Enable Correctable interrupt

    3, Set ECC_ERR_CTRL2.ECC_BIT1 = 1, ECC_CTRL.ERROR_ONCE = 1,ECC_CTRL.FORCE_N_ROW = 1,ECC_CTRL.FORCE_DED = 0,ECC_CTRL.FORCE_SEC = 1

    4, waiting ESM event, and clear status register and EOI register if event happens.

    5, repeat for other RAM IDs

    Some update for the test result after these days

    1. R5F0, RAMID0~26, SW stuck after setting ECC register

    2. R5F1, all RAMIDs, SW stuck after setting ECC register

    3. NVASS, RAMID 60, 62, SW stuck after setting ECC register

    Thank you!

    Zhihua

  • Hi Zhihua,

    Are the tests being run on MCU R5F or Main Domain R5F? What does SW stuck mean exactly? Are the registers inaccessible or is it something else?

    Incase of running the tests from Main Domain, the Main Domain has to be powered on properly. For MCU R5F the TCM Memories have been tested.

    Thanks,

    Josiitaa

  • Hi Josiitaa

    Sorry for the late reply, we are on Chinese New Year holiday.

    All tests are running on MCU R5F, SW stuck means the registers are inaccessible.

    I am not clear how to test TCM memories. Are they part of ECC test? I didn't find that in the SDL. Could you give me some demo?

    BTW,  During the test we find that for NAVSS_UDMA memory, ramID[80], it always report error, can't be cleared. I guess the memory has permanent fault. How to confirm that?

    Thank you!

    Zhihua

  • Hi Zhihua,

    R5F1, all RAMIDs, SW stuck after setting ECC register

    This looks like the ECC register is not accessible. Since the R5F0 core runs the tests, it is enabled, but the R5F1 core must be powered on so that the aggregator is accessible.

    R5F0, RAMID0~26, SW stuck after setting ECC register

    RAM IDs 0-26 are inject only end points. The R5F0 itself take scare of the error detection, correction and reporting. So the error can only be injected. We might not get the ESM report or get notified through the ESM. In this case an exception handler is plugged in to monitor.

    Could you give me some demo?

    The ATCM and BTCM for MCU R5F has been tested in the sdl_ecc_test_app.

    Thanks,

    Josiitaa

  • Hi Josiitaa

    Reading the SDL code again, I still not quite understand the test step  ( We can't directly use SDL). 

    For ATCM and BTCM, I find they are check through PMU function, is that correct? We didn't implenment the PMU in our project, is there any other approach?

    For an exception handler? Do you mean "SDL_EXCEPTION_registerECCHandler(&SDL_ECC_callBackFunction);" ? And I find it collect inforamtion from two ISRs, SDL_EXCEPTION_dataAbortExptnHandler and SDL_EXCEPTION_prefetchAbortExptnHandler?  Is it correct?

    Thanks!

    Zhihua

  • Hi Zhihua,

    I am referring to the sdl_ecc_test_app code in /ti-processor-sdk-rtos-j721e-evm-08_02_00_05/sdl/test/ecc/ecc_sdl/ecc_test_func.c.

    I find they are check through PMU function, is that correct?

    Could you point me to where that function is?

    Thanks,

    Josiitaa

  • Hi Jositaa,

    I mean the function "SDL_ECC_R5EnablePmuForEccEvent".  I found it shall be called when R5F is tested.

    Thanks,

    Zhihua

  • Hi Zhihua,

    On TDA4VM, PMU is needed for the single bit error detection. Double bit is reported via exception.

    Thanks,

    Josiitaa