TDA4VM: ECC fault injection test

Zhihua Ding

Part Number: TDA4VM

Hi, TI

We have done a fault injection test based on the TI SDL_RLS_01.00.00. We found many failed cases for some RAMIDs. I wonder if there are some special steps shall be done before test.

Here are the summary of failed RAMIDs

1. CABSS, RAMID: 0,1,10 - didn't get the ESM events after fault injection

2. R5F0, RAMID: 0~26, - can't set ECC control register, and setting some of RAMIDs will cause stuck of SW.

3. R5F0, RAMID: 27, - didn't get the ESM events after fault injection

4, R5F1, all RAMIDs - can't access the registers, (we are using lockstep mode, is that why R5F1 can't be access?)

5, NAVSS, RAMID: 60, 62 - SW stuck when access these two

6. NAVSS, RAMID: 2-25,29,30-54,63-65, - didn't get the ESM events after fault injection

Looking forward to your response.

Thank you!

Zhihua

over 3 years ago

0 Zhihua Ding over 3 years ago

Intellectual 460 points

Hi, TI

Any suggestion? Thanks!

Zhihua

0 Josiitaa RL over 3 years ago

TI__Guru 53761 points

Hi,

Could you provide more information regarding which specific modules are being tested and on which SDK version?

Thanks,

Josiitaa

0 Zhihua Ding over 3 years ago in reply to Josiitaa RL

Intellectual 460 points

Hi Josiitaa,

SDK version is 8.2

I am not clear about the "modules" you mentioned, do you mean the ECC modules? Like I said in original question, NAVSS is short of SDL_ECC_MEMTYPE_MCU_NAVSS0

Thanks!

Zhihua

0 Josiitaa RL over 3 years ago in reply to Zhihua Ding

TI__Guru 53761 points

Hi Zhihua,

Could you share your test logs?

Thanks,

Josiitaa

0 Zhihua Ding over 3 years ago in reply to Josiitaa RL

Intellectual 460 points

Hi, Josiitaa

I am not sure if it helps you, you can find in the code that if I didn't get ESM event, it will print a message.

Thanks!

Zhihua

0 Josiitaa RL over 3 years ago in reply to Zhihua Ding

TI__Guru 53761 points

Hi Zhihua,

I have a few more questions. What kind of error(s) are being injected? Single or double bit? When injecting the error, what are the arguments given for each ram-id? Are SDL APIs being used?

Thanks,

Josiitaa

0 Zhihua Ding over 3 years ago in reply to Josiitaa RL

Intellectual 460 points

Hi Josiitaa

We will inject both single and double bit fault, above failed test happens during single bit fault injection. We are not use SDL API directly because we are using SPL, which is not supported by the SDL after discussed with local TI supporter.

So we port the SDL to our project and follow the test steps according to the SDL.

The steps are as below:

1, Set ECC_CTRL.ENABLE_RMW = 1, ECC_CTRL.ECC_CHECK = 1, ECC_CTRL.ECC_ENABLE = 1

2, Enable Correctable interrupt

3, Set ECC_ERR_CTRL2.ECC_BIT1 = 1, ECC_CTRL.ERROR_ONCE = 1，ECC_CTRL.FORCE_N_ROW = 1，ECC_CTRL.FORCE_DED = 0，ECC_CTRL.FORCE_SEC = 1

4, waiting ESM event, and clear status register and EOI register if event happens.

5, repeat for other RAM IDs

Some update for the test result after these days

1. R5F0, RAMID0~26, SW stuck after setting ECC register

2. R5F1, all RAMIDs, SW stuck after setting ECC register

3. NVASS, RAMID 60, 62, SW stuck after setting ECC register

Thank you!

Zhihua

0 Josiitaa RL over 3 years ago in reply to Zhihua Ding

TI__Guru 53761 points

Hi Zhihua,

Are the tests being run on MCU R5F or Main Domain R5F? What does SW stuck mean exactly? Are the registers inaccessible or is it something else?

Incase of running the tests from Main Domain, the Main Domain has to be powered on properly. For MCU R5F the TCM Memories have been tested.

Thanks,

Josiitaa

0 Zhihua Ding over 3 years ago in reply to Josiitaa RL

Intellectual 460 points

Hi Josiitaa

Sorry for the late reply, we are on Chinese New Year holiday.

All tests are running on MCU R5F, SW stuck means the registers are inaccessible.

I am not clear how to test TCM memories. Are they part of ECC test? I didn't find that in the SDL. Could you give me some demo?

BTW, During the test we find that for NAVSS_UDMA memory, ramID[80], it always report error, can't be cleared. I guess the memory has permanent fault. How to confirm that?

Thank you!

Zhihua

0 Josiitaa RL over 3 years ago in reply to Zhihua Ding

TI__Guru 53761 points

Hi Zhihua,

Zhihua Ding said:
R5F1, all RAMIDs, SW stuck after setting ECC register

This looks like the ECC register is not accessible. Since the R5F0 core runs the tests, it is enabled, but the R5F1 core must be powered on so that the aggregator is accessible.

Zhihua Ding said:
R5F0, RAMID0~26, SW stuck after setting ECC register

RAM IDs 0-26 are inject only end points. The R5F0 itself take scare of the error detection, correction and reporting. So the error can only be injected. We might not get the ESM report or get notified through the ESM. In this case an exception handler is plugged in to monitor.

Zhihua Ding said:
Could you give me some demo?

The ATCM and BTCM for MCU R5F has been tested in the sdl_ecc_test_app.

Thanks,

Josiitaa

0 Zhihua Ding over 3 years ago in reply to Josiitaa RL

Intellectual 460 points

Hi Josiitaa

Reading the SDL code again, I still not quite understand the test step ( We can't directly use SDL).

For ATCM and BTCM, I find they are check through PMU function, is that correct? We didn't implenment the PMU in our project, is there any other approach?

For an exception handler? Do you mean "SDL_EXCEPTION_registerECCHandler(&SDL_ECC_callBackFunction);" ? And I find it collect inforamtion from two ISRs, SDL_EXCEPTION_dataAbortExptnHandler and SDL_EXCEPTION_prefetchAbortExptnHandler? Is it correct?

Thanks!

Zhihua

0 Josiitaa RL over 3 years ago in reply to Zhihua Ding

TI__Guru 53761 points

Hi Zhihua,

I am referring to the sdl_ecc_test_app code in /ti-processor-sdk-rtos-j721e-evm-08_02_00_05/sdl/test/ecc/ecc_sdl/ecc_test_func.c.

Zhihua Ding said:
I find they are check through PMU function, is that correct?

Could you point me to where that function is?

Thanks,

Josiitaa

0 Zhihua Ding over 3 years ago in reply to Josiitaa RL

Intellectual 460 points

Hi Jositaa，

I mean the function "SDL_ECC_R5EnablePmuForEccEvent". I found it shall be called when R5F is tested.

Thanks,

Zhihua

0 Josiitaa RL over 3 years ago in reply to Zhihua Ding

TI__Guru 53761 points

Hi Zhihua,

On TDA4VM, PMU is needed for the single bit error detection. Double bit is reported via exception.

Thanks,

Josiitaa

Processors

Processors forum

TDA4VM: ECC fault injection test