Hello again,
Any ideas what is the problem, the code jumps to data abort which it can't mask.
In case SRAM ECC test is performed after L2L3 tests, this happen in runtime and boot time. For some reason run time SRAM tests works after start up in case L2L3 tests are only run during start up.
Made simple test case for start up which managed to replicate the problem (before that code I have performed SRAM tests successfully in recommended part of start up routine).
{
// Only these tests can be run in priviledged mode
const SL_SelfTestType aeIconTests[] =
{
L3INTERCONNECT_RESERVED_ACCESS,
L2INTERCONNECT_RESERVED_ACCESS
};
for( u32I = 0U; u32I < ELEMENTS( aeIconTests ); u32I++ )
{
retVal = SL_SelfTestL2L3Interconnect( aeIconTests[ u32I ], NULL, NULL, 0U); // rest data is for unpriviledge test
INCREMENT_PASS_FAIL_COUNTER( ST_PASS, retVal, FAILURE_PROCEED );
}
}
{
SL_SelfTest_Result failInfoTCMRAM; /* TCM RAM Failure information */
retVal = SL_SelfTest_SRAM(SRAM_ECC_ERROR_FORCING_2BIT, TRUE, &failInfoTCMRAM);
INCREMENT_PASS_FAIL_COUNTER( failInfoTCMRAM, retVal, FAILURE_PROCEED );
sl_vimREG->INTREQ0 = 1U; // clear CH: esmHIGH (always FIQ), test enables it
}
The result during run time and startup is same:
===DATA_ABORT===<CR><LF>
DFSR: 0x409<CR><LF>
DFAR: 0x8008170<CR><LF>
Status: 0x19<CR><LF>
Read: TRUE<CR><LF>
AxiDec: TRUE<CR><LF>
====================
where 0x8008170 belongs to SafeTI test buffer
sramEccTestBuff 0x08008160 0x20 Data Gb sl_selftest.o [1]
In data abort handler I have copied the error handling for SRAM test from example project + added that variable increasing every time the test is recognized
if( SL_FLAG_GET((int32)SRAM_ECC_ERROR_FORCING_2BIT) )
{
u32Ecc2bit++;
uint32 u32EccWrEnMask = (uint32)TCRAM_RAMCTRL_ECCWREN; /*lint !e9033 !e9053 */ // TODO: why lint
uint32 u32Eec1WrEn = sl_tcram1REG->RAMCTRL & u32EccWrEnMask;
uint32 u32Eec2WrEn = sl_tcram2REG->RAMCTRL & u32EccWrEnMask;
if( (u32Eec1WrEn == u32EccWrEnMask)|| (u32Eec2WrEn == u32EccWrEnMask) )
{ /* So looks like writes to ECC region is enabled, check if the error address is in the test buffer range */
maskDAbort = TRUE; // TODO: should the address be checked and not just masking error out based on SL_FLAG_GET...
}
}
Value of u32Ecc2bit is 3, and that abort is called twice per test meaning that 2nd test managed to handle first data abort but them something happens in the middle of test.
Based on debugger the data abort is called when executing that _SL_Barrier_Data_Access() line
/* Set the self test flag for a self test to indicate the esm handler that this is done as a part of selftest */
/* read from location with 2-bit ECC error this will cause a data abort to be generated */
/*SAFETYMCUSW 446 S MR:10.1 <APPROVED> Comment_11*/
ramread64 = sramEccTestBuff[2];
_SL_Barrier_Data_Access(); // unexpected data abort from here
/* Restore ctrl registers */
sl_tcram1REG->RAMCTRL &= ~TCRAM_RAMCTRL_ECCWREN;
/*SAFETYMCUSW 134 S MR: 12.2 <APPROVED> Comment_5*/
ramRead = sl_tcram2REG->RAMCTRL;
/* Set the self test flag for a self test to indicate the esm handler that this is done as a part of selftest */
/*SAFETYMCUSW 446 S MR:10.1 <APPROVED> Comment_11*/
ramread64 = sramEccTestBuff[3];
_SL_Barrier_Data_Access();
/* Restore ctrl registers */
sl_tcram2REG->RAMCTRL &= ~TCRAM_RAMCTRL_ECCWREN;
Since SRAM_ECC_ERROR_FORCING_2BIT works alone in runtime it indicates that test itself is OK. Since it also works in runtime 1 time after start-up but not in start-up straight after L2L3 test it indicates that something will change somewhere in transtion from start-up to runtime which allows it pass once.
In case I understood correctly next expected data abort should come from here ramread64 = sramEccTestBuff[3];. so something is wrong. Lets check a bit further:
I made this kind of debug trap and set break point to that if
u32Ecc2bit++;
if( u32Ecc2bit == 3 )
{
temptemp++;
}
Everything looks to work nicely but the code also passes this IF in same round (and I have there logic which in this case sets maskDAbort = FALSE;)
/* DAbort due to an SRAM ECC 2Bit self test? */
if( SL_FLAG_GET((int32)SRAM_ECC_2BIT_FAULT_INJECT) )
{
}
So obviously the problem is that test activity array gets some how corrupted.
After startup running this one L3INTERCONNECT_RESERVED_ACCESS the
sl_priv_flag_set[0] looks to have value of 4 (meaning that SRAM_ECC_ERROR_FORCING_1BIT is active but SL_FLAG_SET uses value 0 & 1)
After performing L2INTERCONNECT_RESERVED_ACCESS the sl_priv_flag_set[3] is 136 (and [0] is now zero) meaning that SRAM_ECC_2BIT_FAULT_INJECT is active.
This perfectly explains why this data abort handler hangs, just wondering what else those L2 & L3 corrupts...
Why this test array gets corrupted, what the L2 & L3 tests do? Are those same kind of destructive test as PBISTs, when those tests should be performed? I execute them nearly in the end of start which doesn't sound safe anymore in case more RAM is corrupted than just those test activity slots...
This also explains why SRAM tests works one after start up -> before jumping to main() the variables are re-initalized and that corrupted SRAM_ECC_2BIT_FAULT_INJECT is removed...
All ideas are welcome, do I do something wrong in data abort handler?
In case of L3 test the sl_priv_flag_set[0] looks to go to value 4 in STR command just before the IF starts so when data abort has been returned
0xa6a6: 0x0020 MOVS R0, R4
0xa6a8: 0xf001 0xfb24 BL SL_FLAG_SET ; 0xbcf4
_SL_Barrier_Data_Access();
0xa6ac: 0xf001 0xe826 BLX _SL_Barrier_Data_Access ; 0xb6fc
g_L2L3_read_reserved_word = *((uint32*)PCR_RESERVED_LOCATION);
0xa6b0: 0xf05f 0x407d MOVS.W R0, #-50331648 ; 0xfd000000
0xa6b4: 0x6800 LDR R0, [R0]
0xa6b6: 0xf8df 0x1d38 LDR.W R1, [PC, #0xd38] ; [0xb3f0] g_L2L3_read_reserved_word // jumps to data abort
0xa6ba: 0x6008 STR R0, [R1] // looks to corrupt here
if ((0x00000008u == (uint32)(0x00000008u & _SL_Get_DataFault_Status())) &&
((uint32)PCR_RESERVED_LOCATION == _SL_Get_DataFault_Address())) {
0xa6bc: 0xf000 0xef60 BLX _SL_Get_DataFault_Status ; 0xb580
0xa6c0: 0x0700 LSLS R0, R0, #28
0xa6c2: 0xd505 BPL.N 0xa6d0
0xa6c4: 0xf000 0xef60 BLX _SL_Get_DataFault_Address ; 0xb588
And L2 test corrupts data in exactly same point
_SL_Barrier_Data_Access();
0xa6e6: 0xf001 0xe80a BLX _SL_Barrier_Data_Access ; 0xb6fc
g_L2L3_read_reserved_word = *((uint32*)SCR_RESERVED_LOCATION);
0xa6ea: 0xf05f 0x4008 MOVS.W R0, #-2013265920 ; 0x88000000
0xa6ee: 0x6800 LDR R0, [R0]
0xa6f0: 0xf8df 0x1cfc LDR.W R1, [PC, #0xcfc] ; [0xb3f0] g_L2L3_read_reserved_word // jumps to data abort
0xa6f4: 0x6008 STR R0, [R1] // looks to corrupt here
_SL_Barrier_Data_Access(); /*added to avoid linker alignment issue*/
0xa6f6: 0xf001 0xe802 BLX _SL_Barrier_Data_Access ; 0xb6fc
if ((0x00000008u == (uint32)(0x00000008u & _SL_Get_DataFault_Status())) &&
((uint32)SCR_RESERVED_LOCATION == _SL_Get_DataFault_Address())) {
0xa6fa: 0xf000 0xef42 BLX _SL_Get_DataFault_Status ; 0xb580
0xa6fe: 0x0700 LSLS R0, R0, #28
0xa700: 0xd505 BPL.N 0xa70e
0xa702: 0xf000 0xef42 BLX _SL_Get_DataFault_Address ; 0xb588
In L2 tests the R0 = 0x88000000 and R1 = 0x0800A840
Address of sl_priv_flag_set is 0x0800A840
sl_priv_flag_set 0x0800a840 0x40 Data Gb sl_priv.o [1]
So most likely this is not corrupting anything else...
Address of sl_priv_flag_set[3] is 0x0800A843 so this STR command makes 32bit write to the beginning of test array and corrupts it...
Can you guess that it took a bit more than 3 minutes to figure out what is happening...