RM48L952: FBPARERR address is called sometimes multiple times after VIM_SRAM_PARITY_TEST even PARFLG is 0

Jarkko Silvasti

Part Number: RM48L952
Other Parts Discussed in Thread: HALCOGEN, SEGGER

Hello,

Any ideas what might cause multiple FBPARERR calls occasionally after the VIM_SRAM_PARITY_TEST test is performed and when entering FBPARERR-function the PARFLG is 0 so there should not be any parity errors?

Pseudocode:

Lock scheduler
Disable IRQ
...
        else if( eTest == DIAG_TEST_CPU_VIM )
        {
            if( ptTest->eTest == VIM_SRAM_PARITY_TEST )
            {
                DBG_PRINT( "VIM PARTEST start: %u us\r\n", HAL_u32TimeGet() );
            }
            bRetVal = SL_SelfTest_VIM( ptTest->eTest );
            if( ptTest->eTest == VIM_SRAM_PARITY_TEST )
            {
                DBG_PRINT( "VIM PARTEST end: %u us, FLG: 0x%x\r\n", HAL_u32TimeGet(), sl_vimParREG->PARFLG );
            }
            failInfo = ST_PASS; // / generic variable always PASS
        }
...
Enable IRQ
Unlock scheduler

## end of pseudocode

That kind of code generates following debug prints

DIAG: tests started, 53247080 ms<CR><LF>
CCM tests skipped<CR><LF>
VIM PARTEST start: 1716472897 us<CR><LF>
VIM PARTEST end: 1716472932 us, FLG: 0x0<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716472985 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473042 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473098 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473154 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473210 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473266 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473322 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473378 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473434 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473490 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473546 us ===<CR><LF>
<CR><LF>
===VIM_PARFLG: 0x0 ===<CR><LF>
<CR><LF>
===VIM RAM PARITY ERROR - channel: 2 @ 1716473602 us ===<CR><LF>
DIAG: tests passed, 53259080 ms<CR><LF>

And my FBPARERR function looks like this (basically pretty much same as halcogen version, but this does not try to restore the vector content since SL_ESM_Init() & vimChannelMap() modifies vectors so Halcogen code would restore wrong vector):

__irq
static void vVimParityErrorHandler( void )
{
    typedef volatile struct vimRam
    {
        t_isrFuncPTR ISR[VIM_CHANNELS];
    } vimRAM_t;

    #define vimRAM ((vimRAM_t *)0xFFF82000U)

    /* Identify the corrupted address */
    uint32 u32ErrorAddr = VIM_ADDERR;
    /* Identify the channel number */
    uint32 u32ErrorChannel = (uint32)((u32ErrorAddr & 0x1FFU) >> 2U);
    /* Clear Parity Error Flag */
    if( VIM_PARFLG == 0U )
    {
        DBG_PRINT( "\r\n===VIM_PARFLG: 0x%x ===\r\n", VIM_PARFLG );
    }

    VIM_PARFLG = 1U;

    if( u32ErrorChannel < VIM_CHANNELS )
    {
        (void)vimRAM->ISR[ u32ErrorChannel ]; /*lint !e9078 !e923 */ /* r11.4 & r11.6 PD */ // re-read same vector

        if( VIM_PARFLG != 0U ) // test if again parity error
        {
            DBG_PRINT_PANIC( "\r\n\r\n===VIM RAM PARITY ERROR - permanent error: %u ===\r\n\r\n", u32ErrorChannel );
            WOS_vInfiniteLoop();
        }
        else
        {
            /* temporary error, acknowledge of the parflg was enough */
            DBG_PRINT( "\r\n===VIM RAM PARITY ERROR - channel: %u @ %u us ===\r\n", u32ErrorChannel, HAL_u32TimeGet() );
        }
    }
    else
    {
        DBG_PRINT_PANIC( "\r\n\r\n===VIM RAM PARITY ERROR - Channel error: %u ===\r\n\r\n", u32ErrorChannel );
        WOS_vInfiniteLoop();
    }
}

And here is my FBPARERR assignment:
        vimInit();
        /* Override HALCoGen function since it restores vectors from ROM table, but vectors are modified by SafeTI &
           vimChannelMap() function --- restoring from ROM is not reliable, own backup RAM table would be needed
           but it would need also manual updating everytime when operation is made which modifies VIM RAM */
        VIM_FBPARERR = (uint32)vVimParityErrorHandler; /*lint !e9074 !e923 */ /* r11.1 & r11.6 PD */

Over all this printing happens quite rarely, like once in 2 hours or even more seldom, and it looks to happen every time right after that VIM_SRAM_PARITY_TEST which is run every 12sec so it is like ~every 1000th tests causes that FBPARERR call and when that happens the FBPARERR is called multiple times in a row and as you can see from prints & code the PARFLG == 0 so no one should not even call that FBPARERR function.

Also notice that PARFLG is already 0 when exiting the VIM_SRAM_PARITY_TEST also in this case like it is in a case where that spurious FBPARERR call does not happen, here is one example of OK/normal case:
DIAG: tests started, 62843080<CR><LF>
CCM tests skipped<CR><LF>
VIM PARTEST start: 2722538305 us<CR><LF>
VIM PARTEST end: 2722538340 us, FLG: 0x0<CR><LF>
DIAG: tests passed, 62855080<CR><LF>

Here is also 1 print with out PAR_FLG prints and as can be seen the "us" time between spurious calls are now smaller ~38us vs ~56us when PAR_FLG is called

VIM PARTEST start: 49076996 us<CR><LF>

VIM PARTEST end: 49077029 us, FLG: 0x0<CR><LF>

===VIM RAM PARITY ERROR - channel: 2 @ 49077063 us ===<CR><LF>

===VIM RAM PARITY ERROR - channel: 2 @ 49077101 us ===<CR><LF>

===VIM RAM PARITY ERROR - channel: 2 @ 49077139 us ===<CR><LF>

===VIM RAM PARITY ERROR - channel: 2 @ 49077176 us ===<CR><LF>
....

Question1: Why FBPARERR address gets occasionally called even PARFLG is 0?
Question2: Why that FBPARERR address is spuriously called multiple times in case it gets called once, since the parity error handler function tests that parity error is not active anymore - if it still active it halts the CPU into IRQ/FIQ mode by staying in infinite loop inside this handler function?
Question3: Why that FBPARERR address calls looks to came only right after the VIM_SRAM_PARITY_TEST?

Question4: any suggestions what to do or how to avoid those unexpected FBPARERR address calls? Obviously PARLFG should be checked first and in case that is 0, nothing shall be done - HalCoGen function looks to have same problem that it does not check PARFLG when entering parity handler function...

I have also set "guard print" to phanton interrupt to determine that this "reserved" VIM vector which is used in SafeTI to test VIM will not be called. This phantom printing is not seen so no one really calls this vector so VIM should not try to load that vector meaning that VIM should not detect that parity error either meaning that FBPARERR address should not get called even once...
void phantomInterrupt(void)
{
/* USER CODE BEGIN (2) */
    #include "macros.h"
    DBG_PRINT( "IRQ: Phantom\r\n" );
/* USER CODE END */
}

I have also enabled IRQ notifier from ESM is case parity error comes (channel 15) and that ESM irq is not signalled either before/after those spurious calls indicating that there is not real parity error in VIM
esmREG->IESR1 = (uint32)((uint32)0U << 31U)
...
                  | (uint32)((uint32)0U << 16U)
                  | (uint32)((uint32)1U << 15U)
                  | (uint32)((uint32)0U << 14U)

over 7 years ago

0 Jarkko Silvasti over 7 years ago

Expert 1395 points

Hi,

I am starting to suspect that 2 pcs CPU's which I have are somehow broken since my colleagues loaded the exactly same binaries exactly same way and they does not receive any spurious VIM parity error calls/print...

I also originally encountered VIM parity test fail due to the fact that at some point most likely the VIM test wasn't able to flip the parity bit or open the VIM ram for writing... My actual code had tripped after 6,5days to VIM parity test failure during my vacation, and the log also had those sprurious FBERRADD-address calls (couldn't get at that time more detailed reason that why the SafeTI test failed, but most likely the reason is same as described below).

Before that "faulty CPU determination" I "accelerated" things a bit by running only VIM parity test in while-kind of loop with minor OS-delay in between and with that problems started to arise much faster than before.

I made also simple "recorder" that how much those spurious calls really comes in case it is trigged once (in vVimParityErrorHandler() if PARFLG == 0). So it looks like that parity error stays active quite long time... I collected every consecutive calls which appears in a row less than 100us from each other so error handler function get called ~1200 times in case it gets called once :).
time: 0, calls: 1, totcalls: 1 @ 1755 ms<CR><LF> // this line can be ignored since this is printed when spurious errors starts
time: 967us, calls: 1199, totcalls: 1200 @ 2584 ms<CR><LF>
time: 968us, calls: 1190, totcalls: 2390 @ 19019 ms<CR><LF>
time: 968us, calls: 1188, totcalls: 3578 @ 19848 ms<CR><LF>
time: 968us, calls: 1188, totcalls: 4766 @ 20677 ms<CR><LF>
time: 968us, calls: 1189, totcalls: 5955 @ 28496 ms<CR><LF>
time: 769us, calls: 933, totcalls: 6888 @ 31232 ms<CR><LF>
time: 966us, calls: 1188, totcalls: 8076 @ 32061 ms<CR><LF>
time: 968us, calls: 1189, totcalls: 9265 @ 32890 ms<CR><LF>
time: 967us, calls: 1188, totcalls: 10453 @ 33719 ms<CR><LF>
time: 968us, calls: 1186, totcalls: 11639 @ 47214 ms<CR><LF>
time: 967us, calls: 1188, totcalls: 12827 @ 48043 ms<CR><LF>

Here is one part where total time between lines was 16sec...
time: 967us, calls: 1183, totcalls: 431397 @ 2128063 ms<CR><LF>
time: 967us, calls: 1184, totcalls: 432581 @ 2144498 ms<CR><LF

Here below is the "accelerated test loop" and as you can see the test is made every ~5ms and spurious tracer detects that spurious calls lasts only ~1000us so there is time for spurious calls to end before new test is launched (from above post can be seen that spurious calls start right after the test). Also based on that ms-timestamp one can see that spurious calls happens much rare than tests are made, from ~1sec to even 16sec.

while( 1 )
{
WOS_vDelayMs( 5U );

#include "sl_api.h"
static boolean bOnce = FALSE;
static uint8 u8Expected = 0U;

WOS_vCsEnter(); // disables IRQ

if( !bOnce )
{
u8Expected = *vimRAMParLoc;
bOnce = TRUE;
}
if( *vimRAMParLoc != u8Expected )
{
DBG_PRINT( "Pre-test parity bit: 0x%x\r\n", *vimRAMParLoc );
}
if( !SL_SelfTest_VIM( VIM_SRAM_PARITY_TEST ) )
{
DBG_PRINT( "Parity test fail\r\n" );
}
if( *vimRAMParLoc != u8Expected )
{
DBG_PRINT( "Post-test parity bit: 0x%x\r\n", *vimRAMParLoc );
}
WOS_vCsExit(); // enables IRQ
}

What comes to that "parity ram access error" mentioed in the beginning of the post, this kind of things happened after this "accelerated code" has been run at least 224sec (last time stamp print). So because "PARFLG or ESM_REG print is first, it indicates that test has been started with parityRAM in OK value == post value of the previous test was also ok. But after the failure print the post-test whines also, meaning that before the error injection inside SafeTI either the RAM access has not been happened (PARCTL opening failed) or parity bit is not flipped and when in the end of the test the RAM is opened again and bit is flipped again that puts the parity RAM in wrong state and after post-test check fails and also all the tests after that starts to fail....
time: 967, calls: 1185, totcalls: 452471 @ 2242578 ms<CR><LF>
time: 967, calls: 1182, totcalls: 453653 @ 2243407 ms<CR><LF>
time: 967, calls: 1185, totcalls: 454838 @ 2244236 ms<CR><LF>
time: 967, calls: 1184, totcalls: 456022 @ 2245065 ms<CR><LF>
PARFLG or ESM REG: 0x0 | 0x0<CR><LF>
Parity test fail<CR><LF>
Post-test parity bit: 0x1<CR><LF>
Pre-test parity bit: 0x1<CR><LF>
PARFLG or ESM REG: 0x0 | 0x0<CR><LF>
Parity test fail<CR><LF>
Post-test parity bit: 0x1<CR><LF>
Pre-test parity bit: 0x1<CR><LF>
PARFLG or ESM REG: 0x0 | 0x0<CR><LF>
Parity test fail<CR><LF>
Post-test parity bit: 0x1<CR><LF>
Pre-test parity bit: 0x1<CR><LF>
PARFLG or ESM REG: 0x0 | 0x0<CR><LF>
Parity test fail<CR><LF>

And here is the source for that error print inside SafeTI...
if (((sl_vimParREG->PARFLG & VIM_PAR_ERR_FLG) == 0U) ||
((sl_esmREG->SR1[0U] & GET_ESM_BIT_NUM(ESM_G1ERR_VIM_PARITY_CORRERR)) == 0U))
{
/* VIM RAM parity error was not flagged to ESM. */
DBG_PRINT( "PARFLG or ESM REG: 0x%x | 0x%x\r\n", sl_vimParREG->PARFLG, sl_esmREG->SR1[0U] );
retVal = FALSE;
}

I will add also check inside SafeTI to check is it PARCTL and parity bit flip which fails, since with current setup also that actual test fail arrives sooner or later (no need to wait hours)...

Question: is CPU's broken scenario likely reason based on opening post and this post? I am strongly thinking so since same binary-file works with colleagues CPUs - no extra printing as it shouldn't be (those lines with "time ..." ) since parity error handler should not be get called...

Question: if it is likely reason, how we can detect individual faulty CPUs since with 12sec test period it took 6.5 days until real SafeTI test based diagnostics error appeared and the official test period would be 1h meaning that if probability stays same the VIM RAM parity test would fail after it has been running 5,34 years causing spurious trip to customer process... Both of the times are such that production tester cannot find such failure...

0 Jarkko Silvasti over 7 years ago in reply to Jarkko Silvasti

Expert 1395 points

It took a while this time, ~75min to get that error RAM access error appear (~900000 actual tests)

time: 967, calls: 1182, totcalls: 636400 @ 4551078 ms<CR><LF>
time: 967, calls: 1183, totcalls: 637583 @ 4551907 ms<CR><LF>
After flip fail: 0x1 vs 0x1<CR><LF> // HERE IS THE ROOT CAUSE for SafeTI test fail
PARFLG or ESM REG: 0x0 | 0x0<CR><LF>
Parity test fail<CR><LF>
Post-test parity bit: 0x0<CR><LF>
Pre-test parity bit: 0x0<CR><LF>
PARFLG or ESM REG: 0x0 | 0x0<CR><LF>
Parity test fail<CR><LF>
Post-test parity bit: 0x0<CR><LF>

Real SafeTI test fails looks to relate to a problem when writing to VIM RAM since "after flip fail" print came in this error and we didn't receive pre-test print so PARCTL register has changed it's state when written.

/* Disable esm error influence */
sl_esmREG->DEPAPR1 = GET_ESM_BIT_NUM(ESM_G1ERR_VIM_PARITY_CORRERR);

uint8 u8Value = *vimRAMParLoc;
/* Enable parity test mode */
/*SAFETYMCUSW 9 S MR: 12.2 <APPROVED> Comment_10*/
/*SAFETYMCUSW 134 S MR: 12.2 <APPROVED> Comment_5*/
BIT_SET(sl_vimParREG->PARCTL, VIM_TEST_MODE);

if( sl_vimParREG->PARCTL == regBackupPCR )
{
DBG_PRINT( "PARCTL fail: 0x%x vs 0x0x\r\n", sl_vimParREG->PARCTL, regBackupPCR);
}

/* Flip a bit in VIM RAM parity location */
/*SAFETYMCUSW 9 S MR: 12.2 <APPROVED> Comment_10*/
BIT_FLIP((*vimRAMParLoc), 0x1U);

if( *vimRAMParLoc == u8Value )
{
DBG_PRINT( "After flip fail: 0x%x vs 0x%x\r\n", *vimRAMParLoc, u8Value );
}

So there are VIM & VIM RAM related problems in my both CPU (assuming case 2 is VIM problem (since one should not load that vim vector when VIM runs) and case 1 VIM RAM problem)
1) Address inserted into FBERRADDR-register gets 'sometimes' called (637583 times total in 75min in this case) even PARFLG == 0 when entering the VIM called error function, and if that is called once it will be called multiple times in a row (900-1200 times in a row lasting ~770-1000us per case, actually seen only 2 values which differs more than a couple of us ~770us & ~970us)
2) VIM RAM bit flip fails very seldom causing actual SafeTI to fail

0 Jarkko Silvasti over 7 years ago in reply to Jarkko Silvasti

Expert 1395 points

Just 1 more update, things looks to go even more weird than before since now other of the failing CPUs started to work...

I made today other code development with other of the 2 failing CPU's, basically meaning that whole CPU flash was re-flashed since other development contains a lof ot changes. Now when restoring the "accelerated" binary back (which also printed how much those parityerror calls has happened) I cannot see anything any more so situation is pretty much as when colleagues downloaded the binary on top of their in CPU.

Here is the failing CPU output from the unit which has not been re-flashed today:
FW CRC: matches: 0x27ffc66b<CR><LF>

time: 0, calls: 1, totcalls: 1 @ 6387 ms<CR><LF>

time: 967, calls: 1199, totcalls: 1200 @ 8122 ms<CR><LF>

time: 967, calls: 1188, totcalls: 2388 @ 9857 ms<CR><LF>

time: 967, calls: 1189, totcalls: 3577 @ 11592 ms<CR><LF>

And here is the not-anymore-failing CPU output (no extra prints which would mean those parity error calls from VIM does not come):

FW CRC: matches: 0x27ffc66b<CR><LF>

And as you can see the CRC of the binary is same in both (when we calculate the CRC with IAR tools & inject that value to firmware & check that inside firmware before starting to run application) so the firmware is same, except that now after playing with other code in between the error does not appear anymore with other unit.

We are using Segger Jlink & segger drivers to flash with option to flash only changed content but that shouldn't cause this kind of problems.

I still have 1 "failing" CPU unit which I even re-programmed (but as explained Segger programming it most likely didn't change anything or did only a minor change) to tomorrow morning (before started that other development with other CPU) to be sure that it had my latest debug-code and that it still kept failing. Now I have archived this failing unit since if this unit will be re-flashed it maybe that it starts working also :), fail occur with and without debugger so debugging does not cause the problems.

Still the re-flashing shouldn't be able to fix the VIM RAM accessing nor VIM parity FBPARERR calling logic but still it looked to do that for 1 of my CPU or then something else magic has happened during this day which fixed (or hided) the problems...

0 Veena Kamath over 7 years ago in reply to Jarkko Silvasti

TI__Mastermind 32565 points

Hi,

Let me describe when VIM parity error occurs and how HALCoGen handles it

VIM parity error occurs when an interrupt occurs and CPU tries to fetch the corresponding ISR address and finds a parity error in the VIM RAM location. The ISR address register (IRQVECREG) in VIM is automatically updated with the fallback address and the PC branches to this address.

To exit from this fallback routine in a clean manner, you need to do the following:

clear the parity error in the VIM RAM.
- HALCoGen re-writes the ISR address in the VIM RAM so that the parity error gets clear
acknowledge the interrupt that caused the parity error.
- For acknowledging the interrupt, the actual ISR needs to be invoked. Once you come out of the parity handler, IRQVECREG will not get updated with the correct ISR address (Even though you rewrote the address in the VIM RAM). The register will still hold the fallback address. Since the interrupt is not acknowledged, it will again branch to the fallback routine. This may be the reason you are finding multiple calls to the parity handler.
- What HALCoGen does is, it disables and enables the inetrrupt line, so that the IRQVECREG gets updated with the correct ISR address. So, immediately after returning form the parity handler, PC will branch to the correct ISR. Since it is bot possible to disable the ESM high interrupt, we have provided the ESM handler as part of the parity handler.

To answer your question, the reason for multiple calls to the parity handler routine may be because the actual interrupt that caused the parity error was not cleared properly. You can either clear the interrupt in the parity handler (you would lose an interrupt) or disable and enable the interrupt so the interrupt is triggered again.

Thanks and Regards,

Veena

0 Jarkko Silvasti over 7 years ago in reply to Veena Kamath

Expert 1395 points

Hi,

Thanks for input, based on your information I managed to create situation that FBPARERR will be kept called in case I only acknowledge the PAR_FLG so obviously acknowledging of it is not enough. But that still does not solve my original problem that parity error handler will be called when it should not be called and VIM RAM bit flips fails...

Guessing that also VIM somehow "repairs itself" even if PAR_FLG is just reseted but it requires something specific to happen which is not documented anywhere since based on my code that FBERRADD-address calls stops at some point if parity error is in IRQ vector, also tested that if you inject error to ESH HIGH vector and only clear PAR_FLG in the error handler the VIM keeps calling FBPARERR address for ever thus code never returns to application...

Also please note that only VIM vector which parity has been actually broken is in address 0xFF280008 (data sheet table 6-31 says it is reserved so VIM should not ever try to load anything from it) and that manipulation is done by SafeTI-library to vector which VIM should never use. Also VIM RAM is PBISTed in the beginning so everything should be fine...

So based on your input this TRM sentence is at least inaccurate since clearing the PARFLG basically does not have any effect to IRQVECREQ nor FIQVECREQ and that sentence claims that after clearing the PAR_FLG the FBPARERR value is not provided anymore to VIC:
"The value provided to the VIC port will also reflect FBPARERR until the PARFLG register has been cleared"

And here it is also claimed that clearing PAR_FLG would be enough - which does not looks to be case based on your input and my re-testing, initially this approach looked to work since I didn't print anything in parity-error-handler since the calls looks to stop after a while.
e2e.ti.com/.../1131734

Question: In case I am doing SafeTI VIM RAM parity testing (interrupts disabled when doing the test) and while parity is broken and vector is read manually to generate that desired parity error and at same and/or after the PAR_FLG rises up comes some real interrupt (say RTI for example) to VIM, could that cause the original situation that VIM stops loading the vectors to IRQVECREQ and keeps that FBPARERR in IRQVECREQ even the SafeTI code clears PAR_FLG manually before exiting the function? That still does not explain why exactly same binary works differently in different CPUs meaning that other does not generate any FBPARERR address calls and other calls them many times...

I also addressed above in the posts the problem in HALCoGen function. It repairs the vectors by writing the value from const array (s_vim_init)
/* Correct the corrupted location */
vimRAM->ISR[error_channel] = s_vim_init[error_channel];

In case you use SafeTI-library, your SL_ESM_Init()-function changes VIM-RAM content (both ESM vectors) or if you use vimChannelMap() function the same happens. In both cases the HALCoGen function restores wrong vector to RAM, it repairs the parity but brokes the vectors - not good...
- That being said the at least I cannot use HALCoGen function, it does not even have USER CODE sections to modify the behavior of the function.

The next problem comes with SafeTI and possible Esm HIGH vector corruption, cannot use HALCoGen function since it calls ESM-functions inside HALCoGen which are not used when using SafeTI. You cannot do this either since it looks that code can't return from pfEsmHigh correctly, the pfEsmHigh is kept called infinitely since PC==0x24794 which calls that pfEsmHigh() address (BLX R0) and when entering then function the LR is 0x24798 and exiting the function the PC is set to LR-4 === 0x24794 which is same as the line to call that function pointer...
                    if( u32Vec == 0U )
                    {
                        // NOTE: vector 0 is in slot 1 in the VIM RAM
                        //t_isrFuncPTR pfEsmHigh = vimRAM->ISR[ 1 ];

                        vimREG->INTREQ0 = BIT_n( u32Vec ); // manually ack it
                        pfEsmHigh();    // NOTE: cannot call since this function is declared as __fiq, does not return properly
                   }

Basically what could be done is this, but that then requires that you must to change declarations of SafeTI functions from static to "extern" to reach them so it is not tempting option either
                    // if ESM high, it cannot be disabled/enabled, must call from here
                    if( u32Vec == 0U )
                    {
                        DBG_PRINT( "ESM HIGH\r\n" );
                        // NOTE: vector 0 is in slot 1 in the VIM RAM
                        //t_isrFuncPTR pfEsmHigh = vimRAM->ISR[ 1 ]; /*lint !e9078 !e923 */ /* r11.4 & r11.6 PD */

                        vimREG->INTREQ0 = BIT_n( u32Vec ); // manually ack it
                        //pfEsmHigh();    // NOTE: cannot call since this function is "FIQ", does not return properly

                        u32Vec = esmREG->IOFFHR;

                        if( u32Vec != 0U ) // Real ESM error
                        {
                            u32Vec--; // normalize

                            extern void esmGroup1Handler(uint8 esmChannel);
                            extern void esmGroup2Handler(uint8 esmChannel);
                            if( u32Vec < 32U )
                            {
                                esmGroup1Handler(u32Vec);
                            }
                            else if( u32Vec < 64U )
                            {
                                esmGroup2Handler(u32Vec-32U);
                            }
                            else if( u32Vec < 96U )
                            {
                                esmGroup1Handler(u32Vec-32U);
                            }
                            else
                            {
                                esmGroup2Handler(u32Vec-64U);
                            }
                        }
                        else
                        {
                            DBG_PRINT( "ESM vector was 0\r\n" );
                        }
                    }

Also tested that FIQVECREQ changes from FBPARERR address to ESH HIGH vector address when IOFFHR register is read so basically if that ESM high exception is not handled here in the parity handler it will be lost for ever (if it is group2 error since IOFFHR resets SR[1] register...

So I decided to modify the SafeTI fiq-handler so that actual content of the function is wrapped and now that function can be called from FBPARERR-function, other option would be halting the CPU in case ESM HIGH error, that would avoid the need of any touching to SafeTI code.
static void sl_esm_high_intr_handler(void)
{
    extern void sl_esm_high_handler(void);
    sl_esm_high_handler(); // JSI 30.10.2017: use wrapper to able to use that also from VIM RAM parity error handler
}

So the parity error handler function looks now like this (stil not sure is that most optimal/correct)
[code]
__irq
static void vVimParityErrorHandler( void )
{
    typedef volatile struct vimRam
    {
        t_isrFuncPTR ISR[VIM_CHANNELS];
    } vimRAM_t;

    #define vimRAM ((vimRAM_t *)0xFFF82000U)

    // If parity error is not handled properly, VIM will keep re-triggering this function since vectors in IRQVECREQ and FIQVECREQ are not updated
    if( VIM_PARFLG == 0U )
    {
        DBG_PRINT( "===VIM_PARFLG: 0x%x @ %u us ===\r\n", VIM_PARFLG, HAL_u32TimeGet() );
    }
    else
    {
        /* Identify the corrupted address */
        uint32 u32ErrorAddr = VIM_ADDERR;
        /* Identify the channel number */
        uint32 u32ErrorChannel = (uint32)((u32ErrorAddr & 0x1FFU) >> 2U);
        /* Clear Parity Error Flag */
        VIM_PARFLG = 1U;

        if( u32ErrorChannel < VIM_CHANNELS )
        {
            (void)vimRAM->ISR[ u32ErrorChannel ]; /*lint !e9078 !e923 */ /* r11.4 & r11.6 PD */ // re-read same vector

            if( VIM_PARFLG != 0U ) // test if again parity eror
            {
                DBG_PRINT_PANIC( "\r\n\r\n===VIM RAM PARITY ERROR - permanent error: %u ===\r\n\r\n", u32ErrorChannel );
                WOS_vInfiniteLoop();
            }
            else
            {
                /* temporary error, acknowledge of the parflg was enough */
                DBG_PRINT( "\r\n===VIM RAM PARITY ERROR - channel: %u @ %u us ===\r\n", u32ErrorChannel, HAL_u32TimeGet() );
                // Mimic what HALCoGen does - it is needed in order to get VIM download new vectors IRQVECREQ & FIQVECREQ
                uint32 u32Vec = 0U;
                /* Disable and enable the highest priority pending channel to re-trigger the actual interrupt */
                if (vimREG->FIQINDEX != 0U)
                {
                    u32Vec = vimREG->FIQINDEX;
                }
                else
                {
                    u32Vec = vimREG->IRQINDEX;
                }

                // Check that it is real vector (not just generated error)
                if( u32Vec != 0U )
                {
                    u32Vec--; // normalize to start from 0
                    // if ESM high, it cannot be disabled/enabled, must call from here
                    if( u32Vec == 0U )
                    {
                        vimREG->INTREQ0 = BIT_n( u32Vec ); // NOTE: not needed
                        extern void sl_esm_high_handler(void);
                        sl_esm_high_handler();
                        // NOTE: vector 0 is in slot 1 in the VIM RAM
                        //t_isrFuncPTR pfEsmHigh = vimRAM->ISR[ 1 ]; /*lint !e9078 !e923 */ /* r11.4 & r11.6 PD */
                        //pfEsmHigh();    // NOTE: cannot call since this function is "FIQ", does not return properly
                    }
                    else if( u32Vec < 32U )
                    {
                        vimREG->REQMASKCLR0 = BIT_n( u32Vec );
                        vimREG->REQMASKSET0 = BIT_n( u32Vec );
                    }
                    else if( u32Vec < 64U )
                    {
                        vimREG->REQMASKCLR1 = BIT_n( u32Vec );
                        vimREG->REQMASKSET1 = BIT_n( u32Vec );
                    }
                    else
                    {
                        vimREG->REQMASKCLR2 = BIT_n( u32Vec );
                        vimREG->REQMASKSET2 = BIT_n( u32Vec );
                    }
                }
            }
        }
        else
        {
            DBG_PRINT_PANIC( "\r\n\r\n===VIM RAM PARITY ERROR - Channel error: %u ===\r\n\r\n", u32ErrorChannel );
            WOS_vInfiniteLoop();
        }
    }
}
[/code]

But any of these fixes nor original HALCoGen function parity error handler function does not help in case I run SL_SelfTest_VIM( VIM_SRAM_PARITY_TEST ) in a continuous while (1) loop (interrups disabled while making the call and then again enabled), the system keeps calling VIM parity error handler and after a while (very fast) the VIM RAM parity bit flipping also fails. This indicates that there is some failure most likely in VIM...

If using 1ms (1tick) OS delay (which would be then actually 1-2ms delay depending when tick occurs) between SafeTI-function calls then "this time with this binary" I cannot see any errors, no single parity error handling function call nor parity bit flipping problem (latter may need some time to come). So making a delay between parity error injection looks to "fix" the faulty behavior is also an indication that there really is some failure in VIM.

Question: Should FIQINDEX / IRQINDEX be read (&interrupts disabled/enabled) if some active also in case PAR_FLG == 0 since it looks like that with HALCoGen function the parity error handling function is not called so often in case no delays are used between SafeTI calls? I still think that in this case this function should not be called even once so it should not matter whether you read those FIR/IRQ INDEXEs but most likely other method is better than other if VIM goes goofy... So there is some problem which comes visible if that parity fault is injected & tested continuously in the code, even the VIM address under the test is such which VIM should itself never use...

0 Jarkko Silvasti over 7 years ago in reply to Jarkko Silvasti

Expert 1395 points

Managed to get those calls also when test is run from main() without OS started

_SL_Restore_IRQ( 0x00U ); // force enable IRQS
while( TRUE )
{
vVimParityTest( TRUE );
}

When printing status after every 10000tests it looks like this
Tests: 22260000, parityerrors: 13<CR><LF>
Tests: 22270000, parityerrors: 13<CR><LF>
Tests: 22280000, parityerrors: 14<CR><LF>
Tests: 22290000, parityerrors: 14<CR><LF>

And when printing after every 1000th tests (more printing, more interrupts -> we are using DMA when printing so only 1 interrupt per print)
Tests: 788000, parityerrors: 96<CR><LF>
Tests: 789000, parityerrors: 96<CR><LF>
Tests: 790000, parityerrors: 96<CR><LF>
Tests: 791000, parityerrors: 97<CR><LF>
Tests: 792000, parityerrors: 98<CR><LF>
Tests: 793000, parityerrors: 98<CR><LF>
Tests: 794000, parityerrors: 98<CR><LF>
Tests: 795000, parityerrors: 99<CR><LF>
Tests: 796000, parityerrors: 99<CR><LF>
Tests: 797000, parityerrors: 99<CR><LF>
Tests: 798000, parityerrors: 100<CR><LF>
Tests: 799000, parityerrors: 100<CR><LF>

And the test function itself
void vVimParityTest( boolean bNoOS )
{
u32Tests++;
#include "sl_api.h"
static boolean bOnce = FALSE;
static uint8 u8Expected = 0U;

volatile uint32 irqStatus;
if( bNoOS )
{
irqStatus = _SL_Disable_IRQ();
}
else
{
WOS_vCsEnter();
}

if( !bOnce )
{
u8Expected = *vimRAMParLoc;
bOnce = TRUE;
}
if( *vimRAMParLoc != u8Expected )
{
static boolean bOnce1 = FALSE;
if( !bOnce1 )
{
bOnce1 = TRUE;
DBG_PRINT( "Pre-test parity bit: 0x%x\r\n", *vimRAMParLoc );
}
}
if( !SL_SelfTest_VIM( VIM_SRAM_PARITY_TEST ) )
{
static boolean bOnce2 = FALSE;
if( !bOnce2 )
{
bOnce2 = TRUE;
DBG_PRINT( "Parity test fail\r\n" );
}
}
if( *vimRAMParLoc != u8Expected )
{
static boolean bOnce3 = FALSE;
if( !bOnce3 )
{
bOnce3 = TRUE;
DBG_PRINT( "Post-test parity bit: 0x%x\r\n", *vimRAMParLoc );
}
}

if( bNoOS )
{
_SL_Restore_IRQ(irqStatus);
}
else
{
WOS_vCsExit();
}

if( (u32Tests % 1000U) == 0U )
{
DBG_PRINT( "Tests: %u, parityerrors: %u\r\n", u32Tests, u32ParityHanderCalls );
}
}

Added this calculator to HALCoGen parity error handler to log every PAR_FLG == 0 entry
/* Identify the channel number */
uint32 error_channel = ((error_addr & 0x1FFU) >> 2U);

extern uint32 u32ParityHanderCalls;

if( VIM_PARFLG == 0 )
{
u32ParityHanderCalls++;
}

if(error_channel >= VIM_CHANNELS)

So as can be seen, the other CPU activity has impact to parity error hander calls meaning that other IRQs arriving simultaneously with with SafeTI VIM parity testing has some side effects...

0 Veena Kamath over 7 years ago in reply to Jarkko Silvasti

TI__Mastermind 32565 points

Hi Jarkko Silvasti,

Are you able to fix the issue with frequent parity handler calls with PARFLG = 0? If you follow the disabling and enabling of interrupt lines, the handler takes care of clearing the pending interrupts in a clean way.
As you mentioned, even though you are running your VIM parity test with interrupt disabled, there is possibility of some other interrupt occurring meanwhile. The vector address register VIM will still point to the parityHandler. But since the handler clears the interrupt in a clean manner, it should not branch to the parity handler again and again. It will be called once (with PARFLG = 1)every time there is a parity error detected. let me know if this is not the case
Unfortunately, the statement mentioned in the TRM is not true. Clearing the PARFLG does not reset the vector address with the actual ISR address. I will report the same to the concerned person.
Parity Handler provided as part of HALCoGen is a reference code. It is defined as a weak function. You can redefine the same. It takes an assumption that VIM is initialized with vimInit and is not disturbed.
Also, what do you mean by VIM RAM parity bit flipping fails? Do you mean parity bit flip done by SL_SelfTest_VIM function is not working as expected?

Thanks and Regards,
Veena

0 Jarkko Silvasti over 7 years ago in reply to Veena Kamath

Expert 1395 points

Hi,

I think that I was now able to fix "frequent"/"repetitive" calls for good so parity handler is called only once sometimes when SafeTI test is run but this your sentence below does not match 100% to my observations but I think that I now understand what happens and why. I also managed to get my latest above own handler stuck to parity error when running code from main() without OS....

Veena Kamath said:
It will be called once (with PARFLG = 1)every time there is a parity error detected.

- The function indeed may get called when PARFLG=0.

First the PARFLG=0 call case:

As mentioned previous post, I used HALCoGen default handler with following logger added to it
    if( VIM_PARFLG == 0 )
    {
        u32ParityHanderCalls++;
    }

And when now run over night the result today morning was like this every 1/816 tests caused the parity handler to activate (Any of the values may have also overflowed)
Tests: 2749988000, parityerrors: 3367989<CR><LF>

So obviously the parity handler gets called while PARFLG = 0, but now just came to conclusion that this must have happen in a case when VIM registers other interrrupts while SafeTI is doing its testing

Veena Kamath said:
The vector address register VIM will still point to the parityHandler.

Since IRQVECREQ or FIQVECREQ vector address is put to point to parity handler in this case, the SafeTI PARFLG clearing in the end of theSafeTI test is not enough like it is not enough in parity handler either (it is still enough in case no interrupts arrives) -> that causes 1 call to parity handler with PARFLG=0 in case some interrupts arrives while PARFLG=1 due to SafeTI testing.

====> Can you confirm this conclusion?

Then own handler stuck when using it from main() without OS case:
Since I was doing anything in case PARFLG=0 the code stucks to repetitive parityhandler call loop. That must be caused by the fact that in the main() use case only interrupt which comes is debug_print related DMA BTC after whole line is printed. In case same handler is used in real application, it does not stuck to parity handler call loop just causes "some" repetitive calls to handler.

So the VIM must load IRQVECREQ with proper vector if new interrupt arrives when PARFLG is 0 even IRQVECREQ contains parity error handler address. Meaning that parity error handler must react/work just like HALCoGen does also in case PARFLG=0 what comes to the interrupt enable/disable part (reads if there is any active and toggle that). Basically what may should not be done in this PARFLG =0 case is read VIM_ADDER register and try to restore that vector (doesn't harm though?) since there is not parity failure and according to TRM the content of the register is valid (guessing that content is still always either 0 or previous failed address so it will be always in "valid range" meaning that any array access overflow cannot happen if just writing to that vector)...

So basically the problematic scenario starts like this
1. Interrupts disables (CPU I-bit)
2. SafeTI starts the test and proceeds so far that PARFLG=1 (not yet cleared it)
3. Interrupt (IRQ) arrives to VIM -> VIM sets IRQVECREQ to parityhandler address
4. SafeTI finishes testing PARFLG = 0
5. Interrupts enables (CPU I-bit)
6. VIM lauches IRQVECREQ -> calls parity handler

Now if parity handler does not re-enable the active IRQ and no more IRQs are coming the IRQVECREQ does not get updated, but it looks to be updated if any IRQ is coming so that will un-lock the jam (this is something which is not also reading anywhere).

So conclusion is that it looks that interrupt re-enable is always needed in parity handler despite is the PARFLG 0 or 1, if it is 0 then there is no need to try to repair any VIM vectors...
====> Can you confirm this conclusion?

endof main() without OS case...

Veena Kamath said:
Also, what do you mean by VIM RAM parity bit flipping fails? Do you mean parity bit flip done by SL_SelfTest_VIM function is not working as expected?

Yes, it looked like that and I am practically 100% sure that prints does not lie. At some point the test was not able to flip the bit in parity RAM. And based on the prints the problem was actual flipping not opening the RAM for writing.

But since it didn't happen anymore in overnight test where at least 2749988000 tests was made, the problem must have (hopefully) related to spurious/consecutive parity handler calls so that those broke something somewhere (even that it shouldn't affect to bit flipping). How ever still considering to disable that VIM RAM parity testing from runtime tests since it does give any FITs to be just sure that this bit flipping cannot problem and thus test failing cannot ever occur.

After you corfirm those 2 parts in I am thinking that this case is handled and I'll mark your answer as verified (and then hoping that this parity flip problem does not re-appear).

Veena Kamath said:
Parity Handler provided as part of HALCoGen is a reference code.

Yes, understand that, but since anywhere is not any comments why something is done it is quite impossible to know which parts are extreme critical to do just like it is in reference (or at least achieve similar end results) and which aren't. Then throw into mix a TRM which says something that leads you to a pitfall if doing as read from there so it is not trivial to re-write the reference code to suit your own needs :). I am considering myself lucky to get that problem visible in my handler (testing the handler with parity bit errors which were fixed inside handler (for testing purpose) wasn't enough since other irq's make the VIM to refresh vectors since PARFLG was 0 after the fixing, tests should have made with only one interrupt active :)), if my generic SafeTI test period would have been targeted 1h vs 12sec it may have not popped visible until code has run really long time...

As can be seen it looks like that this interrupt re-enable part is extreme critical and without it there will be at least performance impact and in worse case total gridlock if no new irqs are coming to vim. Then that vector fixing part of the code is something which most likely will corrupt your vectors in case real parity error happens and you have used operations to change vectors in VIM RAM. So "half" of the code is such which is mandatory and "half" of the code such what at least I can't suggest to be used as is.

Here is function in case someone is interested which looks to work as well as HALCoGen but doesn't try to restore the vectors from ROM (just checks&halts if failure still exists after acknowledge) and should work with also SafeTI ESM high corrupt case since uses wrapper function which was made under sl_esm.c to divert that __fiq function call return value problem.
[code]
__irq
static void vVimParityErrorHandler( void )
{
    // Mimic what HALCoGen does in this function
    typedef volatile struct vimRam
    {
        t_isrFuncPTR ISR[VIM_CHANNELS];
    } vimRAM_t;

    #define vimRAM ((vimRAM_t *)0xFFF82000U)

    // Address in VIM_ADDERR is valid only if PARFLG = 1, there is also nothing in vectors to repair/check if PARFLG is 0
    if( VIM_PARFLG != 0U )
    {
        /* Identify the corrupted address */
        uint32 u32ErrorAddr = VIM_ADDERR;
        /* Identify the channel number */
        uint32 u32ErrorChannel = (uint32)((u32ErrorAddr & 0x1FFU) >> 2U);
        /* Clear Parity Error Flag */
        VIM_PARFLG = 1U;

        if( u32ErrorChannel < VIM_CHANNELS )
        {
            // NOTE: here could be added vector restore but then SL_ESM_Init() & vimChannelMap() changes to vectors needs to be taken into account
            (void)vimRAM->ISR[ u32ErrorChannel ]; /*lint !e9078 !e923 */ /* r11.4 & r11.6 PD */ // re-read same vector

            if( VIM_PARFLG != 0U ) // test if again parity eror
            {
                DBG_PRINT_PANIC( "\r\n\r\n===VIM RAM PARITY ERROR - permanent error: %u, %u ===\r\n\r\n", u32ErrorChannel, VIM_PARFLG );
                WOS_vInfiniteLoop();
            }
            else
            {
                /* temporary error, acknowledge of the parflg was enough */
                //DBG_PRINT( "\r\n===VIM RAM PARITY ERROR - channel: %u @ %u us ===\r\n", u32ErrorChannel, HAL_u32TimeGet() );
            }
        }
        else
        {
            DBG_PRINT_PANIC( "\r\n\r\n===VIM RAM PARITY ERROR - Channel error: %u ===\r\n\r\n", u32ErrorChannel );
            WOS_vInfiniteLoop();
        }
    }

    // This is ALWAYS needed in order to get VIM download new vectors IRQVECREQ & FIQVECREQ, Also in case PARFLG=0
    // the handler needs to re-enable interrupts, since IRQVECREQ is not released until other IRQ comes when PARFLG is 0
    uint32 u32Vec = 0U;
    /* Disable and enable the highest priority pending channel to re-trigger the actual interrupt */
    if (vimREG->FIQINDEX != 0U)
    {
        u32Vec = vimREG->FIQINDEX;
    }
    else
    {
        u32Vec = vimREG->IRQINDEX;
    }

    // Check that it is real vector (not just generated error)
    if( u32Vec != 0U )
    {
        u32Vec--; // normalize to start from 0
        // if ESM high, it cannot be disabled/enabled, must call from here
        if( u32Vec == 0U )
        {
            vimREG->INTREQ0 = BIT_n( u32Vec ); // looks like may not needed...
            extern void sl_esm_high_handler(void);
            sl_esm_high_handler();
            // NOTE: vector 0 is in slot 1 in the VIM RAM
            /*t_isrFuncPTR pfEsmHigh = vimRAM->ISR[ 1 ];*/ /*lint !e9078 !e923 */ /* r11.4 & r11.6 PD */
            //pfEsmHigh();    // NOTE: cannot call since this function is "FIQ", does not return properly
        }
        else if( u32Vec < 32U )
        {
            vimREG->REQMASKCLR0 = BIT_n( u32Vec );
            vimREG->REQMASKSET0 = BIT_n( u32Vec );
        }
        else if( u32Vec < 64U )
        {
            vimREG->REQMASKCLR1 = BIT_n( u32Vec-32U );
            vimREG->REQMASKSET1 = BIT_n( u32Vec-32U );
        }
        else
        {
            vimREG->REQMASKCLR2 = BIT_n( u32Vec-64U );
            vimREG->REQMASKSET2 = BIT_n( u32Vec-64U );
        }
    }
}
[/code]

0 Veena Kamath over 7 years ago in reply to Jarkko Silvasti

TI__Mastermind 32565 points

Hi,

Yes. I believe your observation is right. For some reason, there was an interrupt generated while the VIM parity test was running. Since the interrupt was disabled, the ISR or the Parity Handler was never called. But the VIM Parity checker API clears the PARFLG. Once the interrupt is enabled, vimParityHandler is immediately called since the IRQVECREG is not updated with the correct address.
So, I think you should remove the check for PARFLG, and disable and enable the highest priority pending interrupt in parityHandler always.
Restoring of the correct ISR address may be done only in case PARFLG is set.
In HALCoGen provided handler, it restores the correct address from Flash. I am not sure how that can be handled with the VIM RAM is modified after vimInit. Probably after all the configuration, you could backup the VIM RAM contents in some memory location and restore from there.
I hope the parity bit flipping fail issue is resolved.
I understand that the TRM is bit confusing. It never said whether the IRQVECREG is will modified after PARFLG clear nor did it say it will retain the previous value. Definitely, the register is not getting modified with the flag clear. I will report the same to the spec owner. Thanks for pointing that out

Thanks and Regards,
Veena

Arm-based microcontrollers

Arm-based microcontrollers forum

RM48L952: FBPARERR address is called sometimes multiple times after VIM_SRAM_PARITY_TEST even PARFLG is 0