This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM48L952: Testing ESM Error Handler

Part Number: RM48L952
Other Parts Discussed in Thread: HALCOGEN

I am looking to run through some final tests on the ESM configuration we have selected for our application. The tests are intended to inject the fault and then indicate via our ESM handler which diagnostic tripped us. I have the following diagnostics configured in Halcogen, each selected based upon their use within our application or as a requirement of the governing safety standards.

1 :MIBADC2

6 : Flash ECC Single

7: NHET Parity

11: Clock Monitor Int

15: VIM Parity

18: MIBSPI3 Parity

19: MIBADC1 Parity

26: RAM ECC Even

27: CPU Selftest

28: RAM ECC Odd

31: CCM Selftest

35: EEPROM Single

36: EEPROM Double

37: IOMM Mux Config

All ESM faults are configured to toggle nError and trigger a Low Level Interrupt.

Our Application and board IO architecture is designed to AND all safety outputs with the nError pin, thus on an ESM error we de-energize all safety outputs.

My verification of the ESM configuration is making use of the selftest routines in test builds to invoke the ESM fault after startup checks complete, the expectation that my ESM handler will be ran and we will see the failure handled. Giving indication that the faults we have identified can be captured and handled appropriately by the application.

The interrupt handler is shown below

#pragma WEAK(esmGroup1Notification)
void esmGroup1Notification(uint32 channel)
{
/*  enter user code between the USER CODE BEGIN and USER CODE END. */
/* USER CODE BEGIN (1) */
    /* Save ESM Channel c:000.SYS.A.115.v7 */
    uni_cfg_set_esm(0U,channel);
/* USER CODE END */
}

/* USER CODE BEGIN (2) */
/* USER CODE END */
#pragma WEAK(esmGroup2Notification)
void esmGroup2Notification(uint32 channel)
{
/*  enter user code between the USER CODE BEGIN and USER CODE END. */
/* USER CODE BEGIN (3) */
    /* Save ESM Channel c:000.SYS.A.115.v7 */
    uni_cfg_set_esm(1U,channel);
/* USER CODE END */
}

The uni_cfg_set_esm routine will save the 'channel' to FEE. Subsequent powerups will permit interrogation of this value before resetting to give some opportunity for us to determine what type of ESM fault we are seeing.  I have tested this successfully for the following channels:

1,7,11,15,18,19,26,10,6,35

I have the following questions before I can consider this verification exercise complete and would appreciate input from Ti:

1. Please confirm the clock that the MCU will revert to when the clock monitor diagnostic fails - is it the LPO?

2. Confirm that both checkRAMUERRTest() & checkFlashEEPROMECC() would be expected to trigger ESM channel 6 faults

3. Is there a method to inject RAM ECC odd parity faults (channel 27)?

4. Is there a method to inject EEPROM double bit faults (channel 36)?

5. When I run cpuSelfTest() from my application loop the processor effectively hangs and I've read around a number of posts that indicate this would be expected. Is it true to say that upon a true run time CPU error I cannot expect my application to get the opportunity to execute and therefore will not be able to attempt the recording of the channel source ID to FEE?

6. Similar to 5, CCM selftest routines will not permit my application to capture and record the channel source ID to FEE?

7. nError will be driven low in every case of an ESM error and remain low until the EKR register is cleared. I have captured this behavior successfully for channels 1,7,11,15,18,19,26,10,6,35 but for those other channels that I have questioned above I am not able to show that nError is held low.

I appreciate that these tests are running at startup in the startup code but I would like, for my own verification, to see each one at least once on bench trigger the appropriate response on nError and also be recorded to FEE. If there is a better approach to recording for future debug which ESM error has triggered the fault I would also be interested to know.

Thanks

Jamie

 

  • 1. Please confirm the clock that the MCU will revert to when the clock monitor diagnostic fails - is it the LPO?

    QJW> Yes, this is the error of CLKDET (LPO clock detect)

    2. Confirm that both checkRAMUERRTest() & checkFlashEEPROMECC() would be expected to trigger ESM channel 6 faults

    QJW> checkRAMUERRTest() is to test the redundant address decode and compare logic for the SRAM. It is channel 6 or channel 8 of ESM group 2.
    checkFlashEEPROMECC() is to checks Flash EEPROM ECC error detection logic. It is channel 35 or 36 of ESM group 1.

    3. Is there a method to inject RAM ECC odd parity faults (channel 27)?
    QJW> do you mean channel 26 or channel 28 of ESM group 1?
    Yes, you can for a single bit error or double bit error in both the RAM banks. Please refer to the sys_selftest.c generated through the HALCoGen, or refer to the diagnostic library SL_SelfTest_SRAM().

    4. Is there a method to inject EEPROM double bit faults (channel 36)?
    QJW> yes, please refer to checkFlashEEPROMECC() or SL_SelfTest_Flash().
  • 7. nError will be driven low in every case of an ESM error and remain low until the EKR register is cleared. I have captured this behavior successfully for channels 1,7,11,15,18,19,26,10,6,35 but for those other channels that I have questioned above I am not able to show that nError is held low.

    QJW> Any error from the channels of ESM group 2 and ESM group 3 will pull the ERROR pin LOW. The Group1 errors have a configurable ERROR pin behavior.

    BTW, what is FEE in #5 and #6?

  • By FEE I mean the Flash Emulated EEPROM. I store the channel ids passed to the interrupt handler to block 7 EEPROM so I can record what ESM fault has occurred.
  • In the screenshot below you can see my typical halcogen configuration of required group 1 channels:

    Taking the MIBADC2 as an example you can see in the screenshot below what activity I see when running a build with adc2ParityCheck called prior to my application loop. as expected my debug output shows the correct channel of ESM registered as the fault and the nError pin remains low upon detection. 

    Now, if I run a similar build but this time call checkFlashECC the correct channel is logged as generating the ESM error but for some reason the nError pin does not remain low. All logic in my application is the same other than I am calling the checkFlashECC in place of adc2ParityCheck. So why does nError go high? It is important for me to understand this as production intent builds are all assuming that any ESM error that I have configured will result in nError driven and held low until the application decides it is OK to release it by clearing the EKR register. 

  • One thing I thought might be causing this would be a pre-load on the ESM EKR register but I can't see anything in the self test routine that does that.

  • Can anyone shed some light on the different nError activity when running different diagnostics?
  • Hello, can anyone tell me why nError does not remain low when I run these selftest routines. here is an example running the checkRAMEcc() function. My ESM handler correctly catches the channel 26 fault but nError does not remain low:

    I really need to see data that verifies, at least once, each of the diagnostics i'm expecting to be enabled running. With the appropriate nError activity. Halcogen is always configured the same way, I cannot see why some would allow nError to return to the high state. Appreciate some input on this, so I can close out the verification.

  • Hello, i'm running out of time on this, I would really like some clarification on the expected nError activity on the the self test routines. As you can see above I have some where nError clears and I did not expect it to.
  • Hi Jamie,

    Sorry for the delay in responding.

    Group1 channel 26 is for a single-bit error on a RAM access. This error does not drive the nERROR pin low by default. Have you programmed the ESM to drive the nERROR pin low even on a single-bit ECC error on a RAM read?

    Regards,
    Sunil
  • Hello Sunil,

    Yes I have nError set to be driven low for every group 3 channel as listed above. My expectation was for every fault to deliver the same activity on nError but some of them are not holding nError low. I can be more specific tomorrow when back in the office, but earlier in this post I listed all group 3 faults that I need to establish at least one test for.

    Thanks

    Jamie

  • Hello sunil,

    My post on Sep 25, 2018 12:04 AM: showed how nError behaves differently depending on the self-test routines I execute even though I have ESM group 1s all configured the same way with a low interrupt and nError driven low. I suspect some of the routines are simply pre-loading the EKR register but have not been able to confirm this. In some cases I am perhaps using the wrong self test routine to inject fault, feedback on what I should be doing would be appreciated.

    To recap, I have selected the following group 1 errors as relevant to my application and need, at least once, to verify on bench that I can catch them and act upon them as intended. The main problems are

    1. nError does not remain low on channel 6, 26, what is clearing it?

    2. have not identified the best way to test channels 27, 31, 37, please advise what I can use or direct me to a test method

    3. EEPROM single and double bit errors, once injected, prohibit my application from running as before, all outputs are de-energized due to the nError pin state but I cannot read my input button that would normally clear the fault status after subsquent powerups. Is this because I have no recovery on the EEPROM so the MCU remains stuck in a fault state?

    Diagnostic Channel Test Routine Comment
    MIBADC2 1 adc2ParityCheck nError OK, Channel ID captured correctly
    Flash ECC Single 6 checkFlashECC nError goes high, channel ID captured correctly
    NHET Parity 7 het1ParityCheck nError OK, Channel ID captured correctly
    VIM RAM 15 vimParityCheck nError OK, Channel ID captured correctly
    MIBSPI3 PArity 18 mibspi3ParityCheck nError OK, Channel ID captured correctly
    MIBADC1 19 adc1ParityCheck nError OK, Channel ID captured correctly
    RAM ECC Even 26 checkRAMEcc nError goes High, Channel ID captured correctly
    CPU SelfTest 27 ? no test identified
    RAM ECC odd 28 checkRamAddrParity nError OK, Channel ID captured correctly
    CCM SelfTest 31 ? no test identified
    EEPROM Single 35 modified checkFlashEEPROMEcc nError OK, but cannot clear errors?
    EEPROM Double 36 checkFlashEEPROMEcc nError OK, but cannot clear errors?
    IOMM 37 ? no test identified
    Clock Monitor 11 manual, stop clock nError OK, Channel ID captured correctly
    Dual Clock Compare 30 manual, signal generator to alter clock nError OK, Channel ID captured correctly
  • Hi Jamie,

    If you do the flash ECC diagnostic test in c_int00(), the other test functions after checkFlashECC() may write 0x5 to EKR register.

    The following test function will clear the nERROR pin after the test:
    ccmSelfCheck();
    efcStuckZeroTest();
    fmcECCcheck();
    checkRAMAddrParity();
    checkRAMUERRTest();
    fmcBus1ParityCheck();

    The sys_selfTest.c has functions for CPU self test and CCMR4F self test:
    cpuSelfTest()
    ccmSelfCheck()

    For IOMM:
    Analog loopback testing from peripherals results in signals traversing the I/O pin mux logic to the I/O pad and can provide diagnostic coverage on the I/O pin mux. Analog loopback tests the signal path from the module to the I/O cell with the output driver enabled.
  • Thank you for this information, I am working through it.

    My approach to IOMM was to make an illegal write to the IOMM registers while in USER mode. This successfully delivered the expected ESM response and nError activity.

    But it has thrown up something else. The activity of nError seems to be somehow related to whether or not I am in USER mode. I will try to explain but this is a really confusing behavior!

    My application sequence is as follows:

    1. initialize ESM record from FEE

    2. if no ESM faults recorded last power cycle then inject the fault (e.g. ADC2Parity)

    2a. ESM handler will run from interrupt

    2b. ESM handler writes the channel ID to ESM record in FEE (as read in step 1.), nError is driven low and no action is taken by the application to clear the EKR register so it should remain low

    5. If ESM record shows error adopt FAULT state

    5a. permit clearing of ESM record in FEE if nError is "HIGH" i.e. no ESM error this power cycle, otherwise we remain in the FAULT state

    Step 2 is a local function call, within that call I move the core to USER mode after injecting the error and before entering my main loop. I do this in the understanding that I should really have my application running in USER mode unless I have good reason not to. When I make the transition to USER mode in this local function the nError activity is as I expected, remaining low when we inject the fault.

    If I move the instruction to adopt USER mode to after the local function call, but before entering my main  for loop & state machine the nError activity is not correct. It goes high again after the low time counter expires. It is as if the location of the instruction to enter USER mode somehow impacts nError clearing or remaining low. I cannot find any reference to USER mode having an impact on the EKR register that would result in the nError being driven high again. I certainly don't understand why moving the instruction to enter USER mode from within the local function to the immediate statement after it changes the behavior so significantly. Could it be something to do with interrupt sequencing, is there something else with the command to enter USER mode that could adversely impact the EKR register status and thus release the nError pin?

    For reference, I use the simple asm( " CPS 30x10"); instruction to get in to USER mode.

  • Hello,

    The system mode (0x1F) is the default mode after boot up. If any IRQ interrupt event occurs, the code jumps to the ISR, and the mode changes to IRQ mode (0x12), after return from the ISR, the mode changes back to the system mode (or the original mode). Dynamical changing mode inside the IRQ ISR might result in unwanted behavior.

    Do you switch to User mode in ISR in your Step 2?

    BTW, asm(" CPS 30x10") should be asm(" CPS #0x10")
  • The ESMEKR register can not be written in user mode. But switching the mode from privileged mode to user mode should not impact the value in EKR register.
  • Hello,

    I assume you have solved the problem. Switching mode dosn't impact the value ESMEKR, switching mode in ISR may result in unwanted behavior.
  • I still have some unexplained behavior that I think is because I am trying to write to EEPROM while in USER mode. I do this to reset the stored ESM channel numbers. Should I be doing something before any calls to the FO21_Flash_API driver if I am in USER mode?
  • The flash module (program flash and data flash EEPROM) control registers can only be written by the CPU while in privileged mode. The FEE driver and F021 flash API must be run in a privileged mode (a mode other than user) to allow access to the flash memory control registers.

    You can keep the default mode (system mode) until you finish writing the ESM status to EEPROM, then switch to user mode.
  • I thought there was something strange around my interfacing with the FEE driver. I now see that my FEE writes to reset my record of ESM work when I don't have the module in USER mode but they are incomplete/unsuccessful when I run them with it in USER mode.  Here is one such function that writes successfully when not in USER mode but is incomplete/unsuccessful when I have the application moved to USER mode.

    void uni_cfg_esm_upd(void)
    {
        if(uni_cfg_esm_write && (IDLE == TI_Fee_GetStatus(0)))
        {
            uni_cfg_packdata(uni_cfg_blk2_params, UNI_CFG_NUMPARAMS_BLK2);
            TI_Fee_WriteSync(2,fee_databuffer);
            uni_cfg_esm_write = FALSE;
        }
    }/* end uni_cfg_esm_upd_l */

    At 1 second intervals I read back the block, when in USER mode the reads are not indicating success i.e. the value I tried to write is coming back wrong. If not in USER mode it is always successful. So something interrupts or prevents a successful read/write when in USER mode.

    Should I be doing something specific when interfacing with the Ti_FEE_Read/Write library functions to first move to a Privileged state?

    Definitely closing in on a root cause here.

  • So moving to USER mode earlier, when I initialize all global configuration data from EEPROM it is clear that the FEE reads fail. If I enter USER mode before the intialization routine I basically get garbage out of EEPROM, if I move to USER mode after the initilization from EEPROM my congfiguration data is accurate. Could it be that RTI interrupts, in conjunction with being in the USER mode are screwing up my reads ? Each block is initialized with a construct similar to this:

    TI_FeeModuleStatusType Status_l;
    
        TI_Fee_Read(5,0,(uint8*)fee_databuffer,UNI_CFG_SIZE_BLK5);
        do
        {
            TI_Fee_MainFunction();
            delay();
            Status_l=TI_Fee_GetStatus(0);
        }
        while(Status_l!=IDLE);
        uni_cfg_unpackdata(uni_cfg_blk5_params, UNI_CFG_NUMPARAMS_BLK5);

    The unpack function is as follows:

    void uni_cfg_unpackdata(struct fee_data_block* dbp, uint8 numparams)
    {
        char *p;
    
        p = &fee_databuffer[0];
        /* For each parameter in the data block */
        for(uint8 i=0;i< numparams; i++)
        {
            //write each byte and move databuffer pointer */
            for(uint8 j=0;j < dbp[i].size;j++)
            {
                *(dbp[i].address + j) = *p;
                p++;
            }
        }
    }/* end uni_cfg_unpackdata */

    I think the unpack routine is OK as it works fine when I run the reads in privileged mode. So I am reaching the conclusion that Ti_FEE_Read and probably Ti_FEE_Write etc should

    1. only be ran in privileged mode

    2. may be interrupted or return bad data if RTI interrupts or similar interrupts can occur during the FEE activity

    3. that the FEE library functions are not inherently safe with regard to changing between USER mode to priviged modes or against interrupts

    Any direction or examples on the safe / correct usage of the Ti_FEE library while in USER mode would be appreciated.

  • I note that section 4.11 of the Ti_FEE_User_Guide version 1.10 Feb13, 2015 lists the following functions as requiring privileged mode:
    Ti_Fee_Init
    Ti_FeeInternal_WriteDataF021

    The Init routine is called before I enter USER mode. The WriteDataF021 function is called from the polled function Ti_Fee_MainFunction() so I expect this is the problem. I will try to integrate the routines that allow my application to move to / from USER mode to / from PRIVILEGED mode.

    But while I attempt to do that, is there not a variable in the FEE library that would explain why it is able to proceed to read out total garbage when Ti_FEE_Read is called from the USER mode? Seems to me that the driver should be more robust.
  • Hello Jamie,

    1. Your unpack routine is ok since it doesn't call the flash APIs or FEE driver which may read/write flash control registers.
    2. As I mentioned in my previous post that "The FEE driver and F021 flash API must be run in a privileged mode "
    3. Having interrupt while using FEE driver is fine, will not cause any problem.
    4. Th library is safe if it's APIs are not called in user mode.
  • Hello QJ,

    So this morning I took the examples from SPNA218 and integrated a mode switcher from USER to SYSTEM mode via the SVC handler examples provided. As I am only using SVC to move between the modes when working with FEE I have the following assembler function:

    ;-------------------------------------------------------------------------------
    ; SWI wrapper
            .global     _svc
            .text
            .arm
            .armfunc    _svc
    
       .align 4
    _svc:
            .asmfunc
            ; Preserver A1 and A2 these may hold parameters to the function which are needed in C level handler
            ; Note: This handler doesn't preserve callee saved (Save-on-call) registers, A1 to A4 and V9.
            ;       In other words the callee function has to preserve them which is ensured when function like SVC is used in C (#pragma SWI_ALIAS() or __svc())
            ;       Take care when assembly inlining SVC / SWI.
    
            MRS     A4, SPSR            ; Get spsr
    
            TST     A4, #0x20           ; Called in Thumb state?
    
            ; Note: When called from Thumb code only 256 unique SVC handlers can be distungished, as the Thumb SVC instruction has only a 8 bit field.
            ;       When called from ARM code 2^24 unique SVC handlers can be distungished, as the ARM SVC instruction has a 24 bit field.
    
            LDRNEH  A3, [lr,#-2]        ; Yes: Load halfword and...
            BICNE   A3, A3, #0xFF00     ; ...extract comment field
            LDREQ   A3, [lr,#-4]        ; No: Load word and...
            BICEQ   A3, A3, #0xFF000000 ; ...extract comment field
                                        ; r2/A3 now contains SVC number
                                        ; r3/A4 now contains SPSR (Saved Program Status Register)
    
            CMP     A3, #32
            BHI     _default            ; Branch if higher
    
    		LDRLS   pc, [pc, A3, LSL #2]; Load address from table
    
    		.word   0x00
    
    _table: .word   (_case0)  ; unimplementedSVC
            .word   (_case1)  ; switchCpuMode
            .word   (_case2)  ; switchToSystemMode
            .word   (_case3)  ; switchToUserMode
    
            .word   0x00
    
    _case0: ; unimplementedSVC (used to test fault handler)
            B       _default
    
    _case1: ; switchCpuMode
            AND     A2, A1, #0x0000001F ; Ensure that only mode bits are in A1
            AND     A1, A4, #0x0000001F ; Store mode on entry in R0/A1 to return it to callee
            BIC     A4, A4, #0x0000001F ; Clear Mode bits
            ORR     A4, A4, A2          ; Set Mode bits as in A2 (former A1)
            MSR     SPSR_cxsf, A4       ; Restore spsr
            B       _exit_svc           ; Branch to exit handler
    
    _case2: ; switchToSystemMode
            ;BIC     A4, A4, #0x0000001F
            ORR     A4, A4, #0x0000001F ; Set bits fro System Mode (M0-M4 are set)
            MSR     SPSR_cxsf, A4       ; Restore spsr
            B       _exit_svc
    
    _case3: ; switchToUserMode
            BIC     A4, A4, #0x0000001F ; Clear Mode bits
            ORR     A4, A4, #0x00000010 ; Set Mode Bits for User Mode
            MSR     SPSR_cxsf, A4       ; Restore spsr
            B       _exit_svc
    
    _default:
    
    _exit_svc:
            MOVS PC, LR              ; Return from Exception
    
            .endasmfunc
    
            .end
    

    With the #defines added to a relevant include file:

    #pragma SWI_ALIAS(unimplementedSVC,     0);
    #pragma SWI_ALIAS(switchCpuMode,        1);
    #pragma SWI_ALIAS(switchToSystemMode,   2);
    #pragma SWI_ALIAS(switchToUserMode,     3);
    
    void     unimplementedSVC(void); /* Used to test fault handler */
    uint32_t switchCpuMode(uint32_t u32ModeNum);
    void     switchToSystemMode(void);
    void     switchToUserMode(void);
    

    I verified that the mode switcher functions work by replicating the failures I was experiencing by not switching to SYSTEM mode when calling FEE driver routines. So I am of the opinion these work. Typical logic is as follows:

        TI_FeeModuleStatusType Status_l;
    
        switchToSystemMode();
        if(uni_cfg_tim_write && (IDLE == TI_Fee_GetStatus(0)))
        {
            uni_cfg_packdata(uni_cfg_blk3_params, UNI_CFG_NUMPARAMS_BLK3);
            TI_Fee_WriteSync(3,(uint8*)fee_databuffer);
            uni_cfg_tim_write = FALSE;
        }
        uni_cfg_unpackdata(uni_cfg_blk2_params, UNI_CFG_NUMPARAMS_BLK2);
        switchToUserMode();
    

    The problem I am now seeing is that my RTI interrupts are being killed when I execute the writes. It would look like I get a spurious interrupt that is sending me to the undefEntry vector. Can you see an obvious problem with my assembler routines? I'll keep digging to try and understand the source of the undefEntry.

  • It appears that the application is hitting an undefined instruction as it is jumping to the undefEntry. Now from reading around the subject this is due to an ARM versus Thumb instruction being requested (see SPNA218 sec 2.2). Screenshot below of the SPSR_UND register, what are the best steps to working back to root cause? From what I can see the MCU is in ARM mode and also set to USER mode. Could this be an indication that an interrupt has moved us back from SYSTEM to USER mode? How best to avoid this - by disabling interrupts prior to running the FEE routines?

  • So there was a conflict coming from some FEE activity that is triggered from the RTI interrupt on a periodic basis (6secs during debug/60secs at production). What I think happened is the RTI moved the system to IRQ mode, then the FEE drivers I have moved to SVC and on conclusion moved back to USER, the exit back to the interrupt thus had the MCU mode for interrupts wrong i.e. USER and not IRQ and the clean up in the interrupt handlers caused an exception - perhaps doing a thumb vs ARM instruction. Some rearchitecture to have RTI only signal to the application of pending FEE writes fixed the issue. The main loop is now the only place interacting with FEE and the only place moving to from SVC/USER via the _svc handlers. This maintains the integrity of IRQ mode when RTI has to run. I'm now happy with all testing on our ESM handlers, FEE etc. Thanks QJ for leading me in the right direction.