This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM44L520: SCI RX interrupt via FIQ causes prefetch abort

Part Number: RM44L520
Other Parts Discussed in Thread: HALCOGEN, SEGGER

Hi All!

Our application needs to reduce the interrupt latency on SCIs RX interrupt. So I tried to route it via FIQ which works partly in debug builds but does not work in release builds. As far as I can see, in debug builds the processor always stops at 0x0000000C, which indicates to my knowledge an instruction prefetch abort.

I did the following to find the root cause:

  • project was designed using HALCoGen 4.7.01 and uses FreeRTOS with MPU
  • the project is build using CCS 10.2.0.9 and TI compiler v20.2.1.LTS
  • the code works well if all SCI interrupts are handled via IRQ
  • changed the RX interrupt from Low to High Level (on SCI page of HALCoGen), enabled the new channel at VIM and assigned it to FIQ
  • HALCoGen now generates a interrupt handler in sci.c with "#pragma INTERRUPT(linHighLevelInterrupt, FIQ)" decoration
  • in RAM settings I assigned 0x200 for IRQ and FIQ Stack Length
  • If I do nothing in the FIQ handler (in the switch (vec) statement disable all but the "default" clause) the code still crashes
  • Using some debug pins I have the impression, that the processor never reaches the FIQ handler function in sci.c
  • If I let the code start with SCI RX on IRQ level and switch it to FIQ in one of the other SCI interrupts, the code crashes after switching.

Can it be, that there is more to do then just add this "#paragma INTERRUPT (.., FIQ)" to get a working FIQ handler? Am I missing something else here? Any hints are very much appreciated.

Jan

  • FIQ which works partly in debug builds

    In debug build, does the code jump FIQ handler when an interrupt occurs?

    Is the FIQ enabled by clearing F bit of CPSR register?

  • Hi QJ Wang!

    Many thanks for your reply!

    In debug builds, the code works almost as expected. FIQ handler is called and SCI notification as well. Just after some time and under yet unknown conditions the code stops at 0x0000000C. This seems to be the same signature as in release builds.When the code stops (actually a dead-lock as HALCoGen generates a default "prefetchEntry" that branches to itself) Abort_Registers -> R14_ABT is 0x10, Cp15 -> CP15_DATA_FAULT_STATUS is 0xD, CP15_INSTRUCTION_FAULT_STATUS 0xD, CP15_AUX_DATA_FAULT_STATUS 0x0, CP15_AUX_INSTRUCTION_FAULT_STATUS 0x400000, CP15_DATA_FAULT_ADDRESS 0x0 and CP15_INSTRUCTION_FAULT_ADDRESS 0xC. R13_FIQ still points to 0x08000C00 as configured in HALCoGen.

    In release builds I can not see a single call to SCI FIQ handler (I've put a debug IO in the handler and it never changes its level).

    Comparing the debug IO (in FIQ handler) and the external signal that triggers the FIQ, it looks to me, that the code dies in the moment it tries to execute the FIQ.

    Just to make sure its not something wired in my setup: I'm using a modified bootloader code from SPNA190 (USART Bootloader for Hercules RM48x MCU). The reset vector table is unchanged: reset branches into the bootloader, Undefined, Software and the two Abort Interrupts branch into the main application. IRQ and FIQ both use the "Register vectored interrupts" schema. The main application is based on HALCoGens RM44L520APE_FREERTOS.

    Jan

  • Hi Jan,

    The value (0xD) of DFSR  indicates that the issue is caused by the memory access permission. The MPU determines the access permission for all accesses to memory.

    I don't understand why the code works in debug mode, but fails in release mode. 

    BTW, what you switch from debug build to release build, the project properties need to be re-defined. Is there code optimization in your release built? 

  • Hi QJ Wang!

    Access permission, thats good news, now I "only" need to know where and why...

    My debug builds are without optimizations to simplify debugging. The release build is with level 4 (whole program optimization).

    It seems that my problem is even more strange: Today I tries to add a prefetch handler to check where - with respect to the external signal - the crash happens. In debug build the prefetch handler is not executed. The MCU stops at 0xc and does not continue before I single step one assembler instruction. I also noticed, that half of the time, the vector section at 0x0 just shows 0xd0d0d0d0 for all vectors in the memory windows. In this state, I even can't single step. (I'm using a Segger J-Link as JTAG probe, not sure if its not their fault, updated the driver, no change) BMMCR is always 0xA. My Flash-setup should also be ok, FRDCNTL = 0x311 for 180MHz HCLK.

    Just to repeat: my code dynamically switches the SCI RX interrupt from IRQ to FIQ level. If the RX interrupt is handled entirely in IRQ, everything works every well. If its at FIQ, the mentioned problems happen on a random base. Besides switching the interrupt level, no other part of the code is changed.

    The receiver itself basically waits for a break. The break interrupt executes at IRQ level, writes previous data into a FreeRTOS queue and requests memory from a second queue. (Certain FreeRTOS functions are safe to be called from IRQ level. I assume, they can never be called safely from FIQ level as FIQ can not be blocked.) The pointer to this memory is stored as static in the IRQ handler. The same handler is then called by RX interrupt at FIQ level and writes the new data to the memory. I assume (and verified by debug IO), that the break interrupt has been handled well before the first RX interrupt. So I assume, the memory pointer is valid if the RX interrupt is executed at FIQ. I added basic validation that the pointer shall not be NULL. I also assume, that the RM44 is a 32bit machine and that storing my memory pointer in the static variable is atomic.

    Jan

  • Just to repeat: my code dynamically switches the SCI RX interrupt from IRQ to FIQ level.

    How did you switch interrupt from IRQ to FIQ dynamically?

    #pragma INTERRUPT(lin1HighLevelInterrupt, FIQ)  ------ for FIQ

    #pragma INTERRUPT(lin1HighLevelInterrupt, IRQ)  ------ for IRQ

  • The code optimization may impact the code execution. Would you try without code optimization?

  • Not exactly like this. I'm using the two interrupt lines from SCI to VIM as

    #pragma INTERRUPT(lin1HighLevelInterrupt, FIQ)  ------ for FIQ

    and

    #pragma INTERRUPT(lin1LowLevelInterrupt, IRQ)  ------ for IRQ

    To change the RX interrupt priority is use SCI's interrupt level register. In both functions I call the same handler function (which does not call any FreeRTOS function in the SCI_RX_INT path.)

  • The code optimization may impact the code execution. Would you try without code optimization?

    Yes, thats easy.

    A Debug build with "Whole program optimization" (like release builds) behave the same as without optimization: it dies after a random time in the middle of a data frame with some bytes already received.

    A Release build without optimization behaves the same as debug build: they die after a random time and just like the debug build, my prefetch handler is not executed (even without debugger attached).

  • Hi Jan,

    I am not familiar with freeRTOS. Is FIQ disabled by freeRTOS when entering a critical section? 

  • I am not familiar with freeRTOS. Is FIQ disabled by freeRTOS when entering a critical section? 

    According to TRM (Note in section 15.2.2) FIQ is a non-maskable interrupt in Cortex-R4F, so it can not be disabled. I'm very careful not to call any FreeRTOS functions, which would definitely corrupt the kernel.

    I'm only writing received data to previously allocated memory. I think, that should be "allowed". Is there any instruction in the Cortex-R4 that is not atomic or might get corrupted if interrupted by the FIQ?

  • Today I made an other test: I removed almost everything from the FIQ handler except reading the Receive Data Buffer to acknowledge the interrupt. This is the disassembly:

     953          scilinREG->RD;
              linHighLevelInterrupt():
    00042460:   E59F802C            ldr        r8, [pc, #0x2c]
    00042464:   E5988000            ldr        r8, [r8]
    1033      }
    00042468:   E25EF004            subs       pc, r14, #4
    [...]
              $C$CON52:
    00042494:   FFF7E434           .word       0xfff7e434
    

    As you can see, its loading the address of the register and reading it. Without any prolog or epilog, knowing that R8 is banked in FIQ. There is also no stack action involved, so FIQ stack setup should not care.

    But also this code dies after some time with the same signature: the CPU stops at 0xc and does not execute the prefetch handler, thats connected there.

    Am I overlooking something here?

    Smells more and more like a silicon bug to me. Who would be the one to address this to?

    Jan 

  • You are correct. The FIQ in Cortex-R4F is Non-Maskable. 

    The prefetch aborts you are seeing might be because of missing ECC for your code or parts of your code. Would you try generate ECC using linker cmd file? SO you can generate the ECC including the holes. 

    Here is example linker cmd file for RM44Lx device. It will generate ECC for the whole flash.

    /*----------------------------------------------------------------------------*/
    /* USER CODE BEGIN (0) */
    /* USER CODE END */


    /*----------------------------------------------------------------------------*/
    /* Linker Settings */

    --retain="*(.intvecs)"

    /* USER CODE BEGIN (1) */
    /* USER CODE END */

    /*----------------------------------------------------------------------------*/
    /* Memory Map */

    MEMORY
    {
    VECTORS (X) : origin=0x00000000 length=0x00000020 vfill = 0xffffffff
    FLASH0 (RX) : origin=0x00000020 length=0x000BFFE0 vfill = 0xffffffff
    SRAM (RW) : origin=0x08002000 length=0x0001EB00
    STACK (RW) : origin=0x08000000 length=0x00001500
    /* USER CODE BEGIN (2) */
    #if 1
    ECC_VEC (R) : origin=(0xf0400000 + (start(VECTORS) >> 3))
    length=(size(VECTORS) >> 3)
    ECC={algorithm=algoL2R4F021, input_range=VECTORS}

    ECC_FLA0 (R) : origin=(0xf0400000 + (start(FLASH0) >> 3))
    length=(size(FLASH0) >> 3)
    ECC={algorithm=algoL2R4F021, input_range=FLASH0 }
    #endif
    /* USER CODE END */
    }

    /* USER CODE BEGIN (3) */
    ECC
    {
    algoL2R4F021 : address_mask = 0xfffffff8 /* Address Bits 31:3 */
    hamming_mask = R4 /* Use R4/R5 build in Mask */
    parity_mask = 0x0c /* Set which ECC bits are Even and Odd parity */
    mirroring = F021 /* RM57Lx and TMS570LCx are build in F021 */
    }
    /* USER CODE END */

    /*----------------------------------------------------------------------------*/
    /* Section Configuration */


    SECTIONS
    {
    .intvecs : {} > VECTORS

    .text : {} > FLASH0
    .const : {} > FLASH0
    .cinit : {} > FLASH0
    .pinit : {} > FLASH0
    .data : {} > SRAM
    .bss : {} > SRAM
    .sysmem : {} > SRAM

    /* USER CODE BEGIN (4) */
    /* USER CODE END */
    }

    /* USER CODE BEGIN (5) */
    /* USER CODE END */

  • If you run your code (SCI RX with FIQ) without using freeRTOS, does the code generate same exception? Would you take a quick test by commenting out the "vTaskStartScheduler()"? 

  • Would you try generate ECC using linker cmd file?

    Sorry, that does not change the situation. The MCU still stops at 0xc after some time.

  • If you run your code (SCI RX with FIQ) without using freeRTOS, does the code generate same exception?

    No, I haven't see the exception within the last half hour. Usually it happens after a few seconds.

    I tries to disable/reenable all tasks one by one and it seems, that the exception is triggered by a context switch: if I enable a task, that just reads some IOs every 20ms I can see the exception again regardless if the task is actually reading the IOs. This is in sync with previous observations, that calling OS functions (xQueueSend() which might and usually does trigger a context switch) makes the exception to be triggered more likely.

    I'm able to trigger the exception easily if I setup a task, that just delays for 1 ms. My external signal generates about 5k FIQs/sec. I'm using the MPU enabled FreeRTOS support.

    It seems, that there is not official FreeRTOS port for R4F with MPU support available. At least I did not found it on GitHub. Do you have any hints where to continue?

  • I did some more investigations and compared the port specific code for the Cortex-R4F with the ones officially available for Cortex-M4. It seems, that the HalCoGen provide code for portRESTORE_CONTEXT in os_portasm.asm misses to disable the MPU before reconfiguring it. I copied the code from prvMpuDisable and prvMpuEnable found in the same file and have not seen the exception again. Even with all my code reenabled and ECC generation in linker script reverted, I still don't see the exception anymore.

    How can I file a bug-report on this issue?