This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/CC2640: ROV Unkown Exception

Part Number: CC2640
Other Parts Discussed in Thread: SYSBIOS

Tool/software: TI-RTOS

Hello,

I've been chasing a very strange bug for a while now and do not think I'm going to be able to track it down on my own. The product I'm working on uses a CC2640 and the application was initially based off of the simple_peripheral example. We are running BLE SDK version  2.2, TI-RTOS version 2.18 w/SYS/BIOS version 6.45. 

First, a little background on what led to the issue. I recently implemented a custom OTA firmware upgrade scheme for the CC2640. It borrows heavily from TI's version of OAD, and uses the same external flash / SPI drivers. The main difference is I've implemented custom characteristics within our own BLE service to handle the data transfer and control, rather than using TI's OAD service. I've thoroughly tested my OTA implementation and it is stable. It's probably worth noting that we're using SPI1, rather than SPI0 for communicating with the external flash. After completing the OTA work I I enabled the watchdog on the CC2640. I am configuring the watchdog directly through the inline functions in #include <driverlib/watchdog.h>. The watchdog functionality seems to work fine, I've intentionally inserted code to trip the watchdog and everything works as expected. However, when I have the watchdog feature enabled and attempt to do an over-the-air update I get an exception. I've stretched out the watchdog timer to insure that there aren't issues with the watchdog not being serviced during the OTA. My gut feeling is it's some kind of stack or heap overflow type of problem. I've tried adjusting the HWI stack, task stack, ICall heap, and BIOS heap sizes to no avail. I've done a lot of inspection of the raw memory before/after the exception trips and haven't been able to locate any overflows. I've also been trying to use the ROV tool to narrow down the issue, but haven't had any luck and am getting some strange/unexpected data out of the tool.

When I attempt to kick off an OTA session my application attempts to open the external flash and read out image information just like TI's OAD implementation. An exception gets tripped in the ExtFlash_open();. I've traced the exception to the first time this function tries to write to SPI, specifically where the  SPICC26XXDMA_transfer() calls Semaphore_pend() on line 866:

if (!Semaphore_pend(Semaphore_handle(&(object->transferComplete)), object->transferTimeout))

This function trips an exception. I've tried loading the RTOS ROM symbols to trace into Semaphore_pend, but it's hard to follow and hasn't led me to the issue. Here are some screenshots and comments from the ROV:

Task Stacks at time of exception:

Exception details: (I also checked the CFSR both at entry to my exception handler, and at entry to the exception handler in ROM, all 0's both times...I don't understand whats raising the exception)

Tried to use the "Scan for errors feature" but it generated and error.....

I'm guessing whatever is causing the above is also related to why I can't monitor the system stack in ROV:

I'll be trying to reproduce the issue with the watchdog disabled so that I can rule that out, but I am running out of ideas for ways to try to trace this issue, any help is much appreciated.

Best Regards,

Josh M

  • I instrumented the ICall heap manager by defining HEAPMGR_METRICS, everything looks good, same values for all the variables immediately before and after the exception trips:

    I'm starting to think this is something specific to the watchdog and not a stack/heap type of issue. That said, I have no idea what it could be. Most of the debugging I've done I've had breakpoints set in both the exception handler and the watchdog interrupt. The exception handler break point always hits first, and I've checked the watchdog counter inside the exception handler and it always has plenty of time left before it expires.

  • Any advice on methods to further debug/trace the root cause of this issue?

    Some more things I've tried...I'm grasping at straws at this point:

    1). I swapped SSI1 and SSI0 so that the OTA is using SSI0, still get the same exception behavior.

    2). I took a look at the SS1 (configured for ext flash) and UDMA registers right before the call that trips the exception, they have the exact same values in code with the watchdog enabled and in code with it disabled.

    3). I set a breakpoint at 0x1001bbc8, which I think is the ROM code exception handler, that eventually calls my app exception handler. I checked the CFSR at this breakpoint and the CPU isn't reporting any exceptions, it's all 0's. Is the BIOS calling the exception handler for a "soft" fault? A few things to note at this breakpoint:

    a). The exception number in the xPSR register is 0x18, which corresponds to the SS1_COMB interrupt.

    b). The LR is 0xFFFFFFFD, is this what caused the exception? Trying to return to an invalid address?

    I'm going to try one more time to trace through the semaphore_pend() with the watchdog enabled and with it disabled, hopefully I'll be able to spot where they diverge and narrow down the issue. In the meantime any advice is much appreciated.

    Thanks,

    Josh M

  • Hello Josh,

    Good job on the above average debugging!

    Have you checked the System Stack utilization prior to the exception? It wasn't clear if the inability to check this was prior or after the abort.

    Also, are you using TI ARM Compiler v5.2.6?

    Best wishes
  • Hey JXS,

    The empty/red ROV->Hwi->Module boxes for hwiStackPeak and hwiStackSize are like that both before and after the exception. This seems to be a general issue with my project in CCS, this issue is also present in a build with the watchdog disabled.

    I am using v5.2.6 of the TI ARM compiler, that one slipped through the cracks when rattling off versions in my short novel of a first post!

    I've continued trying to narrow down just what goes wrong after the semaphore_pend() call in the SPI driver, I haven't pinpointed it yet, but I believe I am very close. Will post as soon as I know more.

    Best Regards,
    Josh M

  • Hi Joshua,

    Joshua Meyer said:
    The empty/red ROV->Hwi->Module boxes for hwiStackPeak and hwiStackSize are like that both before and after the exception

    When did you check before? Can you check in main just to make it is working there? What about before you call the first SPI_transfer (not at the Semaphore_pend). I thinking you might have blown the system stack with the SPI interrupt. What is the size of the system stack?

    Todd

  • Whoops! You set this to the wrong guy!
  • By "before" the exception I meant when my application was running normally, before I take the action that causes the exception. I set a breakpoint in a periodic task and checked the ROV, same red/blank boxes. Then I continued execution and kicked off an OTA session, with a breakpoint on the exception handler, which ends up tripping.

    Same issue with ROV at BIOS_start in main():

    My system stack is set to 1024 bytes, from the map file:

    200034c8 heapStart
    20003f68 __stack
    20003f68 heapEnd
    20004368 __STACK_END
    20004368 __STACK_TOP

    I've manually checked the system stack by looking at the memory, I don't think I'm overflowing it. 

    Right at the beginning of the offending SPICC26XXDMA_transfer() call:

    At the call to semaphore_pend(), you can see the stack has changed quite a but it's below the peak depth it's ever been to:

    And finally at the exception hook, nothing has changed on the system stack, which I find somewhat surprising, but I'm not super well versed on what's going on under the hood while that semaphore is pending:

  • Is this before or after the OTA? For example, do you see the above red error in ROV->Hwi->Module when you load the image via JTAG in CCS? Or is this the new image that was loaded via OTA? If it is the latter, ROV is getting confused since it is parsing the image that was loaded with CCS. It needs to be told to look at the OTA image.

    Todd
  • All of the snapshots and info I've posted have been from code loaded by the debugger. The OTA doesn't ever get started because the exception happens when trying to compare the image data received over BLE with the image data in external flash.

    As soon as my app receives an OTA request it calls the same ExtFlash_open() function as TI's OAD service. The only real difference between my applications OTA and TI's is that I've switched to SS1 from SS0 for the external flash (we're using SS0 for something else). Besides that everything from the ExtFlash.c driver and down is exactly the same.
  • Another (not so) brief update before I pack it in for the weekend.

    I've been somewhat successful tracing through what the sysbios ROM code does once the SPI transfer pend's, but I'm still not sure how to fix the issue.

    I've traced it pretty deep and see some things that make sense happen, the "RemoteControl_taskFxn" gets suspended, but the "blockedOn" field in ROV is unknown. After that task is suspended the  scheduler runs and activates the ti_sysbios_knl_idle_loop_E. Eventually, after a lot of assembly stepping I get into the ti_sysbios_family_arm_m3_Hwi_dispatch__I(): function. Towards the bottom of this function the PENDSVSET bit is set to 1 by the assembly code, and the ISRPREEMT bit goes active. I step a couple more times and I end up in the exception handler after the the processor tries to execute the instruction at 0x1001C954.  I'm definitely in a little over my head here, but I don't believe the execution is supposed to make it this far. The value at that memory location is actually the address of the NVIC_ICSR, and is just there to be referenced by the instruction at 0x1001C940.

    Here's some more snapshots from stepping through this section of ROM code:

    I'm not sure if this is actually meaningful, because I get the same behavior when stepping through code with the watchdog disabled, that doesn't trip any exceptions during OTA when I let it run. It seems like an interrupt should happen after the PendSV bit is set, maybe the debugger is interfering? I did take a look at some other registers and the DHCSR.C_MASKINTS bit is cleared/0, so I would think that interrupts should still be serviced even when stepping in the debugger. 

    I've looked at far too much assembly today...I'm going to drink some tasty beverages, thanks for the input, hopefully we can resolve this one soon. 

    -Josh M

  • Josh and I looked at this off-line. The problem is that his code also included IntRegister along with using TI-RTOS. Here's why that is a problem.

    The TI-RTOS kernel (SYS/BIOS) manages a vector table. As part of the startup, it copies the vector table into RAM (so it can be modified) and sets Hwi_nvic.VTOR (0xE000ED08 Vector Table Offset Register) to this RAM location.

    Note the placement of the RAM vector table is determined by the below setting (the default for your device is 0x20000000). This is the ti_sysbios_family_arm_m3_Hwi_ramVectors symbol you see in the mapfile

    So life is good and then along comes IntRegister and it does the following check (green comments added).

        //NVIC_VTABLE = 0xE000ED08 which value is now 0x20000000
        // g_pfnRAMVectors is determined by placement of .vtable_ram. Note: in a TI-RTOS based
        // application we don’t have this section
        if(HWREG(NVIC_VTABLE) != (uint32_t)g_pfnRAMVectors)
        {
            //
            // Copy the vector table from the beginning of FLASH to the RAM vector
            // table.
            //
            ui32Value = HWREG(NVIC_VTABLE);
            for(ui32Idx = 0; ui32Idx < NUM_INTERRUPTS; ui32Idx++)
            {
                g_pfnRAMVectors[ui32Idx] = (void (*)(void))HWREG((ui32Idx * 4) +
                                           ui32Value);
            }

            //
            // Point NVIC at the RAM vector table.
            //
            HWREG(NVIC_VTABLE) = (uint32_t)g_pfnRAMVectors;
        }

    So the first time IntRegister is called, it copies the vector table from flash, changes the vector table offset register and then plugs in the new interrupt. This totally messed up the SYS/BIOS Hwi module since you essentially stole the vector table from it , added a vector whose signature and requirements (e.g. which registers to save) don’t match the Hwi module and, most importantly, wiped out any of the vectors the Hwi plugged during runtime.

    Todd