LP-AM243: FreeRTOS task crashes without errors. Stack issue? Or something else?

Tron

Part Number: LP-AM243

I have a relatively comprehensive multi-core project I've been working on for a while. Each core has multiple tasks, and I have a process to automatically coordinate and initialise IPC Notify channels between the cores and using Task Notify from a common IPC notify interrupt function to the necessary task which is waiting for the message. It's the same code shared/linked for each core and used by each task.

The system has been working flawlessly for months, and still does for all of my tasks except, now, one of them.

One task (vPrintBedTask) that was working fine now seems to crash quietly, sometimes when there's an IPC Notify message passed to it, sometimes after that when xTaskNotify is executing. I can send the same message to tasks on other cores, using the same code and they work fine.

If I set a HW breakpoint on the message being passed with IPC Notify or Task Notify and step through, there's no crash - it works. It feels a bit like quantum physics: as soon as I try to observe what's happening down there I impact the nature of it.

When I don't use a breakpoint, I can see in the ROV after this point in the execution, the task is just gone. I suspect that this may have something to do with it, but I've tried to increase the stack size and that's had no impact. I also don't know how to reconcile this char with the ROV stack information, which tells me (immediately before the task disappears) the task stack is mostly free still.

The only other clue I have is this, which is what I can see if I pause execution after the message has been sent and the task has disappeared. It seems to oscillate between vApplicationIdleHook and a timer interrupt that counts every 2.5ns. Exactly what I'd expect, in the absence of any remaining task running.

I've tried to use DebugP_assert(xTaskNotifyWait(0, 0, &message, portMAX_DELAY) == pdTRUE); but that doesn't tell me anything more or have any impact.

I've checked if there's accidentally a 'return' in the task function somewhere, and there's not.

I've tried using a different Launchpad in case it's a hardware issue (which I had wrt the eQEP counter register this week) but that hasn't made a change.

Does anyone have any suggestions on what else I can try/check? I'm stumped by this one.

over 1 year ago

0 Ming Wei over 1 year ago

TI__Mastermind 47195 points

Hi Tron,

One thing caught my eye is the "a timer interrupt that counts every 2.5ns". The AM243x runs at 800Mhz, that is 1.25ns per instruction (not consider the memory delay). If you have a timer interrupt every 2.5ns, it means the ISR for that timer interrupt has to be completed in two instructions which is impossible. Why do you have to have such a high frequency timer interrupt?

Best regards,

Ming

0 Tron over 1 year ago in reply to Ming Wei

Prodigy 170 points

Hi Ming. Thanks for the quick reply and good observation - I meant to write 2.5us. I'm out of PWM modules and I need to step a motor at upto 400Khz.

0 Ming Wei over 1 year ago in reply to Tron

TI__Mastermind 47195 points

Hi Tron,

For a task to disappear, most likely is that the task is somehow running out of its infinite loop (wait for event/resource, then execute task body). One thing I am not clear is that how your task loop is designed? What other events it is waiting for other than the IPC Notification? How do you inform the task that the IPC Notification has arrived?

Best regards,

Ming

0 Tron over 1 year ago in reply to Ming Wei

Prodigy 170 points

Here's the template of the task, and I've added my relevant IPC code. The task is pretty straight forward. There is a lot that happens in each part, but apart from function calls this is everything that involves the task scheduler/interrupts/semaphores*/etc.

(*apart from DebugP_log)

So, the only event it waits for is xTaskNotify, which its self is messaged from an IPC notify handler function that is standard across all cores and tasks of my project.

void fastTimerISR() {
    gFastTickCount++;
}

void vTaskFunction(void *pvParameters) {

    DebugP_log("\r\n***** Waiting for Task Init Flag! *****\r\n\r\n");
    while (!initFlag) {
        vTaskDelay(100 / portTICK_PERIOD_MS);
    }

    DebugP_log("\r\n***** Initialising Task! *****\r\n\r\n");

    // Start the timer for fast ticks
    TimerP_start(gTimerBaseAddr[CONFIG_TIMER0]);

    DebugP_log("\r\n***** Task init complete! *****\r\n\r\n");

    while(1U) {

        /* Wait for the notification from the ISR */
        DebugP_log("[TASK] Waiting for command.\r\n");
        uint32_t message;
        DebugP_assert(xTaskNotifyWait(0, 0, &message, portMAX_DELAY) == pdTRUE);

        // Do stuff based on message

    }

    /* One MUST not return out of a FreeRTOS task instead one MUST call vTaskDelete */
    DebugP_log("[TASK] Task closing.\r\n");
    vTaskDelete(NULL);
}

/* Interrupt function that receives an IPC Notify message and uses xtaskNotify to relay it to the necesary task */
    void ipcCommandHandler(uint32_t remoteCoreId, uint16_t localClientId, uint32_t msgValue, void *args)
    {

        uint32_t i = (uint32_t)args;  // Cast the 'args' parameter back to the appropriate type

        // Send the message value to the actuator task using Task Notify
        BaseType_t xHigherPriorityTaskWoken = pdFALSE;
        xTaskNotifyFromISR(systemcfg[i].task.handle, (uint32_t)msgValue, eSetValueWithOverwrite, &xHigherPriorityTaskWoken);
        portYIELD_FROM_ISR(xHigherPriorityTaskWoken);

        /* Echo the message back as an ack. Server is waiting. */
        IpcNotify_sendMsg(remoteCoreId, localClientId, (uint32_t)msgValue, 1);

        return;
    }

/* Function that makes the IPC Notify transfer to the designated core */
    bool postIPCNotify(uint8_t sysIdx, uint32_t message) {

    /* send message's to all participating core's, wait for message to be put in HW FIFO */
    uint32_t waitForFifoNotFull = 0, status;

    uint32_t channelID = systemcfg[sysIdx].ipc.channelID;
    uint32_t clientID = systemcfg[sysIdx].ipc.clientID;
    Bool ack = FALSE;

     /* no error checks done inside IpcNotify_sendMsg(), so doing here just to show the constraints */
    if (message > IPC_NOTIFY_MSG_VALUE_MAX) {
        DebugP_log("[IPC] ERROR: Message value 0x%08x is greater than IPC_NOTIFY_MSG_VALUE_MAX 0x%08x!!!\r\n", message, IPC_NOTIFY_MSG_VALUE_MAX);
    }
    if (clientID > IPC_NOTIFY_CLIENT_ID_MAX) {
        DebugP_log("[IPC] ERROR: Client ID %d is greater than IPC_NOTIFY_CLIENT_ID_MAX %d!!!\r\n", clientID, IPC_NOTIFY_CLIENT_ID_MAX);
    }

        for (uint8_t messageAttempts = 0; (messageAttempts < policy.ipc.messageAttemptLimit) && !ack; messageAttempts++) {
            DebugP_log("[IPC] Sending %s command with message 0x%08x to core %d on channel %d... ", (char*)systemcfg[sysIdx].name, message, clientID, channelID);
            status = IpcNotify_sendMsg(clientID, channelID, (uint32_t)message, waitForFifoNotFull);
            if(status==SystemP_SUCCESS) {
                DebugP_log("Waiting for acknowledgement response... ");
                status = SemaphoreP_pend((SemaphoreP_Object *)&systemcfg[sysIdx].ipc.doneSem, pdMS_TO_TICKS(policy.ipc.timeToWaitInTicks));
                if(status==SystemP_TIMEOUT) {
                  DebugP_log("Timed out. %d attempts left.\r\n", policy.ipc.messageAttemptLimit - messageAttempts -1);
                } else if (status==SystemP_SUCCESS) {
                    ack = TRUE;
                    DebugP_log("Done!!!\r\n");
                } else {
                    ack = TRUE;
                    DebugP_log("\r\n[Error] Status returned: %d. Dropping message.\r\n", status);
                }
            } else {
              DebugP_log("\r\n[IPC] Message could not be sent since HW or SW FIFO for holding the message is full... ");
              DebugP_log("%d attempts left.\r\n", policy.ipc.messageAttemptLimit - messageAttempts - 1);
            }
        }
        return true;
    }

The thing that has me most confused is that it works perfectly when I step through it. It's only when running normally that it doesn't.

Sometimes the messaging core will report that there was no IPC acknowledgement received (see the code for how I implement an ack over IPC), other times it says it did work and even received a response - but there's no sign of life by that core that anything did happen so while it got to complete the IPC Notify ack event, it didn't get to the point where it completed the xTaskNotifyFromIsr command. These functions are being used by about 10 other tasks across different cores, also, so they do work. It's only this one task that suddenly has a problem.

0 Ming Wei over 1 year ago in reply to Tron

TI__Mastermind 47195 points

Hi Tron,

I did not see anything obviously wrong in the code you sent. Since the code behave differently from time to time. I would focus on the memory related issues such as stack, caching, memory overwritten etc. Since you have already ruled out the stack overflowing, so I would suggest you look at the linker.cmd and the example.syscfg (MPU settings) for the program of the R5F core. Keep in mind, all the IPC related memory regions have to be none cached. The other thing to look for is the memory overwritten. Checking the pointer-based memory write for possible over boundary write.

You can also add a counter in the while loop of vTaskFunction, in the ipcCommandHandler, and in the other core which sends the IPC notify to this core. By comparing the counter values when the error happens, you should be able to tell where the problem is started.

Best regards,

Ming

Because of the holidays, TI E2E™ design support forum responses will be delayed from Dec. 25 through Jan. 2. Thank you for your patience.

Arm-based microcontrollers

Arm-based microcontrollers forum

LP-AM243: FreeRTOS task crashes without errors. Stack issue? Or something else?