AM2632: Crash at MCAN Busoff

Part Number: AM2632


In the event of an MCAN busoff, the following function is called in the transmit interrupt to recover from the busoff.

MCAN_setOpMode(baseAddr, MCAN_OPERATION_MODE_NORMAL);

Two CANs are used. With MCAN0 with baseAddr = 0x52600000, recovery from BusOff works, but with MCAN3 with basAddr = 0x52630000 and a lot of data transfer on the bus, the firmware crashes.

What could be the reason for this?
(SDK 10.00.00 is used)

  • Hi Simon,

    Do you have any information on what data transfers are occurring on the MCAN3 bus when the firmware crashes?

    Can you provide details on the CAN operating parameters?

    I suppose that MCAN0 is not connected to the same bus as MCAN3, can you switch the buses and retry the test?

    If MCAN3 is able to properly resume normal operation on the alternate MCAN0 bus, then we can assume that something is happening on the MCAN3 bus that is causing the issue.

    Best Regards,

    Zackary Fleenor

  • Hi Zackary

    • When the error occurs, many SDO messages are sent via MCAN3 approximately every 5 ms and answered by another bus participant.
    • Communication runs at 500 kbaud. If an additional bus participant with the wrong baud rate (250 kbaud) is connected, the software crashes (stack overflow).
    • MCAN0 and MCAN3 are connected. Significantly less communication takes place on MCAN0. No problem seems to occur there. The buses cannot simply be swapped.

    In the meantime, I have been able to solve part of the problem. Bus recovery is no longer performed in the CAN interrupt itself. Only a flag is set in the interrupt, and then bus recovery is called in the cyclic task. Bus recovery has also been extended. This allows startup to work if the external bus participant with the incorrect baud rate was not reset before startup. However, if it was reset, there is another crash. In this case, the bus looks like this:

    ...

    Best Regards

  • Hey Simon,

    Thank you for the additional information and test results.

    I am having some trouble understanding the end of your last response.

    Can you rehash the setup and flow that results in this latest crash case?

    Best Regards,

    Zackary Fleenor

  • Hi Zachary

    In the meantime, I have made further progress. The program no longer crashes, but the CAN (MCAN3) does not recover after the software was started with an incorrect baud rate.
    The bus error is detected in the CAN interrupt and passed on via a flag. Bus recovery is then performed in the cyclic task when the flag is set.
    Is this bus recovery sequence correct? Are there any examples of this?
    Here is the code:

    In CAN Interrupt:

        MCAN_getProtocolStatus(baseAddr, &statusMCan);
        MCAN_getErrCounters(baseAddr, &errCnt); //testscho17
          if (statusMCan.busOffStatus == 1 ||
              statusMCan.errPassive == 1 ||
              errCnt.transErrLogCnt > 127)
        {
            gCanFault[channel] = true;
        }
    


    In cyclic task Bus Recovery ( if ([channel] == true):

        HwiP_disable();
    
        /* Disable interrupts */
        MCAN_enableIntr(baseAddr, MCAN_INTR_MASK_ALL, 0U);
        MCAN_enableIntrLine(baseAddr, MCAN_INTR_LINE_NUM_1, 0U);
        MCAN_txBufTransIntrEnable(baseAddr, 1U, 0U);
    
        /* Cancel any pending TX */
        HW_WR_REG32(baseAddr + 0x104U, 0xFFFFFFFFU);
        while (HW_RD_REG32(baseAddr + 0x108U) != 0U) { }
    
        /* Clear pending interrupts */
        uint32_t pending = MCAN_getIntrStatus(baseAddr);
        MCAN_clearIntrStatus(baseAddr, pending);
    
        /* Enter SW_INIT */
        MCAN_setOpMode(baseAddr, MCAN_OPERATION_MODE_SW_INIT);
        while (MCAN_getOpMode(baseAddr) != MCAN_OPERATION_MODE_SW_INIT) { }
    
        /* Wait for bus idle (ACT=0) */
        uint32_t psr;
        uint32_t to = 50000U;
    
        do {
            psr = HW_RD_REG32(baseAddr + MCAN_PSR);
            if (--to == 0) break;
        } while ((psr & MCAN_PSR_ACT_MASK) != 0);
    
        /* Reset */
        MCAN_reset(baseAddr);
    
        /* Full reconfiguration */
        MCAN_init(baseAddr, &initParams);
        MCAN_config(baseAddr, &configParams);
        MCAN_extTSEnableIntr(baseAddr, 0);
        MCAN_extTSCounterConfig(baseAddr, MCAN_TS_PRESCALAR);
        MCAN_setBitTime(baseAddr, &bitTimes);
    
        /* Message RAM must be fully restored */
        MCAN_msgRAMConfig(baseAddr, &msgRAMConfigParams);
    
        /* Correct cache handling */
        CacheP_wbInv((void*)baseAddr, 0x8000, CacheP_TYPE_ALLD);
    
        /* Re-install filters */
        MCAN_addStdMsgIDFilter(baseAddr, 0U, &stdFiltElem0);
    
        /* Start normal operation */
        MCAN_setOpMode(baseAddr, MCAN_OPERATION_MODE_NORMAL);
        while (MCAN_getOpMode(baseAddr) != MCAN_OPERATION_MODE_NORMAL) { }
    
        /* Re-enable interrupts */
        MCAN_enableIntr(baseAddr, MCAN_INTR_MASK_ALL, 1U);
        MCAN_enableIntrLine(baseAddr, MCAN_INTR_LINE_NUM_1, 1U);
        MCAN_txBufTransIntrEnable(baseAddr, 1U, 1U);
    
        HwiP_enable();
    

    Thanks,

    Simon

  • Hi Simon,

    Thank you for providing these additional details. I have been swamped this week with various deadlines. I will review this latest information over the weekend and hope to provide a response early next week (11/25/25).

    Best Regards,

    Zackary Fleenor

  • Hi Zackary

    This issue is urgent because we need to build a customer release.
    I also came across the following statement. Can this error also be circumvented with SDK 10.00.00?

    Warning️ Known bug in MCU+ SDK 10.00.00
    In SDK 10.00.00, TI has confirmed a bug in MCAN reset/mode change:
    • If the bus-off interrupt is active and an OpMode change or reset occurs at the same time,
    the CPU may run into an exception or the peripherals may remain stuck in an invalid state.
    • Cause: The MCAN needs time to go through the transition state. If it switches back to INIT mode at this moment, synchronization will fail.
    According to TI, this bug has been fixed in SDK 10.01.x and newer.

    TI describes internally (bug ID: MCAN-506):
    “During bus-off → normal mode transition, MCAN may respond with external abort if interrupts remain enabled.”

    Best regards 
    Simon

  • Hi Zachary

    I now have a solution with a minimal recovery function in the 5ms task, using only MCAN_setOpMode(baseAddr, MCAN_OPERATION_MODE_NORMAL). Now the error no longer occurs.

    Best regards 

    Simon

  • Hi Zachary

    The problem has occurred again. If the PCAN dongle is reset with an incorrect baud rate and the device is then switched on, the firmware no longer responds and crashes.

    Is there an example of how bus recovery should be done correctly?

    Best regards

    Simon

  • Hello Simon,

    Let me review this internally and get back to you.

    Best Regards,

    Zackary Fleenor

  • Hey Simon,

    I am attaching the official Bosch MCAN description on CAN Bus Recovery below:

    Busoff Recovery Handling

    You can also refer to this previous thread regarding the same:

    (+) [FAQ] TMS320F28P559SJ-Q1: The procedure to recover from CAN Bus-Off - C2000 microcontrollers forum - C2000Tm︎ microcontrollers - TI E2E support forums

    I would recommend doing this testing with latest SDK available at this link: MCU-PLUS-SDK-AM263X Software development kit (SDK) | TI.com

    We are happy to continue to assist you through this issue.

    Best Regards,

    Zackary Fleenor