This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28335: CAN - Getting stuck on a bus off interrupt.

Part Number: TMS320F28335

Hi TI Team,

I am having an issue with my CAN bus when a Bus Off interrupt fires and we get stuck. CAN communications at this point die and I can only recover with a processor reset.

Our Setup:

I am using a TI 28335. I have followed a configuration sequence following the SPRU 074F document section 3.2 as follows:

-        Set GPIOs for CAN B

-        Software reset of the device

-        Set hardware for ECAN

-        Enabled Auto Bus On mode

-        Configure eCAN RX and TX pins

-        Configure clock for 500k

-        Configure transmit and receive mailboxes

-        Clear global interrupt flags

-        Setup interrupts

  • RX mailboxes are set to create interrupts on message arrival
  • TX mailboxes create interrupts on timeout

-        Then away we go

Communications seems to operate fine most of the time. While we were submitting our equipment to EMI noise this creates so many errors on the bus that it goes into bus off mode. Once this interrupt fires we stop receiving RX interrupts while another device is actually sending messages but not getting responses.

This is my ISR code, not sure what I am missing.

void ECAN0INTB(void)
{
    struct ECAN_REGS ECanbShadow;
    
#ifdef DEBUG_GPIO
    GPIOManager_setDebugGpioState((DebugGpioIdType)DebugGpio_CAN_TX_ISR, On);
#endif
    
    canOutgoingIsrCounter++;

    //-----------------------------------------------------------
    // Clearing CANGIF registers - SPRUEU1 document section 3.4.3.3 Step 1a and 1b-i to 1b-iv
    // There are a couple of interrupt signals what requires special processing like transmit timeout or bus off.
    // Then the flags are cleared by writing 1s to the bit signals.  This is done by copying the content
    // of the register to the shadow here at the top, then after the special processing of signals of interest
    // we write the content back to the register which in turns clears any bit set.
    //-----------------------------------------------------------
    // Copy CAN control register to a shadow variable in a single 32-bit wide memory access
    ECanbShadow.CANGIF0.all = ECanbRegs.CANGIF0.all;
    ECanbShadow.CANMC.all   = ECanbRegs.CANMC.all;

    // In the event the message time stamp counter overflowed clear the indication bit.
    // This should never happen as the time stamp counter is reset to 0 before every message
    // transmit operation used to detect transmit timeouts.
    if (ECanbShadow.CANGIF0.bit.TCOF0 == 1)
    {
        canTimestampOverflowCount++;

        // This indicates that the time stamp counter MSB got set
        ECanbShadow.CANGIF0.bit.TCOF0 = 1;
        ECanbShadow.CANMC.bit.TCC = 1;
    }


    // Handle Bus Off Mode - if Tx or Rx error counters ever exceed 255 the device may go bus off mode.
    // Here we handle it directly and return to Bus On, however this should auto recover as we set up the
    // device for auto bus on mode.
    if (ECanbShadow.CANGIF0.bit.BOIF0 == 1)
    {
        canBusOffModeCount++;

        // Clear CAN device bus off interrupt
        ECanbShadow.CANGIF0.bit.BOIF0 = 1;

        // Toggle CCR bit to return device to normal mode.
        CCR_Enable();
        CCR_Disable();
    }

    // Handle Transmit message timeout
    if (ECanbShadow.CANGIF0.bit.MTOF0 == 1)
    {
        canTxMailboxTimeoutCount++;

        // Set timeout flat to allow application to abort any outgoing response in progress.
        transmitTimedOut = true;

        // Clear Timeout bits
        ECanbShadow.CANTOS.all = ECanbRegs.CANTOS.all;
        ECanbRegs.CANTOS.all   = ECanbShadow.CANTOS.all; // Write the same value we read above out to the register.
                                                         // The bit that was set which got us in this ISR in the first
                                                         // will be written back to 1 which clear the bit
        // Force out of Bus Off mode
        CCR_Enable();
        CCR_Disable();

        EALLOW;
        ECanbRegs.CANMC.all = ECanbShadow.CANMC.all;
        ECanbRegs.CANTSC = 0;   // clear the running counter to make sure we do not get stuck in the ISR
        EDIS;
    }

    // Clear CANGIF register - write out the original content to clear the bits.
    ECanbRegs.CANGIF0.all = ECanbShadow.CANGIF0.all;
    EALLOW;
    ECanbRegs.CANMC.all = ECanbShadow.CANMC.all;
    EDIS;

     Clear & Re-Enable CPU Interrupts - SPRUEU1 document section 3.4.3.3 Step 2 to 4
    PieCtrlRegs.PIEACK.bit.ACK9 = 1; // Enable PIE to drive a pulse into the CPU
    IER |= M_INT9;                   // Re-enable the CAN interrupt INT9
    EINT;                            // Re-enable CPU Interrupts
    
#ifdef DEBUG_GPIO
    GPIOManager_setDebugGpioState((DebugGpioIdType)DebugGpio_CAN_TX_ISR, Off);
#endif
}

  • I presume you make this assessment after the EMI disturbance is stopped and after allowing the time required to recover from bus-off (128 x 1  recessive bit-times). 

    1. How do you ascertain that the node has come out of bus-off?
    2. Is the behavior any different if you make ABO = 0 and you recover from BO manually?
    3. Excessive noise can cause a soft error (say, by causing a bit to flip) that can be recovered by reset. Does any other peripheral on the device exhibit any anomaly or is it only the CAN module that is misbehaving? 

    This doesn’t change the direction of this discussion but the document you need to be referring to is http://www.ti.com/lit/sprUI07

  • On a slightly different note, we have published a new Application report http://www.ti.com/lit/spracq3 which provides a tool to introduce error at a chosen bit location. Please explore if this tool would help you generate a bus-off condition in a controlled fashion (as opposed to EMI disturbance, whose impact on the CAN module is not precisely regulated).

     

  • This is great news. I was looking for something like this. When time allows I will try it.  Based on a quick scan of the document it appears that it is applicable to newer devices with the DCAN.  We are using the 28338 with the eCAN.  Will this work for the 28338?

    We use a CAN bus analyzer that only allows us to monitor the bus traffic and inject bit stuffing errors.  But no way to create other errors to validate the error handling code.  The bus analyzer manufacturer has a more sophisticated tool to do this but it is outrageously expensive many thousands of dollars.  My boss in his infinite wisdom said these are just 2 wires how difficult can it be to create nose, just put a paper clip between pins and done, now I can recreate mailbox timeouts and bus off mode.  I will call the analyzer tool company and sell them a few paper clips for a few thousands.  LOL. 

  • After been able to force the application to go through the ISR to handle the error condition I was able to insert a few counters here and there to figure out what was happening.

    Error 1 in our setup:

    • I set the Auto Bus On bit at the beginning the initialization
    • Did not realize that it was been overwritten on a later function call therefore disabling the ABO mode.  DUH!

    Error 2 in our handling of the Bus Off Mode:

    • To clear this we are to set bit BOIF0 to 1
    • Then toggle CCR bit, however I only set the bit in the shadow buffer before calling the CCR toggle
        // Handle Bus Off Mode - if Tx or Rx error counters ever exceed 255 the device may go bus off mode.
        // Here we handle it directly and return to Bus On, however this should auto recover as we set up the
        // device for auto bus on mode.
        if (ECanbShadow.CANGIF0.bit.BOIF0 == 1)
        {
            canBusOffModeCount++;
    
            // Clear CAN device bus off interrupt
            ECanbShadow.CANGIF0.bit.BOIF0 = 1;
            ECanbRegs.CANGIF0.all = ECanbShadow.CANGIF0.all;   // THIS WAS MISSING ---->>>> BUG!
    
            // Toggle CCR bit to return device to normal mode.
            CCR_Enable();
            CCR_Disable();
        }
    

    The combination of these 2 mistakes (Auto Bus On never ON plus us not handling the Bus Off ISR correctly) is causing our device to just go bus off unable to recover.

  • Based on a quick scan of the document it appears that it is applicable to newer devices with the DCAN. We are using the 28338 with the eCAN. Will this work for the 28338?

    This tool itself indeed runs only on devices with DCAN. FYI, you can buy the F280049 Launchpad for $30 https://www.ti.com/tool/LAUNCHXL-F280049C. If you have LabVIEW, we have included that version as well. Once you have our tool running (either on F280049 or LabVIEW), you can use it to analyze the error behavior of any MCU with CAN (including the 28335 you are using). Please do try our tool and provide your feedback.