This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CAN failing to bus-off

Other Parts Discussed in Thread: HALCOGEN

Hi,

Using a TMS470M via HALCoGen generated code, I have some basic routines for sending and receiving CAN packets. However, during certain error conditions things aren't going as expected...

If I connect the TMS470M to a 'dead' CAN Bus, then transmit a message, I get a Bus Warning interrupt, but the peripheral continues to retry the transmission continuously without ever entering Bus-Off state. This is confirmed by putting a scope on the line: whereby continual retransmissions are shown without a break in between.

I have enabled Auto Bus On via HALcoGen, but set tAbo to 1 second (1000000000).

Is there any other registers which could override the Bus-off functionality, or is there any registers which can be set to specify a number of retries before giving up and dumping the message?

Although it probably sounds weird transmitting into a dead bus, it's actually a testcase to see how well the device handles (and recovers from) error conditions! In my case the dead network is actually a 2-node network with the second node's transceiver set to passive.

thanks

Mat

  • Hi Mat,

    1) What is the behavior when you disable Auto-Bus-On ?
    2) Inside the BusOff warning interrupt, what is the state of ES register(is BusOff bit set)?
    To stop after a specific number of retries, you can have a counter in the Bus-Off interrupt and set the Init bit once the number is reached. I'm not aware of a register which can be used for this. I will try to take a closer look at this behavior.


    Thanks and Regards,

    Vineeth

  • Hi Vineeth,

    Thanks for the response. In answer to your questions:

    1). Disabling Auto Bus-on has no effect - the message continues to transmit indefinitely
    2). I only get a single Interrupt with ES bit 64 set (bus warning). No further interrupts are produced.

    Mat
  • As an additional testcase, I disconnected my receive CAN device, transmitted a message, then waited 10 minutes before re-connecting the receiver. It immediately received the message.

    My concern is there is nothing (other than a CAN Bus Warning) to indicate an error event or delay has occurred at the transmitter side. It's also a big worry because the transmitter has the capability to shut down an entire CAN bus in this manner.

    thanks

    Mat
  • Hi Mat,

     

    This looks strange. What is the value of the Error count register(DCAN ERRC) after the 10-minute-test you did(or after a normal bus-off)?

    I will try to recreate the problem here with our set-up.

     

    Thanks and Regards,

    Vineeth

     

    PS: Please expect delays in forum responses due to the holiday season.

  • Happy New Year, apologies for the delay in responding!

    Will check the ERRC value tomorrow when I'm back on the hardware. Would be grateful if you could try the problem at your end.

    Thanks,

    Mat

  • Hello Vineeth,

    The moment the EWarn interrupt fires, the ERRC is 96.

    Checking ERRC after this, I see it rapidly rise to 128, but then never get any higher - just sticks at 128.

    This would certainly explain why the Bus-Off event isn't happening, now we need to figure out why the counter sticks at 128!

    Mat
  • Hi Mat,

     

    If you check the CAN spec (pg.63), it says

    Start-up / Wake-up: If during start-up only 1 node is online, and if this node transmits some message, it will get no acknowledgment, detect an error and repeat the message. It can become ’error passive’ but not ’bus off’ due to this reason.”

     

    I think what we are observing is the expected behavior. In your case, the CAN node gets stuck at “Error Passive” state( ERRC=128 ) since the node is unable to detect any other node in the bus. The node is supposed to try and resend the messages until it receives an ACK(i.e. when a new node is added to the bus). Still, I haven’t seen any documentation which clarifies this point.

     

    Do you agree with this? Do you have any other findings?

     

    Thanks and Regards,

    Vineeth

  • Hi Vineeth,

    This is a good find, and makes sense. However, can I ask where you found the Error Passive State to be ERRC=128?

    I'm happy to accept this as a valid error state, but need to clarify how to detect it in software, i.e. which registers can be read to detect the problem. I can't see anywhere in the RM where it says ERRC=128 is anything other than just an incrementing count.

    thanks

    Mat

  • Hi Mat,

     

    The ERRC value of 128 for ‘error Passive” state is mentioned in the CAN spec itself (pg 62: Point no. 9).

    It says : “A node is ‘error passive’ when the TRANSMIT ERROR COUNT equals or exceeds 128, or when the RECEIVE ERROR COUNT equals or exceeds 128.”

     

    I’m not sure if there is a way to detect the “error passive” state other than to read the ERRC value and check if it is >= 128. I’ll check if there is a better way to do this.

     

    Thanks and Regards,

    Vineeth

  • Hi Mat,

     

    The ‘error passive’ state can be detected by checking the Error and Status Register(DCAN ES). The EPass bit (bit 5) will be set when the node moves to this state. Please check if this works. You can refer the TRM for more info on this(DCAN > Control Registers > ES register).

     

    Thanks and Regards,

    Vineeth

  • Hi Vineeth,

    I can confirm that the EPass bit does get set in the ES register upon this fault occurring. Just for reference, however, it's worth noting that the setting of this bit doesn't trigger an error interrupt, hence to correctly read it, you need to:

    1). Get a Bus Warning interrupt

    2). Wait a while (long enough for ERRC to go from 64 to 128)

    3). Read the ES register

    So, in this respect, you have answered my question: thanks!

    I'm afraid I have one more question, if I may?...

    Once this error condition is correctly detected, my next task is to reset the DCAN peripheral to recover both the bus and the device for continued usage. However, this is proving to be not trivial. 

    My initial attempts were to hit the  SWR reset bit In DCAN CTL. This was done by: CTL->Init bit = 1; CTL->SWR bit = 1; CTL->Init bit = 0. Follow on from that, I'm also calling the canInit() HALCoGen routine, which looks to initialise everything including the message boxes. However either solution is clearing this error state. After reset ERRC is still set to 128 and the ES still reports EWarn and EPass.

    Any ideas how to reset the peripheral?

    Mat

     

     

  • Hi Mat,

     

    When you call canInit(), it resets the CAN controller completely. But I don’t think it resets the message boxes. This means the TxRqst bit of the transmit message should still be set. So as soon as the CAN is out of INIT state, it will again start trying to resend the message, and again get stuck at the EPass state(since there is no other node in the bus). Can you check the value of Transmission Request X Register(DCAN TXRQ X)?(Alternatively, you can call canIsTxMessagePending() for this). After a “full” reset, this register should be 0. If this register is not 0, then you will have to manually clear the TxRqst bit of the configured message box(there is no API for this). This is my guess about what is happening. Please check if this was the problem.

     

    I have some questions about how you are waiting for the Epass bit. Are you waiting inside the EWarn interrupt? Have you taken into account the possibility that the node can move out of EWarn state to normal state without hitting EPass(or will it wait indefinitely for EPass bit)?

     

    Thanks and Regards,

    Vineeth

     

  • Thank Vineeth,

    The TXRQ X register goes from 1 before reset to 0 after reset, so I'm afraid its not this.

    With regards your other question: my ISR just raises a software flag which is later processed in the main loop: this is actually why I discovered the wait period, because the time between EWarn ISR and me actually reading the registers is a little variable, hence I've had to re-interrupt using my console to effectively read the ES EPass error.

    Mat
  • Hi Mat,

     

    This is interesting. But I’m not convinced that the TxRqst bit of the messages gets reset in the message boxes. Maybe there is a coherency problem between the register and the message after a reset. I’m not sure. I noticed the following note in the TRM in the description for MsgVal bit.

    Note: The CPU should reset the MsgVal bit of all unused Messages Objects during the initialization before it resets bit Init in the CAN Control Register. MsgVal must also be reset if the messages object is no longer used in operation.

    Can you try this. Clear the MsgVal bit(or the TxRqst bit) of the active messages before you do a SW reset of the DCAN. This will get set again the next time you try to send a message. This should solve the resending problem.

    I sorry; it will take some time for me to get the whole setup. So i haven't been able to try it myself.

     

    Thanks and Regards,

    Vineeth

  • I've read the IF1STAT BUSY bit before and after reset - it's set to 0 on both occasions, so the message box looks like it's cleared and the actual peripheral has taken over. The IF1CMD bits are set to 0xB7 before reset, then 0x87 after reset. Hence they are showing continued activity: with TxRqst/NewDat flag being set, as is Data A/B.

    However, it's one thing being able to see this, but it's another to actually reset it: most of these registers cannot be over-written. I've tried reading them to clear (I've even tried forcing a 0 write), to no avail. I've done various sets/resets of registers on both sides of the reset, but nothing is clearing the ERRC/ES:


    if (DCAN1->ES & 0x20) { //read the values for debugging a = DCAN1->EERC; cliWriteStr((int8_t*)"CAN1: Error Passive, resetting..."); DCAN1->IF1CMD = DCAN1->IF1CMD; DCAN1->IF1MCTL = 0; DCAN1->TXRQX = DCAN1->TXRQX; DCAN1->NWDATX = DCAN1->NWDATX; DCAN1->MSGVALX = DCAN1->MSGVALX; //peripheral reset DCAN1->CTL &= 0x01; //put the device into initialisation mode DCAN1->CTL &= 0x8000; //perform a software reset DCAN1->CTL &= ~(0x01); //clear the initialisation mode //full re-init canInit();//; //read the values for debugging a = DCAN1->EERC; //still shows 128 b = DCAN1->ES; //still shows EWarn and EPass b = DCAN1->TXRQX; //shows 0 }

     

  • Hi Mat,

    The TXRQX register, NWDATX register and MSGVALX register are read-only registers. They read and show the value of the actual message objects in the message RAM.  Only the Interface registers (IF1 and IF2) can write back to the message RAM. In order to clear the TxRqst bit in a message object, you will have to write to the message object using the IF register as if you were writing to the message RAM for a new transmission(but with TxRqst bit value as 0). You can refer to the HALCoGen canTrasmit() API. The code to clear the TxRqst bit should look(i haven't tested it myself) like the following:

    while((node->IF1STAT & 0x80U) == 0x80U); //wait till IF1 is  not busy

    node->IF1MCTL &= 0xFFFFFEFFU; //clearing TxRqst bit

    node->IF1CMD = (0x90U); //setting the IF register to write to the message-box

    node->IF1NO = theMessageBoxNumber ; //writing to the required message-box in the message RAM

    Please see if this works.

    Thanks and regards,

    Vineeth

    PS: I may not be available during the weekend.

  • Hi Mat,

     

    I tried the experiment here with my setup. I observed the following.

    • ·        When the CAN node tries to transmit and there is no other node in the bus, it will move to the EPass state. In this state, it continues to retransmit the message until an acknowledgement is received (i.e. another node enters the bus)
    • ·        In the EPass state, if canInit() is called, it re-initialises the message-boxes (only the ones that are activated in the GUI) . No further transmission happens until canTransmit() is called again.
    • ·        In the EPass state, if SW reset is performed (1. Set init bit. 2. Set SW bit. 3. Clear init bit), all the TxRqst flags are reset (not just in the registers) . No further transmission happens until canInit() and canTransmit() is called again.

     

    I don’t know why SW reset didn’t work. I’ll check more on why this didn’t work for you.

    For canInit(), I think you may have not activated that particular message-box in the GUI. Can you check this?

     

    Thanks and Regards,

    Vineeth

  • Hi Mat,

     

    I just noticed that you weren’t doing SW reset correctly. It should be “|” and not “&” as shown below.

    //peripheral reset    

    DCAN1->CTL |= 0x01;      //put the device into initialisation mode    

    DCAN1->CTL |= 0x8000;        //perform a software reset    

    DCAN1->CTL &= ~(0x01);   //clear the initialisation mode

     

    Thanks and Regards,

    Vineeth

  • ha ha ha, it's always the simple ones which get you in the end!

    Many Thanks Vineeth: now have CAN error detection and correction functioning as expected!

    Mat

  • That's great. Glad I could be of help. :)


    I'd still like to know why canInit() wasn't clearing the message-boxes for you. Were you able to get it working with just canInit() or did you have to use SW reset?
    Regards,
    Vineeth
  • Needed both peripheral reset and canInit() to get things up and running.
  • For me, canInit() was enough to re-initialize all configured mail-boxes. I still don't know why it isn't working for you. Anyway, this was a good thread from the beginning. Learned a lot many things.

    Thanks and Regards,

    Vineeth