This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28379D: f2837xD - CANbus error recovery

Part Number: TMS320F28379D

Hello,

I'm looking at how I might best recover from a CANbus error situation.

I currently have two  f2837xD devices communicating over CANbus and under certain failure modes, the two devices end up trying to send and receive CAN messages with the same idents, this inevitably causes CAN bus errors and one of the devices locks up when the error counter exceeds the threshold. I can detect this once the errors exceed the threshold and create an interrupt, but by then from looking at the manual it seems I need to perform a CPU reset to recover the situation and clear the flags, which is a bit drastic.  Below is how we trap errors currently:

if(statusB == CAN_INT_INT0ID_STATUS) //an interrupt is pending
{

status_inner = CANStatusGet(CANB_BASE, CAN_STS_CONTROL);

if(((status_inner & ~(CAN_ES_RXOK)) != 7) &&
((status_inner & ~(CAN_ES_RXOK)) != 0))
{

errorFlag = 1;
}

i.e by the time errorFlag goes to 1 it is too late to do anything about it and the CANport is locked up.

I think this code snippet was based on code a colleague of mine was given or shown by the TI team some time ago.

I can't see an easy way to detect the bus errors arising and say clear the error register whilst taking action to stop the channel from transmitting (that counter also needs a CPU reset it seems).

Are there any examples or suggestions for handling CANbus errors and being able to halt the bus before retrying and/or resetting the CAN port without having to reset the whole CPU?

regards

Steve

  • the two devices end up trying to send and receive CAN messages with the same idents,

    First of all, this is a big no-no. You cannot have two nodes on the bus transmitting the same ID under any circumstances. 

    it seems I need to perform a CPU reset to recover the situation and clear the flags, which is a bit drastic. 

    There is no need to reset the CPU to recover from a bus-off. You could either recover automatically or manually by way of setting/clearing the Init bit. In your case, you cannot auto-recover because the nodes will try to retransmit identical IDs again and the same cycle would repeat again. Please point me to the section in the manual that mandates a device reset to come out of bus-off.

    Bus-off is a very severe error condition. In a properly designed/configured network, communication errors should be rare. It could happen due to external noise, but the bus should recover on its own once the disturbances vanish. That is how the protocol is designed. Common reasons for errors during communication are mismatched bit-rates between nodes and electrical noise.

  • Hi Hareesh, thanks for the prompt response. Yes, I know having two CAN id's is a no-no, it shouldn't happen, but we have identified a failure mode by which it might and we will endeavour to ensure it is fixed. Accepting that a failure is possible, the issue was how to react after the event, obviously we would sort the message id's before restarting, but it wasn't obvious to me as to how to clear the errors. We don't have issues other than this self generated silliness with the bus ids, its very reliable normally and I have stress tested it for several hours without problems.


    The bit I read in the manual relating to the CPU reset is below. I interpret the SysRn as being a CPU reset?:

    So you say if I just reinitialize the CAN peripheral it will sort itself out? 

    regards

    Steve

  • I interpret the SysRn as being a CPU reset?:

    True, but what it really means is that that bit will be reset to its reset value by a CPU reset. For example, LEC == 7 after a device reset.

    So you say if I just reinitialize the CAN peripheral it will sort itself out? 

    Correct. You can come out of BO by clearing the Init bit (after waiting for 129 x 11recessive bit times)