TMS320F28379D: f2837xD - CANbus error recovery

Steve McDonald

Part Number: TMS320F28379D

Hello,

I'm looking at how I might best recover from a CANbus error situation.

I currently have two f2837xD devices communicating over CANbus and under certain failure modes, the two devices end up trying to send and receive CAN messages with the same idents, this inevitably causes CAN bus errors and one of the devices locks up when the error counter exceeds the threshold. I can detect this once the errors exceed the threshold and create an interrupt, but by then from looking at the manual it seems I need to perform a CPU reset to recover the situation and clear the flags, which is a bit drastic. Below is how we trap errors currently:

if(statusB == CAN_INT_INT0ID_STATUS) //an interrupt is pending
{

status_inner = CANStatusGet(CANB_BASE, CAN_STS_CONTROL);

if(((status_inner & ~(CAN_ES_RXOK)) != 7) &&
((status_inner & ~(CAN_ES_RXOK)) != 0))
{

errorFlag = 1;
}

i.e by the time errorFlag goes to 1 it is too late to do anything about it and the CANport is locked up.

I think this code snippet was based on code a colleague of mine was given or shown by the TI team some time ago.

I can't see an easy way to detect the bus errors arising and say clear the error register whilst taking action to stop the channel from transmitting (that counter also needs a CPU reset it seems).

Are there any examples or suggestions for handling CANbus errors and being able to halt the bus before retrying and/or resetting the CAN port without having to reset the whole CPU?

regards

Steve

over 1 year ago

0 Hareesh Janakiraman over 1 year ago

TI__Guru* 95315 points

Steve McDonald said:
the two devices end up trying to send and receive CAN messages with the same idents,

First of all, this is a big no-no. You cannot have two nodes on the bus transmitting the same ID under any circumstances.

Steve McDonald said:
it seems I need to perform a CPU reset to recover the situation and clear the flags, which is a bit drastic.

There is no need to reset the CPU to recover from a bus-off. You could either recover automatically or manually by way of setting/clearing the Init bit. In your case, you cannot auto-recover because the nodes will try to retransmit identical IDs again and the same cycle would repeat again. Please point me to the section in the manual that mandates a device reset to come out of bus-off.

Bus-off is a very severe error condition. In a properly designed/configured network, communication errors should be rare. It could happen due to external noise, but the bus should recover on its own once the disturbances vanish. That is how the protocol is designed. Common reasons for errors during communication are mismatched bit-rates between nodes and electrical noise.

0 Steve McDonald over 1 year ago in reply to Hareesh Janakiraman

Intellectual 475 points

Hi Hareesh, thanks for the prompt response. Yes, I know having two CAN id's is a no-no, it shouldn't happen, but we have identified a failure mode by which it might and we will endeavour to ensure it is fixed. Accepting that a failure is possible, the issue was how to react after the event, obviously we would sort the message id's before restarting, but it wasn't obvious to me as to how to clear the errors. We don't have issues other than this self generated silliness with the bus ids, its very reliable normally and I have stress tested it for several hours without problems.

The bit I read in the manual relating to the CPU reset is below. I interpret the SysRn as being a CPU reset?:

So you say if I just reinitialize the CAN peripheral it will sort itself out?

regards

Steve

0 Hareesh Janakiraman over 1 year ago in reply to Steve McDonald

TI__Guru* 95315 points

Steve McDonald said:
I interpret the SysRn as being a CPU reset?:

True, but what it really means is that that bit will be reset to its reset value by a CPU reset. For example, LEC == 7 after a device reset.

Steve McDonald said:
So you say if I just reinitialize the CAN peripheral it will sort itself out?

Correct. You can come out of BO by clearing the Init bit (after waiting for 129 x 11recessive bit times)

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28379D: f2837xD - CANbus error recovery