CAN bus: Arbitration / Collision detection & handling

Felix Hofer

Hello together

I'm trying now for weeks to get a CAN bus working with Tiva C launchpads and TI SN65HVD232DR CAN Transceivers.

My Problem is that the Thing with collision detection & handling does not work (correctly).
With two devices on the bus I was able to get it halfway done with an acknowledge message to the Sender for every CAN message received, so the Sender knows when the message is received and then can send the next message.

I've done this that way since when I send the next CAN message right after the Tx complete Interrupt, this is too fast and Messages get lost.
But unfortunately this doesn't work right anymore, no idea why. And when adding a 3rd bus participant everything is finally messed up...

Any ideas what's my Problem or what I can do? I'm using the TivaWare from before 2-3 months.

Thank you very much!

over 11 years ago

0 Wade Whitehorn over 11 years ago

Prodigy 120 points

You should be able to send a message and immediately after setup and send another message without waiting for an acknowledge. The hardware will acknowledge receipt of the messages for you.

While I was reading you post I had an idea that you problem is actually in the bus termination on each end of the CAN bus. You do not have to use the standard value termination, you will get away with say 470R on both sides.

Regards.

0 Felix Hofer over 11 years ago in reply to Wade Whitehorn

Prodigy 60 points

Hello

Thank you for your reply!

"You should be able to send a message and immediately after setup and send another message without waiting for an acknowledge."

Yes, that's what I thought but what doesn't work. As soon as there is a collision Messages are lost.
With manually acknowledging it works, but only as Long as there is only one bus participant and therefore no collisions occur.
No matter which message ID is sent. The sending retry is enabled, I checked that right before.

"You do not have to use the standard value termination, you will get away with say 470R on both sides."

That I don't understand. At the Moment I use 120 Ohm as in the Standard and as TI too describes.
But should it matter at all? since the cable is less than 30cm and distance between the devices therefore ~15cm...

What I Need to mention: I have a message object (e.g. CmdTx or DataTx) for which I only Change data pointer/size and then set it with TivaWare (CanMessageSet()).
In the uC's data sheet I read something which I don't really understand. The most important seems to me the last sentence but I don't really now what it tells me to do.

Maybe I'll try to just set all structure items do the initially set values before sending the message with new data.
> Edit: does not work...

Or do I have to clear the message always and call CanMessageSet() anew?

"The transmission of message objects is under the control of the software that is managing the CAN
hardware. Message objects can be used for one-time data transfers or can be permanent message
objects used to respond in a more periodic manner. Permanent message objects have all arbitration
and control set up, and only the data bytes are updated. At the start of transmission, the appropriate
TXRQST bit in the CAN Transmission Request n (CANTXRQn) register and the NEWDAT bit in the
CAN New Data n (CANNWDAn) register are set. If several transmit messages are assigned to the
same message object (when the number of message objects is not sufficient), the whole message
object has to be configured before the transmission of this message is requested."

A bit further something is written about updating a message object but they only talk about Registers which I don't use directly with TivaWare...

P.S. I don't think that I have to use "CANBitTimingSet()"?

Thank you

0 Robert Adsett over 11 years ago in reply to Felix Hofer

Guru 27665 points

Felix Hofer said:
since the cable is less than 30cm and distance between the devices therefore ~15cm...

How do you get 15cm between devices with 30cm of cable? Typo?

CAN is remarkably robust but termination issues can lead to odd results.

One other question. I presume you don't have two nodes transmitting the same ID? That will work only under limited circumstances. One node transmitting the same ID with different data should work but it is quite possible (and designed to behave so) to drop data.

Robert

0 Felix Hofer over 11 years ago in reply to Robert Adsett

Prodigy 60 points

Hello. No, no typo. I meant the cable is about 30cm and the distance between the connectors is about the half with three connectors.
Have a look: https://onedrive.live.com/redir?resid=31AB7986C421C2AE!8585&authkey=!ANEvKEDxyXZHYAc&v=3&ithint=photo%2c.jpg

"I presume you don't have two nodes transmitting the same ID?"

Yes and now....

Example1:
I had one node that just wanted to transmit data for bandwidth test (Data Msg ID, Data Tx Object ID).
After Tx complete Interrupt I wanted to send the next data message. This resulted in heavy message lost on the Receiver node.

Example 2:
Atm I have CmdMsgId, DataMsgId, DataAckMsgId (for ack data and command message reception) because of the above).
Further I've got CmdTxObjId/CmdRxObjId, DataTxObjID/DataRxObjId and DataAckTxObjID/DataAckRxObj ID where Rx and Tx Msg Objects have the same Msg ID.

Ok here the host (PC) sends a command over USB to the master, e.g. "Tell me who's there". The master then sets this command on the CAN bus (CmdMsg).
The slaves then acknowledge data reception with the ack msg (same message/object id, same data. but if here one ack goes lost this wouldn't matter in any way). The master just Needs an ack to know he can send further data.
After this ack is set on the bus by the slaves, I have a delay because the slaves then respond after that with their device info.
This again does not work (without the delay) because there would be collision between the ack message set and the Response to the command (which is a command message, so other msg/obj id than the ack msg).
And the further Problem which is still unsolved too and works with one slave but not with two: both slave have a delay after they send the ack to not have a collision between it and the following command Response.
But both slaves then respond "to the same time" to the command of the pc. they in fact use the same msg/obj id but the device identification is different.
i'm not quite sure how it really is but as you say, if the data is different this shouldn't be a Problem either. but it is! most time I get only one Response on master Level, sometimes even None...

Example 3: Just to clarify, this is already included in 2.
Setting different msgs with different data on the bus right after each other (cmd rx ack and then command Response) results in message loss too.

thx & regards

0 Robert Adsett over 11 years ago in reply to Felix Hofer

Guru 27665 points

Felix Hofer said:

Hello. No, no typo. I meant the cable is about 30cm and the distance between the connectors is about the half with three connectors.
Have a look: https://onedrive.live.com/redir?resid=31AB7986C421C2AE!8585&authkey=!ANEvKEDxyXZHYAc&v=3&ithint=photo%2c.jpg

OK, three devices not two that clarifies things. I presume you have terminators on the left and right units and not the centre unit? I can't tell for sure but it does not look like there is much of a stub on the centre unit. The use of ribbon cable is a bit worrisome but will probably work on the bench. You probably should 'scope the signal to make sure it is clean.

Felix Hofer said:
I had one node that just wanted to transmit data for bandwidth test (Data Msg ID, Data Tx Object ID).
After Tx complete Interrupt I wanted to send the next data message. This resulted in heavy message lost on the Receiver node.

OK, not necessarily suprising depending on a host of unmentioned factors.

Felix Hofer said:

Example 2:
Atm I have CmdMsgId, DataMsgId, DataAckMsgId (for ack data and command message reception) because of the above).
Further I've got CmdTxObjId/CmdRxObjId, DataTxObjID/DataRxObjId and DataAckTxObjID/DataAckRxObj ID where Rx and Tx Msg Objects have the same Msg ID.

Ok here the host (PC) sends a command over USB to the master, e.g. "Tell me who's there". The master then sets this command on the CAN bus (CmdMsg).
The slaves then acknowledge data reception with the ack msg (same message/object id, same data. but if here one ack goes lost this wouldn't matter in any way). The master just Needs an ack to know he can send further data.
After this ack is set on the bus by the slaves, I have a delay because the slaves then respond after that with their device info.

That method of acknowledgement is very risky on CAN. You should never send a CAN message containing data from two different transceivers with the same ID.

Felix Hofer said:
i'm not quite sure how it really is but as you say, if the data is different this shouldn't be a Problem either. but it is! most time I get only one Response on master Level, sometimes even None...

Quite the contrary. Different data with the same message ID is a very big problem.

My approach would to

first go back to your example 1 and determine how closely together you can send messages without losing any. If it is slow that problem must first be solved. At high speeds you may just be running into issues with receive efficiency (maybe you need a FIFO for receive). I'd also 'scope your bus lines, a signal quality issue would not be surprising.
Second, ditch the acknowledgement scheme. If you need an acknowledgement you need to come up with a scheme with unique message IDs from each board that acknowledges.

Robert

0 Felix Hofer over 11 years ago in reply to Robert Adsett

Prodigy 60 points

Robert Adsett said:
OK, three devices not two that clarifies things.

The aim is multiple devices but I test with 2 since it doesn't work with 3. As described...
I've got two time 120 ohm. with 3 devices on the outer ones, with 2 devices on them both.
But I guess reflections because of missing terminations wouldn't matter anyway with such a small Network, such short distances and only 0.75 MHz (reduced from 1 MBit for testing for differences).

Robert Adsett said:
I can't tell for sure but it does not look like there is much of a stub on the centre unit.

I don't understand exactly. On the left unit the resistor is fix connected at t he black lines. on the Center device there are open Pins and on the right device you see the Jumper there.

Robert Adsett said:
You probably should 'scope the signal to make sure it is clean.

I will if there is any remaining time. on friday 13th I have to give off my work. It is a work for School an I still have to write a Report and do some other stuff like clean timeouts and error case handling

Robert Adsett said:
OK, not necessarily suprising depending on a host of unmentioned factors.

???

Robert Adsett said:
That method of acknowledgement is very risky on CAN. You should never send a CAN message containing data from two different transceivers with the same ID.

This may be. But this isn't the problem's source. I only did this because otherwise it was impossible to send Messages right after each other without using a Long enough delay which would be very ugly.

Robert Adsett said:
Quite the contrary. Different data with the same message ID is a very big problem.

Hm I didn't know that when I made the concept and I thought my person in Charge told to use a message id for data and a message id for commands.

anyway... luckily, this isn't the problem's source either and can be fixed by delaying the device info Response differently on the various slaves.
And normally only one device is sending commands/Responses on the same bus anyway or otherwise loss doesn't matter if at least one acknowledge is received.

Robert Adsett said:
first go back to your example 1 and determine how closely together you can send messages without losing any. If it is slow that problem must first be solved. At high speeds you may just be running into issues with receive efficiency (maybe you need a FIFO for receive). I'd also 'scope your bus lines, a signal quality issue would not be surprising.

in the bandwidth test where I had to introduce this acknowledge message I managed 150 KB in 5.74s with 1 MBit that way from the host over the master by CAN to the slave..
Also there are no Transfer Errors (or at least they are detected and corrected). The data consistency on the receive side is veryfied.

Robert Adsett said:
Second, ditch the acknowledgement scheme. If you need an acknowledgement you need to come up with a scheme with unique message IDs from each board that acknowledges.

In facte I didn't do this from the beginning as I hope should be clear from my posting. I introduced this when I saw no other possibilities anymore to prevent message loss.
I'd be glad if I could drop this sh*t and the collision detection and correction would work as I excpected once...

So you tell me, basically I'm doing it right with the TivaWare and this should work? Then if you don't know and others don't know how shall I know?
I'm still a Student, I'm not experienced with CAN and with this Tiva devices and libraries. So then I have to say after that many spent days I have to drop this and explain in the Report why it possibly didn't work out.

But I hoped for a quick solution because I did something small wrong with the high Level API or missed something...

What about CANBitTimingSet()? I guess I don't have to call this explicitly if I don't want to Change it or do I?

regards

0 Robert Adsett over 11 years ago in reply to Felix Hofer

Guru 27665 points

Ah, homework. Well that won't make me less Socratic.

That speed is reaching high speed operation for CAN. With problems at this speed you really do need to get a 'scope on it and verify you have a good signal.

Wiring should be

|===============================================|
||
||

Where the vertical bars at the end are the resistive terminators. The short vertical section marked by the || is the wiring from the main bus to the middle board (called a stub). This needs to be short and unterminated.

Felix Hofer said:
But this isn't the problem's source

It describes what you are seeing in example 2 pretty well. How did you determine that it isn't a source of problems?

Felix Hofer said:
in the bandwidth test where I had to introduce this acknowledge message I managed 150 KB in 5.74s with 1 MBit that way from the host over the master by CAN to the slave..

You need to do this without the acknowledgement to find out where your problems are. As it is you are just making it more difficult to find them. Figure out how much time a message takes to transmit and how much time it take you to receive and process a message. That will tell you something about your limitations. Think about what happens if a new message starts arriving while you are still reading the previous one out of the message buffer.

Also bandwidth is a poor way to measure CAN. It is not designed for bulk transfer of data but for control messages and data. The important measures are the messages per second you can receive and thus the time between messages.

Given your problem description, I think the issue in your second case is more that you are not using CAN properly than anything in TIVAWare In the first I suspect an issue with your receive handling (or your expectations).

If you are using the Baud rate setting API you don't need to get into the details of the various ways it can be configured, especially if you are using the same tranceivers and micros for each board. You only need to get into timing configurations when/if you are optimizing for reliability or having to work with a particular delay configuration. As long as you are using standard tranceivers and have not inserted isolators with long delays you should not have an issue. Long delays in isolators behave like long cables from the protocols point of view.

Robert

0 cb1 over 11 years ago in reply to Robert Adsett

Guru 47900 points

@ Robert,

Simply terrific care/effort/detail - on your part. Most excellent - much appreciated - great thanks. (poster likely too stressed to properly acknowledge your efforts)

Value of, "real world" use/experience surely registers, "Loud/strong" this thread. And of course the logical design & assembly of proper, sequential test methods - to confirm (or create) real understanding.

Only thing I can add is to, "Never perform such tests across just one or two boards/devices." Single board anomalies are my group's bane.

I can't recall if poster here is using "official/proper" pcb - or breadboard. That would surely impact - and seems under-represented. (or I missed in the back-forth "noise.")

We've also seen cases where "inadequate power" caused/contributed. Pure, substantial power should always be supplied - each/every node - under test/examination. Too often - one board's enfeebled 3V3 regulator must power everything - and may "dip" - at the worse possible time! (i.e. under the demands of "full-bore" data testing...)

Repeatedly we see/read of the "school" improperly anticipating, "degree of difficulty" - too often at the student's peril...

0 Robert Adsett over 11 years ago in reply to cb1

Guru 27665 points

@cb1,

Thank you for the kind words. I agree the OP is probably awash in stress hormones. A good night's sleep is probably called for and unlikely to be achieved.

Good thought on the power supply. I've not had frequent single board anomalies but they are not rare enough.

The OP did provide a link to a photo of the setup. It appears to have a custom PCB on top of a launchpad. The PCB appears to be a simple double sided PCB with no solder mask. The visible surface also has few/no flywires and it appears to be fairly low density from what I recall.

Your last sentence is very much to the point I think. Not a new problem.

Robert

0 Wade Whitehorn over 11 years ago in reply to cb1

Prodigy 120 points

CAN is pretty robust. You have to look at the basics first before you start pointing fingers at the hardware or the ARM.

Do you have the bus terminated at two ends only. You do not have to aim at 120 Ohms on both ends, but do use the same value for termination. Do twist your wires together. This helps more than you think to reject common mode noise. Shorter will always be better. 300mm is definitely short enough. Measure across the two CAN wires and tell us what you measure.

Check with a scope to see what is happening on the bus if you can. Otherwise a simple DC measurement with a volt meter will tell you if there is some problem.

On the subject of 3V3 regulators as mentioned above, is there adequate decoupling at the CAN transceiver?

0 Felix Hofer over 11 years ago in reply to Wade Whitehorn

Prodigy 60 points

Hello together

Thank you very much for your help!
As said earlier I have time Problems and therefore can not test and try everything again since I still have to write a Report about everything and do many other stuff.
But your help is much apreciated and if I can't come back to this, I hope it may help somebody else...
As i wrote earlier, I hoped of a quick fix because I thought that I missed something or did something wrong in the Software. But the Problem seems to be much more deep and for that is no time anymore, unfortunately.

I expected that I can just set a Tx message on the bus and when the Tx complete Interrupt occurs I can set the next message without considering anything else. The Controller, Transceiver, API should handle this...
Am I right or am I not??

As described above and what you can see on the Picture of the hardware posted, the nodes are on a single ribbon cable and terminated left/right with 120ohm, in the middle not.
the launchpads are the same, the pcb's are the same with the same Hardware. But two are not fully populated with LEDs, Switches and so on. CAN Transceiver: TI SN65HVD232DR.
On one board the wires are because of a PCB design mistake between rs232 Transceiver and Serial connector. on the other board there are defects because of soldering. but nothing has anything to do with the CAN.

The CAN is set up as follows:

CANInit(CAN0_BASE);
CANBitRateSet(CAN0_BASE, SysCtlClockGet(), 750000); // 1000000); // 1 MBit/s
IntEnable(INT_CAN0);
CANIntEnable(CAN0_BASE, CAN_INT_MASTER | CAN_INT_ERROR | CAN_INT_STATUS);
CANEnable(CAN0_BASE);

The boards have one single layer and are connected the following way over the ribbon cable (The twisted pair I'll give a try, shouldn't make anything worse ;) ):
CANL-CANH-GND-GND-3.3V-5V
Where 3.3V and 5V are the connectors of the launchpad.
The 3 boards normally are powered with one or two USB connectors. But as examined earlier, one should be more than enough I think.

In the Rx handler, basically the message is read and the data put into a ring buffer, nothing magic.

But how can I measure respectively see of the measurement if the Signal is good and as it should be?
And what then confuses me: the Arbitration/collision handling does not work but I never get corrupt data, just message loss...?!?
So the signals can't be that bad...?!?

@Wade Whitehorn: What do you mean with "adequate decoupling at the CAN Transceiver"? As i said, the 3.3V and 5V of all boards are connected together and powered by 1-2 usb connectors and the parts powered by the 3.3v.

thx & regards

p.s. what I can say is that a timeout with SysCtlDelay(500) at 50 MHz and 1MBit/s CAN should be enough between two Tx.

0 Robert Adsett over 11 years ago in reply to Felix Hofer

Guru 27665 points

Felix Hofer said:

I expected that I can just set a Tx message on the bus and when the Tx complete Interrupt occurs I can set the next message without considering anything else. The Controller, Transceiver, API should handle this...
Am I right or am I not??

As far as the transmission side goes, I believe that is correct. As I remember errors will either result in retries or eventually a failed bus condition.

You do have to consider the receving side as well.

Usually the receiving side in any protocol determines how quickly packets/bytes/bits can follow one another without issues.

Felix Hofer said:

p.s. what I can say is that a timeout with SysCtlDelay(500) at 50 MHz and 1MBit/s CAN should be enough between two Tx.

So with that delay you can receive all the data? So, how long of a delay is that?

Do you check for overwrites (the message lost flag)?

If you remove the delay what percentage of packets do you lose?

How long does it take you to process a packet into your ring buffer?

How long does it take to detect an arrived packet and transfer it out of the CAN memory?

Felix Hofer said:
But how can I measure respectively see of the measurement if the Signal is good and as it should be?
And what then confuses me: the Arbitration/collision handling does not work but I never get corrupt data, just message loss...?!?
So the signals can't be that bad...?!?

What is your evidence that arbitration is not working? Note that with CAN unless you are doing something wrong there is no collision. CAN does not handle actual collisions well, that is why you should never have two nodes transmitting the same ID.

Felix Hofer said:
So the signals can't be that bad...?!?

With no drop outs they are probably within acceptable range but without checking you cannot know if they are marginal or not. This is why it is vital to check signals, especially if there are any unexplained issues. They may not be the cause but it is best to eliminate this as a likely problem area early. If you do not you can spend a very long time pursuing will-o-wisps.

Felix Hofer said:
@Wade Whitehorn: What do you mean with "adequate decoupling at the CAN Transceiver"? As i said, the 3.3V and 5V of all boards are connected together and powered by 1-2 usb connectors and the parts powered by the 3.3v.

All ICs need local decoupling. A capacitor (or several) near the IC. It acts as a charge reservoir for the IC "decoupling" the IC from the trace inductance and resistance to the main power supply. The faster the IC, and the larger the impulse draws to drive its load and or the more sensitive to voltage variation the more critical the decoupling. Note that this is in addition to bulk charge storage near the power supply and more distributed decoupling also used to share out the load.

Robert

Felix Hofer said:
no time anymore, unfortunately

I suspect you are right. I think you could finish with more time, you do not appear to be completely off in the wrong direction.

0 Felix Hofer over 11 years ago in reply to Robert Adsett

Prodigy 60 points

SysCtlDelay(500) at 50 MHz should be about 10us.

I don't think that I ever checked for any message flags (except the Status flags of the Status Interrupt with the error handler from an example).
I also can't tell you any more how much I loose...

How long does it take you to process a packet into your ring buffer?

no idea... shouldn't be very Long since this is only a few if/else, variable assignments and the buffer copy.

How long does it take to detect an arrived packet and transfer it out of the CAN memory?

How should I know?? Oo

What is your evidence that arbitration is not working? Note that with CAN unless you are doing something wrong there is no collision. CAN does not handle actual collisions well, that is why you should never have two nodes transmitting the same ID.

Well if just two are sending a message at about the same time (different ID's), at least one gets lost.

About IC decoupling: I didn't know that Expression in english in this context. But on the Picture you can see a 0.1 uF capacitor near the IC between supply and IC.

So thank you anyway for your Explanation!
Not a whole week anymore to write the whole Report, fix some siily bugs in the bootloader (hope I'll manage tghis in time) and do some other random stuff ;)

P.S. what i forgot: may it be a Problem that the green (RGB) LED is on the CAN Tx? The LED lights all the time...

regards...

0 Robert Adsett over 11 years ago in reply to Felix Hofer

Guru 27665 points

Felix Hofer said:

SysCtlDelay(500) at 50 MHz should be about 10us.

The documentation I have states 3 cycles per (without an indication of what cycles are or how they relate to the oscillator speed). If that is correct you are suggesting 150cyles per uS or 150Mcyles per second.

Felix Hofer said:
I also can't tell you any more how much I loose...

That's an important large gap in your knowledge.

Felix Hofer said:

How should I know?? Oo

Part of it you can measure or count instructions. You have the setup to do it. You just need to use an oscilloscope or logic analyser to get some good estimates of elements the timing (in partcular you can measure from when one board has finished transmission to when the second board has transferred from the receive registers to the register you actually read the CAN object from). Other portions are in the user manual. However, my point there was to get you thinking about where you can run into problems rather than pointing to specific measurements to take. It is, however, very important not to assume a specific behaviour but to attempt to confirm it. You have been assuming an arbitration/collision problem is causing packet loss even though your tests in an environment that cannot produce collisions or any arbitration opportunities without the CAN peripheral having a pretty severe bug still shows packet loss.

Felix Hofer said:

no idea... shouldn't be very Long since this is only a few if/else, variable assignments and the buffer copy.

Do you see why that is an important piece of information? It does, BTW, include more that just the copy, it includes all the time from when the packet is taken out of the CAN receive registers into the interface registers to the time you are prepared to do so again.

Also think of what happens when you fill the FIFO.

Felix Hofer said:

Well if just two are sending a message at about the same time (different ID's), at least one gets lost.

Which is not an indication of collision or arbitration issues, only of loss.

Felix Hofer said:

About IC decoupling: I didn't know that Expression in english in this context. But on the Picture you can see a 0.1 uF capacitor near the IC between supply and IC.

That would likely be the decoupling cap, also referred to as a despiking cap.

Felix Hofer said:
P.S. what i forgot: may it be a Problem that the green (RGB) LED is on the CAN Tx?

No idea. LEDs aren't part of CAN.

Felix Hofer said:

So thank you anyway for your Explanation!

You're welcome. Good luck.

Robert

0 Petrei over 11 years ago in reply to Robert Adsett

Guru 26105 points

Hi,

Two small details related to the settings of your CAN:

a) Check the returned value of the function CANBitRateSet() - it should be that programmed by you in case of success, or 0 in case of failure. This is because your settings of .75 MHz doesn't give an integer factor (16MHz/0.75MHz) so the correct bit timings could not be resolved. Anyway is recommended to be checked since it returns a value.

b) Just in case it is 0, then read this document and verify your settings.

8802.CAN_bit_timing.pdf

Petrei

Arm-based microcontrollers

Arm-based microcontrollers forum

CAN bus: Arbitration / Collision detection & handling