Other Parts Discussed in Thread: TCAN4550, , TCAN4550EVM
Tool/software:
The problem reappears regularly, and I read all registers when the problem occurs:

This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
The problem reappears regularly, and I read all registers when the problem occurs:
Hello Hangziang,
The register readout doesn't show any set interrupt, status, or error counter bits that give me any information about the reason there are communication errors. The log and display readout does show the device is entering a bus off state and is being restarted.
A bus off condition occurs when there are too many corrupted or bad CAN messages transmitted or received on a bus. Common issues for this is related to poor bus signal integrity causing the bits to be incorrectly sampled, or from disruptions to the high-speed clock or voltage levels. When issues are seen after running for a period of time, this can possibly mean that there is a temperature related shift in in something that comes from the heat of the system and leads to a performance change.
Are you using the TCAN4550-Q1 on a board produced by TI, such as the TCAN4550EVM, or are you using it on a board of your own design? If it is not a TI board, then can you provide the portion of the schematics that pertain to the TCAN4550 for review?
Regards,
Jonathan
Hello!
1、 The "CAN0" mentioned above corresponds to the "CAN1" interface in the schematic diagram, and the terminal resistor on "CAN1" is connected to another distant transceiver position, as shown below:
2、When our device fails to send, it can receive data normally, which should have nothing to do with the crystal oscillator and power supply of the TCAN4550 chip, otherwise it can not receive data normally
Hello wulu,
Thanks for the schematic information. I will point out that the series 22 ohm resistor is generally located on the OSC1 pin instead of the OSC2 pin because the OSC1 pin is the output of the transconductance amplifier sourcing current to the crystal. However, I don't think this is causing your error.
Please see the TCAN455x Clock Optimization and Design Guidelines Application Report (Link) for more information.
When you have a large number of transmitted errors, this is often associated with incompatible bit timing configurations between the different CAN nodes trying to communicate on the bus. Can you verify the exact bit timing configuration for all of the other bus nodes and make sure they exactly match the bit configuration you are using for the TCAN4550-Q1?
Also, can you capture the CAN messages on a scope so that we can see if there are any issues with the physical CANH and CANL signals that explain why the messages are generating errors?
I noticed that you have a 0.1uF (100nF) capacitor between the two 60.4 ohm termination resistors. This may be too large, and a common value is 4.7nF or 0.047uF. It's possible the larger capacitance is filtering out some of the signal and preventing the bits from reaching the full recessive level resulting in bit sampling errors.
Is there only the one set of termination resistors on your CAN bus, or are there two sets of termination resistors?
Regards,
Jonathan
Hello Jonathan Nerger,
1、Now the CAN bus waveform in the case of positive difference communication is measured, but it cannot be measured temporarily in the case of abnormal communication.
There are a total of nine devices on the bus, and only my device CAN0 has the following waveform when sending data:
When the following eight loads are also powered on, some devices also start to communicate when the waveform is as follows:
2、have a 0.1uF (100nF) capacitor between the two 60.4 ohm termination resistors
When there is a problem, the 120Ω resistance directly used at the beginning is as follows, without the 60.4Ω resistance and 100nF capacitor, and the state you see in the above figure is changed later, so the abnormal state that the equipment can only receive and cannot send will occur about 2 hours after use in both ways.
3、Is there only the one set of termination resistors on your CAN bus, or are there two sets of termination resistors?
Yes, there is a terminal resistor on the other end of my bus, and the usage of my two TCAN4550 devices is as follows:
Hello wulu,
Why are they only using half of CAN bus termination called for in the CAN standard? There should be two sets of 120 ohm termination which create a total resistive load of 60 ohms (typical). These termination resistors also function to absorb and dampen reflected energy on the bus and improve signal integrity.
The transmitted errors they are receiving may be due to the improperly loaded and terminated CAN bus.
Can they add the second set of termination resistors to the bus and see if this resolves the transmission errors?
Regards,
Jonathan
I'm sorry, I misunderstood the termination locations. I stand corrected.
The scope plots still show that the dominant and recessive voltage levels should be sufficient to to sample the bits correctly. With limited information on the register log file, and no scope plots of the transmitted frames that generated the errors, I'm not sure what is causing the errors.
Looking back at the previous information, I see the sample point for the bits is set to 87.5%. I believe the Linux driver calculates and assigns the number of time quanta (or high speed clock cycles) to be used to create a bit period that matches the desired rate. It also calculates the sample point by assigning the number of time quanta to be used before and after the sample point. Sampling the bit near the end of the bit may not work for all applications and you may get better results by adjusting the sample location a bit earlier. You might try to move the sample point and see if there are any improvements. This can be done by adjusting the ratio of the time quanta allocated in the NTSEG1 and NTSEG2 fields of the Nominal Bit Timing and Prescaler Register (0x101C).
Regards,
Jonathan
We tried to change the sampling point of all devices on the bus to 0.80, but the problem still occurred.
I would suggest monitoring the Error Counter Register 0x1040 and the Protocol Status Register 0x1044 and keep a log of those register values to try to capture the reason for the errors on the RX and TX CAN messages.
Regards,
Jonathan
Do you mean that we need to record these two registers from the moment the CAN device starts working? How long is the recording interval?
When the problem occurs, cansend cannot successfully send data (candump does not see data packets, ip -d -s link show can0 does not see an increase in sent data packets)
Hello Hangziang,
Do you mean that we need to record these two registers from the moment the CAN device starts working? How long is the recording interval?
What I mean is that in order to determine what is causing the errors in the CAN messages, we need to try and capture some of real-time error counter and status register values that will report what the specific type of error was. Register 0x1040 is the Error Counter Register and will contain the Transmit and Receive Error Counter values (TEC and REC). If either of these counters exceeds 255, the device will enter a Bus Off (BO) condition and the device will disconnect itself from CAN bus communication.
Register 0x1044 is the Protocol Status Register and it will contain error codes for the type of CAN bus error that was detected on the last message. Therefore, if we can capture the value of this register for the messages that had errors, we can determine what type of errors are occurring on your bus and then hopefully try to figure out how to prevent them.
Here are the two register values I grabbed after the problem occurred
You read back 0x104E instead of 0x1044 which won't return the correct value. Also, 0x1040 = 0x00000000 shows both the TEC and REC error counters are at 0, so this doesn't provide any helpful information.
When the problem occurs, cansend cannot successfully send data (candump does not see data packets, ip -d -s link show can0 does not see an increase in sent data packets)
My expertise is with the TCAN4550 at the device level and I am not a Linux expert. If this is an issue with the TCAN4550 and it's configuration we will need to monitor and record the information the device is providing us through the registers before, during, and after the issue occurs. If possible please try to monitor the following register values into some form of log file so that we can try to determine the root cause of the errors.
0x000C
0x0800
0x0820
0x0824
0x1018
0x1040
0x1044
0x1050
0x10A4 (If using RX FIFO 0)
0x10B4 (If using RX FIFO 1)
0x10C4
0x10CC
0x10D8
If the device stops sending due to accumulating too many errors and entering a Bus Off condition, then the processor will need to re-initialize the CAN communication by clearing the INIT bit in the Control register (0x1018[0] = 0), or issuing a hardware or software reset and fully re-configuring all of the device registers just like an initial power up configuration.
Regards,
Jonathan
I read the 0x1040 and 0x1044 registers every 100 ms and log when a change occurs, and get this
The format is [hours, minutes, seconds, milliseconds] (0x1040,0x0144 register value after change)
The problem occurred at about 14:51:43 p.m.
Scenario: Device A sends a heartbeat packet to device B on the CAN bus via tcan4550. The process continues until device B discovers that it has lost the heartbeat packet from device A. At this point, we find that device A can still receive data packets from the bus normally via tcan4550, but it cannot send a successful data packet.
Monitoring more registers at too high a frequency will affect the normal interaction process of tcan4550, so other registers are not monitored for the time being.
You can find the bit descriptions for these two registers in the datasheet and the CAN Error Logging Counter (CEL), Receive Error Counter (REC) , Transmit Error Counter (TEC), the Data Phase Last Error Code (DLEC) and Last error Code (LEC) are defined and update according to the ISO 11898 CAN Standard protocol.
[133702209] CEL = 0, REC = 255, TEC = 0, BO = 0, EW = 1, EP = 1, DLEC = No Change from previous read, LEC = Form Error
[133753088] CEL = 0, REC = 107, TEC = 0, BO = 0, EW = 1, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[133753121] CEL = 0, REC = 59, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[133753251] CEL = 0, REC = 8, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[133753440] CEL = 0, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[133805142] CEL = 1, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Error
[133805274] CEL = 0, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[133900798] CEL = 1, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[133900933] CEL = 0, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[145445684] CEL = 2, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
[145445823] CEL = 0, REC = 0, TEC = 0, BO = 0, EW = 0, EP = 0, DLEC = No Change from previous read, LEC = No Change from previous read
What I can see from this log is that there is initially a lot of Receive Errors based on the REC value. Occasionally the CEL is incremented. But nothing shows transmit errors or a reason why the you are not able to transmit a message in this log.
The previous register value file shows there is a single TX Buffer Element allocated for the TX FIFO. You may want to monitor the TX FIFO Status, Request Pending, and Transmission Occurred registers to confirm whether messages are loaded into the FIFO and transmitted properly. If the message fails to send for some reason, then your driver may not be able to load a new message into the only TX buffer you have allocated resulting your observation that you are not able to transmit messages.
Try to monitor registers 0x10C4, 0x10CC, 0x10D8.
You could also try increasing the size of the TX FIFO to contain more than 1 TX Buffer elements.
Regards,
Jonathan
Hello, I have re-recorded the registers, and now the format is [hour, minute, second, millisecond] (the changed register values are "0x1040" "0x1044" "0x10c4" "0x10cc" "0x10d8"), and the problem occurred around 14:53 in the afternoon.
Hello,
I've reviewed the register log and created a breakdown table for the values I see.
0x1040 Error Counter Register
0x0 = CEL=0, REC=0, TEC=0, REC Below error passive level of 128
0x1100 = CEL=0, REC=17, TEC=0, REC Below error passive level of 128
0x112C00 = CEL=17, REC=44, TEC=0, REC Below error passive level of 128
0x113000 = CEL=17, REC=48, TEC=0, REC Below error passive level of 128
0x214300 = CEL=33, REC=67, TEC=0, REC Below error passive level of 128
0x2900 = CEL=0, REC=67, TEC=0, REC Below error passive level of 128
0x2C00 = CEL=0, REC=44, TEC=0, REC Below error passive level of 128
0x5200 = CEL=0, REC=82, TEC=0, REC Below error passive level of 128
0x7A00 = CEL=0, REC=122, TEC=0, REC Below error passive level of 128
0x7B00 = CEL=0, REC=123, TEC=0, REC Below error passive level of 128
0x7D00 = CEL=0, REC=125, TEC=0, REC Below error passive level of 128
0x80000 = CEL=8, REC=0, TEC=0, REC Below error passive level of 128
0xF00 = CEL=0, REC=15, TEC=0, REC Below error passive level of 128
0xFF00 = CEL=0, REC=255, TEC=0, REC Above error passive level of 128
0x1044 Protocol Status Register
0x710 = DLEC=No Change since last message, LEC=No Error, Operating as Receiver
0x708 = DLEC=No Change since last message, LEC=No Error, Idle
0x70F = DLEC=No Change since last message, LEC=No Change since last message, Idle
0x717 = DLEC=No Change since last message, LEC=No Change since last message, Operating as Receiver
0x718 = DLEC=No Change since last message, LEC=No Change since last message, Idle
0x71F = DLEC=No Change since last message, LEC=No Change since last message, Operating as Receiver
0x74F = DLEC=No Change since last message, LEC=No Change since last message, Idle, Error Warning Status
0x750 = DLEC=No Change since last message, LEC=No Error, Operating as Receiver, Error Warning Status
0x769 = DLEC=No Change since last message, LEC=Stuff Error, Idle, Error passive and Error Warning Status
0x76A = DLEC=No Change since last message, LEC=Form Error, Idle, Error passive and Error Warning Status
0x76F = DLEC=No Change since last message, LEC=No Change since last message, Idle, Error passive and Error Warning Status
0x772 = DLEC=No Change since last message, LEC=Form Error, Operating as Receiver, Error passive and Error Warning Status
0x10C4 TX FIFO/Queue Status
0x1 = TX FIFO Free Level=1
0x200000 = TX FIFO Free Level=0, TX FIFO is Full
0x10CC TX Buffer Request pending
0x0 = No TX Buffers have a message pending transmission
0x1 = TX Buffer 0 has a message pending transmission
0x10D8 TX Buffer Add Request Transmission Occurred
0x0 = No TX Buffers had a transmission occur
0x1 = TX Buffer 0 had a message transmission occur
For all log values for timestamps in the 1453xxxxx range, I only saw one entry that showed unusual values and that was at timestamp[145359846]. In this entry the CAN Error Logging (CEL) value was set to 8, meaning that there were 8 messages that contained a CAN protocol error. However, the REC and TEC counters had already decremented back to 0, so I don't know whether the messages that contained a protocol error were transmitted or received and the device had returned to normal activity.
I also noticed that the TX FIFO was Full on several log entries in register 0x10C4 which may be to only having a single TX Buffer allocated and the message transmission is pending arbitration. This is not necessarily an issue, but this message must successfully transmit before a new message can be loaded into the buffer, and I believe when using the Linux Driver, a memory access error may be returned if there is an attempt to access a buffer element that has not been allocated due to all of the existing buffer elements being full at that moment.
All I can say based on this log is that it appears that you have some occasional bursts of errors that may indicate some external noise source impacting the bus. All the errors in the log appear to be in the Receiver, and there are some initial errors at the start of the log, and then between timestamps [141835439] and [142018009] and a single line at timestamp [142100073].
Regards,
Jonathan
Hello,
Now there is a new phenomenon. After using CAN0 for about five hours, when the device is completed and enters standby mode, CAN0 pauses sending and receiving data. When the device starts to be used again, it also shows that CAN0 can only receive but not send. Please help check what the problem is?
Hello wulu,
Please provide a list of the device registers for review. Without knowing the status and configuration register values, I don't know how to help you determine what the cause of the issue is.
Please read and create a log or list of the following configuration, status and interrupt register values:
0x000C
0x0800
0x0820
0x0824
0x1018
0x1040
0x1044
0x1050
0x10A4 (If using RX FIFO 0)
0x10B4 (If using RX FIFO 1)
0x10C4
0x10CC
0x10D8
Regards,
Jonathan
tcan_reg.log.20250507_101320063.txt
Hello, this time the problem reappeared at about 10:44:25. This log stores the register list ("0x000c" "0x0800" "0x0820" "0x0824" "0x1018" "0x1040" "0x1044" "0x1050" "0x10a4" "0x10c4" "0x10cc")
Hello, the data log didn't show a reason why the TCAN4550 would not be able to transmit. The error counter registers don't show errors, and the config and control registers show it to be in normal mode with the MCAN Initialized and able to communicate on the CAN bus.
I noticed that the TX FIFO/Queue Status register 0x10C4 would either show the TX FIFO as having 1 free TX buffer element, or it was Full. The last time it showed it full in the log was timestamp [104214317]. Shortly after this the TX Event FIFO shows it had a new event element at timestamp [104215224]. After this, there are no more TX Event FIFO interrupt bits set in the MCAN Interrupt register, and the TX FIFO Free Level always shows 1, meaning the TX FIFO doesn't have a message to transmit.
It appears the reason the TCAN4550 has stopped transmitting CAN messages is because the processor is not loading any messages into the TX FIFO because the TX FIFO is always empty and the TX Buffer Add Request Pending register 0x10CC is always showing no messages have been added for transmission.
Please verify the processor is correctly loading messages into the TX FIFO Buffer Elements and setting the TX Buffer Add Request Buffer Number in the TXBAR register 0x10D0.
Regards,
Jonathan
After repeated verification, we confirmed the problem: there is a vulnerability in the sending logic of the kernel driver.
Since there is only one sending fifo, in the sending thread, the sending register M_CAN_TXBAR is enabled first, and then the sending fifo is determined to be full. If the fifo is full, the can software queue is closed. After judging that the fifo is full, an interrupt may interrupt this process, causing the logic judgment to fail.