This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TCAN4550: TCAN4550

Part Number: TCAN4550

We are following a methodology which is given below
1) Transmitting 64 byte long CANFD payload from device A to device C. Observed that data is receiving in device C from device A.
2) Transmitting 64 byte long CANFD payload from device B to device C. Observed that data is receiving in device C from device B but stopped the data reception in device C from A. It generated an error message which is given below
tcan4x5x spi0.0 can2:__can_get_echo_skb:BUG! Trying to access can_priv::echo_skb out of bounds(255/max 7)
tcan4x5x spi0.0 can2:can_put_echo_skb:BUG! echo_skb is occupied!

We want to transmit CAN data simultaneously from both device to a particular single device. Can you suggest methods to resolve this issue.

  • Akshay,

    An engineer has been notified of this post and will respond by end of business 11/1/2022 CST.

    Regards,

    Eric Hackett

  • Hi Akshay,

    Are all the nodes using separate processors, or are you trying to use multiple TCAN4550 devices with the same processor? 

    Generally each processor node would be connected to the CAN bus through a single transceiver (TCAN4550).  However, it is not clear to me that the TCAN4550 devices are connected to separate processors and instead is sounds like they are on the same processor (at least A and B) and there is some conflict over which device is controlling the SPI bus.

    If they are on the same processor, it appears there is an error with the processor device configuration and separate SPI buses are not being used for the different devices.

    If they are on separate processors can you clarify which processor node (A, B, or C) is generating the error? 

    Regards,

    Jonathan

  • Devices (also referred as nodes) A, B and C are connected in the same CANFD bus. Each of these device has its own host processor and TCAN4550 transceiver. The SPI bus between the host processor and TCAN4550 is not shared with any other peripherals.

    The transceiver A starts transmitting 64 byte long CANFD payload to node C. Observed that data receiving in the node C. Then parallelly transceiver B also starts transmitting 64 byte long CANFD payload to node C. It generated an error message in both transceivers A and B which is given below.

    tcan4x5x spi0.0 can2:__can_get_echo_skb:BUG! Trying to access can_priv::echo_skb out of bounds(255/max 7) tcan4x5x spi0.0 can2:can_put_echo_skb:BUG! echo_skb is occupied!

    Also the first transceiver A stops transmitting data to node C, but transceiver B continues transmitting data to node C. Observed that node C is receiving data only from one transceiver B. Some times, we get, transceiver A transmitting data to node C, but transceiver B stops transmitting data to node C

  • Hi Akshay,

    I support the TCAN4550 at a device level and this is not a Linux support forum and I am not an expert in the Linux driver.  However, I can help identify what the root cause of the error is.

    Are Nodes A and B transmitting messages with the same message ID's?  This should be avoided as it could cause problems during arbitration and generate message errors.  Can you verify the CAN message ID's are unique for each node?

    Also, do I understand correctly that if you only have 1 of the nodes (either A or B) connected to node C that you can transmit data indefinitely without receiving this type of error, and it is only when a second node is added to the bus when the error appears?

    Is the error immediate after the second node starts transmitting, or is data transmitted from both nodes for a period of time before the error appears and if so how long is this?

    Can you read the status and interrupt registers on the TCAN4550 devices so that I can see what the fault bits, RX/TX error counters, RX/TX FIFO levels, etc. are?

    • h000C
    • h0800
    • h0820
    • h0824
    • h1018
    • h1040
    • h1044
    • h1050
    • h10A0
    • h10A4
    • h10AC
    • h10B0
    • h10B4
    • h10C4
    • h10CC
    • h10D0

    Regards,

    Jonathan

  • Hi Jonathan,

    These are the register values which you had asked:

    • h1018=300
    • h1040=0
    • h1044=300f
    • h1050=1
    • h10A0=9920001c
    • h10A4=a1515
    • h10AC=0
    • h10B0=990a091c
    • h10B4=0
    • h10C4=20207
    • h10CC=0
    • h10D0=0

    Regards,

    Akshay Naik

  • Hi Akshay,

    The registers you have provided in the h1000 to h10FF are for the MCAN configuration and status registers. I don't see any device related issues from the MCAN registers you have provided and the device does not appear to have received a high TX/RX error count, entered an error warning, error passive, or bus off condition, and the TX/RX FIFOs are not full. 

    Which node are these registers from?

    Can you please provide the other registers as well?  These are the Interrupt and Diagnostic registers that can indicate if there are issues with the device not related to the MCAN controller such as SPI errors.

    • h000C
    • h0800
    • h0820
    • h0824

    Can you pleases also comment on the other questions I have asked?

    Another question I have is whether Nodes A and B are also receiving messages while also transmitting messages, or are nodes A and B only transmitting messages, and node C is only receiving messages?

    Thanks,
    Jonathan

  • Hi Jonathan,

    Will get back to you on the register related query. We have observed another observation which is mentioned below:

    We are transmitting data from transceiver (TCAN4550) A to transceiver (TCAN4550) C. The same time Transceiver (TCAN4550) B starts transmitting to transceiver (TCAN4550) C. All the transceivers are connected on the same bus.

    Since all the transceivers are on same bus, the transceiver A is getting the RX interrupt also with the TX interrupt at the same time. RX interrupt is getting in A because of transceiver B transmitting in the same CAN bus. This time transceiver A stops transmitting the data. Transceiver B still continues transmitting data. We are using 0 (zero) ms inter frame delay for transmitting data in both transceiver A and transceiver B. Can you suggest your opinion on this?

    Is there any issue on receiving RX interrupt while transmitting data on a transceiver? At a time both receiving and transmitting is possible?

    Regards,

    Akshay

  • Hi Akshay,

    I will refer you to the following documents for additional discussion on how the TCAN4550 transmits and receives messages.  The TCAN4550 uses the MCAN CAN FD Controller IP developed by Bosch and more detailed information specific to this controller can be found in the MCAN User's Manual.

    A simpler overview with some examples can be found in the TCAN45xx Software User's Guide.

    It is certainly possible to transmit and receive messages simultaneously with a TCAN4550 according to the CAN FD Standard protocol and the rules of arbitration. 

    When a message or messages are loaded into a TX buffer on the transmitting node and the corresponding bits in the TX Buffer Add Request (TXBAR) register are set, the TCAN4550 will try to transmit the messages on the CAN bus.  If the node wins arbitration (i.e. has the lowest message ID), then the message will be transmitted on the CAN bus.  If however the node loses arbitration, the TCAN4550 will continue trying until it is successful.

    While a node is not actively transmitting a message, it is still receiving the message being transmitted on the bus.  The TCAN4550 will compare the incoming message ID against the Filter elements that have been saved and if it passes through one of the filters, it will be stored in the MRAM memory location directed by that filter.  Different filters can be set up to direct specific messages to the different RX locations such as RX FIFO 0, RX FIFO1, and dedicated RX Buffers.  If the incoming message ID does not pass through a particular filter, it will be either accepted or rejected according to the General Filter Configuration (GFC) register configuration (0x1080).

    The TX and RX buffer space should be allocated in separate non-overlapping memory so there is no conflict between transmitting and receiving messages at the same time.  Obviously because CAN is not a full-duplex type of bus architecture, only one message can be transmitted on the bus at a time, but by the "same time" I mean that the TCAN4550 can have TX messages queued up to transmit while also being able to receive and store messages transmitted by other nodes.

    I previously asked about whether both nodes A and B are using separate message ID's for their transmissions.  If not this could cause problems with arbitration and both nodes could try to transmit a message with the same message ID at the same time.  If however the message ID's are different for the two nodes, but one node has a lower message ID than the other node then it is possible that one node could block the other node from successfully transmitting their messages based on arbitration if the nodes are transmitting repeatedly with little or no inter frame delay.

    I have also asked and am still unclear about whether you want nodes A and B to only transmit data, and only node C to receive, or do you want all nodes to both transmit and receive messages.  If you don't want nodes A and B to only transmit and not receive data from the other node, then you should set up a filter or the general filter settings to reject the messages you are not interested in.  This will prevent the nodes from wasting time on receiving and processing messages they are not interested in.  For example you could either setup node A to reject the message IDs transmitted by node B, and likewise node B could reject the message IDs transmitted by node A. 

    Furthermore, you could create filters in node C to receive only the message IDs transmitted by nodes A and B, or even to store the messages from node A into RX FIFO 0, and messages from node B into RX FIFO 1 as an example.  But it doesn't sound like you have configured any filter elements and I would suggest you try to get some of those configured between the nodes to handle the reception and interrupts generated in the various nodes.

    If nodes A and B are not using unique message ID's then this could potentially generate errors an cause the other problems you are seeing.

    If node B is transmitting with lower message ID's and at a rate that does not allow node A to win arbitration and transmit messages, this could explain why node A messages are not received by node C after node B starts to transmit.  You could try introducing a small transmit pause through the TXP bit of the Control register (0x1018:14) which when set introduces a 2-bit delay between starting the next transmission to allow other nodes an opportunity to start transmitting messages that may have a higher message ID.  You can also try to slow the transmission rate to ensure there is some idle time on the bus that will allow all nodes to eventually transmit their messages.

    Regards,

    Jonathan

  • Hi Jonathan,

    We are very thankful for the detailed explanation on our query. Thank you Jonathan.

    As you mentioned about message ID, we are transmitting with different message ID from transceiver (TCAN4550) A and Transceiver (TCAN4550) B and as well we are transmitting 64 byte payload without any inter frame delay. Node A,B and C needs to transmit as well as receive.

    We observed that node B blocking node A from successfully transmitting messages and node B continues successful transmission. In this situation, We stopped transmission of node B. Even after node B stops transmission, node A is unable to send the data. Could you suggest what could be the reason for this observation.

    Once the node get blocked, then is it needed to reset the TCAN4550 for the successful transmission again?

    As you suggested,  we will introduce small transmit pause through the TXP bit of the Control register (0x1018:14) and check and update you.

    Regards,

    Akshay

  • Hi Akshay,

    I am glad to help.  The TCAN4550 is a fairly complicated device to understand and get configured, so I'm trying to be as informative as possible to help reduce the overall time needed to resolve all the issues, and to also help spur additional thoughts if we are overlooking some detail that hasn't been shared in the thread that could be causing the troubles.

    Once the TCAN4550 has a message loaded into a TX Buffer and the TXBAR bit set to initiate the transmit process, the device should continue to try to transmit this message until it is successful unless:

    1. The Disable Automatic Retransmission (DAR) bit is set in the Control register (0x1018[6]).  When this bit is set to 1, the TCAN4550 will try to transmit the message once, and only once regardless of success.  From the register values you previously provided me, this bit is set to "0" which is the default value and should cause the device to automatically retry failed transmission of messages.
    2. The Initialization (INIT) bit is set in the control register (0x1018[0]).  When this bit is set, the device is prevented from transmitting data on the CAN bus. 
      1. The INIT bit is set automatically when the device changes to Standby Mode
      2. The device receives too many CAN errors and enters a Bus OFF (BO) condition (0x1044[7]). When TX or RX error counters are incrementing, the device will first enter the Error Passive (EP) (0x1044[5]) and then the Error Warning (EW) (1044[6]) before reaching the Bus Off state that sets the INIT bit back to '1'.  You can try to monitor this register and the Error Counter register (0x1040) to see if the device is receiving errors as a result of the other node's activity on the bus.
      3. A transmit cancellation request is received (0x10D4)

    I can't think of anything else at the moment that would cause the device to stop transmitting during normal operation. 

    But the TCAN4550 does not verify the MRAM configuration in any way, so if there are overlapping sections of MRAM memory (i.e. RX and TX buffers overlap), the MRAM data may get corrupted.  As a simple example, if a RX buffer overlaps a TX buffer containing a message to transmit, the TX message may be corrupted by a new RX message before it can be transmitted. 

    Therefore is is a good idea to calculate the start address and number of bytes needed  for each type of MRAM elements being used (TX buffers, RX buffers, TX Event FIFOs, SID and XID filter elements, etc.) and verify there are no overlapping sections.  Also note that if the device will wrap around to the beginning of the MRAM once it reaches the end of the MRAM.  This too should be verified that the MRAM configuration does not exceed the amount of MRAM space available. 

    There is not enough MRAM available to support the maximum size of each type of MRAM element.  You must decide how much of each type of MRAM element you need for your application and how to allocate that in the MRAM space.

    For example, it is a common mistake to forget to add the bytes needed for the CAN message Header to the TX/RX buffer space and it is not just the size of the data field.  Little mistakes like this can cause harder to detect problems, and so if the device's INIT bit is not getting set from a mode change or CAN errors, then there may be a problem with the MRAM configuration.

    Regards,

    Jonathan

  • Hi Jonathan,

    We have the set up of two nodes connected in a bus. Both of the nodes are successfully transmitting and receiving data on same bus simultaneously with an inter frame delay of 19 milli seconds. If we are reducing the inter frame delay less than 19 milli seconds, one of the node will stop transmission.

    Is there any chances of reducing this?

    If yes could you please let us know on how can we reduce.

    Regards,

    Akshay Naik

  • HI Akshay,

    This is starting to sound to me like this could be a limitation on your SPI interface and the inability to get all of the TX/RX data between the processor and the TCAN4550 in the time you require.  What is your SPI data rate, and have you calculated the total time you need to transmit a complete CAN message over SPI into or out of a TX/RX buffer?  This will vary with the data payload size of your messages, SPI bit rates, idle time between SPI bytes and transactions, etc.

    There is nothing inside the TCAN4550's MCAN controller that would prevent it from transmitting or receiving messages with a small inter frame delay and it is compliant with the CAN FD protocol standards.  Therefore I believe the bottleneck is on the SPI side of the device.

    Regards,

    Jonathan

  • Hi Akshay,

    I'm sorry for neglecting to include improvement suggestions in my previous post.

    The TCAN4550 is capable of multi-word SPI transactions allowing multiple consecutive registers to be read or written to in a single SPI transaction by setting the Length field in the SPI header word accordingly.  This reduces the need for a full 32-bit word containing the address to be transmitted with each register read/write and cuts down the SPI time by >50% if you include the inter-word delays between multiple register read/writes.

    The same is true for the MRAM memory space and an entire CAN message can be read or written to a MRAM buffer in a single SPI transaction as compared to multiple SPI transactions for each 32-bit word of data and a new memory address.

    Unfortunately I believe the current Linux driver uses single-word SPI transactions and is not very efficient in this regards.  But there is a lot of room for improvement in this area.  There is currently a more efficient revision to this Linux driver in development, but that work is not yet complete and estimates are for the work to complete by the end of this year (2022).  So far the improvements are reported to increase the efficiency by 20-30%.  Once this is upstreamed into the kernel, this should provide some level of improvements.  But perhaps there are other areas you can find on your own platform that can also improve the efficiency.

    I support the TCAN4550 from a device level and not necessarily from a Linux level so I'm not as familiar with exactly how all of the Linux driver is working with the TCAN4550.  But other areas of improvement may come from how and when the processor retrieves RX messages.  Instead of setting an interrupt on every RX message, it may be more efficient to pull several RX messages that have been placed in the RX FIFO in consecutive MRAM memory space to further reduce the overall SPI time needed to retrieve these messages.  There is a RX FIFO watermark level that can be set and an interrupt bit that can be used for this purpose (RF0W and RF1W in register 0x1050 which is also duplicated and reported in register 0x0824).

    Increasing the SPI data rate to the fastest rate that can be supported, and looking for additional efficiencies in how the processor works with the data may also provide improvements to the performance.

    Regards,

    Jonathan

  • Hi Jonathan,

    We are transmitting and receiving data's simultaneously between 2 nodes. We are giving an inter frame delay of 20ms between frames on both nodes. The transmission will get stopped after some time interval in one node. The device will still receive the data but will not be able to transmit.

    And other than this we are receiving the data from different ID's which we did not transmit. For eg we are transmitting data with ID 111 but we will receive one data with CAN ID 444. Eventually the transmission will get stopped.

    Could you please let us know what is causing this issue?

    Regards,

    Akshay

  • Hi Akshay,

    It is difficult to say what exactly is causing the issue without seeing the device's status and configuration bits.  Can you please read the device registers on the node that stops transmitting so that I can see what is happening in the device while this issue is occurring?  The registers I would like to see are already listed previously in this thread.

    If there are transmit errors occurring in one node, the device may remove itself from participating in bust communication.  However if it can still receive frames, but is not capable of transmitting frames, this is likely not a Bus Off condition.  But the Error Counters and Status bits will indicate this.

    It is also possible the device has entered a Restricted Operation Mode where it is able to receive data and remote frames and to give acknowledge to valid frames, but it does not send data, remote, active error, or overload frames on the bus.  The device can enter this mode when the ASM bit in the Control register is set to '1' by the processor, or automatically when the TX Handler was unable to read data from the Message RAM in time. 

    The device also has a Bus Monitoring Mode that allows it to receive valid data frames, but cannot start a transmission.  But this is not something that can be automatically entered by device, and must be configured by the processor.

    Additional information on these two modes can be found in the MCAN User's Manual that was also previously linked in this thread.

    Your second observation about receiving CAN ID's that are different than the ones you transmitted is interesting.  The difference between a CAN ID of 0x111 and 0x444 is a bit shift of 2 bits.  I'm not sure how a bit shift of 2 bits could occur in a message that would not generate an error frame and meets all the other requirements for a valid frame to not generate an error flag and be discarded.  However, this does open the door to a possible clocking issue.

    Because the two nodes are operating from two independent clock sources, they must be close enough to the same absolute frequency that transmit and receive errors do not occur and must meet the clocking requirements defined in the CAN standard.  If the clock frequencies differ by too much, the bit width of the transmitted bit may not match the bit width expected by the receiving node operating on a slightly different clock frequency which could result in the receiving node sampling the bit in the wrong location.

    Clocks generated by PLLs are also prone to errors because the PLL frequency can wander slightly as it tries to remain locked to the reference clock.  Some PLLs have a narrow window that would ensure the frequency is always inside the valid CAN frequency tolerance window, and other PLLs may wander outside of the valid CAN frequency tolerance window.

    How are your clocks provided to the TCAN4550 devices?  Are they using crystals, or are they generated by a PLL inside the processor or clock generator IC?

    Outside of a clock issue, I don't know what could cause an ID issue like you have reported.  But since you state that the transmission stops, a clock tolerance issue between the two nodes could result in enough RX and TX errors to cause the devices entering a Bus Off condition.

    Regards,

    Jonathan

  • Hi,

    Please see below the device register read

    • h1018=300
    • h1040=0
    • h1044=300f
    • h1050=3
    • h10A0=9920001c
    • h10A4=30a19
    • h10AC=0
    • h10B0=990a091c
    • h10B4=0
    • h10C4=40407
    • h10CC=0
    • h10D0=0
    • h000C=8
    • h0800=c80004a8
    • h0820=40081
    • h0824=0

    We have not enabled the ASM bit in the control register. We are providing external crystal frequency of 40 MHz and SPI frequency of 20MHz.

    Regards,

    Akshay

  • Hi Akshay,

    Thanks for providing the register values. 

    I noticed that your Interrupt register shows the Watchdog Time Out (WDTO) 0x0820:18, which also sets the Global Error (GLOBALERR) 0x0820:7 and Global Voltage, Temp, or WDTO (VTWD) 0x0820:0 bits.  Your Watchdog Action (WD_ACTION) 0x0800:17:16 configuration only instructs the device to set the interrupt flag and if a pin is configured to reflect WD, pull this pin low. 

    I don't know if a watchdog time out would be a cause for your processor to stop transmitting, or if you were intending on using the watchdog timer at all.  But if you are using the watchdog timer, then the reason for this timer not getting triggered may also be a clue as to why the processor stopped transmitting.

    If you were not intending on using the watchdog, then it could potentially cause the processor miss other interrupt events by holding the interrupt pin low as an example. 

    You may want to try your testing again with the Watchdog Timer disabled to see if there is any change in the overall results.  This is done by setting the Watchdog Enable (WD_EN) 0x0800:3 bit to "0".  I'll note that it is set to "1" by default, but the timer should not start until the first Watchdog Trigger event has been observed when it is enabled.  So in order for your timer to have timed out, you must have provided it at least one trigger event.

    If you don't want to disable the watchdog timer completely, you may want to try increasing the timer window.  It is still currently set to the shortest and default value of 60ms with the Watchdog Timer (WD_TIMER) 0x0800:29-28 bits.  It can be increased to 600ms, 3s, or 6s.

    I didn't see anything else in the registers that pointed to a possible cause for the errors such as a high TX or RX error counter.

    I will also note that the maximum SPI frequency supported across all operating conditions is 18MHz.  Operating at 20MHz may work but it is not guaranteed.  It is possible that there is some SPI related issue such as a setup and hold sample error that could be corrupting the data or failing to set registers or load or read CAN messages to or from the memory.  I don't see any SPI errors reported, but perhaps it could somehow account for the incorrect message ID's you are observing.

    Can you also reduce the SPI frequency to be less than or equal to 18MHz for your testing to see if there is any change?

    Regards,

    Jonathan

  • Hi Jonathan,

    We enabled water mark level 25 in the Rx FIFO. Suppose if the RX FIFO not getting filled with 25 Data, after a particular time then watchdog time out will happen and it will read the available Rx FIFO data. We are using watchdog for this purpose. As per your suggestion, we disabled the watchdog and done transmission and reception simultaneously. The result was the same which is transmission stops at random times. Then we tested by changing the watchdog timer to 600ms, 3s and 6s. Result is same. Then we tested with various SPI frequency change (<=18MHz). The result is same.

    Regards,

    Akshay 

  • Hi Akshay,

    OK, thanks for the clarification on how you are using the watchdog.  This makes sense.

    The TX Buffer Add Request (TXBAR) register h10D0=0, meaning that there are no TX message buffers that have been set to transmit messages.  The TCAN4550 will only transmit message buffers that have had the associated bit in the TXBAR set to '1'. 

    The processor will need to set some of the bits in the TXBAR register h10D0 to '1' in order for the TCAN4550 to resume transmission.  Because I don't see any TX/RX errors getting reported, it appears the processor may not be providing new messages to be transmitted on a continuous basis. Or has some bug preventing the TXBAR bits from getting set. 

    However, the TCAN4550 appears to be working properly and I would not expect it to transmit a message as long as register h10D0 equals 0x0.

    How is the processor configured to continually provide messages to the TCAN4550 TX Buffers and setting the associated TXBAR bits?  Can you verify this is working as expected?

    Regards,

    Jonathan

  • Hi Jonathan,

    I read the register value 01C0 and received the value 0x7000D34. This indicates TX FIFO queue size is 7. The TX configuration register indicates that transmit FIFO/Queue size can go upto 32, but whenever I am setting it more than 7 I'm getting a kernel panic error. 

    Any idea what is causing this issue?

    Regards,

    Akshay 

  • Hi Akshay,

    While I'm not exactly sure how the Linux driver does error checking, I suspect that your MRAM settings have exceeded the available Memory space available in the the device. 

    The TCAN4550 only has 2kb of MRAM memory that can be allocated and partitioned per your requirements.  Some applications may require more RX FIFO space, while other applications may require more TX FIFO space.  Or some may require a lot of RX Message Filter elements to allow the processor to only receive and respond to specific messages. 

    It is true that the TCAN4550 can support up to 32 buffer elements and a TX Event FIFO that support up to 32 elements to keep track of successful message transmissions. There are also two RX FIFOs that can each support up to 64 buffer elements in addition to another 64 dedicated RX Buffer elements.  The device can also have up to128 of Standard ID (SID) and 64 Extended ID (XID) Filter Elements.

    But, the device is not capable of supporting the maximum number of all of these elements at the same time and you as a user must partition the MRAM space and configure the device such that the allocation is not overlapping or exceeds the available space.  There is no error checking in the TCAN4550 to verify your configuration, so you must ensure the configuration is correct.  The TCAN4550 would need to have >17kb of memory to support the maximum configuration for each of these types of elements.  Since it only has 2kb, you will need to keep this in mind when doing your configuration.

    Each of these types of MRAM elements has a Start Address, number of elements or buffers, and the TX and RX buffer elements have a data field size value as well so that you don't waste 64bytes of memory if you are only using a 8byte data payload as an example.

    I believe the Linux driver will abstract this and handle the memory allocation for you based the number of elements you want to use.  But you check to make sure it has done this properly.  If the memory sections overlap, or if the memory exceeds the highest address available, the device will wrap around to the beginning of the MRAM memory and it can overwrite data allocated to other types of elements.

    You should verify and optimize your MRAM allocation and see if the error goes away.  Some things to look for are:

    • RX FIFO buffer size should be large enough to keep up with your incoming message rate, but not excessive. 
    • RX and TX Buffer data field size should be set to support the maximum size contained in a message it will receive, which may not be a 64bytes.  Using 64bytes would ensure they can support any CAN FD message, but this allocates 64bytes of memory for each buffer element in addition to the bytes needed for the message header in each element.
    • If you are only using one of the RX FIFOs, check that both of them are not configured.
    • If you are not using the TX Event FIFO, is it allocated memory that can be freed up?
    • etc.

    Additional details on each of these types of MRAM elements can be found in the MCAN User's Manual.  This is the same link I shared with you earlier in this thread when discussing different topics.  But section 2.4.1 of this manual discusses the Message RAM (MRAM) allocation and is a good resource.

    Regards,

    Jonathan

  • Hi Jonathan,

    Your reply was very helpful for us.

    Initially we were using the bosch,mram-cfg = <0x0 3 2 32 10 1 32 7> . Now we are using bosch,mram-cfg = <0x0 3 2 32 10 0 26 12>. By using this we were able to transmit and receive simultaneously for a long time with less interframe delay between packets but still we are facing the issue of stopping the transmission after a long time. Also noticed the data loss.

    When we calculated the total size of the MRAM used by new configuration is only 0.54 Kb out of 2 Kb. When I am increasing Tx Buffers size to more than 12, transmission is getting stopped eventually. Similarly when I am increasing RX FIFO 1 also. Can you please inform us what it is blocking. We are planning to configure almost 90% of MRAM for the better performance.

    Regards,

    Akshay 

  • Akshay,

    Thank you for the continued patience while engineers are out for US holiday.

    Regards,

    Eric Hackett 

  • Hi Akshay,

    When the communication stops, are you still getting the kernel panic error as before?  Have you also checked the TXBAR register 0x10D0 to see if the processor has set any TX Buffers to transmit?  I ask because previously the TXBAR register value returned was all 0's indicating the TCAN4550 had no TX buffers enabled to transmit. As long as the TCAN4550 has enabled TX buffers in the TXBAR register, and the device is not in a Bus Off state, the TCAN4550 will try to transmit the messages following the rules of CAN arbitration.

    I am suspecting that there might be something in the application code that is stopping the processor from enabling the TX Buffer bits in the TXBAR register after it has received an error.  Can you confirm the processor will continue to try to send new messages after it has received any errors and also share the TXBAR 0x10D0 register value to make sure that it is a non-zero value?

    If the TXBAR register does not have any set bits after the communication has stopped, then this is looks like an issue with the application software issue.  Checking the TXBAR register should confirm whether this is a device or application software issue.

    Regards,

    Jonathan

  • Hi Jonathan,

    The TXBAR register value is 0.

    Apart from this we have few queries which are listed below:
    * What is the feasible delay during the simulataneous transmission between 2 nodes?
    * While transmitting between 2 nodes simulataneously if the transmission stops unexpectedly, is there any method of error indication which might help us to restart the transmission again?

    Regards,
    Akshay Naik

  • Hi Akshay,

    If the TXBAR register value is 0, the TCAN4550 has no TX buffer messages enabled for transmission.  For some reason, the processor has stopped trying to send new messages.  This is not a TCAN4550 device issue, but an issue in the processor code, to which I can't be of much help with determining why this is occurring.

    * What is the feasible delay during the simulataneous transmission between 2 nodes?

    I am not exactly sure how to interpret this question.

    The CAN protocol defines the time between messages on a CAN bus as the "Interframe Space" (IFS) which is equal to 3 Nominal Bit Times.  When two nodes are trying to transmit simultaneously, one node will win arbitration and get to transmit its message, and the other node will have to wait for 3 bit times after the completion of the first node's message completion before it can transmit its message.

    However, I think you might be trying to ask me how much delay does a processor need between sending repeated messages to avoid a transmit or receive overflow event generating some form of error or missed messages.  If so, this will vary based on several factors such as:

    • The length of the message (length of data field could be 0-64bytes)
    • The SPI data rate, the intra-word spacing or idle time in the SPI communication such as if the SPI driver breaks the SPI words into smaller 8, 16, or 32 bit sequences with small delays in between these sequences
    • The time between the the chip select signal going low and the start of the SPI data, and the time following the end of the SPI data and the chip select going high again.
    • The number of words in a SPI read or write.  Some SPI drivers use single word SPI write and reads that requires an a 32 bit word containing the address of the register or MRAM memory cell to access.  However, the TCAN4550 supports burst write and reads of consecutive registers or MRAM memory locations in a single SPI sequence set by the "Length" field in the SPI header and the address of the first register or MRAM memory location to access.  This burst method reduces the overall SPI communication by elimination all address words except for the first one and can reduce the SPI time by almost 50%.  Many SPI drivers have been written to support burst mode, but the Linux driver you are currently using may only support the single word method.
    • The processor's overhead time needed to generate a message, receive a message, check and respond to interrupts
    • The Processor's execution and clock speed
    • etc.

    Determining the speed for your system is easier done by measuring it with a scope or logic analyzer on the SPI signals.  Send a single message and capture all of the SPI data transmitted for that message.  This will give you a base time for how long it takes to transmit a single message.  You can then try to send two consecutive messages from the same node to measure any delay from the processor or overhead time, that will need to be accounted for. Likewise you should repeat the process for receiving a message as well.

    This should give more accurate timing for your system that you can use to calculate the maximum message rate you can achieve in your system and how much of a delay between messages you will need.  This will also help point to some areas of improvement that can be made if the times appear longer than expected.

    While transmitting between 2 nodes simulataneously if the transmission stops unexpectedly, is there any method of error indication which might help us to restart the transmission again?

    I'm not aware of an error indication that can be used for this purpose, but there may be some things that could be done.

    The first idea is to monitor some of the other TX Buffer related registers.  When a new message is enabled for transmission by setting the TXBAR bit, the device will set the corresponding TX Buffer Request Pending bit in the TXBRP (0x10CC) register.  Once the transmission is successfully completed, these bits will be cleared and the device will set the TX Buffer Transmission Occurred TXBTO (0x10D8) register.  periodically reading these register values will allow you to know the state of the message in the transmission process as well as if there are NO messages pending transmission.  If there are no pending messages, this may be an indication of your stopped transmission error condition.

    There is a TX Buffer Transmission Interrupt Enable TXBTIE register (0x10E0) and a TX Buffer Cancellation Finished Interrupt Enable TXBCIE register (0x10E4) that can be used to inform the processor when particular messages have transmitted.  Absence of new interrupts could be an indication the transmission has stopped, or each message could be tracked on a one-for-one basis.

    The TX Event FIFO that can be configured to track transmission events and requires an acknowledgement to clear.  Absence of new TX Event FIFO events could be an indication the transmission has stopped, or each message could be tracked on a one-for-one basis.

    You previously mentioned that you were using the watchdog timer for the RX FIFO monitoring, but instead you could use the watchdog timer to monitor the transmission.  If the watchdog was cleared every time the processor tried to send a message, you could get a watchdog timeout if the processor stops trying to transmit messages and then you could use that as an indication to start transmitting again.  This might be a better option for the watchdog.

    Regards,

    Jonathan

  • Hi Jonathan,

    Thanks for the detailed explanation. We will check on that.

    Meanwhile, is there any update on the updated driver. Could you please let me know by when it will be ready?

    Regards,

    Akshay Naik

  • Hi Akshay,

    I know there are still tests being run on the current version to verify the updates show an improvement to the max bus load that can be achieved and is stable, but I don't know the whether it is available for distribution yet. 

    I will check on the latest status for you.

    Regards,

    Jonathan

  • Hi Jonathan,

    Once the testing of the updated driver is completed, you can share us the final version of the driver. Meanwhile, is it possible to share the updated driver while it is being tested, so that we can test it? 

    Regards,

    Akshay

  • Hi Akshay,

    Our next update with the contract developers is on Monday.  I will check with them as to what is possible to share. 

    Regards,

    Jonathan

  • Hi Jonathan,

    We were going through few of the links regarding the TCAN and MRAM configurations and we need clarifications on few queries which are listed below:

    1) MRAM Configurations:

        * The MRAM configurations has a width of 32bits and can be configured up to 4352 words as shown in below image:

        * But according to the TCAN4550 datasheet it is mentioned that 2K bytes of MRAM is fully configurable for TX/RX buffer/FIFO as shown in below image.             But if we observe the previous image the size can be more than that for TX/RX Buffer/FIFO. Could you please explain on that?

    The link for the M_CAN User Manual is provided for your reference:
    M_CAN UM : https://www.bosch-semiconductors.com/media/ip_modules/pdf_2/m_can/mcan_users_manual_v330.pdf

    2) TX Event FiFO:

         * Also could you please provide us some information on TX event FIFO which is mentioned in the M_CAN UM(Image 1) but has not been informed in             the TCAN datasheet as shown in the below image:

         * We were also going through another TI link mentioned below which had mentioned about TX Event FiFO. So could you please let us know which document to refer and also provide us some more information on TX Event FIFO.

    TI Link : www.ti.com/.../sllu270.pdf

    3) Documents

          * Please share us the link/document which might give us better information on Watermark level, Watchdog timers and regarding the MRAM                               configurations.

    Regards,

    Akshay

  • Hi Akshay,

    First off, our driver status update meeting was pushed until Wednesday of this week, so I don't have an update yet on the driver status.

    1) MRAM Configurations:

    Regarding the MRAM configuration questions, the simple answer is the MCAN IP is capable of handling a larger memory block than what is embedded in the TCAN4550.

    The MCAN CAN FD Controller IP was developed by Bosch and licensed to use by TI.  It is commonly embedded into microprocessors, FPGAs, etc. with much larger memory blocks to work with.  The TCAN4550 is a much smaller device and could only support 2k of memory.  Therefore the MCAN IP is capable of supporting addresses of more memory elements of each type than can be configured in the TCAN4550's memory block.

    There is also no error checking in the TCAN4550 to check for overlapping blocks of and total size.  You will need to verify the configuration is correct, otherwise some memory cells could be overwritten unexpectedly and result in errors that are difficult to debug.  Also note, that if the MRAM configuration exceeds the 2k block, the addresses automatically wrap around to the beginning of the block.

    There are no limitations or requirements on how you configure the 2k block for the various TX/RX/Event buffer/FIFO and message ID filter elements, etc.  But it must fit within the 2k block.

    2) TX Event FiFO:

    The TCAN4550 used the MCAN IP developed by Bosch and therefore the MCAN User's Manual should be treated as supplemental information to the information found in the datasheet and documentation published by TI.  TI references this MCAN information and duplicates some of the more important key pieces in it's own publications, but it was not practical or possible to duplicate all of the information into the TCAN4550 datasheet.

    TI did not make any modifications to the MCAN registers and therefore the TCAN4550 datasheet and MCAN User's Manual compliment each other.  The only difference is that in the TCAN4550, the MCAN register addresses were given an offset of 0x1000.  For example, the CCCR (Control) register has an address of 0x18 in the MCAN User's Manual, and an address of 0x1018 in the TCAN4550 device.

    The MCAN information in the TI documentation is more focused on getting basic communication and settings configured properly.  However Event handling is a higher level aspect of the application that may or may not be used in an application.  Many applications will not try to track the status and log a timestamp for each message successfully transmitted on the bus, but some applications may want to do that. 

    I would refer you to section 3.5.8 of Bosch's MCAN User's Manual for specific details on the TX Event Handling.  I think this document is pretty clear and if the TX Event FIFO is used, the device will create a TX Event FIFO Element with the details of every successfully transmitted message which includes the message ID, message marker, and a timestamp.  These TX Event FIFO Elements are outlined in section 2.4.4.

    Some reasons to use this would be to track whether lower priority messages are being transmitted on a busy bus where they are likely to lose arbitration to higher priority messages. 

    3) Documents

    I think you have seen the three most import and relevant documents. But we also have application notes on the watchdog and clock optimization that are good resources.

    The TCAN4550-Q1 datasheet: https://www.ti.com/lit/gpn/tcan4550-q1

    The TCAN4x5x Software User's Guide: https://www.ti.com/lit/pdf/sllu270

    M_CAN User's Manual: https://www.bosch-semiconductors.com/media/ip_modules/pdf_2/m_can/mcan_users_manual_v330.pdf

    TCAN4550 Watchdog Configuration Guide: https://www.ti.com/lit/pdf/slla455

    TCAN455x Clock Optimization and Design Guidelines: https://www.ti.com/lit/pdf/slla549

    Regards,

    Jonathan

  • Hi Akshay,

    I have an update on the Linux driver, the updates for the TCAN4550 device are complete and we have some patch files to the existing driver that I can share with you if you want to apply the patches yourself. 

    There are a few additional updates being made to the driver to make it compatible with some new devices we still have in development, and these updates are expected to run through the end of February. 

    Once these are complete, all the updates will be upstreamed and available to the Linux community. 

    If you are interested in the patch files, let me know and I can send them to you.

    Regards,

    Jonathan

  • Hi Jonathan,

    Thanks for the update Jonathan.

    You can share the updated driver to us. We will test it out and let you know on the outcomes.

    Regards,

    Akshay Naik

  • Hi Akshay,

    I just sent you an email with the patch files.  Hopefully they will work for you

    Regards,

    Jonathan

  • Hi Jonathan,

    Thank you for the patch files. We are updating it in our driver and we will let you know once we have the test results.

    Had few queries regarding the interrupt timer which are listed below:

    • We believe that watchdog available timeouts are 60ms, 600ms, 3sec or 6sec. Is there any possibility of setting it below than that?
    • And in the provided patch you have mentioned about 
      • rx-usecs-irq : Number of microseconds to wait before triggering the interrupt handler manually if the number of received frames is less than rx-frames-irq. A small number will cause more interrupt handler runs. A large number will increase latency of single frames being received.
      • tx-usecs-irq : Same as rx-usecs-irq just in transmit direction.
    • Is the rx-usecs-irq, tx-usecs-irq and watchdog timers same? Because according to the explanation it looks that the rx-usecs-irq or tx-usecs-irq has the same functionality as of watchdog timer. If it is not similar could you please brief us on these timers?
    • But we also observe that in the command which you had mentioned "ethtool -C can0 tx-usecs-irq 40000 rx-usecs-irq 40000 rx-frames-irq 6 tx-frames-irq 10" it seems that the rx-usecs-irq is set to 40000 that is 40ms which is lesser than that of watchdog timer. 

    Kindly let us know on if there is something which we are missing out.

    Regards,

    Akshay 

  • Hi Akshay,

    We believe that watchdog available timeouts are 60ms, 600ms, 3sec or 6sec. Is there any possibility of setting it below than that?

    No, 60ms is the shortest watchdog timeout setting available in the configuration register.

    • And in the provided patch you have mentioned about 
      • rx-usecs-irq : Number of microseconds to wait before triggering the interrupt handler manually if the number of received frames is less than rx-frames-irq. A small number will cause more interrupt handler runs. A large number will increase latency of single frames being received.
      • tx-usecs-irq : Same as rx-usecs-irq just in transmit direction.
    • Is the rx-usecs-irq, tx-usecs-irq and watchdog timers same? Because according to the explanation it looks that the rx-usecs-irq or tx-usecs-irq has the same functionality as of watchdog timer. If it is not similar could you please brief us on these timers?
    • But we also observe that in the command which you had mentioned "ethtool -C can0 tx-usecs-irq 40000 rx-usecs-irq 40000 rx-frames-irq 6 tx-frames-irq 10" it seems that the rx-usecs-irq is set to 40000 that is 40ms which is lesser than that of watchdog timer. 

    First note that these notes you are referencing came from the contractor developing the firmware.  The rx-usecs-irq and tx-usecs-irq are not TCAN4550 device level timers like the watchdog timer is and are instead timers implemented in the driver firmware itself.

    In an effort to optimize the amount of SPI traffic and improve overall efficiency, I believe these timers were added to address the following situation.

    Using the FIFO watermark thresholds in the TCAN4550 as the trigger for processor's IRQ to communicate with the TCAN4550 through SPI instead of having an interrupt generated for every single transmitted and received message can reduce the overall amount of SPI traffic because a large number of interrupts can be generated and keep the processor busy.  However, if the number of FIFO elements does not reach the watermark level, the processor could have to wait a long time for additional messages to be transmitted or received and trigger the IRQ to check the FIFOs.  This is not desirable.

    To prevent this, a manual triggering of the IRQ could be done to force a read of the TX event and RX FIFOs should be initiated periodically to check for messages in the FIFOs that are below the watermark limit.  However, how frequently this occurs also has an impact on the overall SPI efficiency because it requires SPI traffic to check for new elements.

    The specific notes from the developer on the rx-usecs-irq and tx-usecs-irq simply indicate how long the processor should wait before manually checking the FIFOs.  A "short" time will result in a lot of additional SPI traffic due to manually checking them more frequently, and a "long" time will result in a longer latency between the time the message was received in the FIFO and the time it is finally read back by the processor after manually checking the FIFOs.

    Setting a balance between the Watermark levels and these manual timers needs to be made in order to reduce the overall SPI traffic generated by interrupts and also not waiting too long for messages to be received from the FIFO when the watermark level is not yet reached.

    Regards,

    Jonathan