This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3358: DCAN Rx frame loss

Part Number: AM3358

Hi,

We're using the Linux DCAN driver kernel version 4.18 and are seeing several receive frame losses. Here is a statistics output:

# ip -details -statistics link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/can  promiscuity 435268
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
      bitrate 250000 sample-point 0.875
      tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
      c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
      clock 24000000
      re-started bus-errors arbit-lost error-warn error-pass bus-off
      0          0          0          3          3          0         numtxqueues 435300 gso_max_size 435364 gso_max_segs 435400
    RX: bytes  packets  errors  dropped overrun mcast   
    3086416    385802   1246    0       1246    0       
    TX: bytes  packets  errors  dropped carrier collsns
    20         5        0       0       0       0  

The latest Linux CAN driver only supports 32 message objects (16 for RX) which is not the optimum the processor can support so I pulled or cherry-picked a couple of outstanding commits that supports 64 message objects (not mainlined yet). This reduced the number of frame losses but there are still overruns occurring:

AFTER (64 CAN msg objects; 32 RX/32 TX):
=======================================

# ip -details -statistics link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/can  promiscuity 435268
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 250000 sample-point 0.875
          tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
          c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
          clock 24000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          2          2          0         numtxqueues 435300 gso_max_size 435364 gso_max_segs 435400
    RX: bytes  packets  errors  dropped overrun mcast   
    3123480    390435   41      0       41      0       
    TX: bytes  packets  errors  dropped carrier collsns
    20         5        0       0       0       0

Increasing the number of RX message objects further reduced the overruns but noticeably slowed down the system (e.g., console is almost unresponsive, Bluetooth throughput decreased). 

I'm wondering if someone can help us resolve this.

Thanks in advance,

Elenita

  • Hi,

    It sounds like you are up against a tyranny of numbers problem since when you increased the msg objects and there was a corresponding reduction in drop count.  The DCAN driver that is in the TI SDK was more or less written by the DCAN community. TI added a few contributions but this is mostly a DCAN community driver.

    I will ask a few questions but it looks like the CAN BUS is simply overloaded. 

    - Which TI SDK did you start with?

    - What file system are you using? Are you using the TI example from the SDK?

    - What is the clock speed of the system? Along the same line, are you using the ondemand governor for cpufreq?

    - Does reducing the CAN bit rate cause issues with the system design?

    Best Regards,

    Schuyler

  • Hi Schuyler,


    Thanks for the reply. 


    >> - Which TI SDK did you start with?

    None. We jumped directly to using the buildroot build environment and Linux kernel/drivers.


    >> - What file system are you using? Are you using the TI example from the SDK?

    We're using EXT4 for the file system. No, we're not using any TI example from the SDK.


    >> - What is the clock speed of the system? Along the same line, are you using the ondemand governor for cpufreq?

    We are using the ondemand governor and CPU scales between 300MHz and 1GHz.


    >> - Does reducing the CAN bit rate cause issues with the system design?

    We have to use the 250K bit rate and some equipment can run up to 500K. The product won't work if at lower bit rates so yes, reducing the bit rate will have issues with the system design.


    --elenita

  • Hi Elenita,

    I apologize for the delay in getting back to you. Thank you for the answers to the questions.

    While I have seen customers use buildroot I have not used it so I cannot compare and contrast against the TI SDK. I will say that the TI SDK user space is ubuntu like in that it is like a desktop that allows for demos and perhaps a little bit faster development. With a desktop comes a lot of background daemons that will impact network communications. So one suggestion I have is to look to see if buildroot has added background daemons you don't need.

    The reason I asked about ondemand is that sometimes I have seen issues as the processor ramps/responds to the new load. As a test I would set the governor to performance and see if that helps.

    Other than those 2 items that is all I can suggest or think of to help with this problem. 

    Best Regards,

    Schuyler

  • Thanks, Schuyler. I will try this and get back to you.

    --elenita

  • Schuyler,  setting the governor to 'performance' AND picking up the '64 message objects support' patches (not merged to Linux mainline yet as of this writing) resolved the issue.