This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

am3352: Ethernet performance

Part Number: AM3352
Other Parts Discussed in Thread: AM3352

Hi, 

Am3352 based CPU system is used on custom design. It is very similar to AM335X reference design provided by TI. Recently I have met some problems when doing iperf test. PC functions as iperf server. Am3352 based CPU system functions as client. By the way I am running linux system based on SDK08.00.00.00. After some times running this test, the load of CPU is 100%. Then we can see throughput of iperf sharply comes down.

 

i do not think this is caused by low performance of AM3352. Maybe I miss something. Any advice will be appreciated. By the way I am using default setting about system control in kernel.

Best Regards

Yang

  • Hi,

    Please follow this checklist: processors.wiki.ti.com/.../5x_CPSW and post the results here.
    Note that SDK 8.0 is very old and it's strongly recommended that you move to the latest SDK, which can be downloaded here: software-dl.ti.com/.../index_FDS.html
  • Hi, Biser

    It is great. I have done the iperf test using the updated kernel. It works!

    Thanks
    Yang
  • Hi, Yang

    Nice to see that your problem have been solved. We also use AM3352 Linux SDKv8.0 and meet the same problem. the CPU uitilization is 100% when I use vsftpd and iperf to test network. I use perf to analyze the vsftpd process's CPU uitilization. the perf's report is below. It seems that cpdma_chan_submit() in davinci_cpdma.c occupy a lot of cpu time.

    1. Can you tell me which SDK and kernel version fix this problem?
    2. My QQ: 398382433. Can you add me? I will be happy to communicate with you.

    # To display the perf.data header info, please use --header/--header-only options.
    #
    # Samples: 511K of event 'cycles'
    # Event count (approx.): 2986068112
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. .......................................
    #
    14.11% vsftpd [kernel.kallsyms] [k] cpdma_chan_submit
    5.96% vsftpd [kernel.kallsyms] [k] __irq_put_desc_unlock
    5.54% vsftpd [kernel.kallsyms] [k] csum_partial_copy_from_user
    2.84% vsftpd [kernel.kallsyms] [k] __do_softirq
    2.39% vsftpd [kernel.kallsyms] [k] tcp_transmit_skb
    2.18% vsftpd [kernel.kallsyms] [k] mod_timer
  • Acutally the version after SDK08.00.00.00 is all OK. I am using SDK.01.00.00.03. Hope you can fix your problem.
  • Thanks very much. I will use SDK.01.00.00.03 to verify this problem.
    Are you chinese? Can you add my QQ? I work in Wuhan City.
  • Hi, Yang:

    The problem can be fixed When I use SDK 01.00.00.03. Also, I have found the patch below in AMSDK v8.0 cause the problem.
    But I do not understand why this patch about mailbox-wkup_m3 module in AMSDK v8.0 can effect the CPSW ethernet performance and cause the CPU utilization very high. and Which patches in SDK v01.00.00.03 fix this problem.

    =============================================
    commit 2e560903e75ef790fd0428dbb53f65a2ee1ad4c6
    Author: Dave Gerlach <d-gerlach@ti.com>
    Date: Wed Dec 10 04:18:16 2014 +0000

    mailbox/omap: Add ti,mbox-send-noirq quirk to fix AM33xx CPU Idle

    The mailbox framework controls the transmission queue and requires
    either its controller implementations or clients to run the state
    machine for the Tx queue. The OMAP mailbox controller uses a Tx-ready
    interrupt as the equivalent of a Tx-done interrupt to run this Tx
    queue state-machine.

    The WkupM3 processor on AM33xx and AM43xx SoCs is used to offload
    certain PM tasks, like doing the necessary operations for Device
    PM suspend/resume or for entering lower c-states during cpuidle.

    The CPUIdle on AM33xx requires the messages to be sent without
    having to trigger the Tx-ready interrupts, as the interrupt
    would immediately terminate the CPUIdle operation. Support for
    this has been added by introducing a DT quirk, "ti,mbox-send-noirq"
    and using it to modify the normal OMAP mailbox controller behavior
    on the sub-mailboxes used to communicate with the WkupM3 remote
    processor. This also requires the wkup_m3_ipc driver to adjust
    its mailbox usage logic to run the Tx state machine.

    NOTE:
    - AM43xx does not communicate with WkupM3 for CPU Idle, so is
    not affected by this behavior. But, it uses the same IPC driver
    for PM suspend/resume functionality, so requires the quirk as
    well, because of changes to the common wkup_m3_ipc driver.

    Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
    [s-anna@ti.com: revise logic and update comments/patch description]
    Signed-off-by: Suman Anna <s-anna@ti.com>

    Documentation/devicetree/bindings/mailbox/omap-mailbox.txt | 8 ++++++++
    drivers/mailbox/omap-mailbox.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++----
    2 files changed, 58 insertions(+), 4 deletions(-)

    =============================================
  • Hi,

    I am glad to see that your problem is fixed. About CPSW ethernet performance, sorry I don't walk deeply in this.

    BR

    Yang

  • Hi, Yang:

    I have found the follow "338a6906" commit in Processor-SDK-v01.00.00.03 fix this problem. Merging this patch to SDKv8.0 have fixed the problem. The comment of the commit have a detail explain for the reason.

    Thanks very much for your topic and reply!

    ================================================
    commit 338a690623badfc3bc3eb6ddc7302e03ab33e6ba
    Author: Dave Gerlach <d-gerlach@ti.com>
    Date: Thu Mar 26 13:41:47 2015 -0500

    ARM: OMAP2+: cpuidle33xx: Change wkup_m3 idle state to avoid MPU PLL bypass

    To get to C1-state in cpuidle am33xx currently uses the wkup_m3 to gate
    the MPU clock domain and bypass the MPU PLL. Because we are not shutting
    off the MPU power domain in this C-state it is possible for both to be
    awake and executing at the same time, and we do not have the ability
    to synchronize execution between the two in the cpuidle path due to
    no interrupt context.

    This leads to racy behavior between the two and in certain instances
    where c1-state is entered often during periods of high cpu activity it
    is possible for the MPU to interfere with the wake path on the wkup_m3,
    which prevents the MPU PLL from being re-locked and causing extreme
    system slowdown.

    Because of this, we must now use a different idle state available on the
    current wkup_m3 firmware (0x190) which does nothing more than put the
    MPU clockdomain to sleep, giving us the same power savings seen
    previously on the MPU power rail but slightly higher power on the
    MPU_PLL voltage rail.

    Signed-off-by: Dave Gerlach <d-gerlach@ti.com>

    arch/arm/mach-omap2/pm33xx.h | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

    ================================================
  • Hi,

    Good job. It is awesome. Thanks for your sharing.

    BR

    Yang