This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5726: USB driver performance issue

Part Number: AM5726

Tool/software: Linux

Champs,

In customer's design a device connects to AM572x's host USB port via USB 2.0 high speed connection using USB bulk IN endpoint. The device has ~1 KB of data to send to AM572x approximately every 30 us.

In high speed mode host issues Start of Frame packet every 125 us. Using USB protocol analyzer the customer has observed that when 1KB has been transferred and 4 IN packets issued by AM572x are responded with NAKs, the AM572x stops polling until the next Start Of Frame. This results in about 72% of the available USB bandwidth going to waste. 

Customer also verified the performance by connecting the same device to Intel Baytrail based SBC and found out that it actually continues polling after NAKs ~every 20 us resulting in much better data throughput.

Is there a way to optimize TI's driver to be able to achieve better performance in such scenario?

thanks

Michael

  • Please post the SDK version used.
  • Michael,

    All xHCI controllers use the same xHCI driver in Linux kernel, which might behave slightly differently in different kernel versions due to the driver change. TI doesn't modify the Linux xHCI driver.

    In addition to the SDK version which Biser requested above, please provide the following information.
    - The USB device descriptor, you can get it from command 'lsusb -v -d <vid:pid>
    - Does Linux have native driver support to this USB device? if so what driver it is? If not, where the 3rd party driver comes from?
    - Please provide the USB protocol analyzer trace you captured.
    - Do you see the same behavior when reading data from a USB thumb drive?
  • Bin,

    I have sent you all these details on separate email.

    thanks

    Michael

  • Michael,

    Thanks for the response in the email. I learned the USB device uses customized protocol and its host driver is in userspace using openusb library. The customer also tried to test the USB device on AM57x IDK with Processor SDK v4.2.0.9 but its host port doesn't recognize any USB device.

    Accidentally the IDK USB port is not functional in SDK v4.2.0.9 due to a sw bug in the device tree configuration. The customer could try to use SDK v4.1, but I don't think testing on IDK brings any value in the investigation.

    I checked the bus protocol trace you sent in the email. The majority of the IN-NAK transactions between SOFs are 3 pairs, a few are 4 or 10 pairs, but the idle gap between the last NAK to SOF is 85~100us, I understand this idle causes the USB device losses data in its fifo.

    Since the customer also doesn't see similar idle gap with USB thumb drives, I suspect the problem sits in the openusb library. Unfortunately I cannot where exact the problem is, since I don't use this lib. If the customer is familiar with openusb, they can check its source code to see if there is any timeout setup/parameter in bulk transfer API.

    Here are recommendations on how to debug this issue.

    - the customer moves to the community v4.14 kernel.
    kernel v4.14 added xhci driver runtime tracepoints which gives much more runtime information about every USB transaction.

    - compare apple to apple.
    This issue is likely caused by software, so when compare with other platform, please ensure all the software has exact the same version, the kernel, the openusb library.

    - Once the customer confirms with exact the same software version, the issue exists on AM57x but not on Intel Baytrail, I will explain how to enable the kernel tracepoints debug feature and get xhci runtime log.
  • Bin,

    I cycled back with the customer, they are compiling libopenusb from sources for both AM57x IDK and Baytrail, so these are identical. Two questions:

    - Could you please provide some guidance as to what the device tree bug is in the 4.2.0.9 SDK and how to fix it

    - Could you please provide some guidance as to how to build 4.14 kernel and bring it up on the IDK as well as how to enable the debug/tracing features?

    thanks!

    Michael

  • the linux kernel used on Baytrail is Linux minnowturbot 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Michael,

    The bug is caused by commit e16c3535b0e9 ("arm: dts: add cpts 1pps latch input pins") which changes USB1_DRVVBUS pinmux (address 0x3684) from mode0 to mode7/15, which causes USB not functional. The fix is to revert this patch or remove 0x3684 pinmux from DTS.

    I have my own setup to build kernels, I will translate it to a procedure which the customer can follow to build v4.14, post it here later.

    Please also ask the customer to build the same v4.14 kernel for the Baytrail board as well. the xHCI driver behaves slightly different in each kernel version.
  • Michael,

    Here is how to build community kernel v4.14 for AM57x.

    $ git checkout v4.14.23
    $ export ARCH=arm
    $ export CROSS_COMPILE=<your toolchain>
    $ make omap2plus_defconfig
    $ make -s -j8 zImage modules dtbs

    Kernel .config generated will have "CONFIG_FTRACE=y" which enables tracepoints for xhci runtime log. Here is how I get a log.

    (UART is slow for dump runtime logs, so use ethernet port)
    $ telnet <ipaddr of am57x board>
    # cd /sys/kernel/debug/tracing/
    # echo 1 > events/xhci-hcd/enable
    # cat trace_pipe
  • Michael,

    I reviewed the information you sent in the email, but didn't find anything obvious.

    I then wrote a test code (attached below) to read 16KB from g_zero gadget connected to AM57x, and didn't observe the issue either, (the bus trace is attached below too), AM57x doesn't stop sending IN tokens even g_zero gadget NAK'd it.

    Can you please ask the customer to explain how their project uses libusb bulk transfer API?

    /* bulk read from g_zero */
    /* apt-get install libusb-1.0-0-dev */
    /* gcc -o bulkperf_x86 bulkperf.c `pkg-config libusb-1.0 --libs --cflags` -lpthread -lrt */
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <signal.h>
    #include <getopt.h>
    #include <libusb.h>
    #include <sys/time.h>
    #include <sys/resource.h>
    
    static int do_exit = 0;
    
    static void sighandler(int signum)
    {
    	do_exit = 1;
    }
    
    struct param {
    	int		vid;
    	int		pid;
    	int		dir_w;
    	unsigned long	n_loops;
    	int		pktsz;
    	int		timeout;  /* in ms */
    	libusb_device_handle *h;
    	char	       *buf;
    } DEFAULT_PARAM = {
    	.vid = 0x0525,
    	.pid = 0xa4a0,
    	.dir_w = 0,
    	.n_loops = 3,
    	.pktsz = 16384,
    	.timeout = 3000,
    	.h = NULL,
    	.buf = NULL,
    };
    
    int main(int argc, char *argv[])
    {
    	struct sigaction sigact;
    	libusb_context *ctx = NULL;
    	libusb_device_handle *dev_handle;
    	unsigned long loop = 0;
    	ssize_t i;
    	int r;
    	struct param _p = DEFAULT_PARAM;
    	int actual;
    	int ep;
    
    	if ((_p.buf = malloc(_p.pktsz)) == NULL) {
    		printf("mem alloc failed\n");
    		return 3;
    	}
    	memset(_p.buf, '\0', _p.pktsz);
    
    	r = libusb_init(&ctx);
    	if (r < 0) {
    		printf("Error: libusb_init() failed\n");
    		free(_p.buf);
    		return 1;
    	}
    
    	libusb_set_debug(ctx, 3);
    
    	dev_handle = libusb_open_device_with_vid_pid(ctx, _p.vid, _p.pid);
    	if (dev_handle == NULL) {
    		printf("Cannot open device\n");
    		goto out1;
    	}
    	_p.h = dev_handle;
    
    	printf("Device Opened\n");
    
    	if (libusb_kernel_driver_active(dev_handle, 0) == 1) {
    		printf("Kernel Driver Active\n");
    		if (libusb_detach_kernel_driver(dev_handle, 0))
    			goto out2;
    		printf("Kernel Driver Detached!\n");
    	}
    
    	r = libusb_claim_interface(dev_handle, 0);
    	if (r < 0) {
    		printf("Cannot Claim Interface\n");
    		goto out2;
    	}
    
    	sigact.sa_handler = sighandler;
    	sigemptyset(&sigact.sa_mask);
    	sigact.sa_flags = 0;
    	sigaction(SIGINT, &sigact, NULL);
    	sigaction(SIGTERM, &sigact, NULL);
    	sigaction(SIGQUIT, &sigact, NULL);
    
    	printf("Claimed Interface; Start transfer...\n");
    
    	ep = _p.dir_w ? (1 | LIBUSB_ENDPOINT_OUT) : (1 | LIBUSB_ENDPOINT_IN);
    	while (!do_exit && loop < _p.n_loops) {
    		r = libusb_bulk_transfer(dev_handle, ep, _p.buf,
    				_p.pktsz, &actual, _p.timeout);
    
    		if (r) {
    			printf("loop %ld: %s failed [%d]\n", loop,
    					_p.dir_w ? "Write" : "Read", r);
    			do_exit = 1;
    			continue;
    		}
    
    		loop++;
    	}
    
    	r = libusb_release_interface(dev_handle, 0);
    	if (r)
    		printf("Cannot Release Interface\n");
    out2:
    	libusb_close(dev_handle);
    out1:
    	libusb_exit(ctx);
    	free(_p.buf);
    	return 0;
    }
    

    am57x-read-gzero.zip

  • Bin,

    I have sent you  these details in a separate email.

    Thanks,

    Nick

  • Nick,

    When calling libusb_bulk_transfer(), what is the value of length and timeoutVal parameters passed in?

    Does the application print out any abnormal debug message during usb transfers?
  • The application calls libusb_bulk_transfer() with a timeout value of 0 (wait forever), and with a 2 MB block length.
    The application does not print out any unusual messages when transferring data over the bulk endpoints.
  • Is the usb device directly connected to AM57x USB host port or through a usb hub?
  • The USB Hub USB2512B-AEZG-TR is used. I will email you the customer schematic.
  • Nick,

    Thanks for the quick response. I don't need the schematic at this moment.
    I just reproduced the issue with a hub, (didn't see it before without a hub), so asked.
    I started to look why a hub causes the scheduling difference.
  • Bin,

    Some models connect the AM57xx USB host port directly to the external USB connector, and other models use the 2512 hub in between.
    We get the same slow scheduling performance in either case.

  • Nick,

    Now I understand the issue is not introduced by adding an usb hub, but by adding an interrupt endpoint which comes with the hub. I created a custom usb device which has both bulk and interrupt endpoints, now I see the issue without a hub. The customer's usb device has an interrupt endpoint, so they face the issue also without a hub.

    We will continue investigate the root cause, and will keep you posted.
  • Bin, do you have any update for this? customer's asking
    thanks
    Michael
  • Michael,

    We are still working on it. It might take some time but we don't have an estimate yet.
  • Michael,

    We are still checking if there is anything can be done to improve the USB bus utilization in this use case. But please note that the root cause is actually the USB printer.

    We know the symptom is that the USB host stops sending IN tokens too early in Bulk transfers which poorly utilizes the bus bandwidth. However, this behavior doesn't violate the USB Specs, as the Specs don't define flow control in Bulk IN transfers. If a USB device (the printer in this case) requires the host to read USB data in certain pace, it shouldn't use Bulk transfers which doesn't guarantee the bus bandwidth. Isoch transfer probably is a better choice for the device.

    So please ask the customer to improve the printer firmware to solve the issue.