This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Another CPPI USB DMA issue

Hello,

I am using this post to reopen a previously opened post, which I mistakenly thought solved the issue. we still see the issue of DMA hang after applying the 5 patchs from TI (http://processors.wiki.ti.com/index.php/Sitara_Linux_MUSB_Issues#AMSDK_06.00.00.00

I am working a AM3356 based product which is connected to a USB modem through USB. The usb modem exports a composite usb interface composed from 2 CDC-ECM and 1 CDC-ACM. the Sitara is the USB host and the modem is the USB device.

we are using .SDK-06.00.00.00 with psp04.06.00.11 . the kernel is linux-3.2.0-psp04.06.00.11

during data transfers (around 2KB/S) we suddenly see the USB on the Sitara side stops and there is no way to recover except reboot. configuring the USB to use PIO instead of DMA seems to solve the issue, but we would like to use DMA since we expect much higher throughput (we hope to get ~60 Mbit/S)

while normal operation the regdeump shows:

USB (M)HDRC Register Du
FAddr : 00
Power : f0
Frame : 06df
Index : 0f
Testmode : 00
TxMaxPp : 0000
TxCSRp : 0000
RxMaxPp : 0000
RxCSR : 0000
RxCount : 0000
ConfigData : 33
DevCtl : 5d
MISC : 44
TxFIFOsz : 07
RxFIFOsz : 07
TxFIFOadd : 0780
RxFIFOadd : 0780
VControl : 00000000
HWVers : 0800
EPInfo : ff
RAMInfo : 0d
LinkInfo : 5c
VPLen : 3c
HS_EOF1 : 80
FS_EOF1 : 77
LS_EOF1 : 72
SOFT_RST : 00
DMA_CNTLch0 : 0000
DMA_ADDRch0 : 00000000
DMA_COUNTch0: 00000000
DMA_CNTLch1 : 0000
DMA_ADDRch1 : 00000000
DMA_COUNTch1: 00000000
DMA_CNTLch2 : 0000
DMA_ADDRch2 : 00000000
DMA_COUNTch2: 00000000
DMA_CNTLch3 : 0000
DMA_ADDRch3 : 00000000
DMA_COUNTch3: 00000000
DMA_CNTLch4 : 0000
DMA_ADDRch4 : 00000000
DMA_COUNTch4: 00000000
DMA_CNTLch5 : 0000
DMA_ADDRch5 : 00000000
DMA_COUNTch5: 00000000
DMA_CNTLch6 : 0000
DMA_ADDRch6 : 00000000
DMA_COUNTch6: 00000000
DMA_CNTLch7 : 0000
DMA_ADDRch7 : 00000000
DMA_COUNTch7: 00000000
#

and while the issues appears we see:

# cat regdump 
MUSB (M)HDRC Register Dump 
FAddr : 00 
Power : e0 
Frame : 051d 
Index : 0f 
Testmode : 00 
TxMaxPp : 0000 
TxCSRp : 0000 
RxMaxPp : 0000 
RxCSR : 0000 
RxCount : 0000 
ConfigData : 33 
DevCtl : 19 
MISC : 44 
TxFIFOsz : 07 
RxFIFOsz : 07 
TxFIFOadd : 0780 
RxFIFOadd : 0780 
VControl : 00000000 
HWVers : 0800 
EPInfo : ff 
RAMInfo : 0d 
LinkInfo : 5c 
VPLen : 3c 
HS_EOF1 : 80 
FS_EOF1 : 77 
LS_EOF1 : 72 
SOFT_RST : 00 
DMA_CNTLch0 : 0000 
DMA_ADDRch0 : 00000000 
DMA_COUNTch0: 00000000 
DMA_CNTLch1 : 0000 
DMA_ADDRch1 : 00000000 
DMA_COUNTch1: 00000000 
DMA_CNTLch2 : 0000 
DMA_ADDRch2 : 00000000 
DMA_COUNTch2: 00000000 
DMA_CNTLch3 : 0000 
DMA_ADDRch3 : 00000000 
DMA_COUNTch3: 00000000 
DMA_CNTLch4 : 0000 
DMA_ADDRch4 : 00000000 
DMA_COUNTch4: 00000000 
DMA_CNTLch5 : 0000 
DMA_ADDRch5 : 00000000 
DMA_COUNTch5: 00000000 
DMA_CNTLch6 : 0000 
DMA_ADDRch6 : 00000000 
DMA_COUNTch6: 00000000 
DMA_CNTLch7 : 0000 
DMA_ADDRch7 : 00000000 
DMA_COUNTch7: 00000000

could you please advise

  • Hi Bin,
    regarding the prints I added, I cannot recall exactly were I placed them since I already deleted this version, I can recall that I added them in the tx_complete function in u_ether.c

    another information I would like to share is that I was able to reproduce the issue using iperf only, I think this can help you to reproduce the issue on your side.
    The scenario is bidirectional UDP traffic running between our modem and the QCOM AP, the Sitara board sits between them and is using the custom composite driver I shared with you.
    On both AP and modem I run "iperf -u -s -i5 &"
    on the modem I run "iperf -u -c 192.168.16.151 -l1k -t3600" and on the QCOM AP I run "iperf -u -c 192.168.16.1 -l1k -t3600"

    The issue happens in around 60 seconds, when I disable DMA on the Sitara board, the test run to completion (1 hour).


    Thanks
  • Eilon,

    Eilon Eyal74533 said:
    regarding the prints I added, I cannot recall exactly were I placed them since I already deleted this version, I can recall that I added them in the tx_complete function in u_ether.c

    Thanks, this is all I need.

    Eilon Eyal74533 said:
    On both AP and modem I run "iperf -u -s -i5 &"

    Does this use only one ECM interface on AM335x and the modem? If so, do you think the following setup using two AM335x boards is equivalent?

    | PC host | <----> | usb0 * AM335x[1] * usb1 | <----> | usb0 * AM335x[2] |

    - AM335x[1] usb1 port is in host mode;

    - both AM335x[1] & [2] boards usb0 port is in gadget mode, loaded with vanilla g_cdc driver which has one ECM interface.

  • Eilon,

    Can you please do the following to capture a kernel debug log?

    - Please revert the couple patches I gave recently, only keep the patches listed on http://processors.wiki.ti.com/index.php/Sitara_Linux_MUSB_Issues#AMSDK_06.00.00.00;

    - Please apply the debug patch attached below, it helps to generate debug logs;

    - rebuild the kernel uImage with Ftrace enabled:

    In menuconfig: Kernel hacking  ---> [*] Tracers  ---> [*]   Kernel Function Tracer

    - boot the board;

    - Following the kernel doc 'Documentation/trace/ftrace.txt' to ensure kernel debugfs is mounted;

    - run 'echo 32000 > /sys/kernel/debug/tracing/buffer_size_kb' to set the ftrace buffer size to 32MB;

    - run the test until it fails;

    - run 'cat /sys/kernel/debug/tracing/trace' to dump the log to a file.

    Please let me know if the log is too big to attach in here, I will create a ftp entry for you to upload it.

    motorola-reg-tracing-2015-11-18.diff
    diff --git a/drivers/usb/musb/cppi41.h b/drivers/usb/musb/cppi41.h
    index b4fce04..065b55f 100644
    --- a/drivers/usb/musb/cppi41.h
    +++ b/drivers/usb/musb/cppi41.h
    @@ -536,9 +536,16 @@ struct cppi41_queue_obj {
     };
     
     static inline u32 cppi_readl(const void __iomem *addr)
    -	{ return readl(addr); }
    +{
    +	u32 val = readl(addr);
    +	trace_printk("cppi-readl %p: %08x\n", addr, val);
    +	return val;
    +}
     static inline void cppi_writel(u32 data, void __iomem *addr)
    -	{ writel(data, addr); }
    +{
    +	trace_printk("cppi-Write %p: %08x\n", addr, data);
    +	writel(data, addr);
    +}
     /**
      * cppi41_queue_mgr_init - CPPI 4.1 queue manager initialization.
      * @q_mgr:	the queue manager to initialize
    diff --git a/drivers/usb/musb/cppi41_dma.c b/drivers/usb/musb/cppi41_dma.c
    index 1db49ab..5e184fa 100644
    --- a/drivers/usb/musb/cppi41_dma.c
    +++ b/drivers/usb/musb/cppi41_dma.c
    @@ -520,6 +520,8 @@ static struct dma_channel *cppi41_channel_alloc(struct dma_controller
     
     	dev_dbg(musb->controller, "Allocated DMA %cx channel %d for EP%d\n", is_tx ? 'T' : 'R',
     	    ch_num, ep_num);
    +	if (is_tx)
    +		trace_printk("Allocated DMA Tx channel %d for EP%d\n", ch_num, ep_num);
     
     	return &cppi_ch->channel;
     }
    @@ -632,6 +634,9 @@ static unsigned cppi41_next_tx_segment(struct cppi41_channel *tx_ch)
     	dev_dbg(musb->controller, "TX DMA%u, %s, maxpkt %u, %u PDs, addr %#x, len %u\n",
     	    tx_ch->ch_num, tx_ch->dma_mode ? "accelerated" : "transparent",
     	    pkt_size, num_pds, tx_ch->start_addr + tx_ch->curr_offset, length);
    +	trace_printk("TX DMA%u, %s, maxpkt %u, %u PDs, addr %#x, len %u\n",
    +	    tx_ch->ch_num, tx_ch->dma_mode ? "accelerated" : "transparent",
    +	    pkt_size, num_pds, tx_ch->start_addr + tx_ch->curr_offset, length);
     
     	/* Enable txfifo empty interrupt logic for supported platform to make
     	 * sure last byte is transferred out of txfifo, this logic
    diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
    index 075aa5f..2147457 100644
    --- a/drivers/usb/musb/musb_core.c
    +++ b/drivers/usb/musb/musb_core.c
    @@ -167,10 +167,18 @@ static inline void __tusb_musb_writeb(void __iomem *addr, unsigned offset,
     }
     
     static inline u8 __musb_readb(const void __iomem *addr, unsigned offset)
    -	{ return readb(addr + offset); }
    +{
    +	u8 val = readb(addr + offset);
    +
    +	trace_printk("readb %p + %04x: %02x\n", addr, offset, val);
    +	return val;
    +}
     
     static inline void __musb_writeb(void __iomem *addr, unsigned offset, u8 data)
    -	{ writeb(data, addr + offset); }
    +{
    +	trace_printk("Writeb %p + %04x: %02x\n", addr, offset, data);
    +	writeb(data, addr + offset);
    +}
     
     static int musb_ulpi_read(struct otg_transceiver *otg, u32 offset)
     {
    diff --git a/drivers/usb/musb/musb_io.h b/drivers/usb/musb/musb_io.h
    index af554fe..9f2a78d 100644
    --- a/drivers/usb/musb/musb_io.h
    +++ b/drivers/usb/musb/musb_io.h
    @@ -61,17 +61,31 @@ static inline void writesb(const void __iomem *addr, const void *buf, int len)
     /* NOTE:  these offsets are all in bytes */
     
     static inline u16 musb_readw(const void __iomem *addr, unsigned offset)
    -	{ return readw(addr + offset); }
    +{
    +	u16 val = readw(addr + offset);
    +	trace_printk("readw %p + %04x: %04x\n", addr, offset, val);
    +	return val;
    +}
     
     static inline u32 musb_readl(const void __iomem *addr, unsigned offset)
    -	{ return readl(addr + offset); }
    +{
    +	u32 val = readl(addr + offset);
    +	trace_printk("readl %p + %04x: %08x\n", addr, offset, val);
    +	return val;
    +}
     
     
     static inline void musb_writew(void __iomem *addr, unsigned offset, u16 data)
    -	{ writew(data, addr + offset); }
    +{
    +	trace_printk("Writew %p + %04x: %04x\n", addr, offset, data);
    +	writew(data, addr + offset);
    +}
     
     static inline void musb_writel(void __iomem *addr, unsigned offset, u32 data)
    -	{ writel(data, addr + offset); }
    +{
    +	trace_printk("Writel %p + %04x: %08x\n", addr, offset, data);
    +	writel(data, addr + offset);
    +}
     
     #else
     
    diff --git a/drivers/usb/musb/ti81xx.c b/drivers/usb/musb/ti81xx.c
    index 9b23570..acc7f3a 100644
    --- a/drivers/usb/musb/ti81xx.c
    +++ b/drivers/usb/musb/ti81xx.c
    @@ -736,7 +736,6 @@ static void otg_timer(unsigned long _musb)
     			    MUSB_INTR_VBUSERROR << USB_INTR_USB_SHIFT);
     		break;
     	case OTG_STATE_B_IDLE:
    -		if (!is_peripheral_enabled(musb))
     			break;
     
     		/*
    @@ -859,6 +858,7 @@ static irqreturn_t cppi41dma_Interrupt(int irq, void *hci)
     	if (gmusb[0] && (usb0_tx_intr || usb0_rx_intr)) {
     		dev_dbg(gmusb[0]->controller, "CPPI 4.1 IRQ: Tx %x, Rx %x\n",
     			usb0_tx_intr, usb0_rx_intr);
    +		trace_printk("CPPI 4.1 IRQ: Tx %x, Rx %x\n", usb0_tx_intr, usb0_rx_intr);
     		spin_lock_irqsave(&gmusb[0]->lock, flags);
     		cppi41_completion(gmusb[0], usb0_rx_intr,
     					usb0_tx_intr);
    @@ -1079,6 +1079,8 @@ static irqreturn_t ti81xx_interrupt(int irq, void *hci)
     	musb->int_usb =	(usbintr & USB_INTR_USB_MASK) >> USB_INTR_USB_SHIFT;
     
     	dev_dbg(musb->controller, "usbintr (%x) epintr(%x)\n", usbintr, epintr);
    +	if (!musb->id)
    +		trace_printk("usbintr (%x) epintr(%x)\n", usbintr, epintr);
     
     	if (musb->int_usb & MUSB_INTR_SOF) {
     		musb->sof_cnt++;
    

  • Eilon Eyal74533 said:
    on the modem I run "iperf -u -c 192.168.16.151 -l1k -t3600" and on the QCOM AP I run "iperf -u -c 192.168.16.1 -l1k -t3600"

    Can you please explain about the topology? it crosses both usb0/ECM and usb1/ECM ports on am335x, in which am335x works as a network router? I don't understand how the ECM interface on both QCOM AP and modem are on the same subnet - 192.168.16.0?

  • Hi Bin,
    1. As for the setup I am not sure it is the same as we use the modified CDC which create 3 ECM interfaces, although we only one of them during the test. could the issue be related either to the fact we consume a large number of endpoints .

    2. when using ftrace, which tracer would you like me to use ? the function tracer? or the function graph?

    3. in our setup we use ip bridge (brctl) to do a transparent bridge inside the Sitara board . 1 end of the network is on the QCOM board and the other on the modem, the Sitara simply forward the packets.

    I will collect the ftrace logs tomorrow.

    thanks
  • Eilon,

    Thanks for the quick response.
    For ftrace, you don't have to use any tracer, I added trace_printk() in the debug patch, and I'd like to see this log.
    Is it possible for you to send instructions for how to do transparent bridge? I understand your project uses multiple ECM interfaces, but this iperf test only uses one. In theory, other interfaces should not affecting usb so long as no traffic on them, so I want to use use two EVMs to run the iperf test, if I cannot see the issue, I will add your patch to bring up 3 ECM interfaces.
  • Hi Bin,
    first you will need to make sure your kernel is configured with "networking -> 802.1d Ethernet Bridging"

    then assuming you want to bridge usb0 and usb1 network interfaces, you will need to:
    ifconfig usb0 0.0.0.0 up
    ifconfig usb1 0.0.0.0 up
    brctl addbr br1
    brctl addif br1 usb0
    brctl addif br1 usb1
    ifconfig br1 0.0.0.0 up
  • Hi Bin.

    I applied the patch you sent (after reverting not needed patches), but unfortunaly the issue could not be reproduced with the patch, it did not even reproduced when I disabled ftrace (echo 0 > /sys/kernel/debug/tracing/tracing_on).

    I the next steps I tried to see which parts of the patch prevents the issue from happening by applying only parts of the patch you sent. 

    it appears that the 2 parts adding trace_printk into  musb_core.c and musb_io.h prevent the issue from happening.

    please let me know how would you like me to continue.

    Thanks,

  • Eilon,

    Eilon Eyal74533 said:
    it did not even reproduced when I disabled ftrace (echo 0 > /sys/kernel/debug/tracing/tracing_on).

    tracing_on flag does not control trace_printk(), so this command does not prevent my patch to generate ftrace log.

    Eilon Eyal74533 said:

    I the next steps I tried to see which parts of the patch prevents the issue from happening by applying only parts of the patch you sent. 

    it appears that the 2 parts adding trace_printk into  musb_core.c and musb_io.h preve

    You probably will not find anything. The patch in musb_io.h and partially in musb_core.c adds trace_printk() in *_read/write(), which generates debug log in every register access, it appears adding delay which cause the issue disappeared.

    Please let me think about it for the next debug step.

  • Eilon,

    Please revert the previous ftrace debug patch, and apply the debug patch below. If it helps, I am expecting the lockup issue goes away and your uart serial console has log messages. Please send me the last 100 lines of the log, I'd like to get an idea of how long it takes to clear the register bits on your system.

    diff --git a/drivers/usb/musb/musb_gadget.c b/drivers/usb/musb/musb_gadget.c
    index bcb0a8f..7454bdc 100644
    --- a/drivers/usb/musb/musb_gadget.c
    +++ b/drivers/usb/musb/musb_gadget.c
    @@ -543,6 +543,19 @@ void musb_g_tx(struct musb *musb, u8 epnum)
                    return;
            }
    
    +       if (dma) {
    +               int i = 0;
    +               int threshold = 500000;
    +
    +               while (csr & (MUSB_TXCSR_FIFONOTEMPTY | MUSB_TXCSR_TXPKTRDY)) {
    +                       if (i >= threshold)
    +                               break;
    +                       csr = musb_readw(epio, MUSB_TXCSR);
    +                       i++;
    +               }
    +               dev_err(musb->controller, "ep%d TXCSR_FIFONOTEMPTY, loop %d\n", epnum, i);
    +       }
    +
            if (request) {
                    u8      is_dma = 0;
    
    
    
  • If the patch above makes the issue go away, please remove the 'dev_err()' line from the patch and test it again to check if the patch still solve the issue. This can confirm if the uncleared register bits is the root cause.
  • Hi Bin,

    The patch shows very good signs it fix the issue. :-)

    After removing the dev_err, mydma.log setup which used to malfunction in around 10 seconds run over 2 hours without any issue (until the device battery died)

    we will continue to verify the patch on other setups here in Motorola, and will let you know how it works.

    The register access timing is attached (max value is 369), also could you briefly describe what was the root cause and how this patch fix it?

    Thanks,

  • Eilon,

    Thanks for the confirmation!

    The AM335x MUSB has a hardware issue that the TX CPPI DMA generates TX completion interrupt when the CPPI DMA channel moved the TX packet from memory (DDR) to MUSB TX FIFO, but did not wait untill empty the TX FIFO, so there is a chance that the TX DMA channel is programmed again for next TX transfer while MUSB TX FIFO still has data which leads to USB lockup, which is what happening in your case.

    To workaround this issue, the TX completion ISR in CPPI driver checks to ensure the MUSB TX FIFO is empty before notifying the upper layer. Unfortunately the driver only implemented the workaround properly for host mode, but not in gadget mode, which is the root cause of your issue.

    The following patch (attached too) is the fix for this issue. Please revert the previous debug patch and use this one as the properly fix. Please let me know if you run into any issue with this patch.

    diff --git a/drivers/usb/musb/cppi41_dma.c b/drivers/usb/musb/cppi41_dma.c
    index e88cb73..438e55c 100644
    --- a/drivers/usb/musb/cppi41_dma.c
    +++ b/drivers/usb/musb/cppi41_dma.c
    @@ -1781,12 +1781,6 @@ static void usb_process_tx_queue(struct cppi41 *cppi, unsigned index)
                            } else if (tx_ch->txfifo_intr_first) {
                                    tx_ch->txfifo_intr_first = 0;
                                    musb_completion = 1;
    -                       } else {
    -                               if (is_peripheral_active(musb) &&
    -                                       csr & MUSB_TXCSR_TXPKTRDY
    -                                       && !tx_ch->zlp_queued) {
    -                                       musb_completion = 1;
    -                               }
                            }
                            if (musb_completion) {
                                    sched_work = 0;
    
    

    0001-usb-musb-cppi41-let-tx_completion_work-to-check-FIFO.patch.txt
    From 9686d3e0f98e48910b380b7931f1020156890bfc Mon Sep 17 00:00:00 2001
    From: Bin Liu <b-liu@ti.com>
    Date: Fri, 20 Nov 2015 11:42:24 -0600
    Subject: [PATCH] usb: musb: cppi41: let tx_completion_work to check
     FIFONOTEMPTY for gadget mode
    
    Due to the hardware issue of tx-dma early completin, the register bit
    TXCSR[FIFONOTEMPTY] should be checked for 0; otherwise the tx CPPI channel
    could be reprogrammed for new transfer while the musb tx fifo still has data
    for current transfer, which leads to usb lockup.
    
    Signed-off-by: Bin Liu <b-liu@ti.com>
    ---
     drivers/usb/musb/cppi41_dma.c | 6 ------
     1 file changed, 6 deletions(-)
    
    diff --git a/drivers/usb/musb/cppi41_dma.c b/drivers/usb/musb/cppi41_dma.c
    index e88cb73..438e55c 100644
    --- a/drivers/usb/musb/cppi41_dma.c
    +++ b/drivers/usb/musb/cppi41_dma.c
    @@ -1781,12 +1781,6 @@ static void usb_process_tx_queue(struct cppi41 *cppi, unsigned index)
     			} else if (tx_ch->txfifo_intr_first) {
     				tx_ch->txfifo_intr_first = 0;
     				musb_completion = 1;
    -			} else {
    -				if (is_peripheral_active(musb) &&
    -					csr & MUSB_TXCSR_TXPKTRDY
    -					&& !tx_ch->zlp_queued) {
    -					musb_completion = 1;
    -				}
     			}
     			if (musb_completion) {
     				sched_work = 0;
    -- 
    1.8.4
    
    

  • Hi Bin,
    After testing this patch it seems it adds significant delays compared to the previous patch.
    can you help?
  • Eilon,

    Can you please explain what the delay caused by this patch is? Please provide any data if you have.

    Both patches are functionally the same - waiting for TX FIFONOTEMPTY bit got cleared. But this patch does the checking in kernel worker instead of inline loop as in the previous patch.

    If you think the previous patch behaves better, you can use it instead in your project. You might want to replace the dev_err() line with 'cpu_relax();' in the patch to improve the responsiveness of the whole system.

  • Hi Bin,

    using the second patch I enabled the prints in txdma comp work, and I this is what I see (there are endless prints of this kind so I only took a few): I think there might me some kind of endless loop due to uninitialized channel , what do you think?

    [ 4284.759680] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4284.779682] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4284.799683] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4284.819691] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4284.839684] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4284.859683] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4284.879729] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4284.899683] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4284.919681] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4284.939682] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4284.959682] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4284.979681] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4284.999703] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.019685] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.039683] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4285.059683] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.079681] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.099683] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4285.119682] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.139727] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.159683] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4285.179693] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.199685] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.219690] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4285.239682] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.259684] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.279681] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4285.299695] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.319690] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.339681] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 4285.359696] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 4285.379685] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 4285.399682] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403

  • to provide more information added prints inside txdma comp work, to print when an actual TX notification occur

    tx_ch->sched_cnt = 0;

    /*****************************/
    dev_dbg(musb->controller,
    "NOW SENDING: TX%d csr %04x",
    tx_ch->ch_num, csr);
    /***********************/
    if (len > 0) {
    tx_ch->tx_complete = 0;
    cppi41_next_tx_segment(tx_ch);
    continue;
    }

    tx_ch->channel.status =
    MUSB_DMA_STATUS_FREE;
    tx_ch->tx_complete = 0;


    here are the new prints, I only placed a few of them to show when there is the notification, but please bare in mind that the "NOW SENDING" print is not showing most of the time. 99% of the time there are on the : txdma comp work prints.
    Also only TX1 and TX2 appear in the SENDING print.
    I hope this helps


    [ 206.129709] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.149701] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.169714] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.189703] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.209718] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.229705] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.249701] musb-hdrc musb-hdrc.0: NOW SENDING: TX2 csr 3404
    [ 206.249755] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.269705] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.289714] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.309708] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.329704] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.349701] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.369704] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.389716] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.409703] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.429704] musb-hdrc musb-hdrc.0: NOW SENDING: TX2 csr 3404
    [ 206.429757] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.449701] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.469703] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.489705] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.509698] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.529702] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.549704] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.570367] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.589730] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.609709] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.629708] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.649704] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.669704] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.689703] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.709700] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.729702] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.749716] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.769699] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.789703] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.809704] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.829701] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
    [ 206.849706] musb-hdrc musb-hdrc.0: txdma comp work: TX7 csr 3403
    [ 206.869699] musb-hdrc musb-hdrc.0: txdma comp work: TX0 csr 3403
    [ 206.889701] musb-hdrc musb-hdrc.0: txdma comp work: TX1 csr 3403
  • Hi Bin,
    as you remember we have 3 CDC-ECM + 1 ACM on the controller going to the QCOM cpu.
    normally at the beginning only 2 ECM are opened on the QCOM side and then we wait for the LTE modem to connect and only then open the last ECM + ACM.
    I leave the device without connection (ECM in down state ACM closed) I see the endless loop and the long delays. if I connect or manually do ifconfig <3rd ECM> up + cat /dev/ttyACM0 > /dev/null ,,,, then the delay is gone and also the loop is gone.

    do you think the delay we see has something to do with the device waiting for setup/open from the host?

    Thanks,
  • Eilon,

    Eilon Eyal74533 said:
    do you think the delay we see has something to do with the device waiting for setup/open from the host?

    Maybe, let me try to to replicate it. But to move your project forward, can you use the first patch (for musb_gadget.c) as the solution? You can change the loop threshold from 500000 to 5000.

  • Thanks Bin,

    Yes I already incorporated the first patch into our official release going to box test team, their test cycle is a round a week and they started yesterday, until now (almost the end of the day here) the results look very good.

    as for the second patch , I can see that those EP which were not opened, for example a ECM interface which was enumerated but the host did not do "ifconfig up", there is a flag tx_complete in the dma channel structure , this is always 1 on those EP, also the FIFONOTEMPTY register shows there is a URB in their queue. so the "big" IF statement in txdma_completion_work is always true and the completion routine waits for the FIFO to be empty (it reschedule it self in case of failure) also it waits 1 millisecond for each failure.

    once the interface is opened  the packet is transmitted.

    Thanks

  • Hi Bin,

    I am very sorry to say that our box team found a reproducible scenario which cause the DMA issue to happen even with the first patch.

    adding a few prints into the patch

     if (i >= threshold) {
         dev_err(musb->controller, " ep%d TXCSR_FIFONOTEMPTY, loop %d csr %x \n", epnum, i, csr);
         break;
     }

    I can see that some time the counter exceeds the threshold (=500000) , I can see that multiple time especially at the system boot,

    ep2 TXCSR_FIFONOTEMPTY, loop 1000000 csr 3403
    ep4 TXCSR_FIFONOTEMPTY, loop 1000000 csr 3403

    after that I can see the same happen for ep1 and then the transmission on ep1 stops.

    examining patch 2 it seems much better to handle such cases since it will never indicate a TX indication to upper layer until the FIFO is empty.

    do you think it is possible to somehow fix patch #2 so it will no cause while waiting for the interface to open?

    Thanks

  • Eilon,

    Finally I am able to replicate the issue based on your description. Thanks.

    The problem is that the application on the USB gadget side sends data to MUSB TX FIFO before the USB host sends IN token, so the data stuck in the TXFIFO which causes txdma_completion_work() to be a busy loop.

    Please try the patch attached below, Let's call it patch #3. Please apply it with patch #2 I sent before. Please fully test with these patches to ensure nothing else is broken.

    0002-usb-musb-cppi41-fix-busy-loop-in-txdma_completion_wo.patch.txt
    From 5fa70ec6e9e2afc13fdb3351b8d595f3aa885fae Mon Sep 17 00:00:00 2001
    From: Bin Liu <b-liu@ti.com>
    Date: Thu, 3 Dec 2015 10:14:44 -0600
    Subject: [PATCH 2/2] usb: musb: cppi41: fix busy loop in txdma_completion_work
    
    Due to the hardware issue of tx-dma early completin, a gadget driver
    could pre-load a packet into TX FIFO, which sets txcsr TXPKTRDY and
    FIFONOTEMPTY bits, which would be cleared until the host sends an IN
    token. During this time txdma_completion_work() becomes a busy loop
    checking those two txcsr bits.
    
    One way to replicate the issue is with g_serial driver sending a
    character on the the gadget side, but the host does not open the ACM
    port to receive the char.
    
    The fix is to put the work thread to sleep for 50us after looped for
    4000 times. Since the sleep is interruptible, the work thread scheduling
    for new tx channels will not be affected.
    
    Signed-off-by: Bin Liu <b-liu@ti.com>
    ---
     drivers/usb/musb/cppi41_dma.c | 24 ++++++++----------------
     1 file changed, 8 insertions(+), 16 deletions(-)
    
    diff --git a/drivers/usb/musb/cppi41_dma.c b/drivers/usb/musb/cppi41_dma.c
    index 32380ac..3b3f4c8 100644
    --- a/drivers/usb/musb/cppi41_dma.c
    +++ b/drivers/usb/musb/cppi41_dma.c
    @@ -114,7 +114,6 @@ struct cppi41_channel {
     	u8  txfifo_intr_first;
     	u8  xfer_state;
     	struct usb_pkt_desc *curr_pd;
    -	int sched_cnt;
     };
     
     /**
    @@ -315,7 +314,6 @@ static int __devinit cppi41_controller_start(struct dma_controller *controller)
     				     CPPI41_SRC_TAG_CH_NUM_SHIFT) |
     				    (tx_info->sub_ch_num <<
     				     CPPI41_SRC_TAG_SUB_CH_NUM_SHIFT);
    -		cppi_ch->sched_cnt = 0;
     	}
     
     	/* Configure the Rx channels */
    @@ -324,7 +322,6 @@ static int __devinit cppi41_controller_start(struct dma_controller *controller)
     		memset(cppi_ch, 0, sizeof(struct cppi41_channel));
     		cppi_ch->ch_num = i;
     		cppi_ch->channel.private_data = cppi;
    -		cppi_ch->sched_cnt = 0;
     	}
     
     	/* Construct/store Tx PD packet info field for later use */
    @@ -1417,6 +1414,7 @@ void txdma_completion_work(struct work_struct *data)
     	unsigned index;
     	u8 resched = 0;
     	unsigned long flags;
    +	int slowchk=0;
     
     	while (1) {
     		for (index = 0; index < USB_CPPI41_NUM_CH; index++) {
    @@ -1444,27 +1442,19 @@ void txdma_completion_work(struct work_struct *data)
     					(csr & (MUSB_TXCSR_TXPKTRDY |
     					MUSB_TXCSR_FIFONOTEMPTY))) {
     					resched = 1;
    -					tx_ch->sched_cnt++;
    -					if (tx_ch->sched_cnt < 4000)
    -						continue;
     					dev_dbg(musb->controller,
     						"txdma comp work: TX%d csr %04x",
     						tx_ch->ch_num, csr);
    -					msleep(1);
     				} else {
     					len = tx_ch->length -
     						tx_ch->curr_offset;
     					if (csr & (MUSB_TXCSR_TXPKTRDY |
     						MUSB_TXCSR_FIFONOTEMPTY)) {
     						resched = 1;
    -						tx_ch->sched_cnt++;
     						tx_ch->count = 1;
    -						if (tx_ch->sched_cnt < 4000)
    -							continue;
     						dev_dbg(musb->controller,
     							"txdma comp work (2): TX%d csr %04x",
     							tx_ch->ch_num, csr);
    -						msleep(1);
     						continue;
     					} else  if (tx_ch->zlp_queued && !len) {
     						/* send ZLP using PIO mode */
    @@ -1485,8 +1475,6 @@ void txdma_completion_work(struct work_struct *data)
     						continue;
     					}
     
    -					tx_ch->sched_cnt = 0;
    -
     					if (len > 0) {
     						tx_ch->tx_complete = 0;
     						cppi41_next_tx_segment(tx_ch);
    @@ -1507,11 +1495,15 @@ void txdma_completion_work(struct work_struct *data)
     				cond_resched();
     		}
     
    -		if (resched) {
    -			resched = 0;
    +		if (!resched)
    +			return;
    +
    +		resched = 0;
    +		if (slowchk++ < 4000) {
     			cond_resched();
     		} else {
    -			return ;
    +			set_current_state(TASK_INTERRUPTIBLE);
    +			schedule_timeout(usecs_to_jiffies(50));
     		}
     	}
     
    -- 
    1.9.1
    
    

  • Eilon,

    In case you don't receive an email notification, I updated the patch #3 in my previous post above. It is similar to the first version, just with a threshold adjustment and some cleanup.

    I will put both patch #2 & #3 on the TI wiki page which lists all the MUSB patches.
  • Hi Bin,

    I am really sorry, but I have to reopen the case.

    it seems that the DMA issue was resolved but the throughput issue when the one of the interfaces is not open still exist.

    our test team reported a degradation from 27Mbit to 300K while this happens. 

  • Eilon,

    Thanks for the report. I thought about this usecase when working on the patch, and simulated it, but did not see throughput problem.

    But I guess I understand what causes the throughput issue. Let me try to work on it again.

  • Eilon,


    Please apply the patch attached below on top of your kernel, and let me know how it works.

    This patch adds a separate kworker thread to pull the inactive endpoints in lower frequency, so it wouldn't add context switch overhead to active endpoints which caused the throughput regression.

    3187.0001-usb-musb-cppi41-fix-throughput-regression.patch.txt
    From f5d498a03b68b325febff885daad867b2ba9983b Mon Sep 17 00:00:00 2001
    From: Bin Liu <b-liu@ti.com>
    Date: Thu, 17 Dec 2015 15:25:32 -0600
    Subject: [PATCH] usb: musb: cppi41: fix throughput regression
    
    ---
     drivers/usb/musb/cppi41_dma.c | 63 ++++++++++++++++++++++++++++++++++++-------
     1 file changed, 54 insertions(+), 9 deletions(-)
    
    diff --git a/drivers/usb/musb/cppi41_dma.c b/drivers/usb/musb/cppi41_dma.c
    index 3b3f4c8..9fe0021 100644
    --- a/drivers/usb/musb/cppi41_dma.c
    +++ b/drivers/usb/musb/cppi41_dma.c
    @@ -114,6 +114,7 @@ struct cppi41_channel {
     	u8  txfifo_intr_first;
     	u8  xfer_state;
     	struct usb_pkt_desc *curr_pd;
    +	int sched_cnt;
     };
     
     /**
    @@ -129,6 +130,7 @@ struct cppi41 {
     	struct cppi41_channel tx_cppi_ch[USB_CPPI41_NUM_CH];
     	struct cppi41_channel rx_cppi_ch[USB_CPPI41_NUM_CH];
     	struct work_struct      txdma_work;
    +	struct work_struct      txdma_slow_work;
     	struct work_struct      rxdma_work;
     
     	struct usb_pkt_desc *pd_pool_head; /* Free PD pool head */
    @@ -314,6 +316,7 @@ static int __devinit cppi41_controller_start(struct dma_controller *controller)
     				     CPPI41_SRC_TAG_CH_NUM_SHIFT) |
     				    (tx_info->sub_ch_num <<
     				     CPPI41_SRC_TAG_SUB_CH_NUM_SHIFT);
    +		cppi_ch->sched_cnt = 0;
     	}
     
     	/* Configure the Rx channels */
    @@ -322,6 +325,7 @@ static int __devinit cppi41_controller_start(struct dma_controller *controller)
     		memset(cppi_ch, 0, sizeof(struct cppi41_channel));
     		cppi_ch->ch_num = i;
     		cppi_ch->channel.private_data = cppi;
    +		cppi_ch->sched_cnt = 0;
     	}
     
     	/* Construct/store Tx PD packet info field for later use */
    @@ -1441,7 +1445,13 @@ void txdma_completion_work(struct work_struct *data)
     				if (!tx_ch->txfifo_intr_enable &&
     					(csr & (MUSB_TXCSR_TXPKTRDY |
     					MUSB_TXCSR_FIFONOTEMPTY))) {
    -					resched = 1;
    +					if (tx_ch->sched_cnt > 4000) {
    +						tx_ch->tx_complete = 2;
    +						slowchk = 1;
    +					} else {
    +						tx_ch->sched_cnt++;
    +						resched = 1;
    +					}
     					dev_dbg(musb->controller,
     						"txdma comp work: TX%d csr %04x",
     						tx_ch->ch_num, csr);
    @@ -1450,7 +1460,13 @@ void txdma_completion_work(struct work_struct *data)
     						tx_ch->curr_offset;
     					if (csr & (MUSB_TXCSR_TXPKTRDY |
     						MUSB_TXCSR_FIFONOTEMPTY)) {
    -						resched = 1;
    +						if (tx_ch->sched_cnt++ > 4000) {
    +							tx_ch->tx_complete = 2;
    +							slowchk = 1;
    +						} else {
    +							tx_ch->sched_cnt++;
    +							resched = 1;
    +						}
     						tx_ch->count = 1;
     						dev_dbg(musb->controller,
     							"txdma comp work (2): TX%d csr %04x",
    @@ -1475,6 +1491,8 @@ void txdma_completion_work(struct work_struct *data)
     						continue;
     					}
     
    +					tx_ch->sched_cnt = 0;
    +
     					if (len > 0) {
     						tx_ch->tx_complete = 0;
     						cppi41_next_tx_segment(tx_ch);
    @@ -1495,18 +1513,44 @@ void txdma_completion_work(struct work_struct *data)
     				cond_resched();
     		}
     
    -		if (!resched)
    -			return;
    -
    -		resched = 0;
    -		if (slowchk++ < 4000) {
    +		if (resched) {
    +			resched = 0;
     			cond_resched();
     		} else {
    -			set_current_state(TASK_INTERRUPTIBLE);
    -			schedule_timeout(usecs_to_jiffies(50));
    +			if (slowchk) {
    +				slowchk = 0;
    +				schedule_work(&cppi->txdma_slow_work);
    +			}
    +			return ;
     		}
     	}
    +}
    +
    +void txdma_completion_slow_work(struct work_struct *data)
    +{
    +	struct cppi41 *cppi = container_of(data, struct cppi41, txdma_slow_work);
    +	struct cppi41_channel *tx_ch;
    +	struct musb *musb = cppi->musb;
    +	unsigned index;
    +	int check = 0;
    +
    +	set_current_state(TASK_UNINTERRUPTIBLE);
    +	schedule_timeout(msecs_to_jiffies(300));
    +
    +	for (index = 0; index < USB_CPPI41_NUM_CH; index++) {
    +		tx_ch = &cppi->tx_cppi_ch[index];
    +		if (tx_ch->tx_complete != 2)
    +			continue;
    +
    +		tx_ch->tx_complete = 1;
    +		check = 1;
    +		dev_dbg(musb->controller,
    +			"txdma slow comp work: TX%d",
    +			tx_ch->ch_num);
    +	}
     
    +	if (check)
    +		schedule_work(&cppi->txdma_work);
     }
     
     void cppi41_isoc_schedular(struct musb *musb)
    @@ -1634,6 +1678,7 @@ cppi41_dma_controller_create(struct musb  *musb, void __iomem *mregs)
     	cppi->txfifo_intr_enable = musb->txfifo_intr_enable;
     	cppi->tx_isoc_sched_enable = musb->tx_isoc_sched_enable;
     	INIT_WORK(&cppi->txdma_work, txdma_completion_work);
    +	INIT_WORK(&cppi->txdma_slow_work, txdma_completion_slow_work);
     	INIT_WORK(&cppi->rxdma_work, rxdma_completion_work);
     
     	/*
    -- 
    1.9.1
    
    

  • Eilon,

    I updated the patch in my previous post, in case you have already downloaded the original. This version reduces the system load in some cases.
  • Hi Bin.

    Sorry for late response it was the weekend here.

    I will compile the latest patch you sent and will post it to our box team for verification (this can take a week or so)

    Thanks