USB RX DMA hangs during stress test

Stefan Schulze

Hi,

for a customer project, we have ported our USB stack for using it on an OMAP L-138 platform for the RNDIS use case.

For this, we have written a new HAL layer for our stack supporting the TRANSPARENT and GENERIC RNDIS DMA mode.

During some stress test (create high CPU load and then performing HTTP requests on RNDIS again and again) we see some instability for the RX DMA.

When the issue occurs, we see that the rxcsr register shows a full FIFO, DMAEN is set, and from our logs we can also see that a new DMA was prepared correctly. But we never get an interrupt.

Detailed analysis has shown that it looks like the DMA is hanging in the middle of a transaction, because we see the following (example sequence):

(good case)

-->SW prepares DMA and enables DMA in rxcsr register

-->PC sends 512 bytes

-->PC sends 19 bytes

-->SW gets interrupt - DMA completed correctly

(fail case)

-->SW prepares DMA and enables DMA in rxcsr register

-->PC sends 512 bytes

-->PC sends 19 bytes

-->no interrupt, but we see that the 512 bytes are already transferred via DMA! But the short packet of 19 bytes is not transferred.

We have verified again and again if the sequence on preparing the EP for the DMA is correct. Any suggestion, any debug hints to find the root cause for this issue are welcome.

Thanks in advance,

Stefan

over 14 years ago

0 Ravi B over 14 years ago

TI__Expert 7885 points

Stefan

As per your explanation the issue you observe in RX-DMA not generating the interrupt in the above scenario. What is mode of RX-DMA has been configured? is it TRANSPARENT or GENERIC RNDIS mode. If GENERIC RNDIS mode, what is the request packet size and what is the value configured in EP-SIZE register of that particular endpoint.

Regards

Ravi B

0 Stefan Schulze over 14 years ago in reply to Ravi B

Prodigy 70 points

Ravi,

thanks a lot for your feedback.

I see the issue in both TRANSPARENT and GENERIC RNDIS mode. Our goal is to use the GENERIC RNDIS mode to minimize the CPU load, so I'm focusing on solving the issue there. I suppose you want to know the configuration of the "Generic RNDIS EPx Size Register" (GENRNDISSZx). We initialize these registers with 0x800, and we do not re-program these registeres because it may lead to a problem according to one of the latest Errata Sheets. The max packet size register (RXMAXP) is set to 512 bytes.

The USB MaxPacketSize is 512 bytes (high-speed USB). Until now, I have only seen this issue after transferring a sequence of 1 packet with 512 bytes + 1 packet < 512 bytes (short packet). It happens "rarely" (after 15..30 minutes stress test)

Best Regards,

Stefan

0 Ravi B over 14 years ago in reply to Stefan Schulze

TI__Expert 7885 points

Stefan

As you said this issue occurs during stress test and rarely. I suspect this could be due to race condition in rx-dma. Can you try this patch.

From 49e1dff979774339ef30544e5cdaa47e5dd5f524 Mon Sep 17 00:00:00 2001
From: Ravi B <ravibabu@ti.com>
Date: Thu, 19 May 2011 11:21:19 +0530
Subject: [PATCH] musb: cppi41: Fix for dma race condition during i/o completion

This patch fixes the cppi41 dma race condition, where software
reads buffer descriptor before being updated by dma as rx/tx
buffer descriptor(BD) writes by dma still pending in
interconnect bridge when traffic on interconnect is high.

Signed-off-by: Ravi B <ravibabu@ti.com>
---
drivers/usb/musb/cppi41_dma.c |   29 +++++++++++++++++++++++++++--
1 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/musb/cppi41_dma.c b/drivers/usb/musb/cppi41_dma.c
index d709c6f..e320a39 100644
--- a/drivers/usb/musb/cppi41_dma.c
+++ b/drivers/usb/musb/cppi41_dma.c
@@ -796,9 +796,16 @@ static unsigned cppi41_next_rx_segment(struct cppi41_channel *rx_ch)
                (length > max_rx_transfer_size) ? max_rx_transfer_size : length;

                hw_desc = &curr_pd->hw_desc;
+               hw_desc->desc_info = (CPPI41_DESC_TYPE_HOST <<
+                                     CPPI41_DESC_TYPE_SHIFT);
                hw_desc->orig_buf_ptr = rx_ch->start_addr + rx_ch->curr_offset;
                hw_desc->orig_buf_len = pkt_len;

+               /* buf_len field of buffer descriptor updated by dma
+                * after reception of data is completed
+                */
+               hw_desc->buf_len = 0;
+
                curr_pd->ch_num = rx_ch->ch_num;
                curr_pd->ep_num = rx_ch->end_pt->epnum;

@@ -1357,7 +1364,7 @@ static void usb_process_rx_queue(struct cppi41 *cppi, unsigned index)
                struct usb_pkt_desc *curr_pd;
                struct cppi41_channel *rx_ch;
                u8 ch_num, ep_num;
-               u32 length, orig_buf_len;
+               u32 length, orig_buf_len, timeout = 50;

                curr_pd = usb_get_pd_ptr(cppi, pd_addr);
                if (curr_pd == NULL) {
@@ -1365,10 +1372,28 @@ static void usb_process_rx_queue(struct cppi41 *cppi, unsigned index)
                        continue;
                }

+               /* This delay is required to overcome the dma race condition
+                * where software reads buffer descriptor before being updated
+                * by dma as buffer descriptor's writes by dma still pending in
+                * interconnect bridge.
+                */
+               while (timeout--) {
+                       length = curr_pd->hw_desc.desc_info &
+                                       CPPI41_PKT_LEN_MASK;
+                       if (length != 0)
+                               break;
+                       udelay(1);
+               }
+
+               if (length == 0)
+                       ERR("!Race condtion: rxBD read before updated by dma");
+
                /* Extract the data from received packet descriptor */
                ch_num = curr_pd->ch_num;
                ep_num = curr_pd->ep_num;
-               length = curr_pd->hw_desc.buf_len;
+
+               DBG(4, "Rx complete: dma channel(%d) ep%d len %d timeout %d\n",
+                       ch_num, ep_num, length, (50-timeout));

                rx_ch = &cppi->rx_cppi_ch[ch_num];
                rx_ch->channel.actual_len += length;

0 Stefan Schulze over 14 years ago in reply to Ravi B

Prodigy 70 points

Ravi,

because we do not use the linux reference, I cannot apply the original patch. I have to patch it "by hand".

Just to ensure I have understood the intention of the patch:

We might get an interrupt for a descriptor arrived in the completion queue, and the descriptor may be still updated after that?

So the work-around is:

1. always to set the buf_len (HPD Word3?) of the BD to zero. (why?)

2. In the interrupt handler or in the task which is handling the Rx completion a loop should be added which is reading the packet length (HPD Word0?) again and again until it is not zero.

A non-zero value indicates that the BD is definitely completed?

Best Regards,

Stefan

0 Ravi B over 14 years ago in reply to Stefan Schulze

TI__Expert 7885 points

Stefan

Your understanding is correct. Let me know it works.

Regards

Ravi B

0 Stefan Schulze over 14 years ago in reply to Ravi B

Prodigy 70 points

Ravi,

it took some time, because the customer has provided me the whole setup to reproduce the issue.

Unfortunately, it does not solve the problem. :-(

The customer made another interesting finding:

When the rx hangs, the PERI_RXCSR register has the value 0x2003, which means that DMAEN, RXPKTRDY and FIFOFULL are set. (but we do not get any more interrupt)

Stopping the system with the debugger, writing a 0x2002 to this register, and continue running solves the problem . Interrrupts are generated and the system continues until the same issue occurs again after some time..

Do you know if it may lead to problems if the preparation of rx/tx transfers is done from different task context?

Best Regards,

Stefan

Processors

Processors forum

USB RX DMA hangs during stress test