This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TUSB9261: Delay for TUSB9261 to send CSW after data transfer is finished. More prevalent when connected to Laptops than Desktops.

Part Number: TUSB9261

Dear TI support,

I am currently running a performance test on a portable storage device which uses the TUSB9261. I have the USB communications running through a USB3.0 protocol analyzer. 

I am seeing great variability in the transfer speeds. The variability in transfer speeds appears to be due to a delay between the final data being transferred and the TUSB9261 sending the CSW.  What determines when the CSW is sent by the TUSB9261? Is it explicitly requested by the host PC? Is it sent automatically by the TUSB9261 when it is ready?

I realize that there a great many variables which can contribute to the speed at which the USB storage drive will run, but the CSW delay appears to be the cause of the variability in performance. 

Some more information which may help:

  • Device runs reliably on USB3.0 with no evidence of CRC errors, or link faults
  • The variability in performance is much more significant when using a laptop than when using a Desktop PC (I have tested many difference PCs)

Typical logs out of the TUSB9261 when connected to a desktop, and when connected to a laptop are attached.

I would appreciate any information you have on this.

Kind Regards,

Peter Bouvy

TUSB9261DebugLogs.zip

  • Hi Peter,

    What version of the FW are you using? CSW are sent at the packet boundary so if the laptop you are testing is transferring data in larger chunks to your SATA device then this could one source of the difference in performance. I am currently reviewing your debug logs to see what may be the root cause of this performance difference.

  • Hi Malik,

    I'm a colleague of Peter's that has been working on the performance testing as well. We have used FW versions 1.05 and 1.06 with seemingly no difference in performance behavior. I can also note that read and write performance appears to be affected on the laptops versus the desktops. We are mainly using a bench-marking tool called PassMark Performance Test, during which we can note 10-20% performance variability during the sequential read and write tests on the laptops versus practically 0% variability on the desktops (with the highest performance also being better). During the sequential tests with this tool, the transfer size is 32KB, which I wouldn't have thought was all that large. Hope this adds a little bit of info for you.

    Regards,

    Terry

  • Hi Peter and Terry,

    The variability in performance may be driven by the battery level of the laptop (not sure your testing accounts for this). Form the logs there is not much difference in operation. I did notice that the laptop has some additional AHCI errors but TUSB9261 responds accordingly.  

  • Hi Malik,

    Testing was done with the laptops on charge and at 100%, so that battery level would not be a factor. I don't believe the errors are a factor at all either, they are just unsupported ATA commands being issued with the laptop showing a couple more as the test run was longer.

    Regards,

    Terry

  • Hi Terry,

    I see that makes sense. I would suspect that there may be slightly worse signal integrity when using the laptop that may be causing packet errors during transactions. USB link performance can vary based on the USB Host implementation. In either case it seems that TUSB9261 is responding correctly.

  • Hi Malik,

    Based on the host implementation, I would expect different performance yes, but on the same host I would expect similar performance at least when the same test is run repeatedly. Yet, I can see as much as 10-20% (in some cases worse) variance on the same USB host leaving all else equal. Also the main time difference I can see in the USB analyser traces we have is between the final data packet transfer and the CSW being sent by the TUSB9261. This just purely seems to be a delay, and not caused by errors. What drives the CSW to be transferred, the host or the device (TUSB9261)?

    Regards,

    Terry

  • Hi Terry,

    The CSW is constructed in response to a CBW at the packet boundary. The only thing in between the CBW and CWS from a TUSB9261 perspective would be the processing and execution of the SCSI command sent by the USB Host. Could you share the USB traces? Have you looked at the SATA side as well? 

  • Hi Malik,

    Yes, I have attached two USB trace snippets (exported spreadsheets from our analyser software). One snippet for a laptop and one snippet for a desktop showing the difference in timing before the CSW. Each snippet contains a few read commands from the start, middle and end of a much larger USB trace. In the desktop case, the time before the 13 byte CSW is less than 50us for all read commands. In the laptop case, the timing at the same point is much more variable and in some cases is more than 300us. This is what we don't currently have an explanation for.

    On the SATA side, yes we have analysed this too. In that respect, I can see the command is completed successfully as expected and the time delay is before the next command (i.e. the SATA side is just idle waiting), while the USB processing is being completed. So the performance difference seems purely on the USB side.

    Regards,

    TerryUSBTraceSnippets.zip

  • Hi Malik,

    Just wanted to check you were able to open and make sense of the USB trace snippets?

    Regards,

    Terry

  • Hi Terry,

    Sorry for the delay. I do see as increased number of LUP/LDN  pairs in Laptop case (no packets to be sent) however there is nothing wrong reported in the CSW nor is there any mention of the TUSB9261 encountering a phase error thus allowing the USB link to be stalled or halted intermittent. In the log the host never performs any recovery steps either. I can only assume that the CSW processing is causing the variation in performance when an unsupported command block is sent.

  • Hi Malik,

    More LUP/LDN pairs just means there is nothing happening on the USB link for a longer time doesn't it?

    I don't see any sign of anything else going on either, other than just a delay where nothing seems to be happening, which there is no logical reason for. The handful of unsupported commands seen in the TUSB9261 debug previously occur before and after the performance testing not during. Also there are only a few unsupported commands compared to the thousands (or more) read/write commands issued during the performance testing. Therefore, I believe these unsupported commands have nothing to do with the issue on the laptops. The performance test issues reads (or writes) in sequential order and a consistent size, which is really the best case scenario for a USB storage device, so this variable performance appears completely unexpected and still unexplained?

    Regards,

    Terry

  • Hi Terry,

     LUP/LDN pairs indicates there are no packets or other link commands to be transmitted. A logical idle on the bus is detected for greater than 10 us. 

    If the unsupported commands are few and far between there is no reason for the USB link to be idle for so long sporadically. TUSB9261 seems to be functioning correctly. You mentioned no issue on the SATA side (from a protocol perspective), was this tested with only one SSD?

    To the best of my knowledge this kind of behavior has not been reported.

  • Hi Malik,

    We agree that there appears no reason for the USB link to be idle so long and sporadically, but we don't know how to determine what the cause is. Yes, SATA side looks as expected, except for the longer delay waiting for the next command, which corresponds to the idle USB link.We have tested on a few of our hardware prototypes, with different capacities of the same brand/model of SSD, with each showing virtually the same behavior.

    What else can we do to try and determine why this is happening?

    Regards,

    Terry

  • Hi Terry,

    Not sure if we discussed this in detail but we can look at the signal integrity of the system using eye diagrams to help rule out any issues from that perspective.  

  • Hi Malik,

    As the data always appears to be transferred successfully and we have not noted any CRC errors while using our USB analyser, we do not believe signal integrity is an issue. There does not seem anything questionable in the USB traces, apart from the variable pre-CSW time delay.

    The thing we are still not clear on is the sequence and drivers under which the CSW is returned as the only documentation we found on this is ambiguous at best. Are you able to provide or point us in the direction of specifically how the CSW is defined and handled? For example, is the return of the CSW driven by the device or the host?

    Regards,

    Terry

  • Hi Terry,

    Are you familiar with the USB mass storage spec? This spec details how CBW and CSW should be handled and section 5.3 Data Transfer Conditions details how the CSW should be sent with respect to the CBW. TUSB9261 FW is designed to follow this flow (v1.0 version specifically). Spec can be found on USB-IF website at the link below.

     

  • Hi Malik,

    Yes, we already had v1.0 of the USB Mass Storage Class - Bulk-Only Transport document. As this document was so old (1999), we weren't sure if it was the latest and most relevant. But I've just double checked against what is listed on the USB-IF page you linked to and it's the same version they list still. So yes we've seen section 5.3 Data Transfer Conditions that has the Status Transport Flow described with flowchart. However, this description doesn't really indicate whether the CSW return is driven by the host or the device, as far as we're concerned. It seems quite an ambiguous description to us. Is there no other document that further describes this behaviour?

    The flowchart indicates; 'attempt to read CSW from bulk-in endpoint'. Does this mean it is the host that drives reading the CSW or does it just mean the host should check if the CSW is already available and it is driven by the device then? What we really want to know is, is the long variable delay before the CSW is read (in our USB traces) caused by the host or the TUSB9261 device?

    Regards,

    Terry

  • Hi Terry,

    The CSW is sent by TUSB9261 to the Host at the packet boundary, in other words, after the CBW and bulk data is processed. However the CSW is sent in response to an IN transaction as scheduled by the USB host. It is not listed in the provided USB logs but is there any NRDY handshakes or STALL handshakes during the test? The protocol analyzer software may mask or stack these packets so they are not visible. 

  • Hi Malik,

    So it is really the host that drives the reading of the CSW when it's ready?

    I have attached two very small traces, both from the laptop. Each trace shows just a portion of one read command, with the final section (1KB) of data being read and the subsequent CSW. Compared to the previous traces, both of these IN transactions have been expanded in the USB analyser before being exported, so it shows some more details. One trace shows the typical ('normal') CSW delay and the second trace shows a much longer delay. In both cases there is NRDY shown during the data read, but the data is transferred and completed before the LUP/LDN pairs are shown, which is all that is shown until the IN transaction starts for the CSW, which doesn't show any NRDY or stall to me. And all that significantly differs between the shorter and longer delay cases is the number of LUP/LDN pairs. So to me, prior to the CSW IN transaction simply seems to be idle, so the variations in the idle length must be host based, would you agree? Note, the desktop also looks the same, just less idle.

    Regards,

    Terry

    USBTraces2.zip

  • HI Terry,

    The delay of the CSW does appear related to the host.  The host sends the ACK transaction that the 9261 responds to with the CSW Data transaction.  The delay occurs before the sending of the ACK from the host, not between the ACK and the Data response from the device.

    Regards,

    JMMN

  • Hi JMMN,

    Can you think of, or have seen, any reasons the host may delay like this at this point? And why only on laptops PCs, not desktop PCs (all current business models)?

    Essentially our product is a SATA storage device behind the TI USB to SATA chip. For development we have used a TI development board in front of our SATA product. Interestingly, when we've used other USB to SATA adapters instead, this delay seems far less pronounced or even virtually non-existent in some cases, which doesn't make much sense if this is host related only?

    Regards,

    Terry

  • Hi JMMN/Malik,

    Just wanted to check if there was anything in my previous message that you guys were able to comment on?

    Regards,

    Terry

  • Hi Terry.

    I don't see anything obvious in the traces on why there would be a longer delay for the CSW in some cases vs. others.  One thought is that in the delayed case I see that there are attempted entries to the intermediate power states (U1).  Could the laptop be prioritizing attempting a lower power state over requesting the CSW.  Do you have the ability to disable U1/U2 on the host to see if it impacts anything?  I know U1/U2 can be disabled on the TUSB9261 but it would not stop the host from attempting entry to those states, it will just refuse the attempts.

    Regards,

    JMMN

  • Hi Terry,

    I didn't mean to mark this as "TI thinks resolved", but I can't seem to uncheck it.   We consider it open still.

    Thanks,

    JMMN