This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

smsc95xx: download ok, upload hangs

Hi,

I am experiencing ethernet issue with a pandaboard A3 (OMAP4430 rev 2.2) featuring smsc LAN9514-JZX usbnet chipset.
My panda runs android-4.0.4 with linux kernel 3.0.8 (android-omap-panda-3.0 branch onhttps://android.googlesource.com/kernel/omap.git).

Receiving ethernet frames work fine, but transmitting them does not. The driver/chip seems stuck.
Moving the USB mouse (or USB keyboard key pressed) unlocks this behavior and transmission gets resumed for a second or two. Then ethernet transmission gets stuck again.

Recently, I cherry-picked dozen of usbnet and smsc95xx patches, and managed to get a watchdog barking (see test 21 below).

Unfortunately I have no JTAG probe, so I am limited to driver tweaks and tryouts...
Here are the tests I have performed so far, along with a todo list:

FAILED means that the issue came up.
PASSED means that the issue has not come up.
DONE, NOT DONE, ONGOING are more related to a todo list than a test report.

1. check with constant cpu load (stress -c 2) - FAILED
2. check if problem occurs on older releases (non ICS) - NOT DONE
3. try CONFIG_PL310_ERRATA_769419 patch in cpuidle - FAILED
4. check without USB hub connected - FAILED
5. check with usbcore.autosuspend=600 added to cmdline - FAILED
6. patch ehci-omap.c to verify clock frequency - NOT DONE
7. check with CPU1 offlined - FAILED
8. check ethtool on android - FAILED
Cannot get register dump: Operation not supported on transport endpoint
9. check without USB_EHCI_TT_NEWSCHED - FAILED

10. try to unbind, rebind smsc95xx - FAILED
11. disable turbo_mode and reset the chip - FAILED
12. test with "CONFIG_NO_HZ is not set" - FAILED
13. test with another external USB ethernet dongle - NOT DONE
14. test linaro-12.05 ICS release and see ethernet behavior - PASSED
Ethernet runs fine on release:
. 12.05 tracking - PASSED
. 11.10 tracking - PASSED
. 11.09 release - PASSED
15. try with "netcfg eth0 dhcp" - FAILED
16. check datasheet - ONGOING
registers description is missing on 9514.pdf, only eeprom is described
17. adapt driver to ethtool - DONE
18. dump registers and check against linaro 11.09 - ONGOING
19. ethtool returns heaps of "0", the pattern I added to the array is all replaced by "0"...
Actually the eeprom is blank. I found it out since each time I unbind/bind the device to smsc95xx driver, I get a random MAC address...

20. test with 11.09 linaro kernel - NOT DONE
zygote not starting
21. uploading 24MB file on the web (http://dl.free.fr) - FAILED
This occurred only with these patches added to my kernel:
 From 8a78335 [PATCH] usbnet: consider device busy at each recieved packet
 From 5d5440a [PATCH] usbnet: don't clear urb->dev in tx_complete
 From 4231d47 [PATCH] net/usbnet: avoid recursive locking in usbnet_stop()
 From 1aa9bc5 [PATCH] usbnet: use netif_tx_wake_queue instead of netif_start_queue
 From 7bdd402 [PATCH] net/usbnet: reserve headroom on rx skbs
 From 0956a8c [PATCH] usbnet: increase URB reference count before usb_unlink_urb
 From 9bbf566 [PATCH] net: usb: smsc95xx: fix mtu
 From 720f3d7 [PATCH] usbnet: fix leak of transfer buffer of dev->interrupt
 From a472384 [PATCH] usbnet: fix failure handling in usbnet_probe
 From 5b6e9bc [PATCH] usbnet: fix skb traversing races during unlink(v2)
 From 07d69d4 [PATCH] smsc95xx: mark link down on startup and let PHY interrupt
a timeout occurred:
http://pastebin.com/KpaTJY3N

My current kernel is based on:
commit 52f476403350050beb0dff135a55c06c9e7a82a9
Author: Jean-Baptiste Queru <jbq@google.com>
Subject: Revert "gpu: pvr: Revert to 1.8@550175"

I managed to get a register and PHY dump when upload is stuck, thanks to ethtool:

000:     01 00 00 ec 00 00 00 00 00 00 00 00 00 00 00 00
010:     04 00 00 00 00 14 00 00 00 00 00 00 00 20 00 00
020:     81 00 00 00 00 00 11 01 1f 00 00 1f a0 30 f8 00
030:     00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00
040:     00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00
050:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060:     00 00 00 00 00 00 00 00 00 80 00 00 00 20 00 00
070:     00 00 00 00 83 0f 83 0f 00 00 00 00 0f 06 0f 06
080:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
090:     00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
0a0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0b0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0c0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0d0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0e0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0f0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
100:     0c 20 10 00 f7 6f 00 00 a2 7d 13 dd 00 00 00 40
110:     20 00 00 80 40 09 00 00 e1 c1 00 00 00 00 00 00
120:     00 81 00 00 ff ff 00 00 00 00 00 00 00 00 00 00
130:     00 31 00 00 2d 78 00 00 07 00 00 00 c3 c0 00 00
140:     e1 0d 00 00 e1 c1 00 00 0b 00 00 00 ff ff 00 00
150:     ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00
160:     ff ff 00 00 ff ff 00 00 ff ff 00 00 00 00 00 00
170:     40 00 00 00 02 00 00 00 e1 00 00 00 ff ff 00 00
180:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190:     ff ff 00 00 ff ff 00 00 00 00 00 00 0a 00 00 00
1a0:     00 00 00 00 c8 00 00 00 50 00 00 00 58 10 00 00

But the 9514.pdf datasheet I have misses register description.
Then decoding all this is quite troublesome.

I saw that Ubuntu release got trouble with this chipset and acpi. But there is no acpi on Android AFAIK.
Did anyone else experience this issue?
Does anyone have an idea where it can come from?


Thanks a lot for your kind support,
Emeric

  • These are very quick analysis/observations base in Panda's user guide and described behavior;

    1. it is possible that issue is because a timeout or it is involved, by using the USB HUB is getting 1 or 2 seconds enabled that could be a timeout for the chip.

    2. the SMSC chip is connected to (KPD_COL2/KPD_COL5/CAM2_D10/GPIO_1/SAFE_MODE) in order to enable it by using TPS73633DBVT and TXB0104ZXUR enabled from GPIO_1, as explained in Panda user guide, that means 2 possibilities,
    a) for TX it is not being activated at all, and when USB hub is activated it enables the chip for 1 or 2 seconds.
    b) another external device like keypad is getting the access to the resources and is not enabling this SMSC chip, or conflicting.

    By schematic (page 9-11) then it means TPS73633DBVT delivers enable bit by pin #4 (HUB_LDO_NR) and in LAN9514 chip it can be directly check if enabled in pins 19,27,33,39,46.

    1) Can you check if when TX this pins 19,27,33,39,46 of LAN9514 have a voltage? or if it is enabled?
    2) are you using this GPIO_1 for keypad or other device?

  • Hi Manuel,

    1. What do you mean by "using the USB HUB getting 2 seconds enabled could be a timeout for the chip"? If chipset times out, how can I verify that in practice?
    2. The user guide does not describe the TXB0104ZXUR chipset. Is this an LDO as well? Which chips does it supply? And what is HUB_LDO_EN triggering on TXB0104ZXUR?
    1) I have very limited hardware wiring tools in here, my company is mainly software :-(. But I have a scope, do you know if these pins can be checked on expansion connectors (J3, J6) instead of directly on the (too tiny) smsc LAN9514?
    2) I don't know. I got to check my kernel.
  • Hi Emeric,

    One correction, it is not HUB_LDO_NR line it is VDD_HUB_FLT, then paragraph must look like next

    By schematic (page 9-11) then it means TPS73633DBVT delivers enable bit by pin #4 (VDD_HUB_FLT) and in LAN9514 chip it can be directly check if enabled in pins 19,27,33,39,46.

    This pin VDD_HUB_FLT is not connected to a header for easy reading.

    If you check in schematic page 11 you can find LAN9514 and related circuitry (http://pandaboard.org/content/resources/references / http://pandaboard.org/sites/default/files/board_reference/pandaboard-a/panda-a-schematic.pdf)

    The other reference could be HUB_NPD that is connected to GPIO_1 in page 6, but it is not going to a header neither.

    Answers;

    1. I need to check the driver to see how it is doing it to see if it can be check by SW just by polling a register, but for what you said the connection between the 2 conditions RX and USB HUB is that the chip is selected, and before the timeout is reached TX works ok.

    2. If you check schematic page 11 you can see,

    TXS0104EZXUR - http://www.ti.com/product/txs0104e - 4-Bit Bidirectional Voltage-Level Translator for Open-Drain Applications

    TPS73633DBVT - http://www.ti.com/product/tps73633 - Single Output LDO, 400-mA, Fixed (3.3 V), Cap-Free, Low-Noise, Reverse Current Protection

    what I did was user guide refers to LAN9514 and OMAP4430 GPIO_1, I just went to schematic and checked what was in between and those chips are TPS73633DBVT and TXS0104EZXUR;

    2.7.2 USB/Ethernet Power Circuitry
    There is a fixed 3.3V LDO (U11) that provides power for the LAN9514 Ethernet/USB Hub device. This
    device is a Texas Instruments TPS73633DBVR device which can provide up to 400mA of output current.
    This device may be controlled via S/W by writing OMAP4430 GPIO_1. Writing this GPIO high will
    enable this LDO, while writing it low will disable it (see Table 9 on page 41). This device is shown on
    sheet 11 of the schematic.

    1) you could check the capacitors that surrounds LAN9514 and check if signal is present.







  • Hi Emeric,

    To clarify my last comments, you can check the schematic for a capacitor pin to check this value, like you mentioned chip pins are very small and there is not a header pin to check this.

    You can use a oscilloscope and check if this pin have some value or voltage.

    But, what you mean is that only Linaro releases has this working and when you moved to Google's release it was failing? and I can think that you need it working in Google's release? or what is the final purpose if may I ask?

  • Hi Manuel,

    I have the schematics and user guide for panda A revision. I read the same information as you. Ok, I will check the schematics to identify a capacitor holding the pin you mentioned. Something big enough to point my (too big) scope probe on it.

    Yes, everything is fine on linaro ICS release. But I need to have it working on Google's release, not linaro. I am doing a "git bisect" on my kernel to find the faulty commit, but the bug might be caused by another android package, not the kernel...

    Thanks,

  • Ok, I checked VDD_HUB_FLT on C67, a huge capacitor located between the 16A4NZM Ti chipset and the ethernet/USBx2 connector. I read 5V when browsing the web, I read 5V when meeting the issue, ie upload (TX) is stuck. However on my pandaboard rev A3, C67 is actually labelled C76, C66 is labelled C75, ... I found out that nearly all the capacitors are numbered differently on my board. I read panda-a-schematics.pdf and panda-a-allegro.pdf. Are these differing labels nor suprising, or do I read the wrong schematics and allegro files?

    I checked the( what I supposed) pin 33 of the LAN9514 chipset. I say "supposed" because the chip has two "foolproof" bullets, a big one and a small one. I considered the big bullet to be the right one. Then according to 9514.pdf pin 33 is diagonal opposite of this bullet. In practice,i probed the pin closest to the C67 (C76 for me) capacitor. Signal looks like:

    .

    When TX is stall, IO spreads on 40us every 1ms, as depicted. When I move my USB mouse, TX is resumed, IO spreads on roughly 80us every 1ms.

    I made a mistake on my picture, you should read 3V, not 5V.

  • I am not that sure, but I was expecting some flat line or similar for TX.

    Have you checked using TI's 4AI1.4P1 - 16. full_panda-eng? is it working?

  • I just tried with "Linux version 3.0.8-g52f4764 (jbq@jqueru.mtv.corp.google.com) (gcc version 4.6.x-google 20120106 (prerelease) (GCC) ) #1 SMP PREEMPT Sun Apr 29 09:27:12 PDT 2012", and I faced the same issue.

    Do you have a link for the one you mentioned?

  • Hi Manuel,

    I tried this kernel this morning. The panda_defconfig did not compiled out of the box:

    arch/arm/mach-omap2/temp_sensor_device.c:38: undefined reference to `omap_temp_sensor_idle'

    I had to disable CONFIG_OMAP_TEMP_SENSOR and CONFIG_OMAP4_DIE_TEMP_SENSOR. It seems that it is not supported on omap4430 anyway... But kernel panics short after init, with my Android images. And I am not really into investigating this panic.

  • Hi,

    I made more probing yesterday. VDD33IO signals look the same as I depicted previously. I checked also VDD33A, and it is 3.3V flat. It never goes low.
    According to my pandaboard-a-schematic.pdf, both signals should have the same shape since VDD33IO and VDD33A are (supposed to be) connected to VDD_HUB_FLT.
    But I see different signals...

    1. Am I looking at the wrong schematics (I have a panda rev A3)?
    2. Aren't these two signals connected to VDD_HUB_FLT?
    Thanks,
  • I am trying to get it running, in order to reproduce the issue, but it could take some time, usually I don't have access to a Panda board only to Blaze board, and it is intended to redirect Panda board issues to http://pandaboard.org/ but in this case it could not help because you are not using Linaro release for final purpose.

    I think we can pass about reading the power signal if it is high level that means chip is activated and it is ok, my idea was that drivers was missing enabling the device for TX and when the chip was enabled for USB access it worked, I took the easy answer just from HW side and thought it was missing just enable signal for what issue description clue.

    But now seeing the fact that it is working in Linaro baselines that means something is missing in other releases.

    After checking the schematics these signals are the ones to measure VDD33IO and VDD33A are connected to the same point, then it must be the same voltage. If it is HW issue then if you try a different Panda board with your current release non Linaro, but you mentioned that this board is working for Linaro releases, that means is not a HW issue, and having different signals in these pins could not be significant.

    I don't have any experience using Panda Board, then I don't have a direct answer to solve the issue with a direct answer.

    I just got a Panda Board borrowed for a couple of days, and today's published link in a post

    http://omappedia.org/wiki/4AI.1.4_OMAP4_Icecream_Sandwich_Panda_Notes

    from post http://e2e.ti.com/support/omap/f/849/p/195046/699499.aspx#699499

    I going to give it a try and check actual issue.

  • Emeric;

    I have to tell you that it is not possible to me to test this issue with the Panda, it took me some time to compile the system and not much time for testing it, and I had to return the Board to his owner by his request.

  • Hello,

    is there any outcome of that research? The issue still there. netperf TCP_STREAM test dies or yields just 0.02 MBit while TCP_MAERTS works fine (96 MBit). Indeed, if USB mouse is moved while netperf is running, transmission throughput increases up to 5MBits, so it all looks to be the same issue (my Panda is ES Rev B1). Under assumption that Linaro kernel/bootcode works fine, it is most likely a software problem. Was it resolved or decision has been made to go from AOSP kernel? Thanks

  • Hi Konstantin,

    This issue is still there on my side too, with google kernel (android-omap-panda-3.0 branch). Ming Lei from Canonical made me do some more tests, and it appears that the EHCI host driver also has the issue:

    https://bugs.launchpad.net/linux-linaro/+bug/709245

    I am in talk with some Ti guys, but for now, no progress. I will push updates in here as soon as I have interesting news.

    Emeric

  • I cannot get a Panda Board to debug the issue, and I finished with some compilation issues, no updates to provide for this issue.

  • Emeric,

    it seems the schematic has some misprint. Accordingly to SMSC document, VDD33IO and VDD33A should be separated with inductor. In schematic that is mapped to signals HUB_3V3 (that shall go to VDD33IO) and that VDD_HUB_FLT (shall go to VDD33A) separated by L11. That oscillation that you observe on VDD33IO is a very strange thing. I would suspect that VDD33IO (which is roughly equal to V3.3) shall stay high while VDD33A (which is inside the LC-chain) may be oscillating. No real idea what was behind having these two separated by inductor, but if you confirm the behavior is exactly as you described, it all looks like a hardware problem (signals are misplaced, and wiring of VDD33IO and VDD33A should be opposite). That is just a version. I still have no explanation of why Linaro kernel works in the same hardware.

  • Hi,

    We are facing same issues on Pandaboard (all 20) as Emeric. Ethernet TX doesnot work at all. RX works fine.

    If CPU is kept busy like by moving mouse then Ethernet TX works for that period.

    This does not seem to be hardware problem as I have tried Linaro kernels, also Ubuntu on the same board and ethernet works fine on.

    We are using AOSP kernel 3.0.8. I suspect there is some problem in Power Management on USB host driver.

    Has there been any solution to the problem in this thread?

    Thanks,

    Sagar

  • Hi Konstantin,

    It is not a hardware problem. I verified that ethernet runs fine on some recent linaro android-4.0.4 builds. So the problem is software, somewhere in the linux kernel from google official repo. A colleague of mine recently copied the drivers/usb directory from the "working" linaro kernel to the "faulty" google kernel, but no luck. He noticed that the ethernet speed gets worse by increasing packet size of the ping command:

    # ping -s 10 192.168.1.1

    1.2ms average

    # ping -s 96 192.168.1.1

    1.2ms average

    # ping -s 97 192.168.1.1

    40.0ms average

    # ping -s 1024 192.168.1.1

    stuck...

    Also USB host storage (file transfer from pandaboard to usb stick) is impacted as well: https://bugs.launchpad.net/linux-linaro/+bug/709245/comments/59

    Regards,

    Emeric

  • Hi Emeric,

    I agree, that's likely a software bug, and it is more probable that you indeed observed oscillation on "analogue" line (VDD33A). Frankly, I didn't investigate the issue after that, and switched to the workaround (external USB-to-Ethernet device). It may very well be a power-management issue; I did check the difference in drivers/usb between AOSP and Linaro kernel, and haven't found anything interesting (direct comparison of arch/arm/mach-omap2 was problematic, and I decided not to spend too much time as workaround was quite acceptable).

  • Ok then I have to check VDD33A line with the "working" linaro build.

  • Emeric;

    I asked about this issue to an Expert in the matter and he told me next point to check,

    "the resume sequence that the customer is using could not be complete

    or not in the sequence required for full wake up.

    on resume, the power to the phy would need to be restored, the phy clock

    re-enabled, and the power to the USB hub enabled before the lan9514 could resume operations."

    I hope it helps,

  • Hey can you specify that at what base address you are configuring the Ethernet controller.