This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MDIO not able to poll PHY

Other Parts Discussed in Thread: AM3352

Hi folks,

Booting Linux on a custom board here. We're using the Micrel KSZ8081RNAIA phy, What I see is that the MDIO and PHY kernel modules load and are initialized, as you can see here in my kernel output:

[ 0.122924] BPF: phy_init: called
[ 0.122924] BPF: mdio_bus_init: mdio init

Also, after .9 seconds, we flip a GPIO to HIGH which pulls the PHY out of reset. Until this point, the PHY has been held in reset. 

[    0.913635] BPF: gpio1_init: Init of GPIO, RESET of RMII should go high here

Then after about 50ms more, we see the MDIO begin to poll the registers on all 32 PHY addresses looking for the right PHY IDs. Sadly, it seems to find all Fs instead:

[ 0.952301] BPF: get_phy_id: identifying phy...
[ 0.957061] BPF: mdiobus_read: Address being queried is 0, register 2
[ 0.963745] BPF: mdiobus_read: Address being queried is 0, register 3
[ 0.970428] phy ID: ffffffff
[ 0.973449] BPF: get_phy_device: Found PHY ID ffffffff
[ 0.978759] BPF: get_phy_device: Device was all Fs, nothing there, bail out
[ 0.985992] BPF: get_phy_id: identifying phy...
[ 0.990692] BPF: mdiobus_read: Address being queried is 1, register 2
[ 0.997375] BPF: mdiobus_read: Address being queried is 1, register 3
[ 1.004089] phy ID: ffffffff
[ 1.007049] BPF: get_phy_device: Found PHY ID ffffffff
[ 1.012390] BPF: get_phy_device: Device was all Fs, nothing there, bail out

(this continues through PHY address 31)

Then of course our Micrel PHY driver code fails, because none of the PHYs had the expected PHY IDs:

[ 2.032348] Fixed MDIO Bus: probed
[ 2.035888] BPF: ksphy_init: Invoking micrel driver code
[ 2.043182] BPF: ksphy_init: Did not find any devices matching our phy IDs

So our question is: What could make the MDIO unable to poll the PHY IDs from the PHY? We've verified that the reset lines are really high (where low means reset), so the chip should be out of reset. We've verified the pin-muxing for the MDIO/MDC, and are seeing traffic on it after Linux boots. We've also done some verification on the clocking and verified that the PHY clocks are working. We also are able to use the PHYs from U-boot, and the PHY ID is detected, and our pin muxing is the same there.

  • Since I forgot to mention it except in the tag, we are also using an AM335x, specifically the AM3352.

  • Hi Benjamin, Do you have an external pullup resistor on the MDIO_DATA line?
  • Also, do you have RX enabled in the pinmux of MDIO_CLK? This is necessary as the MDIO module uses the clock at pin level for input retiming.
  • Hi Biser,

    We have a 1k external pull-up on MDIO_DATA. Also, in the pin muxing we have a pull-up on it - our hardware engineer has indicated that having them both should not be an issue though.

    We did not have input enabled on MDIO_CLK, although we also do not in u-boot where it appears to work anyway. I enabled RX on it and built a new image, but it appears to have the same result - all Fs. 

    Here is the pin muxing after my change:

    static struct pinmux_config rmii1_pin_mux[] = {
    {"mii1_crs.rmii1_crs_dv" , OMAP_MUX_MODE1 | AM33XX_INPUT_EN },
    {"mii1_rxerr.rmii1_rxerr" , OMAP_MUX_MODE1 | AM33XX_INPUT_EN }, 
    {"mii1_txen.rmii1_txen" , OMAP_MUX_MODE1 | AM33XX_PIN_OUTPUT }, 
    {"mii1_txd1.rmii1_txd1" , OMAP_MUX_MODE1 | AM33XX_PIN_OUTPUT }, 
    {"mii1_txd2.rmii1_txd0" , OMAP_MUX_MODE1 | AM33XX_PIN_OUTPUT }, 
    {"mii1_rxd1.rmii1_rxd1" , OMAP_MUX_MODE1 | AM33XX_INPUT_EN }, 
    {"mii1_rxd0.rmii1_rxd0" , OMAP_MUX_MODE1 | AM33XX_INPUT_EN }, 
    {"rmii1_refclk.rmii1_refclk", OMAP_MUX_MODE0 | AM33XX_INPUT_EN },
    {"mdio_data.mdio_data", OMAP_MUX_MODE0 | AM33XX_PIN_INPUT_PULLUP}, /* BPF: These are included in Linux */
    {"mdio_clk.mdio_clk", OMAP_MUX_MODE0 | AM33XX_PIN_INPUT_PULLUP},
    {NULL, 0},
    };

    We discovered already by comparison with other board files that in one difference from u-boot the MDIO_DATA/MDIO_CLK lines have to be included in each interface's pin muxing, so that's where my comment on them being included in Linux comes from.

    rmii2 is also set up, using the same MDIO/MDC lines.

    Thanks,

    Ben

  • Sounds very strange to me. Can you probe the MDIO to check clock frequency and whether the PHY is actually outputting all F's? I can see no reason why detection should function in u-boot and not in kernel. Can you check if your MDIO pinmux does not get overwritten during kernel startup?
  • Hi Biser,

    On the scope, we are seeing a clock frequency somewhere around 460kHz. We are seeing the polling loop as noted in this thread: http://e2e.ti.com/support/arm/sitara_arm/f/791/t/304929.aspx Which makes it difficult to tell if we are actually accessing the phy or not. 

    What I can tell you is that via some debugging mechanisms including toggling a GPIO while we are doing the access, is that we are currently only seeing the repeating pattern of accesses from the constant polling loop. We DO NOT see any additional accesses on the MDIO bus, so it seems likely that the phy isn't being queried at all to return this data, and that the Fs are happening because no query is actually done.

    When looking at the pin muxing to see if it's getting overwritten, I am seeing that the values for MDIO and MDC are both set to 0x30. According to the TRM, this means pad pullup/pulldown enabled, pullup selected, input enabled. Which matches the AM335X_PIN_INPUT_PULLUP define that I'm using on both these pins - so my pin-muxing seems to be working correctly.

    Any other things we can try? Why wouldn't we be seeing our requests go out over MDIO if our pin muxing is correct?

    By the way, we do see the PHYs respond to the automatic polling loop that is happening - it looks like over the range of all addresses, 2 of them (probably addresses 0 and 3, where our PHYs are, but it's hard to tell) are responding to the query. We can tell because these two have a different bit pattern from the rest of the polling loop, and there is data in the reply instead of the line staying high. So it really seems like the PHYs are out there and working.

  • This scope shot shows the GPIO pin (green) going low during our call to mdiobus_read, but you can see in the un-zoomed portion at the top the rapid polling loop is not interrupted by any outside accesses.

    You can also see that right after the GPIO goes low, we do get a response from a PHY, but this appears to be part of the normal loop as it is not the right address (0) that we put the GPIO toggle on. Later on in the un-zoomed portion of the image, toward the right side, you can see that the polling loop again hits addresses 0 and 3, and gets responses for each.

  • You have the correct MDIO address for the PHY?

  • Hi Benjamin,
     
    Can you post what each channel captures on your scope?
  • Hi Biser,

    Our scope-shot is labeled, but it may not be immediately obvious. The labels are visible in the top-left of the screenshot. Yellow is MDC, blue is MDIO, pink is the Ethernet Reset line which holds our PHY in reset when it is low, and green is the GPIO line which indicates when we are entering the section of code that is supposedly polling PHY address 0.

    Also in things that may not be immediately obvious, the top fifth or so of the screenshot is the zoomed-out view where you can see the entire capture, and the lower 4/5 or so is the zoomed-in portion where you can see the majority of the action play out.

    The zoomed-out view is probably required for understanding the view of the constant polling loop.

    Thanks,

    Ben

  • Frank,

    I'm not certain what you mean here. If you're talking about the address that the PHY is responding to, not only do we think we know that (the PHYs are located at addresses 0 and 3), but Linux also automatically polls all 32 possible addresses, so even if we were wrong about the location Linux should find out where it actually is.

    If you mean the base address of the MDIO, I don't see a setting for this or a way to set this within Linux. Please elaborate?

    Thanks,

    Ben

  • Hi Ben,
     
    I saw the labels now. Well, it seems to me that you are actually getting response on addresses 0 and 3 (at least there are two longer frames, separated by two short, and that pattern is repeating every 32 frames). So I think everything is fine on the hardware side. It has to be a software issue.
  • Ben,
     
    This caught my attention reading the TRM:
     
    "Prior to initiating any other transaction, the station management entity shall send a preamble sequence of 32 contiguous logic one bits on the MDIO_DATA line with 32 corresponding cycles on MDIO_CLK  to provide the PHY with a pattern that it can use to establish synchronization. A PHY shall observe a sequence of 32 contiguous logic one bits on MDIO_DATA with 32 corresponding MDIO_CLK cycles before it responds to any other transaction."

    Can you capture the situation around phy reset release to see if this is observed?

  • Hi Biser,

    We're thinking so as well. The automatic polling loop does seem to get a response from addresses 0 and 3, but one thing we're not seeing is any extra commands going out to query the other addresses. It's like the MAC is operating completely on autopilot and isn't actually connected to the commands we called to go out and query the data.

    This does point to a software problem, as you say, but Linux thinks it has a connection to the MDIO and I don't get any sort of error codes when I tell it to do queries, I just don't get a reply. Which makes it tricky for me to figure out where the disconnect is happening.

    Thanks,

    Ben

  • Hi Biser,

    I'll get the scope set up and see if we can capture this today. One point though, this indicates that the PHY should wait to see this sequence before it responds, and as far as we can tell the PHY is already responding to the polling loop, so either this should have happened or the PHY seems to be ignoring the requirement.

    Thanks,

    Ben

  • Biser,

    Here is our capture of the preamble happening. Screenshot is labeled like the previous one.

    Thanks,

    Ben

  • Definitely NOT a hardware issue. I'm running out of ideas...
  • I am assuming that you are saying that Ethernet works fine in u-boot, then this should work in linux as well. A couple of things to suggest:

    - check the MDIO PHY alive register in MDIO, 14.5.10.3 of the TRM is register that indicates if a PHY is showing up in the polling process, this should be independent of sw and should let you know the part is seeing the PHY.  Using the devmem2 command can you see any bits set to the selected address:

    devmem2 0x4a101000

    - check the gmii_sel register in the control module 9.3.1.31 0x44e10650, not directly phy related but controls the port selection type. The setup in linux is in the arch/arm/mach-omap2/devices.c in the function am33xx_cpsw_init(). Depending on the port method used there is swtich statement that depends on the TI evm type. In the case of a custom board this code will need to be modified to select the correct mode for the PHY on the board.

    - does ethtool eth0 (or eth1, whichever port you are running)

  • Hi Schuyler,

    First of all, it looks like your comment is incomplete. I'll answer everything I can tell here.

    In the MDIO PHY_ALIVE register, I read 0x0. This is register 0x4a101008 according to the TRM, I think, since the offset is 8h. It looks like it's not seeing a response, even though I see one on the scope.

    In case it helps, the MDIO CONTROL register at 0x4a101004 has a value of 0x410000FF, which seems to indicate that the MDIO is enabled and the highest user channel is 1, and the clock is FF. None of that seems bad to me.

    The GMII_SEL register is 0xF5, which seems to indicate that both PHYs are used, both PHYs are RMII (which is correct in our board), that the clock is sourced from the chip (also correct) and that there's no internal delay (I'm not sure about this one). This seems ok to me, unless the internal delay is supposed to be on.

    Ethtool doesn't tell me too much, mostly because /dev/eth0 and /dev/eth1 don't exist. As far as I can tell this is because nobody from the MAC on up thinks they're actually there despite the HW traffic, and they read 0xFF from the MDIO, so the kernel drivers don't get instantiated for them.

    Please let us know what other info you need, and thanks so much for responding!

    -Ben

  • Is the PHY alive register showing a value when running u-boot?  This can be done with:

    md.l 0x4a101008

     

  • Hi Schuyler,

    Yes, the PHY alive register in u-boot is showing 0x9, which seems to indicate the PHYs at both address 0 and address 3 are responding. That's the addresses we have our PHYs at, so that seems correct.

    Thanks,
    Ben

  • Hi Ben,

    This is a good indication that it works in u-boot, that means with a fairly high probability that the PHYs will work in Linux. Let's look at the pin mux registers for MDIO and MDC in both u-boot and Linux. It almost seems like in Linux the pins are defaulting to output mode based on the scope capture, that's a rough guess though.

    In u-boot we want to see the following registers 0x44e10948 (MDIO) and 0x44e1094c (MDC),  please this command  to display both locations,

    md.l 0x44e10948      (mine reads 0x00000030 and 0x00000010)

    In Linux those locations are

    devmem2 0x44e10948 (mine reads 0x00000070, which is different from u-boot, setting a slower slew rate)

    devmem2 0x44e1094c (mine read 0x00000010, same as u-boot)

    Let us know the results, Regards,

    Schuyler

  • Hi Schuyler,

    The Linux result for 0x44e10948 is 0x30, and the result for 0x44e1094c is 0x30.

    By the TRM, that seems to indicate that the MDIO is I/O, and the MDC is also I/O. This looks correct, based on Biser's advice above to change the MDC to I/O for retiming support.

    The U-boot result for 0x44e10948 is 0x30 and the result for 0x44e1094c is 0x10.

    This matches my expectations: in u-boot we had the MDIO set up as output only, and this worked fine. As you can see from the conversation above, this was also the case in Linux as well, prior to debugging the system with Biser. He asked us to switch MDC to an input also, which we did, but this had no effect on our situation and things still didn't work.

    Hope this helps,

    Ben

  • Correction: In u-boot we had MDC, not MDIO, set to output only. This seems to be your setup in Linux as well, but based on Biser's advice above we changed that in our Linux pin-mux.

  • OK, the pin mux looks good. The next thing to look at is the MDIO clock. The scope capture submitted  shows a clock of around 488KHz for the MDC on your board. I don't have an easy way at the moment to look at the clocks on the evm I have. What value do you have for the clock divider in 0x4a101004 for both U-Boot and Linux?  I have 0xff and 0x7c respectively between U-Boot and Linux.

    Is it possible for you to submit a similar scope capture of the MDIO and MDC while running in u-boot? I just wanted to confirm that the scope capture in this thread is for a Linux context.

  • Hi Schuyler,

    My value in Linux is 0x410000FF, and the u-boot value is 0x410000FF, so these are identical. Is your 0x7C from your lower slew rate?

    Sadly the hardware engineer I'm working with has left for the holiday break already, and I believe he's taken the scope captures with him, so I can't post a scope shot from u-boot. However, I can tell you that I believe he already verified the HW clocks were the same.

    The scope capture in this thread is under Linux - the green GPIO line there is toggled when we enter a certain kernel routine, so it has to be from when we were booting the kernel, after we'd already done the pin-muxing. After the holidays I'll see if he can post a shot of the u-boot MDC for comparison.

    Speaking of which, I will also be finishing up for the holiday break at ~5pm EST today, and won't be around until the 30th after that, just so you know. I'll have to put this topic on hold until then.

    Thanks,

    Ben

  • Hi Ben,

    The difference maybe the clocking setup in Linux is different from U-Boot. It looks like U-Boot leaves the MDIO clkdiv at the default value of 0xFF.  I will have to check the code. BTW, which SDK are you using? I will look through the U-Boot and Linux code to see where this clkdiv value gets setup, I want to make sure I am looking in the right source tree.

    Fellow team members have said that on our boards the MDC is typically run around 1MHz. Based on that assuming the same clocking setup we use on the EVM is what you are using, a 0xFF would explain the 488KHz MDC. Though we are not sure this explains the problem.

    I too will out for holiday break until the 6th, Happy Holidays.

  • Hi Schuyler,

    We're required to use an internally-developed Linux distribution which is rolled in-house. According to that development team, there are patches in place that make the build essentially equivalent to the 5.5 SDK but upgraded to a 3.2.2 kernel, but in practice we've seen some other things break too. 

    We've become pretty versed in finding/altering any code we need to, so if there's anything you want me to check on the clocking setup, or some particular way you want me to set it up, I can probably do that. I'm not sure it'd fix the problem either, but I'll try pretty much anything at this point.

    Thanks so much for your help,

    Ben

  • Hi Ben,

    Here is something to look at in the Linux code that setups the MDIO clkdiv. I looked in the 5.05.01 source tree and in the file arch/arm/mach-omap2/devices.c is the function am33xx_cpsw_init(). In this function is a call to omap_device_build("davinci_mdio"...

    There is a structure pointer passed that all it contains is the bus frequency, which here is 1MHz. Based on what we see in this thread it looks like this function is not getting called. If this function returns an error it looks like it would leave the default value in the clkdiv. Based on what you are saying with your code base being different perhaps this function is not getting called or perhaps has an error return.

    Let us know if this helps.

  • Hi Schuyler,

    I confirm that this function wasn't being called. After calling it, I still get all F's when polling all addresses on the MDIO, and my clock rate still appears to be 0xFF.

    On the other hand, I now have an eth0 showing up under ifconfig, even though there's no /dev/eth0 which seems like it would need to be there. Having eth0 show up seems like a nice Christmas miracle anyway!

    This definitely gives me a path to follow though, and I'll post updates here as we discover new things.

    In the meantime, happy holidays and I hope you enjoy your time off! 

    Thanks,

    Ben

  • For network devices there is no /dev device node. All you have is an interface when you do ifconfig. Since you have an eth0 from ifconfig, things are starting to look good.

    Steve K.

  • Hi Schuyler, Steve,

    Happy new year! I wanted to give you an update on our situation. I was able to call the am33xx_cpsw_init function and see eth0, but I was getting a strange notification from sysfs that it wasn't able to create the MDIO, because it already had one.

    I did some hunting and figured out that our kernel config had MDIO_FIXED_PHY defined, which creates a virtual MDIO device which all of our queries were going to - none of them were going down to the hardware level, because they had been redirected to this virtual interface. I'm not sure how that got enabled, but after disabling it, I confirmed that I do detect the proper PHYs at the proper addresses. 

    We were also experiencing a problem where the MDIO initialization happened after the PHY init, so the PHY driver claimed it had no PHYs present. I reordered this by changing the PHY to late_initcall rather than module_init, and that seems to have worked - the PHY is now detected.

    I still can't ping anything, but I suspect that there's something wrong with the PHY driver that's preventing that - I'm seeing the kernel stuff come up and look ok. 

    Thanks for all your help folks!

    -Ben