This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMDX654IDKEVM: Linux-RT SDK 8.1: Linux ETHTOOL ioctl returns "Operation not supported" when used immediately after administrative link down.

Part Number: TMDX654IDKEVM
Whenever administrative link down is induced on an PRU-ICSSG Ethernet interface (e. g. by `ifconfig eth1 down`) the `ioctl()` call to get MAU type settings using ETHTOOL interface (command 
ETHTOOL_GLINKSETTINGS) returns "Operation not supported".
I could only observe this using application, which is hooked to Linux rt-netlink interface to get updates about the interface status, and reacts to any interface status change by immediately asking for the MAU type by this ETHTOOL ioctl().  Under those conditions however, the issue happens in 100 % of cases.
This behaviour is observed on Linux-RT SDK image 8.1, but not on older version 8.0.
My guess would be a new data race somewhere.
To make this a question:  When is this planned to be fixed?
  • Hello it is 3 weeks, can you please share with us update of the issue?

  • Hi,

    Sorry for the delay.

    Unfortunately, I am not able to reproduce the issue. Can you elaborate on the your setup and steps to reproduce the issue.

    Regards,
    Tanmay

  • We are working on easier way of reproducing this and will get to you.

  • Hi Tanmay,

    There are exact steps how to reproduce

    1. Init netlink socket
    2. Ethtool handshake
    3. Set administrative link down
    4. Get link settings -> EOPNOTSUPP

    Simple example "example.c" is attached. Example contains steps mentioned above for eth1 interface.

    Compile and run the example on TMDX654IDKEVM to reproduce the bug.

    Regards,
    Lukas

    4186.example.c
    #include <errno.h>
    #include <error.h>
    #include <linux/ethtool.h>
    #include <linux/if_packet.h>
    #include <linux/netlink.h>
    #include <linux/rtnetlink.h>
    #include <linux/sockios.h>
    #include <net/if.h>
    #include <stdbool.h>
    #include <stdio.h>
    #include <stdint.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/ioctl.h>
    #include <sys/socket.h>
    #include <sys/types.h>
    #include <unistd.h>
    
    int nwords;
    uint8_t commData[sizeof(struct ethtool_link_settings) + 3 * 4 * 4]; // struct + 3 * 32/8 * nwords -> allocate for nwords <= 4
    struct ethtool_link_settings *data = (struct ethtool_link_settings *)commData;
    
    /* Get interface state with link mode, store them to data */
    static void getLinkSettings(int fd, const char* devname)
    {
        int ret;
        struct ifreq ifr;
    
        memset(&ifr, 0, sizeof(ifr));
    
        strncpy(ifr.ifr_name, devname, IFNAMSIZ);
        ifr.ifr_data = (char *)data;
        data->link_mode_masks_nwords = nwords;
        data->cmd = ETHTOOL_GLINKSETTINGS;
    
        ret = ioctl(fd, SIOCETHTOOL, &ifr);
        if (ret < 0)
        {
            error(-53, errno, "Cannot get link settings");
        }
    }
    
    /* Do ethtool api handshake and obtain current nwords */
    void getNwords(int fd, const char* devname, int* nwords)
    {
        int ret;
        struct ifreq ifr;
        struct ethtool_link_settings handshake;
    
        memset(&ifr, 0, sizeof(ifr));
    
        strncpy(ifr.ifr_name, devname, IFNAMSIZ);
        ifr.ifr_data = (char *)(&handshake);
        handshake.cmd = ETHTOOL_GLINKSETTINGS;
    
        ret = ioctl(fd, SIOCETHTOOL, &ifr);
        if (ret < 0)
        {
            error(-53, errno, "Failed ethtool handshake");
        }
    
        // Validate nwords, expected is negative value
        if (handshake.link_mode_masks_nwords >= 0 || handshake.cmd != ETHTOOL_GLINKSETTINGS)
        {
            error(-53, errno, "Ethtool handshake wrong value");
        }
    
        *nwords = abs(handshake.link_mode_masks_nwords);
    }
    
    /* Init netlink socket for ethtool */
    int initNetlinkSocket(const char* devname)
    {
        int fd;
        int ret;
        struct sockaddr_nl sa_nl;
    
        // open socket for netlink
        fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
        if (fd < 0)
        {
            error(-53, errno, "Cannot get netlink socket");
        }
    
        memset(&sa_nl, 0, sizeof(sa_nl));
        sa_nl.nl_family = AF_NETLINK;
        sa_nl.nl_groups = RTMGRP_LINK;
    
        // bind netlink socket
        ret = bind(fd, (struct sockaddr *)&sa_nl, sizeof(sa_nl));
        if (ret == -1)
        {
            error(-53, errno, "Netlink socket bind failed");
        }
    
        getNwords(fd, devname, &nwords);
    
        return fd;
    }
    
    /* Set administrative link state */
    void setAdminLinkState(int fd, const char* devname, bool up)
    {
        int ret;
        struct ifreq ifr;
    
        memset(&ifr, 0, sizeof(ifr));
    
        strncpy(ifr.ifr_name, devname, IFNAMSIZ);
    
        // get current state of flasg
        ret = ioctl(fd, SIOCGIFFLAGS, &ifr);
        if (ret < 0)
        {
            error(-53, errno, "Cannot get administrative link state");
        }
    
        // set requested link state
        if (up)
        {
            ifr.ifr_flags |= IFF_UP;
        }
        else
        {
            ifr.ifr_flags &= ~IFF_UP;
        }
    
        // write to device
        ret = ioctl(fd, SIOCSIFFLAGS, &ifr);
        if (ret < 0)
        {
            error(-53, errno, "Cannot set administrative link state");
        }
    }
    
    int main()
    {
        char* iface = "eth1";
    
        int nlFd = initNetlinkSocket(iface);
    
        setAdminLinkState(nlFd, iface, false);
    
        getLinkSettings(nlFd, iface);
    
    }

  • Hi Lukas, 

    I discussed this with my colleagues and found out that this is not a bug. When a link down issued, we disconnect the phy itself. Hence the query to get the link setting fails after a link down. I am not sure why it worked for you in SDK 8.0 in first place.

    This is not planned to be changed in near future. So, I would like to know the need and involvement of this feature in your project so that we can look for a workaround/solution for this. 

    Regards,
    Tanmay

  • Hi Tanmay,

    When you set link down you are not able to neither get nor set supported MAU types, this behavior is not good.
    Also when you set MAU types and do link down and link up, then the interface is in "default" and does not has previous configuration.

    Our usage of this is set link settings when is link down and then set link up. Only this procedure ensure link establishing with requested configuration. It correctly worked on SDK 8.0 and older versions. Now the interface behaves differently than all other systems.

    Regards,
    Lukas

  • Hello Tanmay,

    for starters, I have to point out that the PHY disconnection you mentioned is merely a cause of the bug, and by no means a reason why this behavior would not be a bug.

    To wit:

    1. On SDK 8.1, this behavior started.  It means that the semantics of standard API shifted away from the usual de-facto standard.  (It worked properly at least since SDK 6.3 to SDK 8.0.)
    2. Also this means that your particular implementation leaks implementation details through standardized API.

    Those are two reasons why this issue is definitely a bug.

    Why is this bug a problem?

    Because we now cannot start the link with defined state.  We simply have no way to set MAU type and link up atomically, and this bug took the possibility to set MAU type before link up away from us.

    Since our project is extremely sensitive to detailed behavior of our link partner, starting the link and only then changing the MAU type is option that may not even be acceptable for us.

    What is the impact of this bug on our project?

    Currently, this bug prevents us from updating our software base from SDK 8.0 upwards, thus preventing us from using any fixes of TI bugs that already appeared in newer releases.  Some of these fixes are necessary for features in our current project timeline, so this bug may happen to actually stall the whole project.

    Attempt to workaround this bug would cost us several manweeks and discussions with multiple external parties at best.
    At worst, it would prove impossible.

    What could be a solution for this bug?

    I, personally, see at least two of them:

    First:  Just keep the PHY connected after link down.  This looks like the most simple, while completely correct solution from my perspective (I know no details of your implementation).

    Second:  Before disconnecting the PHY, load its relevant state into the driver and turn the PHY down to a state, where it cannot interact with a link partner.  Then, as long as PHY is disconnected, manage the PHY state in the driver.  Finally, when the PHY is connected, save the state kept in driver into it and then forward the set calls as usual.

    None of those solutions seems difficult or complex to invent or implement, but they need to be done on the driver side inside kernel, which rules out any solution in application, which simply does not have any fine controls over PHY.

    Finally

    I hope this post made some things clear and helped us to align our viewpoints.
    Hopefully, it will also help you to plan the fix for this bug into near future.

    Best Regards,
    Jan

  • I have reported one more issue which might be related:

    TMDX654IDKEVM: Linux-RT SDK 8.2: 'ethtool -s eth1 speed 1000 duplex full autoneg on' does not work for eth1-eth5

    e2e.ti.com/.../tmdx654idkevm-linux-rt-sdk-8-2-ethtool--s-eth1-speed-1000-duplex-full-autoneg-on-does-not-work-for-eth1-eth5

  • Hi Jan, Lukas,

    Thanks for the explanation. It was very clear and helpful. The issue is indeed something which has to be fixed. Hence, I have filed an internal ticket for it (https://jira.itg.ti.com/browse/LCPD-27922).

    I will update you on the progress as soon as we have a fix.

    Regards,
    Tanmay

  • The fix for this is available in our official ti-linux-5.10.y branch, so it would be part of any upcoming AM65x Processor SDK release (v8.3 or later):

    https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/?h=ti-linux-5.10.y&id=91ea70eb3334d9d5b2c9ea13b274bbeb5bc678f9&context=3&ignorews=0&dt=0

    Regards, Andreas