This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3359: AM3359 PRUETH init/deinit system lockup

Part Number: AM3359
Other Parts Discussed in Thread: TMDSICE3359

Hey everyone,

I'm currently upgrading our Yocto-based distribution from version 3.0/Zeus to 3.1/Dunfell.
Within this process i've found a possible bug in the PRUETH firmware, kernel driver or further software.

When initializing and deinitializing the two interfaces of a PRUETH based system, the system will
lock  up depending on the sequence of ip or ifconfig - commands.

I've not yet tested this behaviour on the current arago distribution, as our system is plain yocto.

How to provoke the lockup

This is a crosscheck to check wether the fault is in our system or in the default meta-ti configuration.
The following was tested on a default yocto/poky with meta-ti layer, both on dunfell tag.
No further configuration was done.

When running

ifconfig eth 1 up
ifconfig eth 0 up

the system locks up:

root@am335x-evm:~# ifconfig
lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0 
          inet6 addr: ::1/128 Scope:Host 
          UP LOOPBACK RUNNING  MTU:65536  Metric:1 
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0 
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B) 
 
root@am335x-evm:~# ifconfig eth1 up 
[   70.616033] remoteproc remoteproc2: powering up 4a338000.pru 
[   70.629768] remoteproc remoteproc2: Booting fw image ti-pruss/am335x-pru1-prueth-fw.elf, size 7344 
[   70.639210] pru-rproc 4a338000.pru: configured system_events[63-0] = 00600000,08a00000 
[   70.647791] pru-rproc 4a338000.pru: configured intr_channels = 0x0000032a host_intr = 0x000002aa 
[   70.656907] remoteproc remoteproc2: remote processor 4a338000.pru is now up 
[   70.664452] net eth1: started 
root@am335x-evm:~# ifconfig eth0 up 
[   75.325870] remoteproc remoteproc1: powering up 4a334000.pru 
[   75.334979] remoteproc remoteproc1: Booting fw image ti-pruss/am335x-pru0-prueth-fw.elf, size 7432 
[   75.344725] pru-rproc 4a334000.pru: configured system_events[63-0] = 00000600,04500080 
[   75.352688] pru-rproc 4a334000.pru: configured intr_channels = 0x000000d5 host_intr = 0x00000155 

When running the commands the other way arround, the system will not lock up.

The error was found on our system, where the interfaces are initialized by our software, and, in order to reconfigure them,
are deinitialized and reinitialized by the software.

This leads to the following effect:

eth0 up ⇒ eth1 up ⇒ eth0 down ⇒ eth1 down ⇒ eth0 up → LOCK
eth0 up ⇒ eth1 up ⇒ eth1 down ⇒ eth0 down ⇒ eth0 up → OK

Is this a known behaviour in the firmware or its drivers, are there any reference materials which state the
correct way to do this initializing an deinitializing, or am i doing something wrong here?

Tested software configuration

The problem was initially found on a system with the kernel from meta-ti/5.4.91
I've tested it with 5.4.93 and various earlier versions, always by checking out the whole repo commit, such that
surrounding software like the pru firmware will be in a compatible state.

On an older commit of the meta-ti layer (zeus, Kernel 5.4.20) the error is not reproducable.

Thanks in advance

Dave

  • Hello Dave,

    I am not sure if your process of initializing / deinitializing PRUETH ports was specifically tested in previous releases.

    1) By "lock up", do you mean the terminal becomes nonresponsive without reporting any errors? Or are you seeing something else?

    2) What steps does your software take to initialize, deinitialize, and reinitialize the PRUETH ports?

    3) I am having trouble mapping your "ifconfig eth 1 up, ifconfig eth 0 up" terminal output to the "eth0 up ⇒ eth1 up ⇒ eth0 down ⇒ eth1 down ⇒ eth0 up → LOCK" statement. Maybe this will be more clear after you tell us more about 2).

    Regards,

    Nick

  • Hello Nick,

    1) The whole system will freeze, not just this terminal session. For example i've run iperf on an interface which i did not change, in this case the iperf measurement froze the same moment as the other terminal sessions - ssh sessions as well as the tty session.

    2,3) When initializing after system boot up, the following commands are executed:

    ip link add name br0 type bridge
    ip link set br0 up
    ip link set eth0 master br0
    ip link set eth1 master br0
    ip link set dev eth0 up
    ip link set dev eth1 up
    

    In this case, there is also a bridge set up, but the error exists with and without the bridge.

    Our system allows to reconfigure the ethernet configuration - e.g. disable the bridge, set up static or dynamic ip adresses.

    In order to change the configuration of a ethernet device, the interface will be shut down, reconfigured and finally brought back up:

    ip addr flush dev br0
    ip addr flush dev eth0
    ip addr flush dev eth1
    ip link set dev eth0 down
    ip link set dev eth1 down
    ip link set dev br0 down
    
    -- reconfiguration (ip calls)
    
    ip link set br0 up
    ip link set eth0 master br0
    ip link set eth1 master br0
    ip link set dev eth0 up
    ip link set dev eth1 up

    This sequence will lock/freeze the system on line 13.

    When i exchange the commands in lines 4 and 5, the system will work fine.

    - Edited -
    It seems that when locking/freezing the pru firmware fires interrupts before the interrupt handler has been initialized - not knowing the PRU firmware this is just a guess, but it may be the problem.

    Also, there is another error or at least warning on the am335x-icev2 with prueth (as well as on our proprietary board) regarding the PRU firmware:

    [    5.180469] davinci_mdio 4a332400.mdio: phy[1]: device 4a332400.mdio:01, driver TI TLK10X 10/100 Mbps PHY
    [    5.213104] davinci_mdio 4a332400.mdio: phy[3]: device 4a332400.mdio:03, driver TI TLK10X 10/100 Mbps PHY
    [    5.261991] remoteproc remoteproc1: 4a334000.pru is available
    [    5.294346] pru-rproc 4a334000.pru: PRU rproc node /ocp/interconnect@4a000000/segment@0/target-module@300000/pruss@0/pru@34000 probed successfully
    [    5.349308] remoteproc remoteproc2: 4a338000.pru is available
    [    5.375469] pru-rproc 4a338000.pru: PRU rproc node /ocp/interconnect@4a000000/segment@0/target-module@300000/pruss@0/pru@38000 probed successfully
    [    5.485774] prueth pruss_eth: could not get ptp tx irq. Skipping PTP support
    [    5.492878] prueth pruss_eth: could not get hsr ptp tx irq. Skipping PTP support
    [    5.664179] prueth pruss_eth: could not get ptp tx irq. Skipping PTP support
    [    5.671285] prueth pruss_eth: could not get hsr ptp tx irq. Skipping PTP support
    [    5.815995] prueth pruss_eth: TI PRU ethernet driver initialized: dual EMAC mode

    It seems that some interrupt elements are missing.

    Regards,

    Dave

  • Hello Dave,

    Thanks for the additional debug. I'll take a look at this by the end of the week - please ping me if I haven't responded by the start of next week.

    Regards,

    Nick

  • Hello Nick,

    Are there any updates on this issue?

    Regards,

    Dave

  • Hello Dave,

    Thanks for the ping. To confirm, which version of PRU Ethernet firmware are you using?

    Are you using HSR or PTP functionality?

    Regards,

    Nick

  • Hello Nick,

    The version used is the PRUETH-fw_5.5.13 (from http://git.yoctoproject.org/cgit/cgit.cgi/meta-ti/tree/recipes-bsp/prueth-fw/prueth-fw_5.5.13.bb?h=dunfell)

    I'm using neither HSR nor PTP.

    Regards,

    Dave

  • Hello Dave,

    The notice about skipping PTP and HSR support should be fine. Let me check to see what testing has been done on AM335x PRUETH with Linux 5.4.

    Regards,

    Nick

  • You mentioned that you did not see the lockup on an earlier version of Linux 5.4. Did you observe this same process working (without lockups) on any earlier versions of Linux? If so, which versions?

  • Hello Nick,

    Thanks for the followup on the skipping notices.

    We did not observe those notices or any problems with the PRUETH functionality. If it helps i can send over both dmesg streams from the older and newer versions, but they are not that different from the default yocto messages on the icev2 board.

    The last used version of Linux was 5.4.20, but i'm not able to find out which version of the PRUETH firmware is used as there is no version or commit in the recipe: git.yoctoproject.org/.../prueth-fw_git.bb

    As i've said, the whole system is build with Yocto, therefore we are using the meta-ti layer in a certain version compatible to the other Yocto layers.

    Between the used Zeus version of each layer and the dunfell version there are major changes on the devicetree, the kernel and of course many other programs from the meta-ti-layer.

    Trying to roll back only the PRUETH firmware, the Kernel, the devicetree and other componentes in various combinations did not help, (as expected, mixing different versions is never a good idea) but therefore as we needed some newer versions from software coming from Yocto itself, rolling back the whole System is not an option for us.

    Regards,

    Dave

  • Hello Dave,

    The development team said this could be a valid bug, so I will submit a requirement for them to look further into it. However, they did not promise me a date for when they would take a closer look to see what was going on. I have not spent a ton of time with this particular driver, but I can do a diff of the prueth driver / remoteproc driver (used to load & initialize the PRUs) and see if anything interesting pops up.

    On your comment here: "It seems that when locking/freezing the pru firmware fires interrupts before the interrupt handler has been initialized - not knowing the PRU firmware this is just a guess, but it may be the problem." - what were you seeing that told you about the interrupts firing too early?

    Regards,

    Nick

  • Hello Nick,

    As a workarround, we redesigned our software to reboot the whole system instead of just disabling and restarting the ethernet devices. As this takes quite a lot more time, we would be thankful if this bug could be fixed, but for now, we do have a intermediate solution.

    As i've said, the error should be reproducable for you when using a TI developed evaluation board like the icev2 (TMDSICE3359).

    On the comment on the interrupts: I've written some firmware elements with remoteproc where i've experienced the same kind of system locking/freezing. In this case the error was an interrupt firing to early, meaning the interrupt handler was not initialized leading to an undefined system state where the system locked up. Not knowing anything about how the PRUETH firmware is written or how the drivers are implemented, i can only guess that this *might* be a problem.
    So, i did not see anything but this line in my initial entry stating that the interrupts have been initialized after which the system froze.

    Regards,

    Dave

  • Hello Nick,

    are there any updates on this topic?

    Regards,

    Dave

  • Hello Dave,

    Thanks for the ping. I am on vacation the rest of this week, but I'll try to replicate your results on my side next week. Feel free to ping me again next week just to make sure this doesn't get lost.

    Regards,

    Nick

  • Hello Nick,

    are there any updates on this topic?

    Regards,

    Dave