This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3358: Problems with ethernet connectivity - redundancy protocol PRP non-offload

Part Number: AM3358


Tool/software:

Hi everyone, 

I am new to microprocessors, Linux and networking so I need your help for my project.

I have this device with an AM3358 processor that I would like to use as a DANP device. It has two ethernet ports and before me (around 2020) someone had already installed the kernel for HSR/PRP redundancy.

I try to set up the prp configuration through this script:


MAC=`ifconfig eth1 | grep HWaddr | awk '{print $5}'`
IP=`ifconfig eth1 2>/dev/null|awk '/inet addr:/ {print $2}'|sed 's/addr://'`
echo "Configuring new PRP network interface with address $IP MAC $MAC"

ifconfig eth0 0.0.0.0 down
ifconfig eth1 0.0.0.0 down
ip link set dev eth0 address $MAC
ip link set dev eth1 address $MAC
ifconfig eth0 up
ifconfig eth1 up
ip link add name prp0 type prp slave1 eth0 slave2 eth1
ip addr add $IP/24 dev prp0
ifconfig prp0 up

It works for some time, I connect the device eth0 to my PC and I can ping the device, same if I connect to eth1.

However the connection randomly stops working and i can't access my device anymore from my PC.

What I’ve observed is that by using tcpdump, the device receives ICMP requests and sends replies. However, the replies don’t reach my PC.
After some time, the communication resumes as if nothing happened, but the issue recurs intermittently. I’ve cheked my hardware integrity and replaced Ethernet cables.
The problem persists regardless of these changes.

Has anyone experienced a similar issue or have suggestions on how to resolve it?
Where should I look or debug further to understand the root cause of this issue?
Could this be caused by some network or kernel configuration?
Thank you for your guidance!

  • Setting up two interfaces with identical MAC and IP addresses does not make sense. With PRP you do not want to change the MAC addresses underneath, you create a virtual interface, typically called prp0 that uses eth0 and eth1 underneath. Please follow the steps in for example https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM64X/latest/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/HSR_PRP_Non_Offload.html#hsr-prp-non-offload .

      Pekka

  • Thank you for your reply.

    However I checked the example you gave me and I observed that it does the same thing on managing MAC addresses:

    ip link set dev $ifa address $mac
    ip link set dev $ifb address $mac

    Isn't it the same I did in my code? Can you help me understand please?
  • Sorry you are correct, MAC's should be identical, the key requirement is that the two LAN's are truly separate (source of my mistake, so pick MAC of one of the interfaces and use it for both). PRP basics is two distinct LANs, https://wiki.wireshark.org/PRP . 

    Did you try the steps? Also what is you network topology, what do the to LANs look like? I would do wireshark captures of traffic in both LANs and see if the traffic is what you expected.

    Another mismatch that jumps out to me is this line in your commands:

    ip link add name prp0 type prp slave1 eth0 slave2 eth1

    does that work? I don't think there is typr prp? ip command man page https://man7.org/linux/man-pages/man8/ip-link.8.html has (see PRP part):

           High-availability Seamless Redundancy (HSR) Support
                  For a link of type HSR the following additional arguments
                  are supported:
    
                  ip link add link DEVICE name NAME type hsr slave1
                  SLAVE1-IF slave2 SLAVE2-IF [ supervision ADDR-BYTE ] [
                  version { 0 | 1 } [ proto { 0 | 1 } ]
    
                          type hsr - specifies the link type to use, here
                          HSR.
    
                          slave1 SLAVE1-IF - Specifies the physical device
                          used for the first of the two ring ports.
    
                          slave2 SLAVE2-IF - Specifies the physical device
                          used for the second of the two ring ports.
    
                          supervision ADDR-BYTE - The last byte of the
                          multicast address used for HSR supervision frames.
                          Default option is "0", possible values 0-255.
    
                          version { 0 | 1 } - Selects the protocol version
                          of the interface. Default option is "0", which
                          corresponds to the 2010 version of the HSR
                          standard. Option "1" activates the 2012 version.
    
                          proto { 0 | 1 } - Selects the protocol at the
                          interface. Default option is "0", which
                          corresponds to the HSR standard. Option "1"
                          activates the Parallel Redundancy Protocol (PRP).

    which would imply something like below to setup prp (magic is proto 1, type is hsr even if it is prp):

    ip link add name prp0 type hsr slave1 eth0 slave2 eth1 supervision 45 proto 1

      Pekka

  • okay so  I have tried many different set ups and I had the same issue in all of them:

    In set-up 1 and set-up 2 I had a REDBOX available so iI used it to make my PC redundant and able to communicate with my device. I have recorded the traffic on my Pc, device and also on the manages switches through the redbox

    Wat I observed on the recording is the problem described on my initial queston (ICMP reply does not reach my PC). I can attach these recordings. rec_28_oct_2024.zip

    Now I do not have the redbox available anymore so I am triyng to do the same tests to understand the issues. I used a software (elipse) to make my pc prp redundant . Also as you can see on set-up 5 I connected two devices together and use my pc with a serial console. Still the same problem occurs.

    Regarding the command to activate the prp interface, i followed this example https://software-dl.ti.com/processor-sdk-linux/esd/docs/06_03_00_106/linux/Industrial_Protocols_HSR_PRP.html because my device does not have the same ip-link version as the one you linked me. Indeed if i write:

     ip link add name $if type hsr slave1 $ifa slave2 $ifb supervision 45 proto 1

    my device is not able to resolve the proto parameter. If i use the help function I have this result:

    UN70HS07M01000305 admin@HMI-011c:~$ ip link add name prp0 type prp help

    Usage: ip link add name NAME type prp slave1 SLAVE1-IF slave2 SLAVE2-IF
    [ supervision ADDR-BYTE ] [ sv_vid SV-VID ] [ sv_pcp SV-PCP ]
    [ sv_cfi SV-CFI ]

    NAME
    name of new prp device (e.g. prp0)
    SLAVE1-IF, SLAVE2-IF
    the two slave devices bound to the PRP device
    ADDR-BYTE
    0-255; the last byte of the multicast address used for PRP supervision
    frames (default = 0)
    SV-VID
    0-4094; VLAN ID to be used in the VLAN tag of SV frames (default 0)
    SV-PCP
    0-7; PCP value to be used in the VLAN tag of SV frames (default 0)
    SV-CFI
    0-1; CFI value to be used in the VLAN tag of SV frames (default 0)
    Use VLAN tag if one of sv_vid, sv_pcp or sv_cfi is specified. Default value
    used for unspecified ones

    UN70HS07M01000305 admin@HMI-011c:~$ ip link add name hsr0 type hsr help
    Usage: ip link add name NAME type hsr slave1 SLAVE1-IF slave2 SLAVE2-IF
    [ supervision ADDR-BYTE ] [version VERSION] [ sv_vid SV-VID ]
    [ sv_pcp SV-PCP ] [ sv_cfi SV-CFI ]

    NAME
    name of new hsr device (e.g. hsr0)
    SLAVE1-IF, SLAVE2-IF
    the two slave devices bound to the HSR device
    ADDR-BYTE
    0-255; the last byte of the multicast address used for HSR supervision
    frames (default = 0)
    VERSION
    0,1; the protocol version to be used. (default = 0)
    SV-VID
    0-4094; VLAN ID to be used in the VLAN tag of SV frames (default 0)
    SV-PCP
    0-7; PCP value to be used in the VLAN tag of SV frames (default 0)
    SV-CFI
    0-1; CFI value to be used in the VLAN tag of SV frames (default 0)
    Use VLAN tag if one of sv_vid, sv_pcp or sv_cfi is specified. Default value
    used for unspecified ones

    It is a really complex problem for me, any Idea why it happens?

  • Many moving parts here, including Windows PC with PRP Sw Elipse? I would focus on one contained setup. But start one piece at a time

    What hardware are you using? Some EVM or community board or your own design? First ensure PRP works in the embedded only environment (2 or preferably 4 AM335x boards) your setup #5, before mixing in a Windows PC.

    Regarding the command to activate the prp interface, i followed this example https://software-dl.ti.com/processor-sdk-linux/esd/docs/06_03_00_106/linux/Industrial_Protocols_HSR_PRP.html because my device does not have the same ip-link version as the one you linked me. Indeed if i write:

     ip link add name $if type hsr slave1 $ifa slave2 $ifb supervision 45 proto 1

    6.3 is a very old SDK (April 2020). iproute2 package (where the command ip is) in there must be an old version as well. So update to something more modern like 9.3 available at https://www.ti.com/tool/PROCESSOR-SDK-AM335X .

    Second comment is on what interface are you using. AM335x has 4 possible Ethernet ports, 2x from CPSW3G (standard Ethernet, 1G RGMII), 2x from ICSS (100M MII). The link https://software-dl.ti.com/processor-sdk-linux/esd/docs/06_03_00_106/linux/Industrial_Protocols_HSR_PRP.html you provide is to the ICSS with special PRP duplicate drop firmware and non-upstream patches to get that working in Linux. What physical interface are you trying to use? The MIIs from ICSS or RGMII from CPSW?

    The ip command, from what I can tell there is no type prp , it is type hsr .. proto 1 that is used to create PRP without duplicate drop offload. The old 6.3 SDK included some patches that never got upstreamed, possibly the type prp is one of them.

      Pekka

  • Sorry for the mess, it is because I am working on it from last september so I did a lot of different tests there. I will try to clear everything up and restart step by step as you are suggesting. 

    About the harware I am not using a EVM but it is a customized design and I have only 2 AM335x boards available. each of them has 2 Ethernet ports, I think they are from ICSS (100M MII) but correct me if I am wrong:

    UN70HS07M01000305 admin@HMI-011c:~$ ethtool eth0
    Settings for eth0:
            Supported ports: [ TP AUI BNC MII FIBRE ]
            Supported link modes:   10baseT/Half 10baseT/Full
                                    100baseT/Half 100baseT/Full
            Supported pause frame use: Symmetric Receive-only
            Supports auto-negotiation: Yes
            Supported FEC modes: Not reported
            Advertised link modes:  10baseT/Half 10baseT/Full
                                    100baseT/Half 100baseT/Full
            Advertised pause frame use: No
            Advertised auto-negotiation: Yes
            Advertised FEC modes: Not reported
            Speed: 10Mb/s
            Duplex: Half
            Port: MII
            PHYAD: 1
            Transceiver: internal
            Auto-negotiation: on
    Cannot get wake-on-lan settings: Operation not permitted
            Current message level: 0x00000000 (0)
    
            Link detected: no
    UN70HS07M01000305 admin@HMI-011c:~$ ethtool eth1
    Settings for eth1:
            Supported ports: [ TP AUI BNC MII FIBRE ]
            Supported link modes:   10baseT/Half 10baseT/Full
                                    100baseT/Half 100baseT/Full
            Supported pause frame use: Symmetric Receive-only
            Supports auto-negotiation: Yes
            Supported FEC modes: Not reported
            Advertised link modes:  10baseT/Half 10baseT/Full
                                    100baseT/Half 100baseT/Full
            Advertised pause frame use: No
            Advertised auto-negotiation: Yes
            Advertised FEC modes: Not reported
            Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                                 100baseT/Half 100baseT/Full
            Link partner advertised pause frame use: Symmetric Receive-only
            Link partner advertised auto-negotiation: Yes
            Link partner advertised FEC modes: Not reported
            Speed: 100Mb/s
            Duplex: Full
            Port: MII
            PHYAD: 5
            Transceiver: internal
            Auto-negotiation: on

    About this:

    6.3 is a very old SDK (April 2020). iproute2 package (where the command ip is) in there must be an old version as well. So update to something more modern like 9.3 available at https://www.ti.com/tool/PROCESSOR-SDK-AM335X .

    I don't know how to do the update the SDK and the ip2route pakage, consider me a newbie on Linux environnement. Is there an easy tutorial you would suggest?

  • Ok so looks like you are using ICSS based Ethernet, and because it is the old 6.3 SDK it has a TI specific iproute2 (which was rejected upstream). So for this platform and SDK, the only route is following the steps in https://software-dl.ti.com/processor-sdk-linux/esd/docs/06_03_00_106/linux/Industrial_Protocols_HSR_PRP.html search for "no offload", ignore the newer SDKs (9.3) and any upstream standard Linux instructions. With a newer SDK the steps with upstream ip command will work for non-offload. AM335x SDK with upstream solution for PRP offload is planned, not yet available.

    I don't know how to do the update the SDK and the ip2route pakage, consider me a newbie on Linux environnement. Is there an easy tutorial you would suggest?

    Are you debugging an issue in a deployed system using PRP, or investigating PRP usage for the first time? If an existing system you should contact whoever manages the build/make of your product. 

      Pekka

  • Okay I understood.

    I tried again to configure PRP redundancy following the "no offload" istructions here and using the set up like this:

    The two devices ping each other and as I imagined the problem still occurs: they stop seeing each other ICMP replies.

    So I really do not know  if this is due to a kernel error (but prp statistics are okay , packets are not malformed) or if the problem is in the ethernet ports configuration.

    (the hardware is okay because if I use it for other purposes it works just fine).

    Are you debugging an issue in a deployed system using PRP, or investigating PRP usage for the first time? If an existing system you should contact whoever manages the build/make of your product. 

    Yes, I am debugging a deployed system using PRP. Unfortunately I couldn't reach who built the product so I am trying to do it myself.

    If any other suggestions come into your mind to where I should investigate more let me know.

    -------------------------------------- added later ----------------------------------------------

    during one of the ping session between devices i got this output on my serial terminal:

    UN70HS07M01000305 admin@HMI-011c:/proc/prp0$ ping 192.168.127.119
    PING 192.168.127.119 (192.168.127.119) 56(84) by[  984.068708] ------------[ cut here ]------------
    tes of data.
    [  984.077427] WARNING: CPU: 0 PID: 3115 at net/hsr-prp/hsr_prp_framereg.c:303 hsr_addr_subst_dest+0x10c/0x134
    [  984.089454] hsr_addr_subst_dest: Unknown node
    [  984.094322] Modules linked in: bluetooth ecdh_generic cfg80211 omap_aes_driver crypto_engine omap_crypto omap_sham omap_rng rng_                                                                                                          core quota_v2 quota_tree
    [  984.109012] CPU: 0 PID: 3115 Comm: ping Tainted: G        W       4.14.94-rt50-ge055122 #1
    [  984.117880] Hardware name: Generic AM33XX (Flattened Device Tree)
    [  984.124469] Backtrace:
    [  984.127061] [<c0110244>] (dump_backtrace) from [<c01104f0>] (show_stack+0x18/0x1c)
    [  984.135299]  r6:00000000 r5:c0837dbc r4:db6ffa48 r3:00400100
    [  984.141388] [<c01104d8>] (show_stack) from [<c060ec60>] (dump_stack+0x20/0x28)
    [  984.148963] [<c060ec40>] (dump_stack) from [<c012bd14>] (__warn+0xd8/0x104)
    [  984.156474] [<c012bc3c>] (__warn) from [<c012bd80>] (warn_slowpath_fmt+0x40/0x48)
    [  984.164495]  r9:db17a4e0 r8:0000f788 r7:00002f89 r6:ddd8e464 r5:db187fc0 r4:ddeda240
    [  984.172770] [<c012bd44>] (warn_slowpath_fmt) from [<c060612c>] (hsr_addr_subst_dest+0x10c/0x134)
    [  984.182098]  r3:c076f2bb r2:c0837e06
    [  984.185857] [<c0606020>] (hsr_addr_subst_dest) from [<c060a0f0>] (hsr_prp_forward_skb+0xfb4/0x11a0)
    [  984.195547]  r5:db187fc0 r4:ddeda240
    [  984.199302] [<c060913c>] (hsr_prp_forward_skb) from [<c0606910>] (hsr_prp_dev_xmit+0x38/0x70)
    [  984.208442]  r10:ddedc9c0 r9:db17a000 r8:c0a063d4 r7:c0a05fcc r6:ddeda300 r5:db17a000
    [  984.216830]  r4:db187680
    [  984.219497] [<c06068d8>] (hsr_prp_dev_xmit) from [<c055bbd8>] (dev_hard_start_xmit+0x94/0x108)
    [  984.228709]  r6:00000000 r5:00000000 r4:ddeda300 r3:c06068d8
    [  984.234836] [<c055bb44>] (dev_hard_start_xmit) from [<c055c1e0>] (__dev_queue_xmit+0x4ec/0x5dc)
    [  984.244096]  r10:db1f8100 r9:db17a158 r8:00000002 r7:ddedc9c0 r6:db17a000 r5:ddeda300
    [  984.252456]  r4:00000000
    [  984.255119] [<c055bcf4>] (__dev_queue_xmit) from [<c055c2e4>] (dev_queue_xmit+0x14/0x18)
    [  984.263799]  r10:c0a335d8 r9:db17a158 r8:00000002 r7:db05ba78 r6:db17a000 r5:ddeda300
    [  984.272186]  r4:db05ba00
    [  984.274853] [<c055c2d0>] (dev_queue_xmit) from [<c056a160>] (neigh_resolve_output+0x16c/0x184)
    [  984.284087] [<c0569ff4>] (neigh_resolve_output) from [<c058e8b0>] (ip_finish_output2+0x308/0x37c)
    [  984.293568]  r8:db17a15c r7:00000010 r6:db17a000 r5:db05ba00 r4:ddeda300
    [  984.301339] [<c058e5a8>] (ip_finish_output2) from [<c058ff58>] (ip_finish_output+0x1f8/0x208)
    [  984.310322]  r10:00000000 r9:777fa8c0 r8:c0a335d8 r7:d8bda000 r6:000005d6 r5:00000001
    [  984.318785]  r4:ddeda300
    [  984.321890] [<c058fd60>] (ip_finish_output) from [<c05908f0>] (ip_output+0x90/0x104)
    [  984.330039]  r10:00000000 r9:777fa8c0 r8:db17a000 r7:d8bda000 r6:c0a335d8 r5:00000001
    [  984.338521]  r4:ddeda300
    [  984.341401] [<c0590860>] (ip_output) from [<c05900f0>] (ip_local_out+0x48/0x4c)
    [  984.349063]  r8:00000000 r7:00000000 r6:c0a335d8 r5:d8bda000 r4:ddeda300
    [  984.356385] [<c05900a8>] (ip_local_out) from [<c0591180>] (ip_send_skb+0x1c/0x5c)
    [  984.364369]  r6:00000040 r5:c0a335d8 r4:d8bda000 r3:00000000
    [  984.370310] [<c0591164>] (ip_send_skb) from [<c05911e8>] (ip_push_pending_frames+0x28/0x38)
    [  984.379268]  r5:db6fff48 r4:d8bda000
    [  984.383197] [<c05911c0>] (ip_push_pending_frames) from [<c05b5648>] (raw_sendmsg+0x5bc/0x830)
    [  984.392302] [<c05b508c>] (raw_sendmsg) from [<c05c22a4>] (inet_sendmsg+0x3c/0x68)
    [  984.400170]  r10:00000000 r9:db6ffe28 r8:00000000 r7:dd158c40 r6:00000040 r5:db6fff48
    [  984.408616]  r4:d8bda000
    [  984.411501] [<c05c2268>] (inet_sendmsg) from [<c0540dcc>] (sock_sendmsg+0x1c/0x2c)
    [  984.419437]  r6:00000000 r5:00000000 r4:db6fff48 r3:c05c2268
    [  984.425637] [<c0540db0>] (sock_sendmsg) from [<c054176c>] (___sys_sendmsg+0x188/0x218)
    [  984.434115] [<c05415e4>] (___sys_sendmsg) from [<c05424a0>] (__sys_sendmsg+0x48/0x6c)
    [  984.442460]  r10:00000128 r9:db6fe000 r8:c010c4c8 r7:00000128 r6:0001025c r5:00000000
    [  984.450830]  r4:dd158c40
    [  984.453508] [<c0542458>] (__sys_sendmsg) from [<c05424d4>] (SyS_sendmsg+0x10/0x14)
    [  984.461704]  r6:00011404 r5:00000040 r4:000113f4
    [  984.466562] [<c05424c4>] (SyS_sendmsg) from [<c010c280>] (ret_fast_syscall+0x0/0x5c)
    [  984.474900] ---[ end trace 0000000000000003 ]---

  • 1. Use a network tap like https://www.profitap.com/profishark-1g/ to collect packet captures of the working setup, and when it stops working. Make a summary of this from the network traffic perspective. Try to correlate any logs in Linux with these events.

    Unfortunately I couldn't reach who built the product

    2. The Sw build and deploy infrastructure. If you are able to isolate the issue, and you need to update something in the boot image or filesystem. How will you get get the fix in. So access to the SW build system needs to be solved.

    3. AM335x Linux SDK 6.3 is not something we support. First suggestion from TI is to move to the latest SDK and see if the problem still persists.

    4. AM335x Linux SDK 6.3 has a TI specific set of patches for PRP which we do not carry in later SDKs. So you'd need to move to newer version.

    As long as you can isolate the issue, see #1, it might not be directly SW. I think this is the starting point.

    For updating the SW I'd suggest contacting https://www.couthit.com/ , they have done backporting features and version updates for AM335x and ICSS Ethernet Linux features like PRP.

      Pekka