This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

66AK2L06: System hangup after Ethernet cable plug-in

Part Number: 66AK2L06

Hi All!

We have a custom board with:

  1. 66AK2L06 SoC working as PCIe root complex
  2. SoC is connected to FPGA (Xilinx Artix 7) endpoint through PCIe port #0 (5Gb link)
  3. Marvel Ethrnet PHY (88E1512) connected to SGMII port #0 (1Gb full duplex link)

PCIe and SGMII links are active and everything works as expected until Ethernet cable is unplugged and plugged back.
SoC hangs 2-3 seconds after ethernet cable is plugged in. It also happens when link autonegotiation is re-triggered on both sides with ethtool.
It hangs so thoroughly that I'm not even able to connect to it with XDSv100 emulator.

Some observations:

  1. This happens only when PCIe link is loaded with transactions.
  2. When Ethernet link speed is 100/10Mb this hangup happens later than on 1Gb (sometimes - tens of seconds).

This hangup very similar to the occasion where PCIe data region is accessed without active link. In this case SoC hangs the same way.
My guess is that somehow when Ethernet cable is plugged in, PCIe link is dropped during transaction and core hangs.
On FPGA side we see no link drops after hangup, but it could be so short that we simply miss it.

What can be the cause of such hang and how can I debug this situation?

Regards,
Yurii

  • Hi All!

    The problem was in custom FPGA driver. This driver constantly reads data from FPGA over DMA and pushes it to circular buffer.

    User-space application reads this buffer in chunks and sends over Ethernet. When the cable was plugged in, some delays happened
    in sendto function for short period of time, but large enough for circular DMA buffer to overflow. Unfortunately, overflow event handling
    contained a bug which caused kernel memory corruption.

    Regards,
    Yurii

  • Thanks for sharing your findings.


    Best Regards,
    Yordan

  • Hi, Yordan!

    Fixing a bug in our driver does not resolve initial problem. Kernel memory corruption was fixed, but kernel still hangs up.
    Sorry for premature resolution confirmation.

    I have removed all out-of-tree drivers and software from filesystem image and wrote simple UDP test program:

    #include <sys/socket.h>
    #include <arpa/inet.h>
    #include <unistd.h>
    
    int main(int argc, char* argv[])
    {
    	int		sock;
    	in_addr_t	dst;
    	unsigned	data[512];
    
    	sock = socket(AF_INET, SOCK_DGRAM, 0);
    	dst = inet_addr("192.168.0.101");
    
    	sockaddr_in srv_addr{};
    	srv_addr.sin_family		= AF_INET;
    	srv_addr.sin_port		= htons(5502);
    	srv_addr.sin_addr.s_addr	= dst;
    
    	while(true)
    	{
    		sendto(sock, data, sizeof(data), 0, (sockaddr*)&srv_addr, sizeof(srv_addr));
    		usleep(32000u);
    	}

    return 0;
    }

    No PCIe or DMA transactions are running in background. But this simple code bricks our system shortly after Ethernet cable plug in. Nothing is output to the UART console (loglevel=9) except Ethernet link down/up messages:

    [   20.567935] netcp-1.0 2620110.netcp eth0: Link is Down
    [   25.129224] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow control rx/tx.

    Some time after cable plug in Wireshark receives garbled Ethernet messages with bad type, src/dst MAC, etc. Data is same in all messages and it resembles ELF binaries (I've found ELF headers in message data).

    It turns out that this issue is somehow related to the network stack. How can I debug this?

    Regards,
    Yurii

    PS. I'm using kernel from Linux-RT SDK v05.01.00.11 built with tisdk_k2l-evm-rt_defconfig.

  • Hi,

    Thanks for sharing the kernel version.

    It turns out that this issue is somehow related to the network stack. How can I debug this?

    Nothing specific comes to my mind at the moment. Could you share bootlog & your dts files?

    Best Regards,
    Yordan

  • Hi Yordan,

    This is my boot log:


    [    0.000000] Booting Linux on physical CPU 0x0
    [    0.000000] Linux version 4.14.67-rt40 (build@localhost) (gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #1 SMP PREEMPT RT Tue Oct 22 11:53:23 MSK 2019
    [    0.000000] CPU: ARMv7 Processor [412fc0f4] revision 4 (ARMv7), cr=30c5387d
    [    0.000000] CPU: div instructions available: patching division code
    [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
    [    0.000000] OF: fdt: Machine model: k10sv processing module
    [    0.000000] Memory policy: Data cache writealloc
    [    0.000000] Switching physical address space to 0x800000000
    [    0.000000] efi: Getting EFI parameters from FDT:
    [    0.000000] efi: UEFI not found.
    [    0.000000] cma: Reserved 24 MiB at 0x000000087e400000
    [    0.000000] On node 0 totalpages: 1048576
    [    0.000000] free_area_init_node: node 0, pgdat c1049880, node_mem_map edbf9000
    [    0.000000]   DMA zone: 1728 pages used for memmap
    [    0.000000]   DMA zone: 0 pages reserved
    [    0.000000]   DMA zone: 196608 pages, LIFO batch:31
    [    0.000000]   HighMem zone: 851968 pages, LIFO batch:31
    [    0.000000] psci: probing for conduit method from DT.
    [    0.000000] psci: Using PSCI v0.1 Function IDs from DT
    [    0.000000] percpu: Embedded 15 pages/cpu @edbc5000 s31904 r8192 d21344 u61440
    [    0.000000] pcpu-alloc: s31904 r8192 d21344 u61440 alloc=15*4096
    [    0.000000] pcpu-alloc: [0] 0 [0] 1 
    [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1046848
    [    0.000000] Kernel command line: quiet rootwait=1 earlyprintk root=/dev/ram0 rw
    [    0.000000] PID hash table entries: 4096 (order: 2, 16384 bytes)
    [    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
    [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
    [    0.000000] Memory: 4114136K/4194304K available (8192K kernel code, 301K rwdata, 2452K rodata, 2048K init, 344K bss, 55592K reserved, 24576K cma-reserved, 3383296K highmem)
    [    0.000000] Virtual kernel memory layout:
    [    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    [    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    [    0.000000]     vmalloc : 0xf0800000 - 0xff800000   ( 240 MB)
    [    0.000000]     lowmem  : 0xc0000000 - 0xf0000000   ( 768 MB)
    [    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    [    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
    [    0.000000]       .text : 0xc0008000 - 0xc0a00000   (10208 kB)
    [    0.000000]       .init : 0xc0e00000 - 0xc1000000   (2048 kB)
    [    0.000000]       .data : 0xc1000000 - 0xc104b520   ( 302 kB)
    [    0.000000]        .bss : 0xc104d000 - 0xc10a30c4   ( 345 kB)
    [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
    [    0.000000] Preemptible hierarchical RCU implementation.
    [    0.000000] 	RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
    [    0.000000] 	RCU priority boosting: priority 1 delay 500 ms.
    [    0.000000] 	No expedited grace period (rcu_normal_after_boot).
    [    0.000000] 	Tasks RCU enabled.
    [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
    [    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
    [    0.000000] GIC: Using split EOI/Deactivate mode
    [    0.000000] arch_timer: cp15 timer(s) running at 200.00MHz (phys).
    [    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x2e2049d3e8, max_idle_ns: 440795210634 ns
    [    0.000005] sched_clock: 56 bits at 200MHz, resolution 5ns, wraps every 4398046511102ns
    [    0.000009] Switching to timer-based delay loop, resolution 5ns
    [    0.000170] keystone timer clock @200000000 Hz
    [    0.000310] Console: colour dummy device 80x30
    [    0.000323] console [tty0] enabled
    [    0.000340] Calibrating delay loop (skipped), value calculated using timer frequency.. 400.00 BogoMIPS (lpj=2000000)
    [    0.000346] pid_max: default: 32768 minimum: 301
    [    0.000433] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
    [    0.000439] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
    [    0.000969] CPU: Testing write buffer coherency: ok
    [    0.001192] /cpus/cpu@0 missing clock-frequency property
    [    0.001227] /cpus/cpu@1 missing clock-frequency property
    [    0.001245] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
    [    0.040058] Setting up static identity map for 0x80200000 - 0x80200138
    [    0.080048] Hierarchical SRCU implementation.
    [    0.140479] EFI services will not be available.
    [    0.160158] smp: Bringing up secondary CPUs ...
    [    0.261647] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
    [    0.261785] smp: Brought up 1 node, 2 CPUs
    [    0.261795] SMP: Total of 2 processors activated (800.00 BogoMIPS).
    [    0.261799] CPU: All CPU(s) started in HYP mode.
    [    0.261804] CPU: Virtualization extensions available.
    [    0.262593] devtmpfs: initialized
    [    0.270435] random: get_random_u32 called from bucket_table_alloc+0x148/0x284 with crng_init=0
    [    0.280418] VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0
    [    0.280642] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
    [    0.280655] futex hash table entries: 512 (order: 3, 32768 bytes)
    [    0.281218] pinctrl core: initialized pinctrl subsystem
    [    0.281841] DMI not present or invalid.
    [    0.282233] NET: Registered protocol family 16
    [    0.284518] DMA: preallocated 256 KiB pool for atomic coherent allocations
    [    0.285817] hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
    [    0.285823] hw-breakpoint: maximum watchpoint size is 8 bytes.
    [    0.290859] gpio gpiochip0: (davinci_gpio.0): added GPIO chardev (254:0)
    [    0.290962] gpiochip_setup_dev: registered GPIOs 0 to 31 on device: gpiochip0 (davinci_gpio.0)
    [    0.294718] gpio gpiochip1: (davinci_gpio.1): added GPIO chardev (254:1)
    [    0.294813] gpiochip_setup_dev: registered GPIOs 32 to 63 on device: gpiochip1 (davinci_gpio.1)
    [    0.309102] media: Linux media interface: v0.10
    [    0.309145] Linux video capture interface: v2.00
    [    0.309245] pps_core: LinuxPPS API ver. 1 registered
    [    0.309251] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
    [    0.309268] PTP clock support registered
    [    0.309298] EDAC MC: Ver: 3.0.0
    [    0.309655] dmi: Firmware registration failed.
    [    0.309874] Advanced Linux Sound Architecture Driver Initialized.
    [    0.310918] clocksource: Switched to clocksource arch_sys_counter
    [    0.321328] NET: Registered protocol family 2
    [    0.321993] TCP established hash table entries: 8192 (order: 3, 32768 bytes)
    [    0.322062] TCP bind hash table entries: 8192 (order: 5, 229376 bytes)
    [    0.322273] TCP: Hash tables configured (established 8192 bind 8192)
    [    0.322384] UDP hash table entries: 512 (order: 3, 32768 bytes)
    [    0.322426] UDP-Lite hash table entries: 512 (order: 3, 32768 bytes)
    [    0.322679] NET: Registered protocol family 1
    [    0.323204] RPC: Registered named UNIX socket transport module.
    [    0.323212] RPC: Registered udp transport module.
    [    0.323218] RPC: Registered tcp transport module.
    [    0.323224] RPC: Registered tcp NFSv4.1 backchannel transport module.
    [    0.323235] PCI: CLS 0 bytes, default 64
    [    0.323504] Trying to unpack rootfs image as initramfs...
    [    0.446303] Freeing initrd memory: 2156K
    [    0.446638] hw perfevents: no interrupt-affinity property for /pmu, guessing.
    [    0.446981] hw perfevents: enabled with armv7_cortex_a15 PMU driver, 7 counters available
    [    0.447646] platform alarmtimer: set dma_pfn_offset00780000
    [    0.448700] workingset: timestamp_bits=14 max_order=20 bucket_order=6
    [    0.456189] squashfs: version 4.0 (2009/01/31) Phillip Lougher
    [    0.456999] NFS: Registering the id_resolver key type
    [    0.457025] Key type id_resolver registered
    [    0.457031] Key type id_legacy registered
    [    0.457094] ntfs: driver 2.1.32 [Flags: R/O].
    [    0.458867] bounce: pool size: 64 pages
    [    0.458921] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
    [    0.458930] io scheduler noop registered
    [    0.458937] io scheduler deadline registered
    [    0.459076] io scheduler cfq registered (default)
    [    0.459084] io scheduler mq-deadline registered
    [    0.459091] io scheduler kyber registered
    [    0.459591] keystone_irq soc:keystone_irq@26202a0: irqchip registered, nr_irqs 28
    [    0.460169] ti,keystone-serdes 2320000.phy: init fw ks2_pcie_serdes.bin: version 3.3.0.2c
    [    0.461106] ti,keystone-serdes 232a000.phy: init fw ks2_gbe_serdes.bin: version 3.3.0.2c
    [    0.462333] gpio-syscon soc:keystone_dsp_gpio@02620240: can't read the dir register offset!
    [    0.462350] gpiochip_find_base: found new base at 484
    [    0.462528] gpio gpiochip2: (soc:keystone_dsp_gpio@02620240): added GPIO chardev (254:2)
    [    0.462630] gpiochip_setup_dev: registered GPIOs 484 to 511 on device: gpiochip2 (soc:keystone_dsp_gpio@02620240)
    [    0.462718] gpio-syscon soc:keystone_dsp_gpio@2620244: can't read the dir register offset!
    [    0.462733] gpiochip_find_base: found new base at 456
    [    0.462890] gpio gpiochip3: (soc:keystone_dsp_gpio@2620244): added GPIO chardev (254:3)
    [    0.462992] gpiochip_setup_dev: registered GPIOs 456 to 483 on device: gpiochip3 (soc:keystone_dsp_gpio@2620244)
    [    0.463080] gpio-syscon soc:keystone_dsp_gpio@2620248: can't read the dir register offset!
    [    0.463095] gpiochip_find_base: found new base at 428
    [    0.463257] gpio gpiochip4: (soc:keystone_dsp_gpio@2620248): added GPIO chardev (254:4)
    [    0.463355] gpiochip_setup_dev: registered GPIOs 428 to 455 on device: gpiochip4 (soc:keystone_dsp_gpio@2620248)
    [    0.463443] gpio-syscon soc:keystone_dsp_gpio@262024c: can't read the dir register offset!
    [    0.463463] gpiochip_find_base: found new base at 400
    [    0.463624] gpio gpiochip5: (soc:keystone_dsp_gpio@262024c): added GPIO chardev (254:5)
    [    0.463717] gpiochip_setup_dev: registered GPIOs 400 to 427 on device: gpiochip5 (soc:keystone_dsp_gpio@262024c)
    [    0.464806] keystone-pcie 21800000.pcie: GPIO lookup for consumer reset
    [    0.464813] keystone-pcie 21800000.pcie: using device tree for GPIO lookup
    [    0.464823] of_get_named_gpiod_flags: can't parse 'reset-gpios' property of node '/soc/pcie@21800000[0]'
    [    0.464831] of_get_named_gpiod_flags: can't parse 'reset-gpio' property of node '/soc/pcie@21800000[0]'
    [    0.464837] keystone-pcie 21800000.pcie: using lookup tables for GPIO lookup
    [    0.464843] keystone-pcie 21800000.pcie: lookup for GPIO reset failed
    [    0.465129] OF: PCI: host bridge /soc/pcie@21800000 ranges:
    [    0.465146] OF: PCI:   MEM 0x50000000..0x5fffffff -> 0x50000000
    [    0.565994] keystone-pcie 21800000.pcie: link up
    [    0.566168] keystone-pcie 21800000.pcie: PCI host bridge to bus 0000:00
    [    0.566181] pci_bus 0000:00: root bus resource [bus 00-ff]
    [    0.566188] pci_bus 0000:00: root bus resource [mem 0x50000000-0x5fffffff]
    [    0.566217] pci 0000:00:00.0: [104c:b00a] type 01 class 0x060400
    [    0.566501] PCI: bus0: Fast back to back transfers disabled
    [    0.566653] pci 0000:01:00.0: [10ee:7021] type 00 class 0x050000
    [    0.566695] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x0fffffff]
    [    0.566863] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
    [    0.591009] PCI: bus1: Fast back to back transfers disabled
    [    0.591044] pci 0000:00:00.0: BAR 8: assigned [mem 0x50000000-0x5fffffff]
    [    0.591058] pci 0000:01:00.0: BAR 0: assigned [mem 0x50000000-0x5fffffff]
    [    0.591071] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
    [    0.591080] pci 0000:00:00.0:   bridge window [mem 0x50000000-0x5fffffff]
    [    0.591295] pcieport 0000:00:00.0: limiting MRRS to 256
    [    0.591692] pcieport 0000:00:00.0: Signaling PME with IRQ 108
    [    0.591953] pcieport 0000:00:00.0: AER enabled with IRQ 108
    [    0.593467] keystone-navigator-qmss soc:qmss@2a40000: qmgr start queue 0, number of queues 8192
    [    0.593668] keystone-navigator-qmss soc:qmss@2a40000: added qmgr start queue 0, num of queues 8192, reg_peek f0b40000, reg_status f0b25000, reg_config f0b27000, reg_region f0b29000, reg_push f0b80000, reg_pop f0bc0000
    [    0.593938] keystone-navigator-qmss soc:qmss@2a40000: firmware file ks2_qmss_pdsp_acc48.bin downloaded for PDSP
    [    0.597314] keystone-navigator-dma soc:knav_dmas@0: DMA dma_gbe registered 149 logical channels, flows 128, tx chans: 21, rx chans: 91
    [    0.662741] Serial: 8250/16550 driver, 10 ports, IRQ sharing disabled
    [    0.662840] platform serial8250: set dma_pfn_offset00780000
    [    0.666691] 2530c00.serial: ttyS0 at MMIO 0x2530c00 (irq = 26, base_baud = 12500000) is a TI DA8xx/66AK2x
    [    0.694075] brd: module loaded
    [    0.704904] loop: module loaded
    [    0.707166] platform Fixed MDIO bus.0: set dma_pfn_offset00780000
    [    0.707456] mdio_bus fixed-0: GPIO lookup for consumer reset
    [    0.707463] mdio_bus fixed-0: using lookup tables for GPIO lookup
    [    0.707470] mdio_bus fixed-0: lookup for GPIO reset failed
    [    0.707493] libphy: Fixed MDIO Bus: probed
    [    0.709246] mdio_bus 26200f00.mdio: GPIO lookup for consumer reset
    [    0.709253] mdio_bus 26200f00.mdio: using device tree for GPIO lookup
    [    0.709263] of_get_named_gpiod_flags: can't parse 'reset-gpios' property of node '/soc/mdio@26200f00[0]'
    [    0.709270] of_get_named_gpiod_flags: can't parse 'reset-gpio' property of node '/soc/mdio@26200f00[0]'
    [    0.709276] mdio_bus 26200f00.mdio: using lookup tables for GPIO lookup
    [    0.709282] mdio_bus 26200f00.mdio: lookup for GPIO reset failed
    [    0.760973] davinci_mdio 26200f00.mdio: davinci mdio revision 1.7, bus freq 2500000
    [    0.760980] libphy: 26200f00.mdio: probed
    [    0.765317] davinci_mdio 26200f00.mdio: phy[0]: device 26200f00.mdio:00, driver Marvell 88E1510
    [    0.765950] netcp-1.0 2620110.netcp: initialized cpsw ale version 1.4
    [    0.765957] netcp-1.0 2620110.netcp: ALE Table size 1024
    [    0.765990] netcp-1.0 2620110.netcp: cpts: overflow check period 350 (jiffies)
    [    0.766000] netcp-1.0 2620110.netcp: CPTS: ref_clk_freq:600000000 calc_mult:3579139413 calc_shift:31 error:-1 nsec/sec
    [    0.766776] netcp-1.0 2620110.netcp: module(netcp-xgbe) not used for device
    [    0.766989] i2c /dev entries driver
    [    0.767580] IR NEC protocol handler initialized
    [    0.767585] IR RC5(x/sz) protocol handler initialized
    [    0.767589] IR RC6 protocol handler initialized
    [    0.767594] IR JVC protocol handler initialized
    [    0.767598] IR Sony protocol handler initialized
    [    0.767602] IR SANYO protocol handler initialized
    [    0.767607] IR Sharp protocol handler initialized
    [    0.767611] IR MCE Keyboard/mouse protocol handler initialized
    [    0.767615] IR XMP protocol handler initialized
    [    0.769150] sdhci: Secure Digital Host Controller Interface driver
    [    0.769154] sdhci: Copyright(c) Pierre Ossman
    [    0.769320] sdhci-pltfm: SDHCI platform and OF driver helper
    [    0.771823] platform snd-soc-dummy: set dma_pfn_offset00780000
    [    0.773292] NET: Registered protocol family 10
    [    0.784506] Segment Routing with IPv6
    [    0.784588] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
    [    0.785247] NET: Registered protocol family 17
    [    0.785521] Key type dns_resolver registered
    [    0.785662] Registering SWP/SWPB emulation handler
    [    0.794672] hctosys: unable to open rtc device (rtc0)
    [    0.805117] ALSA device list:
    [    0.805122]   No soundcards found.
    [    0.807861] Freeing unused kernel memory: 2048K
    [    1.454503] random: dd: uninitialized urandom read (512 bytes read)
    [    1.623527] netcp-1.0 2620110.netcp eth0: Link is Down
    [    1.626918] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
    [    1.690480] random: dropbear: uninitialized urandom read (32 bytes read)
    [    4.872333] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow control rx/tx
    [    4.872361] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
    

    DTS files are scattered, because I have to make copies and modify these files from original Linux-RT SDK:

    • keystone-k2l-evm.dts
    • keystone-k2l-netcp.dtsi

    This is modified version of keystone-k2l-netcp.dtsi file:


    /*
     * Device Tree Source for Keystone 2 Lamarr Netcp driver
     *
     * Copyright 2015 Texas Instruments, Inc.
     *
     * This program is free software; you can redistribute it and/or modify
     * it under the terms of the GNU General Public License version 2 as
     * published by the Free Software Foundation.
     */
    
    qmss: qmss@2a40000 {
        compatible = "ti,keystone-navigator-qmss";
        dma-coherent;
        #address-cells = <1>;
        #size-cells = <1>;
        clocks = <&chipclk13>;
        ranges;
        queue-range    = <0 0x2000>;
        linkram0    = <0x100000 0x4000>;
        linkram1    = <0x70000000 0x10000>; /* 1MB OSR mem */
    
        qmgrs {
            #address-cells = <1>;
            #size-cells = <1>;
            ranges;
            qmgr0 {
                managed-queues = <0 0x2000>;
                reg = <0x2a40000 0x20000>,
                      <0x2a06000 0x400>,
                      <0x2a02000 0x1000>,
                      <0x2a03000 0x1000>,
                      <0x23a80000 0x20000>,
                      <0x2a80000 0x20000>;
                reg-names = "peek", "status", "config",
                        "region", "push", "pop";
            };
        };
        queue-pools {
            qpend {
                qpend-0 {
                    qrange = <658 8>;
                    interrupts =<0 40 0xf04 0 41 0xf04 0 42 0xf04
                             0 43 0xf04 0 44 0xf04 0 45 0xf04
                             0 46 0xf04 0 47 0xf04>;
                };
                qpend-1 {
                    qrange = <528 16>;
                    interrupts = <0 48 0xf04 0 49 0xf04 0 50 0xf04
                              0 51 0xf04 0 52 0xf04 0 53 0xf04
                              0 54 0xf04 0 55 0xf04 0 56 0xf04
                              0 57 0xf04 0 58 0xf04 0 59 0xf04
                              0 60 0xf04 0 61 0xf04 0 62 0xf04
                              0 63 0xf04>;
                    qalloc-by-id;
                };
                qpend-2 {
                    qrange = <544 16>;
                    interrupts = <0 64 0xf04 0 65 0xf04 0 66 0xf04
                              0 59 0xf04 0 68 0xf04 0 69 0xf04
                              0 70 0xf04 0 71 0xf04 0 72 0xf04
                              0 73 0xf04 0 74 0xf04 0 75 0xf04
                              0 76 0xf04 0 77 0xf04 0 78 0xf04
                              0 79 0xf04>;
                };
            };
            general-purpose {
                gp-0 {
                    qrange = <4000 64>;
                };
                netcp-tx {
                    qrange = <896 128>;
                    qalloc-by-id;
                };
            };
            accumulator {
                acc-low-0 {
                    qrange = <480 32>;
                    accumulator = <0 47 16 2 50>;
                    interrupts = <0 226 0xf01>;
                    multi-queue;
                };
            };
        };
    
        descriptor-regions {
            #address-cells = <1>;
            #size-cells = <1>;
            ranges;
            region-12 {
                id = <12>;
                region-spec = <8192 128>;    /* num_desc desc_size */
                link-index = <0x4000>;
            };
        };
    
        pdsps {
            #address-cells = <1>;
            #size-cells = <1>;
            ranges;
            pdsp0@0x2a10000 {
                reg = <0x2a10000 0x1000    /*iram */
                       0x2a0f000 0x100     /*reg*/
                       0x2a0c000 0x3c8       /*intd */
                       0x2a20000 0x4000>;  /*cmd*/
                id = <0>;
            };
        };
    
    }; /* qmss */
    
    knav_dmas: knav_dmas@0 {
        compatible = "ti,keystone-navigator-dma";
        clocks = <&papllclk>;
        #address-cells = <1>;
        #size-cells = <1>;
        ranges;
        ti,navigator-cloud-address = <0x23a80000 0x23a90000>;
    
        dma_gbe: dma_gbe@0 {
            reg = <0x26186000 0x100>,
                  <0x26187000 0x2a0>,
                  <0x26188000 0xb60>,
                  <0x26186100 0x80>,
                  <0x26189000 0x1000>;
            reg-names = "global", "txchan", "rxchan",
                    "txsched", "rxflow";
            ti,enable-all;
        };
    };
    
    gbe_subsys: subsys@26200000 {
        compatible = "syscon";
        reg = <0x26200000 0x100>;
    };
    
    gbe_serdes0: phy@232a000 {
        compatible        = "ti,keystone-serdes-gbe";
        reg            = <0x0232a000 0x2000>;
        status            = "disabled";
        link-rate-kbps        = <1250000>;
        num-lanes        = <1>;
        #address-cells    = <1>;
        #size-cells    = <0>;
    
        serdes0_lane0: lane@0 {
            #phy-cells    = <0>;
            reg        = <0>;
            status        = "okay";
            control-rate    = <2>;
            rx-start    = <7 5>;
            rx-force    = <1 1>;
            tx-coeff    = <0 0 0 12 4>;
        };
    };
    
    netcp: netcp@26000000 {
        reg = <0x2620110 0x8>;
        reg-names = "efuse";
        compatible = "ti,netcp-1.0";
        #address-cells = <1>;
        #size-cells = <1>;
    
        /* NetCP address range */
        ranges = <0 0x26000000 0x1000000>;
    
        clocks = <&clkpa>, <&clkcpgmac>, <&chipclk12>;
        clock-names = "pa_clk", "ethss_clk", "cpts";
        dma-coherent;
    
        ti,navigator-dmas = <&dma_gbe 0>, <&dma_gbe 0>;
        ti,navigator-dma-names = "netrx0", "nettx";
    
        netcp-devices {
            #address-cells = <1>;
            #size-cells = <1>;
            ranges;
            gbe@200000 { /* ETHSS */
                label = "netcp-gbe";
                compatible = "ti,netcp-gbe-5";
                syscon-subsys = <&gbe_subsys>;
                reg = <0x200100 0x400>, <0x220000 0x20000>;
                tx-queue = <896>;
                tx-channel = "nettx";
    
                interfaces {
                    gbe0: interface-0 {
                        phys        = <&serdes0_lane0>;
                        slave-port    = <0>;
                        link-interface    = <1>;
                        phy-handle    = <&ethphy0>;
                    };
                };
            };
        };
    
        netcp-interfaces {
            interface-0 {
                rx-channel = "netrx0";
                rx-pool = <1024 12>;
                tx-pool = <1024 12>;
                rx-queue-depth = <128 128 0 0>;
                rx-buffer-size = <1518 4096 0 0>;
                rx-queue = <528>;
                tx-completion-queue = <530>;
                efuse-mac = <1>;
                netcp-gbe = <&gbe0>;
    
            };
        };
    };
    
    sa_subsys: subsys@26080000 {
        #address-cells = <1>;
        #size-cells = <1>;
        compatible = "simple-bus";
        ranges = <0 0x26080000 0x40000>;
        dma-coherent;
        dma-ranges;
    
        sa_config: subsys@0 {
            compatible = "syscon";
            reg = <0x0 0x100>;
        };
    
        rng@24000 {
            compatible = "ti,keystone-rng";
            reg = <0x24000 0x1000>;
            ti,syscon-sa-cfg = <&sa_config>;
            clocks = <&clksa>;
            clock-names = "fck";
        };
    };

    You can see that I've removed gbe_serdes1 node. This node hides pcie0_phy node located in keystone.dtsi file so PCIe link cant be established because of wrong serdes setup. Also I've removed second NetCP interface and corresponding Navigator DMA. Maybe I've touched something vital here.

    This is modified version of keystone-k2l-emv.dts file:


    /*
     * K10SV device tree
     */
    /dts-v1/;
    
    #include "keystone.dtsi"
    #include "keystone-k2l-k10sv.dtsi"
    
    / {
        compatible = "k10sv", "ti,k2l", "ti,keystone";
        model = "k10sv processing module";
    
        soc {
            clocks {
                refclksys: refclksys {
                    #clock-cells = <0>;
                    compatible = "fixed-clock";
                    clock-frequency = <100000000>;
                    clock-output-names = "refclk-sys";
                };
            };
        };
    };
    
    &usb_phy {
        status = "disabled";
    };
    
    &keystone_usb0 {
        status = "disabled";
    };
    
    &usb0 {
        status = "disabled";
    };
    
    &i2c0 {
        status = "disabled";
    };
    
    &i2c1 {
        status = "disabled";
    };
    
    &i2c2 {
        status = "disabled";
    };
    
    &uart1 {
        status = "disabled";
    };
    
    &uart2 {
        status = "disabled";
    };
    
    &uart3 {
        status = "disabled";
    };
    
    &spi0 {
        status = "disabled";
    };
    
    &spi1 {
        status = "disabled";
    };
    
    &mdio {
        status = "okay";
        ethphy0: ethernet-phy@0 {
            compatible = "marvell,88E1514", "marvell,88E1510", "ethernet-phy-ieee802.3-c22";
            reg = <0>;
        };
    };
    
    &pcie0_phy {
        clocks = <&clkpcie>;
        clock-names = "fck";
        status = "okay";
        num-lanes = <1>;
    };
    
    &pcie0 {
        ti,syscon-pcie-id = <&pcie_devid>;
        ti,syscon-pcie-mode = <&pcie_mode>;
        dma-coherent;
        num-ob-windows = <32>;
        num-lanes = <1>;
        status = "okay";
    };
    
    &gbe_serdes0 {
        status = "okay";
    };

    Here I've disabled unused peripherals and added parameters to pcie0/pcie0_phy node.

    Best Regards,
    Yurii

  • I've tried latest published Processor SDK and issue is still there - kernel crash after cable connect.

    My previous observations were incorrect, packet data is partially valid short time after cable connect. Partially because only first four bytes of Ethernet header (destination address) are correct. After that goes UDP payload.

    It seems that kernel runs out of network buffers. May be they do not return to the TX free queue when cable is disconnected?

    Best Regards,
    Yurii

  • Hi Yurii,

    It seems that kernel runs out of network buffers. May be they do not return to the TX free queue when cable is disconnected?

    It seems possible. Let me consult the design team.

    Best Regards,
    Yordan

  • Hi Yordan,

    Another observation: this behavior affects only UDP/ICMP (e.g. connection-less) transmission from SoC to the network. Flood pinging remote PC (ping -i 0.001 192.168.0.101) also causes kernel freeze some time after cable connect. In the same time, when I transfer large files over SSH, everything works seamlessly.

    I've tried to filter out UDP/ICMP packets in netcp_ndo_start_xmit function (simply drop skb's) and kernel keeps running after link down/up sequence. Now I'll try to step into in NetCP internals.

    Best Regards,
    Yurii

  • Also I've figured out that this problem affects kernels v4.14 (PSDK 05.01.00.11) and v4.19 (PSDK 06.01.00.08), but not the kernel v4.9 from latest v4 PSDK (04.03.00.05). So the problem was introduced between v4.9 and v4.14. I'll try to bisect this in few days.

    Best Regards,
    Yurii

  • Hi All!

    It turns out that ti-rt-linux-4.9.y and ti-rt-linux-4.14.y have common ancestor in (surprise) mainline Linux v4.9 commit. And ti-rt-linux-4.9.y branch contains commit

    commit b5376be4f653d003d173a7e9e9db4b0528b04e3f
    Author: WingMan Kwok <w-kwok2@ti.com>
    Date:   Fri Jan 20 15:34:37 2017 -0500

        net: netcp: fix K2L/E system lockup caused by gbe cable unplug

    which is missing in ti-rt-linux-4.14.y.

    I've merged NetCP changes and packet accelerator framework from ti-rt-linux-4.9 to ti-rt-linux-4.14 and now everything works flawlessly.

    Best Regards,
    Yurii