This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
Hello E2E Team,
we face an issue with the am6442 SR2 USB controller. The controller sometimes causes a system hang if we configure the USB device via our setup script. Only a hard power cycle help to reset the system.
am6442 System:
Testsetup:
Custom am6442 board connected to a Win10 PC via USB2.0. The PC try to establish a SSH connection via the RNDIS Ethernet device. After successful establish the connection, the Windows system send a reboot command to the device. After the power cycle the Windows system try to connect again via USB RNDIS. The USB cable connection is not removed while the test, the Windows 10 PC and the device are connected while the hole test. the system hang is not reproduceable, sometimes it takes 7 reboots, sometimes 500+.
The following script configures the USB controller at Linux init time as a device, with CDC-ECM and RNDIS. The controller is configured as "otg" in the device-tree of the Linux-kernel.
Script abstract:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# config 1 is for CDC
mkdir -p configs/c.1
echo "${NET_USB_ATTR}" > configs/c.1/bmAttributes
echo "${NET_USB_PWR}" > configs/c.1/MaxPower
mkdir -p "configs/c.1/strings/${CONFIG_USB_LANGID}"
echo "CDC" > "configs/c.1/strings/${CONFIG_USB_LANGID}/configuration"
mkdir -p functions/ecm.usb0
mkdir -p configs/c.2
echo "${NET_USB_ATTR}" > configs/c.2/bmAttributes
echo "${NET_USB_PWR}" > configs/c.2/MaxPower
mkdir -p "configs/c.2/strings/${CONFIG_USB_LANGID}"
echo "RNDIS" > "configs/c.2/strings/${CONFIG_USB_LANGID}/configuration"
mkdir -p functions/rndis.usb0
# On Windows 7 and later, the RNDIS 5.1 driver would be used by default,
# but it does not work very well. The RNDIS 6.0 driver works better. In
# order to get this driver to load automatically, we have to use a
# Microsoft-specific extension of USB.
echo "1" > os_desc/use
echo "${MS_VENDOR_CODE}" > os_desc/b_vendor_code
echo "${MS_QW_SIGN}" > os_desc/qw_sign
init_mac_leases 3
local host_mac; host_mac=$(get_mac "${NET_USB_INTERFACE}" "${NET_USB_CONFIGFILE}" "HOST_")
log_info "Initializing usb net for interface ${NET_USB_INTERFACE} with HOST_MAC ${host_mac} and serial number ${serial_number}"
echo "${host_mac}" > functions/rndis.usb0/host_addr
local mac; mac=$(get_mac "${NET_USB_INTERFACE}" "${NET_USB_CONFIGFILE}")
log_info "Setting up interface ${NET_USB_INTERFACE} with IP ${NET_USB_IP} and MAC ${mac}"
echo "${mac}" > functions/ecm.usb0/dev_addr
echo "${mac}" > functions/ecm.usb0/host_addr
echo "${mac}" > functions/rndis.usb0/dev_addr
echo "${mac}" > functions/rndis.usb0/host_addr
echo "${MS_COMPAT_ID}" > functions/rndis.usb0/os_desc/interface.rndis/compatible_id
echo "${MS_SUBCOMPAT_ID}" > functions/rndis.usb0/os_desc/interface.rndis/sub_compatible_id
ln -s functions/ecm.usb0 configs/c.1
ln -s functions/rndis.usb0 configs/c.2
ln -s configs/c.2 os_desc
# add appropriate usb device to the gadget (AM64x specific)
echo "f400000.usb" > UDC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
After the "echo "f400000.usb" > UDC" the system hang.
In case of the system hang the Linux stops working, no communication via UART, Ethernet and USB was possible.
We check the silicon errata for the am6442, we add the bugfix for "i2409 — USB: USB2 PHY locks up due to short suspend" sadly that didn't resolve the problem.
Reading the Cadence silicon errata (https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/drivers/usb/cdns3/cdns3-gadget.c?h=ti-rt-linux-6.1.y-cicd),
we remove the power management stuff from the cdns-plat.c driver code:
Sadly, this didn't help with the stuck at init time. Remove the pm stuff let the system boot, but no communication is going on the USB device, reconfigure the device via the script cause the system hang.
We found this E2E post which may relates to our problem:
Best regards,
Stefan
Hi Stefan,
I have assigned your query to our expert. He is out of office today, so expect a response tomorrow when he is in office or early next week.
Apologies for the delay.
Best Regards,
Suren
HI Stefan
Urgency on this issue noted. Our responses maybe slower than usual as the key expert has limited access next week.
Few follow up questions
1) Any way to see if different host give different results? Window 11 or Linux host?
2) Do you have some more data on failure , how many boards tried and how many failing , are you thinking it is some sort of marginality in software or silicon or every board is currently susceptible to it with prolonged reboot tests?
3) Is power cycling done properly , no noise/glitches etc and no glitches in software start up sequence on host side? Several customers do USB cable connect/disconnect via MCCI like products for robustness testing. Are your robustness testing mostly around powering on/off host
4) Is it possible for you to replicate the issue on the TI EVM?
Hi Mukul,
According to your follow-up questions:
1: We currently evaluate only on Win10 Systems, but we can setup a Linux (Debian) test as well.
2: We see the failure on different custom boards, using the same kernel and rootfs. We have two different Hardware (product) variants utilize the am64 device, on both boards the problem is seen. In total we have seen the problem on three custom boards. We did not have any other long-term USB-Tests. Sadly, the board stops working after the crash, even the Linux UART console buffer is not written out, so we do not have any snapshot of the error itself.
We configure the USB core for two different descriptors, as we found the cadence comment in the Linux driver regarding the Buffer init, we now focus on the USB core hw init. We see the error also if we configure only one descriptor.
3: The am64 board use a PMIC for the Core voltages and separate converters for the DDR4 Voltages. The Host PC is not power cycled, so the host keeps the same power status while the whole test. We do not disconnect the cable while the reboot-test.
4: Now we do not have test the am64-evm board right now, we plan to build the test with the am64-evm.
BR
Stefan
Hi Stefan,
I am currently in a full-day training this week and next week, and don't have bandwidth on any EVM hands-on work. But please let me know once you are able to replicate the issue on AM64x GPEVM, then I will look into it.
I don remember we had a customer had a AM64x USB device mode issue in Linux (but I cannot recall at this moment if it was such dead lockup as you observed). But the issue only happened when USB0 dr_mode in kernel device tree was configured to "peripheral", but it didn't happen if dr_mode = "org". ("otg" is the default setting for USB0 in the SDK kernel.)
Hi Bin,
we currently setup two am64-evm boards connected to different PC systems, we came back with the test results. We run the USB controller in dr_mode = "org"
BR
Stefan
Hi Stefan,
we came back with the test results.
Do you mean you are able to reproduce the issue on two AM64x-EVM?
Hi Bin,
Hi Mukul,
yes, we are able tor reproduce the error on different am64-evm boards with different PC systems. Here our setups:
Setup 1: (Win 10 with VMWare Debian)
The reboot script is running in the VMWare
am64-evm
Custom Hardware
Setup 2: (Win 10)
The reboot script is running on Windows
2 different am64-evm boards
BR
Stefan
Hi Stefan,
Thanks for the details. I am still in training this week, and will be back to work next week and try to reproduce the issue on my EVM and look into it. Due the accumulated work in these two weeks while I am in training, my process would be slower next week. I will keep you posted.
Hi Stefan,
Setup 1: (Win 10 with VMWare Debian)
The reboot script is running in the VMWare
It appears in this setup, the AM64x USB RNDIS gadget is enumerated/communicated with the Debian running in Win10 VMWare, not the Win10, right?
I am asking is because I don't have a Window PC to test with, I only have access to Linux PC to connect my AM64x GPEVM with.
Hi Stefan,
Thanks for confirming. I will try to replicate the issue with AM64x GPEVM connecting with Linux USB host early next week.
Hi Stefan,
Sorry for the late response.
I am unable to reproduce the issue on my AM64x GPEVM. I don't see which Processor SDK version uses kernel v6.1.20, so I used SDK v9.0.0.3 which has kernel v6.1.33.
I see your USB gadget config uses ECM and RNDIS gadget functions, so I used the following USB gadget config script to create the composite gadget with both ECM and RNDIS functions. It generates two USB ethernet interfaces on both the EVM and Linux PC. I then assigned different subnet IP addresses to both (usb0: 192.168.3.x, usb1: 192.168.4.x), and I can ping and ssh to the EVM.
After "reboot" the EVM in the ssh session, repeat the same setup, then I can ssh from the Linux PC to the EVM again using ether usb0 or usb1.
Do you see what I did different from what you tested?
#!/bin/bash # $1: -d - tear down #FUNCS=("mass_storage.usb0") #FUNCS=("hid.usb0") #FUNCS=("uvc.usb0") #FUNCS=("uac1.usb0") #FUNCS=("uac2.usb0") #FUNCS=("SourceSink.usb0") #FUNCS=("uvc.usb0" "hid.usb0") #FUNCS=("acm.usb0" "ncm.usb0" "acm.usb1") #FUNCS=("uac1.usb0" "hid.usb0") #FUNCS=("uac2.usb0" "hid.usb0") #FUNCS=("acm.usb0" "acm.usb1") FUNCS=("ecm.usb0" "rndis.usb0") CFS=/sys/kernel/config/usb_gadget VID="0x1d6d" PID="0x0104" LANG="0x409" GNAME=g1 RNDIS_DEV_ADDR="12:22:33:44:55:66" RNDIS_HOST_ADDR="12:22:33:44:55:65" ECM_DEV_ADDR="12:22:33:44:55:68" ECM_HOST_ADDR="12:22:33:44:55:67" hid_report="\\x05\\x01\\x09\\x06\\xa1\\x01\\x05\\x07\\x19\\xe0\\x29\\xe7\\x15\\x00\\x25\\x01\\x75\\x01\\x95\\x08\\x81\\x02\\x95\\x01\\x75\\x08\\x81\\x03\\x95\\x05\\x75\\x01\\x05\\x08\\x19\\x01\\x29\\x05\\x91\\x02\\x95\\x01\\x75\\x03\\x91\\x03\\x95\\x06\\x75\\x08\\x15\\x00\\x25\\x65\\x05\\x07\\x19\\x00\\x29\\x65\\x81\\x00\\xc0" init() { zcat /proc/config.gz | grep 'CONFIGFS_FS=' > /dev/null || exit 1 lsmod | grep libcomposite > /dev/null || modprobe libcomposite || exit 2 mount | grep configfs > /devnull || mount -t configfs none $(dirname $CFS) || exit 3 } create_gadget() { [ ! -d ${CFS}/${GNAME} ] || exit 5 mkdir ${CFS}/${GNAME} && cd ${CFS}/${GNAME} || exit 6 echo "$VID" > idVendor echo "$PID" > idProduct mkdir strings/$LANG echo "0123456789" > strings/$LANG/serialnumber echo "Foo Inc" > strings/$LANG/manufacturer echo "Bar gadget" > strings/$LANG/product } # configuraton naming: configs/<name>.<number> create_config() { [ -d ${CFS}/${GNAME} ] && cd ${CFS}/${GNAME} || exit 5 mkdir configs/c.1 mkdir configs/c.1/strings/$LANG echo "conf1" > configs/c.1/strings/$LANG/configuration } # $1 - function name # function naming: functions/<name>.<instance_name> create_func_single() { local _func=$1 mkdir functions/${_func} || return case $_func in "mass_storage."*) local _msc=/dev/shm/gmsc-${_func#*.}.file [ -f $_msc ] || dd if=/dev/zero of=$_msc bs=1M count=32 echo $_msc > functions/${_func}/lun.0/file ;; "hid."*) echo 1 > functions/${_func}/protocol echo 1 > functions/${_func}/subclass echo 8 > functions/${_func}/report_length echo -ne $hid_report > functions/${_func}/report_desc ;; "uvc."*) mkdir functions/${_func}/control/header/h cd functions/${_func}/control ln -s header/h class/fs ln -s header/h class/ss cd ${CFS}/${GNAME} _w=640 _h=360 _fps=30 mkdir -p functions/${_func}/streaming/uncompressed/u/${_h}p cd functions/${_func}/streaming/uncompressed/u/${_h}p echo $_w > wWidth echo $_h > wHeight echo $((_h * _w * 2 * _fps)) > dwMaxBitRate echo $((_h * _w * 2 * _fps)) > dwMinBitRate echo $((_h * _w * 2)) > dwMaxVideoFrameBufferSize echo 333333 > dwFrameInterval echo 333333 > dwDefaultFrameInterval #666666 #1000000 #5000000 cd ${CFS}/${GNAME} mkdir functions/${_func}/streaming/header/h cd functions/${_func}/streaming/header/h ln -s ../../uncompressed/u cd ../../class/fs ln -s ../../header/h cd ../../class/hs ln -s ../../header/h cd ../../class/ss ln -s ../../header/h cd ${CFS}/g1 echo 1024 > functions/${_func}/streaming_maxpacket ;; "rndis."*) # echo $RNDIS_HOST_ADDR > functions/${_func}/host_addr # echo $RNDIS_DEV_ADDR > functions/${_func}/dev_addr # match MS built-in RNDIS driver echo EF > functions/${_func}/class echo 04 > functions/${_func}/subclass echo 01 > functions/${_func}/protocol ;; "ecm."*) # echo $ECM_HOST_ADDR > functions/${_func}/host_addr # echo $ECM_DEV_ADDR > functions/${_func}/dev_addr ;; esac ln -s functions/${_func} configs/c.1 } activate() { local _udc _udc=$(ls /sys/class/udc/) # TODO check $_udc # case $port in # "1") _udc=48890000.usb ;; # "2") _udc=488d0000.usb;; # esac echo _ $_udc _ echo "$_udc" > UDC } teardown() { # TODO: test hid & uvc local _ent [ -d ${CFS}/${GNAME} ] && cd ${CFS}/${GNAME} || exit 5 echo "" > UDC for _ent in $(ls configs/c.1/); do [[ "$_ent" != "MaxPower" ]] || continue [[ "$_ent" != "bmAttributes" ]] || continue [[ "$_ent" != "strings" ]] || continue rm -f configs/c.1/$_ent done for _ent in $(ls functions/); do case $_ent in "uvc."*) rm -rf functions/$_ent/streaming 2>/dev/null ;; esac rm -rf functions/$_ent 2>/dev/null done rmdir configs/c.1/strings/$LANG rmdir configs/c.1 rmdir strings/$LANG cd .. && rmdir $GNAME echo "toredown" } ### MAIN ### case "$1" in "-d") teardown; exit 0;; #"1") port=1;; #"2") port=2;; #*) echo "invalid param $1."; exit 1;; esac init create_gadget create_config for func in ${FUNCS[*]}; do echo "creating $func ..." create_func_single $func done activate echo created
Hi Bin,
Thank you for your support! Our setup differs slightly from the default image as we need to use the latest sysfw to enable pru_eth in MII mode. We encounter a hang issue after numerous reboots, sometimes up to 800. Initially, the reboots proceed without any problems or hangs.
Hi Stefan,
Thanks. Now I understand the hang issue only happens after hundreds times of reboot, but
Our setup differs slightly from the default image as we need to use the latest sysfw
Have you tested if the issue happens with the same sysfw from the SDK? I am trying to narrow the components for reproducing the symptom.
Hi Bin,
Have you tested if the issue happens with the same sysfw from the SDK? I am trying to narrow the components for reproducing the symptom.
Not now, because of the mii topic. I can setup a test, but this need some time ~1 Week
Hi Stefan,
. I can setup a test, but this need some time ~1 Week
Looking forward to the result.
Meanwhile, if there is anything I can do on my setup, please provide the exact instruction based on any of the SDK version. Please note that I have my own configfs script (attached above). The one your provided in your first post misses the definition of some macros.
Hello Bin,
I’ve attached the script from our Windows PC for your reference (Powershell script). We reboot the board by establishing an SSH connection via RNDIS. The purpose of this test is to verify if the USB interface is operational and if the board is up and ready.
Best Regards,
Stefan
# Path to Plink executable $plinkPath = "plink" # Remote host details $remoteHost = "192.168.200.1" $username = "root" $password = "root" # Counter for the number of reboots $rebootCount = 0 # Function to establish SSH connection and reboot the remote system function Reboot-RemoteSystem { param ( [string]$rhost, [string]$user, [string]$pass ) $command = "echo y | $plinkPath -ssh $user@$remoteHost -pw $pass reboot" $output = Invoke-Expression -Command $command if ($output -like "*reboot*") { return $true } else { return $false } } # Infinite loop to attempt connection, reboot, and count while ($true) { $success = Reboot-RemoteSystem -host $remoteHost -user $username -pass $password if ($success) { Write-Output "The system did not reboot" } else { $rebootCount++ Write-Output "Reboot count: $rebootCount" } # Wait for 1 minute before attempting to reconnect Start-Sleep -Seconds 45 }
Hi Stefan,
I don't have access to a Windows PC. My computers are all Linux based.
I modified my script attached above (usbconfigfs.sh.txt):
FUNCS=("ecm.usb0" "rndis.usb0")
to
FUNCS=("rndis.usb0")
so that the USB gadget has only rndis function, my Linux PC and still enumerate it using "rndis_host" driver.
I understand you have to use USB gadget configfs, but can you please try to reproduce the issue with a Linux host? Then I can replicate it on my side and debug it.
# Path to Plink executable $plinkPath = "plink" # Remote host details $remoteHost = "192.168.200.1" $username = "root" $password = "root" #$password = "" # Counter for the number of reboots $rebootCount = 0 # Function to establish SSH connection and reboot the remote system function Reboot-RemoteSystem { param ( [string]$rhost, [string]$user, [string]$pass ) $command = "echo y | $plinkPath -ssh $user@$remoteHost -pw $pass /sbin/reboot" #$command = "$plinkPath -ssh $user@$remoteHost -pw $pass reboot" #$command = "echo y | $plinkPath -ssh -l $user @$remoteHost reboot" $output = Invoke-Expression -Command $command if ($output -like "*reboot*") { return $true } else { return $false } } # Infinite loop to attempt connection, reboot, and count while ($true) { $success = Reboot-RemoteSystem -host $remoteHost -user $username -pass $password if ($success) { Write-Output "The system did not reboot" } else { $rebootCount++ Write-Output "Reboot count: $rebootCount" } # Wait for 1 minute before attempting to reconnect Start-Sleep -Seconds 45 }
#!/bin/sh #RNDIS config for WIN 7, WIN 10, Linuy and MAC OSx set -e # command line parameters command="$1" # "up" or "down" udc_device="f400000.usb" # a udc device name, such as "musb-hdrc.1.auto" config_home="/sys/kernel/config/" g="/sys/kernel/config/usb_gadget/AM64xRef" usb_up() { usb_ver="0x0200" # USB 2.0 dev_class="2" vid="0x1919" pid="0x0815" device="0x3001" mfg="SICK AG" prod="AM64xRef" serial="12345682" attr="0xC0" # Self powered pwr="1" # 2mA cfg1="CDC" cfg2="RNDIS" # add colons for MAC address format mac="00:06:77:12:AB:50" dev_mac1="${mac}" host_mac1="${mac}" dev_mac2="${mac}" host_mac2="${mac}" ms_vendor_code="0xcd" # Microsoft ms_qw_sign="MSFT100" # also Microsoft (if you couldn't tell) ms_compat_id="RNDIS" # matches Windows RNDIS Drivers -> done ms_subcompat_id="5162001" # matches Windows RNDIS 6.0 Driver -> done modprobe libcomposite if [ ! -d ${config_home}/usb_gadget/ ]; then echo "Mount usb config dir.." mount none ${config_home} -t configfs else echo "Usb config dir mounted" fi if [ -d ${g} ]; then if [ "$(cat ${g}/UDC)" != "" ]; then echo "Gadget is already up." exit 1 fi echo "Cleaning up old directory..." usb_down fi echo "Setting up gadget..." # Create a new gadget mkdir -p ${g} echo "${usb_ver}" > ${g}/bcdUSB echo "${dev_class}" > ${g}/bDeviceClass echo "${vid}" > ${g}/idVendor echo "${pid}" > ${g}/idProduct echo "${device}" > ${g}/bcdDevice mkdir -p ${g}/strings/0x409 echo "${mfg}" > ${g}/strings/0x409/manufacturer echo "${prod}" > ${g}/strings/0x409/product echo "${serial}" > ${g}/strings/0x409/serialnumber # Create 2 configurations. The first will be CDC. The second will be RNDIS. # Thanks to os_desc, Windows should use the second configuration. # config 1 is for CDC mkdir -p ${g}/configs/c.1 echo "${attr}" > ${g}/configs/c.1/bmAttributes echo "${pwr}" > ${g}/configs/c.1/MaxPower mkdir -p ${g}/configs/c.1/strings/0x409 echo "${cfg1}" > ${g}/configs/c.1/strings/0x409/configuration # Create the CDC function mkdir -p ${g}/functions/ecm.usb0 echo "${dev_mac1}" > ${g}/functions/ecm.usb0/dev_addr echo "${host_mac1}" > ${g}/functions/ecm.usb0/host_addr # config 2 is for RNDIS mkdir -p ${g}/configs/c.2 echo "${attr}" > ${g}/configs/c.2/bmAttributes echo "${pwr}" > ${g}/configs/c.2/MaxPower mkdir -p ${g}/configs/c.2/strings/0x409 echo "${cfg2}" > ${g}/configs/c.2/strings/0x409/configuration # On Windows 7 and later, the RNDIS 5.1 driver would be used by default, # but it does not work very well. The RNDIS 6.0 driver works better. In # order to get this driver to load automatically, we have to use a # Microsoft-specific extension of USB. echo "1" > ${g}/os_desc/use echo "${ms_vendor_code}" > ${g}/os_desc/b_vendor_code echo "${ms_qw_sign}" > ${g}/os_desc/qw_sign # Create the RNDIS function, including the Microsoft-specific bits mkdir -p ${g}/functions/rndis.usb0 echo "${dev_mac2}" > ${g}/functions/rndis.usb0/dev_addr echo "${host_mac2}" > ${g}/functions/rndis.usb0/host_addr echo "${ms_compat_id}" > ${g}/functions/rndis.usb0/os_desc/interface.rndis/compatible_id echo "${ms_subcompat_id}" > ${g}/functions/rndis.usb0/os_desc/interface.rndis/sub_compatible_id # Link everything up and bind the USB device ln -s ${g}/functions/ecm.usb0 ${g}/configs/c.1 ln -s ${g}/functions/rndis.usb0 ${g}/configs/c.2 ln -s ${g}/configs/c.2 ${g}/os_desc echo "${udc_device}" > ${g}/UDC echo "Done." sleep 10 ifconfig usb1 192.168.200.1 up slepp 10 ifconfig usb1 192.168.200.1 up } usb_down() { if [ ! -d ${g} ]; then echo "Gadget is already down." exit 1 fi echo "Taking down gadget..." # Have to unlink and remove directories in reverse order. # Checks allow to finish takedown after error. if [ "$(cat ${g}/UDC)" != "" ]; then echo "" > ${g}/UDC fi rm -f ${g}/os_desc/c.2 rm -f ${g}/configs/c.2/rndis.usb0 rm -f ${g}/configs/c.1/ecm.usb0 [ -d ${g}/functions/ecm.usb0 ] && rmdir ${g}/functions/ecm.usb0 [ -d ${g}/functions/rndis.usb0 ] && rmdir ${g}/functions/rndis.usb0 [ -d ${g}/configs/c.2/strings/0x409 ] && rmdir ${g}/configs/c.2/strings/0x409 [ -d ${g}/configs/c.2 ] && rmdir ${g}/configs/c.2 [ -d ${g}/configs/c.1/strings/0x409 ] && rmdir ${g}/configs/c.1/strings/0x409 [ -d ${g}/configs/c.1 ] && rmdir ${g}/configs/c.1 [ -d ${g}/strings/0x409 ] && rmdir ${g}/strings/0x409 rmdir ${g} echo "Done." } case ${command} in start) usb_up ;; stop) usb_down ;; *) echo "Usage: start|stop" exit 1 ;; esac
[Unit] Description=Setup RNDIS Devices at Boot After=network.target [Service] Type=oneshot ExecStart=/lib/systemd/setup-rndis.sh start ExecStop=/lib/systemd/setup-rndis.sh stop RemainAfterExit=yes Type=oneshot [Install] WantedBy=multi-user.target
Hi Stefan,
I guess I didn't state it clearly.
I should be able to run any Linux setup on an AM64x EVM with instructions from you, but my challenge is on the USB host side, as I don't have access to a Windows PC.
So if you can keep the AM64x side software setup the same, but only try to test it with a Linux PC. If the issue also happens with the Linux PC, I can replicate it and debug.
Hi Bin,
I just finished the test with a Linux PC (Ubuntu 2022.02), after a few reboots (~50), the System hang:
Log from ti am64-evm:
[ OK ] Finished File System Check on /dev/mmcblk1p1.
Mounting /run/media/boot-mmcblk1p1...
[ OK ] Mounted /run/media/boot-mmcblk1p1.
[ OK ] Started Network Configuration.
Starting Wait for Network to be Configured...
Starting Network Name Resolution...
[ 9.581008] remoteproc remoteproc15: powering up 300b4000.pru
[ 9.583266] remoteproc remoteproc15: Booting fw image ti-pruss/am65x-sr2-pru0 -prueth-fw.elf, size 40816
[ 9.583310] remoteproc remoteproc15: unsupported resource 5
[ 9.583339] remoteproc remoteproc15: remote processor 300b4000.pru is now up
[ 9.583383] remoteproc remoteproc16: powering up 30084000.rtu
[ 9.584635] remoteproc remoteproc16: Booting fw image ti-pruss/am65x-sr2-rtu0 -prueth-fw.elf, size 30888
[ 9.584692] remoteproc remoteproc16: remote processor 30084000.rtu is now up
[ 9.584731] remoteproc remoteproc7: powering up 3008a000.txpru
[ 9.586137] remoteproc remoteproc7: Booting fw image ti-pruss/am65x-sr2-txpru 0-prueth-fw.elf, size 36672
[ 9.586198] remoteproc remoteproc7: remote processor 3008a000.txpru is now up
[ 9.588990] pps pps1: new PPS source ptp2
[ 9.658029] am65-cpsw-nuss 8000000.ethernet eth1: PHY [mdio_mux-0.1:03] drive r [TI DP83869] (irq=POLL)
[ 9.658063] am65-cpsw-nuss 8000000.ethernet eth1: configuring for phy/rgmii-r xid link mode
[ 9.705028] am65-cpsw-nuss 8000000.ethernet eth0: PHY [8000f00.mdio:00] drive r [TI DP83867] (irq=POLL)
[ 9.705562] am65-cpsw-nuss 8000000.ethernet eth0: configuring for phy/rgmii-r xid link mode
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Network.
[ OK ] Reached target Host and Network Name Lookups.
Starting Avahi mDNS/DNS-SD Stack...
Starting Enable and configure wl18xx bluetooth stack...
Starting containerd container runtime...
[ OK ] Started Netperf Benchmark Server.
[ OK ] Started NFS status monitor for NFSv2/3 locking..
Starting Setup RNDIS Devices at Boot...
Starting Simple Network Ma…ent Protocol (SNMP) Daemon....
Starting Permit User Sessions...
[ OK ] Finished Enable and configure wl18xx bluetooth stack.
[ OK ] Started Avahi mDNS/DNS-SD Stack.
[ OK ] Finished Permit User Sessions.
[ OK ] Started Getty on tty1.
[ 10.361531] using random self ethernet address
[ 10.361546] using random host ethernet address
[ OK ] Started Serial Getty on ttyS2.
[ OK ] Reached target Login Prompts.
Starting Synchronize System and HW clocks...
[ 10.435187] using random self ethernet address
[ 10.435203] using random host ethernet address
[FAILED] Failed to start Synchronize System and HW clocks.
See 'systemctl status sync-clocks.service' for details.
[ 10.492666] usb0: HOST MAC 00:06:77:12:ab:50
[ 10.492687] usb0: MAC 00:06:77:12:ab:50
#!/bin/bash # Remote host credentials REMOTE_HOST="192.168.199.1" REMOTE_USER="root" REMOTE_PASSWORD="root" # Initialize the reboot counter REBOOT_COUNT=0 # Function to connect and reboot the remote host reboot_remote_host() { sshpass -p $REMOTE_PASSWORD ssh -o StrictHostKeyChecking=no $REMOTE_USER@$REMOTE_HOST '/sbin/reboot' if [ $? -eq 0 ]; then REBOOT_COUNT=$((REBOOT_COUNT + 1)) echo "Reboot successful. Current reboot count: $REBOOT_COUNT" else echo "Failed to connect or reboot the remote host." return 1 fi } # Function to wait for the remote host to come back online wait_for_host() { echo "Waiting for the remote host to come back online..." while true; do sshpass -p $REMOTE_PASSWORD ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no $REMOTE_USER@$REMOTE_HOST 'exit' if [ $? -eq 0 ]; then echo "Remote host is back online." return 0 fi sleep 10 done } # Main logic while true; do ifconfig enx00067712ab50 192.168.199.10 up sleep 5 echo "Attempting to connect and reboot the remote host..." reboot_remote_host sleep 15 ifconfig enx00067712ab50 192.168.199.10 up if [ $? -eq 0 ]; then wait_for_host echo "Ready to reboot again." else echo "Retrying connection in 10 seconds..." sleep 10 fi done
#!/bin/sh #RNDIS config for WIN 7, WIN 10, Linuy and MAC OSx set -e # command line parameters command="$1" # "up" or "down" udc_device="f400000.usb" # a udc device name, such as "musb-hdrc.1.auto" config_home="/sys/kernel/config/" g="/sys/kernel/config/usb_gadget/AM64xRef" usb_up() { usb_ver="0x0200" # USB 2.0 dev_class="2" vid="0x1919" pid="0x0815" device="0x3001" mfg="SICK AG" prod="AM64xRef" serial="12345682" attr="0xC0" # Self powered pwr="1" # 2mA cfg1="CDC" cfg2="RNDIS" # add colons for MAC address format mac="00:06:77:12:AB:50" dev_mac1="${mac}" host_mac1="${mac}" dev_mac2="${mac}" host_mac2="${mac}" ms_vendor_code="0xcd" # Microsoft ms_qw_sign="MSFT100" # also Microsoft (if you couldn't tell) ms_compat_id="RNDIS" # matches Windows RNDIS Drivers -> done ms_subcompat_id="5162001" # matches Windows RNDIS 6.0 Driver -> done modprobe libcomposite if [ ! -d ${config_home}/usb_gadget/ ]; then echo "Mount usb config dir.." mount none ${config_home} -t configfs else echo "Usb config dir mounted" fi if [ -d ${g} ]; then if [ "$(cat ${g}/UDC)" != "" ]; then echo "Gadget is already up." exit 1 fi echo "Cleaning up old directory..." usb_down fi echo "Setting up gadget..." # Create a new gadget mkdir -p ${g} echo "${usb_ver}" > ${g}/bcdUSB echo "${dev_class}" > ${g}/bDeviceClass echo "${vid}" > ${g}/idVendor echo "${pid}" > ${g}/idProduct echo "${device}" > ${g}/bcdDevice mkdir -p ${g}/strings/0x409 echo "${mfg}" > ${g}/strings/0x409/manufacturer echo "${prod}" > ${g}/strings/0x409/product echo "${serial}" > ${g}/strings/0x409/serialnumber # Create 2 configurations. The first will be CDC. The second will be RNDIS. # Thanks to os_desc, Windows should use the second configuration. # config 1 is for CDC mkdir -p ${g}/configs/c.1 echo "${attr}" > ${g}/configs/c.1/bmAttributes echo "${pwr}" > ${g}/configs/c.1/MaxPower mkdir -p ${g}/configs/c.1/strings/0x409 echo "${cfg1}" > ${g}/configs/c.1/strings/0x409/configuration # Create the CDC function mkdir -p ${g}/functions/ecm.usb0 echo "${dev_mac1}" > ${g}/functions/ecm.usb0/dev_addr echo "${host_mac1}" > ${g}/functions/ecm.usb0/host_addr # config 2 is for RNDIS mkdir -p ${g}/configs/c.2 echo "${attr}" > ${g}/configs/c.2/bmAttributes echo "${pwr}" > ${g}/configs/c.2/MaxPower mkdir -p ${g}/configs/c.2/strings/0x409 echo "${cfg2}" > ${g}/configs/c.2/strings/0x409/configuration # On Windows 7 and later, the RNDIS 5.1 driver would be used by default, # but it does not work very well. The RNDIS 6.0 driver works better. In # order to get this driver to load automatically, we have to use a # Microsoft-specific extension of USB. echo "1" > ${g}/os_desc/use echo "${ms_vendor_code}" > ${g}/os_desc/b_vendor_code echo "${ms_qw_sign}" > ${g}/os_desc/qw_sign # Create the RNDIS function, including the Microsoft-specific bits mkdir -p ${g}/functions/rndis.usb0 echo "${dev_mac2}" > ${g}/functions/rndis.usb0/dev_addr echo "${host_mac2}" > ${g}/functions/rndis.usb0/host_addr echo "${ms_compat_id}" > ${g}/functions/rndis.usb0/os_desc/interface.rndis/compatible_id echo "${ms_subcompat_id}" > ${g}/functions/rndis.usb0/os_desc/interface.rndis/sub_compatible_id # Link everything up and bind the USB device ln -s ${g}/functions/ecm.usb0 ${g}/configs/c.1 ln -s ${g}/functions/rndis.usb0 ${g}/configs/c.2 ln -s ${g}/configs/c.2 ${g}/os_desc echo "${udc_device}" > ${g}/UDC echo "Done." sleep 10 ifconfig usb0 192.168.199.1 up ifconfig usb1 192.168.200.1 up slepp 10 ifconfig usb0 192.168.199.1 up ifconfig usb1 192.168.200.1 up } usb_down() { if [ ! -d ${g} ]; then echo "Gadget is already down." exit 1 fi echo "Taking down gadget..." # Have to unlink and remove directories in reverse order. # Checks allow to finish takedown after error. if [ "$(cat ${g}/UDC)" != "" ]; then echo "" > ${g}/UDC fi rm -f ${g}/os_desc/c.2 rm -f ${g}/configs/c.2/rndis.usb0 rm -f ${g}/configs/c.1/ecm.usb0 [ -d ${g}/functions/ecm.usb0 ] && rmdir ${g}/functions/ecm.usb0 [ -d ${g}/functions/rndis.usb0 ] && rmdir ${g}/functions/rndis.usb0 [ -d ${g}/configs/c.2/strings/0x409 ] && rmdir ${g}/configs/c.2/strings/0x409 [ -d ${g}/configs/c.2 ] && rmdir ${g}/configs/c.2 [ -d ${g}/configs/c.1/strings/0x409 ] && rmdir ${g}/configs/c.1/strings/0x409 [ -d ${g}/configs/c.1 ] && rmdir ${g}/configs/c.1 [ -d ${g}/strings/0x409 ] && rmdir ${g}/strings/0x409 rmdir ${g} echo "Done." } case ${command} in start) usb_up ;; stop) usb_down ;; *) echo "Usage: start|stop" exit 1 ;; esac
[Unit] Description=Setup RNDIS Devices at Boot After=network.target [Service] Type=oneshot ExecStart=/lib/systemd/setup-rndis.sh start ExecStop=/lib/systemd/setup-rndis.sh stop RemainAfterExit=yes Type=oneshot [Install] WantedBy=multi-user.target
Hi Stefan,
Bin is out of office until the end of next week. Please expect a delayed response.
Hi Stefan,
Thanks for the details. I am able to follow it and run the same test on my EVM. It has been running for 58 reboots now and haven't trigger the issue yet. I will leave it running overnight.
Meanwhile, can you please apply the following kernel patch and test it on your setup to see if it resolves the issue? This patch fixes a kernel driver bug which generates a busy loop with g_printer gadget driver which leads to Linux system hang on AM64x. I reviewed the kernel ECM and RNDIS gadget drivers, the bug appears to be applicable here too.
Hi Stefan
I am guessing it is hard to tell whether the patch improved things at all , if variation is high? Are you doing these tests on your board(s) or EVM or both?
Hi Stefan,
The controller sometimes causes a system hang if we configure the USB device via our setup script. Only a hard power cycle help to reset the system.
Do you mean using warm reset, such as grounding RESET_REQz pin, doesn't reset the system when the lockup happens?
Hi Stefan,
Somehow the USB network between my AM64x EVM and the Linux PC is not reliable, the test often got stuck in the sshpass calls. so I changed the PC side script to make the test more reliable (mainly reduce the sleep time, and do ping command before sshpass).
I am not a network expert, but I am concerned on the following kernel message on the Linux PC side:
IPv6: usb0: IPv6 duplicate address fe80::206:77ff:fe12:ab50 used by 00:06:77:12:ab:50 detected!
So I changed the EVM script setup-rndis.sh to use different MAC addresses on the EVM and the PC as follow to remove this kernel message on the host. Please let me know if this change is OK.
mac1="00:06:77:12:AB:50"
mac2="00:06:77:12:AB:51"
dev_mac1="${mac1}"
host_mac1="${mac2}"
dev_mac2="${mac1}"
host_mac2="${mac2}"
Hi Stefan, here is an update from my side:
I am able to reproduce the reboot lockup multiple times, though most times (20+ times) the lockup happened during Linux shutdown phase, but I saw 2 times the lockup happened during Linux bootup (when setup-rndis.sh start).
I then further simplified the test setup - do not use the host ssh connection, rather just repeatedly do "setup-rndis.sh start" and "setup-rndis.sh stop" without rebooting Linux, but still keep the USB cable connected to the USB host. This makes the lockup happens very quickly (within a few minutes).
I also removed RNDIS gadget function from setup-rndis.sh so only use ECM, The lockup still happens.
The patch 0001-usb-cdns3-fix-linked-list-corruption.patch I provided here on last Thursday doesn't seem to help, the lockup still happens with this patch in my test.
I am now able to reliably and quickly reproduce the lockup, and will start to debug it from Monday.
I am guessing it is hard to tell whether the patch improved things at all , if variation is high? Are you doing these tests on your board(s) or EVM or both?
Do you mean using warm reset, such as grounding RESET_REQz pin, doesn't reset the system when the lockup happens?
Hi Stefan,
The lockup appears to happen at random places in either cdns3_gadget_udc_start() or cdns3_gadget_udc_stop() when USB configfs script doing echo "${udc_device}" > ${g}/UDC or echo "" > ${g}/UDC respectively. I will continue debugging...
Hi Bin,
we see the system hang at the same point:
After the "echo "f400000.usb" > UDC" the system hang
as mentioned in the first post.
Hi Stefan,
Yes, about an half of the lockup cases I see was when "echo f400000.usb > UDC", which triggers kernel function cdns3_gadget_udc_start(). I also see that JTAG is unable to connect to DDR or any other AM64x peripherals when the lockup happens. It seems the SoC is already in a bad state when the lockup happens. I am continuing debugging the issue, but I am not sure how soon the root cause will be discovered and the issue will be resolved. Do you considering using a watchdog timer in Linux to reset the system when the lockup happens? This could be the plan B in your project until the issue is resolved.
Hi Bin,
we did tests on the am64-evm and the watchdog workaround can work. We start evaluating it on our custom hardware.
BR
Stefan
Hi Stefan,
Thanks for the update. We continue debugging it and will keep you posted.