Tool/software: Linux
Hi,
I'm working on a system that has a 66AK2H12 device connected over 10GbE.
My system is based on TI's kernel and buildRoot for creating the filesystem.
I'm witnessing an issue when trying to use NFS from the Hawking (mount an NFS shared folder that resides in remote computer).
Mounting the NFS seems to work fine, but at some point (After sending a few megabytes of data) we lose any TCP/IP connectivity with the ARM (Even Ping doesn’t work).
Looking at the process list, we see that the writing process is stuck in ‘D’ state.
After that happened, I noticed that the send-Q holds a huge amount of data.
In addition, when the NFS is stuck, trying to shut down the NIC, produces some info about Packet DMA in the kernel log.
All of the info can be seen bellow.
I tried the following, and none of them helped:
- Move to MTU of 1500 in the ARM (Usually 9014)
- Change the NFS configuration to rsize=wsize=1024
- This seems to improve the situation a bit. The test doesn’t always fail in the first shot, but it still fail quickly.
- Use NFSv3, NFSv2
- Use NFS nolock configuration
- Put the NFS server on another similar device instead of a Windows computer
To reproduce the problem, I’m running the following command:
head -c 10m /dev/urandom > /mnt/NFSShare/file
The only thing seems to help, is working in small chunks, for example:
for i in `seq 10000`; do head -c 1k /dev/urandom > /mnt/NFSShare/file; done
In this way I was able to write more than 100M
========================================== INFO ===============================================
Netstat – after getting stuck
======================
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 26064 (null):931 (null):nfsd ESTABLISHED
mount information
===============
15.40.1.192:/NfsShared on /home/root/SharedForArm type nfs (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=15.40.1.192,mountvers=3,mountproto=tcp,local_lock=none,addr=15.40.1.192)
Console output when doing “ifconfing eth0 down” after getting stuck
=====================================================
[420002.029780] ------------[ cut here ]------------
[420002.034508] WARNING: at drivers/dma/keystone-pktdma.c:1094 chan_destroy+0x2c4/0x2d0()
[420002.042461] chan_stop: pool 0 deficit 2047 != depth 2048
[420002.047881] Modules linked in: uio_module_drv(O) mcd(O) cmemk(O)
[420002.054025] CPU: 3 PID: 1298 Comm: ifconfig Tainted: G O 3.10.72 #1
[420002.061289] [<c0015014>] (unwind_backtrace+0x0/0xec) from [<c00117ac>] (show_stack+0x10/0x14)
[420002.069954] [<c00117ac>] (show_stack+0x10/0x14) from [<c0021200>] (warn_slowpath_common+0x54/0x6c)
[420002.079034] [<c0021200>] (warn_slowpath_common+0x54/0x6c) from [<c0021248>] (warn_slowpath_fmt+0x30/0x40)
[420002.088733] [<c0021248>] (warn_slowpath_fmt+0x30/0x40) from [<c029df58>] (chan_destroy+0x2c4/0x2d0)
[420002.098075] [<c029df58>] (chan_destroy+0x2c4/0x2d0) from [<c029b354>] (dma_release_channel+0x24/0x94)
[420002.107425] [<c029b354>] (dma_release_channel+0x24/0x94) from [<c031eb2c>] (netcp_ndo_stop+0x128/0x1e0)
[420002.116959] [<c031eb2c>] (netcp_ndo_stop+0x128/0x1e0) from [<c0394d50>] (__dev_close_many+0x80/0xc4)
[420002.126223] [<c0394d50>] (__dev_close_many+0x80/0xc4) from [<c0394db8>] (__dev_close+0x24/0x38)
[420002.135055] [<c0394db8>] (__dev_close+0x24/0x38) from [<c039a51c>] (__dev_change_flags+0x94/0x128)
[420002.144144] [<c039a51c>] (__dev_change_flags+0x94/0x128) from [<c039a61c>] (dev_change_flags+0x10/0x48)
[420002.153675] [<c039a61c>] (dev_change_flags+0x10/0x48) from [<c03ebd10>] (devinet_ioctl+0x65c/0x734)
[420002.162854] [<c03ebd10>] (devinet_ioctl+0x65c/0x734) from [<c03856c0>] (sock_ioctl+0x1c0/0x294)
[420002.171684] [<c03856c0>] (sock_ioctl+0x1c0/0x294) from [<c00e0058>] (do_vfs_ioctl+0x3fc/0x5bc)
[420002.180427] [<c00e0058>] (do_vfs_ioctl+0x3fc/0x5bc) from [<c00e0250>] (SyS_ioctl+0x38/0x60)
[420002.188900] [<c00e0250>] (SyS_ioctl+0x38/0x60) from [<c000d920>] (ret_fast_syscall+0x0/0x30)
[420002.197459] ---[ end trace d5dca14823c2b4f6 ]---
[420002.202962] dma dma3chan2: xgerx0 leaked descriptor 701
[420003.489991] dma dma3chan0: xgetx0 leaked descriptor 473
[420003.495320] dma dma3chan0: xgetx0 leaked descriptor 483