Our AM3356 custom board with 1GB DDR3 memory often crashes.
The problem is related with Linux high-mem.
To reproduce the problem, we only need two scripts:
1. a wget / curl script: repeat a simple HTTP GET request by wget/curl, with a short delay
#!/bin/sh url=http://192.168.0.1 # this is a web server while true do #curl $1 > /dev/null -s wget -O /dev/null $url > /dev/null 2>&1 usleep 10000 done
2. a continuous SFTP transfer, which run on the host computer. Like uploading a large file to board.
Below is the error info:
dmesg log:
May 15 23:35:25 T-BOX-33 user.alert kernel: [ 2225.712590] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6df0776 May 15 23:35:25 T-BOX-33 user.alert kernel: [ 2225.712607] pgd = 455429a2 May 15 23:35:25 T-BOX-33 user.alert kernel: [ 2225.712611] [b6df0776] *pgd=be860831 May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.388678] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6ca5c59 May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.388695] pgd = b6258563 May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.388700] [b6ca5c59] *pgd=be860831 May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.774605] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6d3fc59 May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.774622] pgd = 2de8c821 May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.774626] [b6d3fc59] *pgd=be88e831 May 15 23:36:36 T-BOX-33 user.alert kernel: [ 2296.543777] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6df0776 May 15 23:36:36 T-BOX-33 user.alert kernel: [ 2296.543794] pgd = f61b441f May 15 23:36:36 T-BOX-33 user.alert kernel: [ 2296.543798] [b6df0776] *pgd=be8b4831
You will find some illegal memory access, usually at 0xb6xx_xxxx. It's the high-mem address.
==================
Use AM335x SDK's default kernel config, it's easy to reproduce the issue, the key config macro is :
CONFIG_VMSPLIT_3G = y
CONFIG_HIGHMEM = y
CONFIG_HIGHPTE = y
When we change the kernel config:
CONFIG_VMSPLIT_3G =>> CONFIG_VMSPLIT_3G_OPT(disable highmem and use entire 1G space for direct memory)
or
CONFIG_HIGHPTE = n (disable the 2nd level page table from highmem)
The issue seems solved.
=================
My question is: what's the final official solution?