This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3356: Linux Kernel with 1GB memory crashes

Part Number: AM3356

Our AM3356 custom board with 1GB DDR3 memory often crashes.

The problem is related with Linux high-mem.

To reproduce the problem, we only need two scripts:

1. a wget / curl script: repeat a simple HTTP GET request by wget/curl, with a short delay

#!/bin/sh

url=http://192.168.0.1  # this is a web server

while true
do
    #curl $1 > /dev/null -s
    wget -O /dev/null $url > /dev/null 2>&1
    usleep 10000
done

2. a continuous SFTP transfer, which run on the host computer. Like uploading a large file to board.

Below is the error info:

dmesg log:

May 15 23:35:25 T-BOX-33 user.alert kernel: [ 2225.712590] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6df0776
May 15 23:35:25 T-BOX-33 user.alert kernel: [ 2225.712607] pgd = 455429a2
May 15 23:35:25 T-BOX-33 user.alert kernel: [ 2225.712611] [b6df0776] *pgd=be860831
May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.388678] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6ca5c59
May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.388695] pgd = b6258563
May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.388700] [b6ca5c59] *pgd=be860831
May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.774605] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6d3fc59
May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.774622] pgd = 2de8c821
May 15 23:36:35 T-BOX-33 user.alert kernel: [ 2295.774626] [b6d3fc59] *pgd=be88e831
May 15 23:36:36 T-BOX-33 user.alert kernel: [ 2296.543777] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6df0776
May 15 23:36:36 T-BOX-33 user.alert kernel: [ 2296.543794] pgd = f61b441f
May 15 23:36:36 T-BOX-33 user.alert kernel: [ 2296.543798] [b6df0776] *pgd=be8b4831

You will find some illegal memory access, usually at 0xb6xx_xxxx. It's the high-mem address.

==================

Use AM335x SDK's default kernel config, it's easy to reproduce the issue, the key config macro is :

CONFIG_VMSPLIT_3G = y
CONFIG_HIGHMEM = y
CONFIG_HIGHPTE = y

When we change the kernel config:

CONFIG_VMSPLIT_3G =>> CONFIG_VMSPLIT_3G_OPT(disable highmem and use entire 1G space for direct memory)

or

CONFIG_HIGHPTE = n (disable the 2nd level page table from highmem)

The issue seems solved.

=================

My question is: what's the final official solution?