This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM335x Linux kernel paging faults

I've got a custom AM335x board based closely on the BeagleBone.  I'm using the Linux PSP kernel available from the latest distribution of Angstrom.  I have a burn-in test script that copies files into /tmp until its full.  When it gets to about 70% (which is 128 MB) and hence consumes the virtual memory managed by tmpfs, I consistently get these paging faults--which brings down the system:

[  111.700408] Unable to handle kernel paging request at virtual address 656c43a5
[  111.707977] pgd = cf0b8000
[  111.710815] [656c43a5] *pgd=00000000
[  111.714538] Internal error: Oops: 5 [#1]
[  111.718658] Modules linked in: spidev ip_tables x_tables ipv6
[  111.724700] CPU: 0    Not tainted  (3.2.23 #224)
[  111.729553] PC is at mem_cgroup_get_reclaim_stat_from_page+0x12/0x3c
[  111.736175] LR is at mem_cgroup_get_reclaim_stat_from_page+0x2b/0x3c
[  111.742828] pc : [<c008e206>]    lr : [<c008e21f>]    psr: 000000b3
[  111.742858] sp : cf04fcf0  ip : 000055ed  fp : 00000000
[  111.754852] r10: 000002d8  r9 : c054b490  r8 : c0520d84
[  111.760314] r7 : 00000000  r6 : 00000000  r5 : 00000000  r4 : c06d3120
[  111.767120] r3 : 656c4335  r2 : 00000068  r1 : c05c2000  r0 : 0008003c
[  111.773956] Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA Thumb  Segment user
[  111.781707] Control: 50c5387d  Table: 8f0b8019  DAC: 00000015
[  111.787689] Process cp (pid: 349, stack limit = 0xcf04e2f0)
[  111.793518] Stack: (0xcf04fcf0 to 0xcf050000)
[  111.798095] fce0:                                     c054b490 c0074d07 c06d3120 00000000
[  111.806640] fd00: c054b490 c0074dad c0520d78 a0000013 c054b490 00000003 c0520d84 c0075285
[  111.815185] fd20: c0520d78 c0074d41 00000000 c06d3260 cf000470 cf04e000 cf000450 00000000
[  111.823760] fd40: 00000000 cfbe38c0 00000221 c007a8dd 00000000 c000d2bf 00000044 cf000530
[  111.832305] fd60: 00221000 00000000 00000002 00000003 00000000 cf04fddc c06d3240 c02f1600
[  111.840881] fd80: 00001000 cf04e000 c007aad1 cf000530 00000000 cf04e000 00000000 c006eb2f
[  111.849426] fda0: 000200da 00000000 cf04fddc cf04fdd8 00221000 00000000 00000000 cf1955c0
[  111.858001] fdc0: 00001000 00000000 cf04ff48 00000001 00000000 00001000 cf1955c0 c009d929
[  111.866546] fde0: 386d9121 cf1955c0 00000000 00001000 000007ff 00221000 00000000 cf04ff48
[  111.875091] fe00: cf04ff00 c006fc7b 00221000 00000000 cf04ff00 00001000 00000000 c0081799
[  111.883666] fe20: 000000b7 cf04fec8 cf04ff48 cf000530 cf04ff50 00000000 00001000 00000001
[  111.892211] fe40: 00221000 00000000 00222000 00000000 cf04fe50 00001000 cf7187c8 00221000
[  111.900787] fe60: 00000000 cf1955c0 cf0004d4 cf04fec8 00000001 cf04ff48 00000003 c006fcfd
[  111.909332] fe80: 00001000 00001000 91827364 cf04fe8c cf04fe8c cf04fe94 cf04fe94 00000000
[  111.917877] fea0: cf1955c0 cf04ff80 fffffdee c0090e7f cf04ff80 00000000 00000000 c0090edb
[  111.926452] fec0: 00221000 00000000 cf72d398 00000000 00000000 00000001 ffffffff cf1955c0
[  111.934997] fee0: 00000000 00000000 00000000 00000000 cfbd9540 00020000 00000000 00000000
[  111.943572] ff00: 00221000 00000000 00000000 00000000 00001000 c01419e5 00001000 00000002
[  111.952117] ff20: 00222000 00000000 cf1955c0 c00911bf 00000001 cf1955c0 cf04ff80 00001000
[  111.960662] ff40: 00001000 cf1955c0 bef82a48 00001000 00001000 cf1955c0 bef82a48 c00913d3
[  111.969238] ff60: cf1955c0 bef82a48 cf1955c0 bef82a48 00001000 00000004 00221000 c00915f7
[  111.977783] ff80: 00221000 00000000 00001000 00000000 0009092c 00001000 bef82a48 c000cbe4
[  111.986358] ffa0: cf04e000 c000ca41 0009092c 00001000 00000004 bef82a48 00001000 00000000
[  111.994903] ffc0: 0009092c 00001000 bef82a48 00000004 00000004 00000001 00000000 00000003
[  112.003448] ffe0: 00000004 bef82a10 00010af0 401c414c 60000010 00000004 00000000 00000000
[  112.012054] [<c008e206>] (mem_cgroup_get_reclaim_stat_from_page+0x12/0x3c) from [<c0074d07>] (update_page_reclaim_stat+0xf/0x48)
[  112.024169] [<c0074d07>] (update_page_reclaim_stat+0xf/0x48) from [<c0074dad>] (____pagevec_lru_add_fn+0x6d/0xd8)
[  112.034912] [<c0074dad>] (____pagevec_lru_add_fn+0x6d/0xd8) from [<c0075285>] (pagevec_lru_move_fn+0x41/0x6c)
[  112.045288] [<c0075285>] (pagevec_lru_move_fn+0x41/0x6c) from [<c007a8dd>] (shmem_getpage_gfp+0x205/0x3c2)
[  112.055389] [<c007a8dd>] (shmem_getpage_gfp+0x205/0x3c2) from [<c006eb2f>] (generic_file_buffered_write+0x9f/0x18e)
[  112.066314] [<c006eb2f>] (generic_file_buffered_write+0x9f/0x18e) from [<c006fc7b>] (__generic_file_aio_write+0x281/0x2be)
[  112.077880] [<c006fc7b>] (__generic_file_aio_write+0x281/0x2be) from [<c006fcfd>] (generic_file_aio_write+0x45/0x88)
[  112.088897] [<c006fcfd>] (generic_file_aio_write+0x45/0x88) from [<c0090edb>] (do_sync_write+0x5d/0x86)
[  112.098724] [<c0090edb>] (do_sync_write+0x5d/0x86) from [<c00913d3>] (vfs_write+0x5f/0x10c)
[  112.107482] [<c00913d3>] (vfs_write+0x5f/0x10c) from [<c00915f7>] (sys_write+0x27/0x48)
[  112.115875] [<c00915f7>] (sys_write+0x27/0x48) from [<c000ca41>] (ret_fast_syscall+0x1/0x44)

Now, if I switch this SD card to a BeagleBone, the problem does not occur.  

Thinking I might have made so many changes to my kernel that I messed something up, I downloaded the official TI release 5.05 of the SDK and built the kernel.  I ran the same experiment on a custom board, but this time I got a different kernel paging fault:


[  199.090118] ------------[ cut here ]------------
[  199.094970] kernel BUG at mm/slab.c:512!
[  199.099060] Internal error: Oops - undefined instruction: 0 [#1]
[  199.105346] Modules linked in:
[  199.108551] CPU: 0    Not tainted  (3.2.0 #4)
[  199.113128] PC is at free_block+0x14c/0x164
[  199.117523] LR is at drain_array+0x98/0xc8
[  199.121795] pc : [<c00a9d60>]    lr : [<c00a9f60>]    psr: 40000093
[  199.121826] sp : cf8cded0  ip : cee14af8  fp : cf8cdefc
[  199.133819] r10: cfa0b210  r9 : 00000001  r8 : 00000000
[  199.139282] r7 : cf947140  r6 : 00000000  r5 : cfa0b200  r4 : cfa0b210
[  199.146118] r3 : 00000000  r2 : 8ee14af8  r1 : 00000000  r0 : c08b8280
[  199.152954] Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  199.160675] Control: 10c5387d  Table: 8ec14019  DAC: 00000015
[  199.166687] Process kworker/0:1 (pid: 317, stack limit = 0xcf8cc2f0)
[  199.173339] Stack: (0xcf8cded0 to 0xcf8ce000)
[  199.177886] dec0:                                     00000001 cfa0b210 cfa0b200 00000001
[  199.186462] dee0: c06c2e20 c0668440 c0674f0c c0668440 cf8cdf1c cf8cdf00 c00a9f60 c00a9c20
[  199.195007] df00: cf8cdf34 cfa03ec0 cf947140 00000000 cf8cdf54 cf8cdf20 c00aa0dc c00a9ed4
[  199.203582] df20: 00000000 cf8cc000 cf8cdf4c c0674f0c cf8ac4c0 cf8cc000 00000000 cf8cc000
[  199.212127] df40: c00aa060 cf814805 cf8cdf8c cf8cdf58 c00531b0 c00aa06c 00000000 cf814800
[  199.220703] df60: c06689ac cf8ac4c0 c06689ac 00000009 c06689ac cf8cc000 cf8ac4d0 cf8ac4d0
[  199.229248] df80: cf8cdfbc cf8cdf90 c0053e9c c0053084 00000013 cf833eec cf8ac4c0 c0053d30
[  199.237823] dfa0: 00000013 00000000 00000000 00000000 cf8cdff4 cf8cdfc0 c0059200 c0053d3c
[  199.246368] dfc0: cf833eec 00000000 cf8ac4c0 00000000 cf8cdfd0 cf8cdfd0 00000000 cf833eec
[  199.254943] dfe0: c0059178 c0042b8c 00000000 cf8cdff8 c0042b8c c0059184 8a814513 02438021
[  199.263488] Backtrace:
[  199.266052] [<c00a9c14>] (free_block+0x0/0x164) from [<c00a9f60>] (drain_array+0x98/0xc8)
[  199.274627] [<c00a9ec8>] (drain_array+0x0/0xc8) from [<c00aa0dc>] (cache_reap+0x7c/0x134)
[  199.283172]  r6:00000000 r5:cf947140 r4:cfa03ec0
[  199.288055] [<c00aa060>] (cache_reap+0x0/0x134) from [<c00531b0>] (process_one_work+0x138/0x3e8)
[  199.297241] [<c0053078>] (process_one_work+0x0/0x3e8) from [<c0053e9c>] (worker_thread+0x16c/0x450)
[  199.306701] [<c0053d30>] (worker_thread+0x0/0x450) from [<c0059200>] (kthread+0x88/0x90)
[  199.315185] [<c0059178>] (kthread+0x0/0x90) from [<c0042b8c>] (do_exit+0x0/0x6b8)
[  199.323028]  r6:c0042b8c r5:c0059178 r4:cf833eec
[  199.327880] Code: e1a00007 e5853018 ebffff93 eaffffc0 (e7f001f2)
[  199.334350] ---[ end trace 323c566b5e7ca225 ]---

I tried slowing the processor down to 250 Mhz via the Linux commands (http://processors.wiki.ti.com/index.php/Sitara_Linux_Training:_Power_Management), but the problem still occurred.

The two different traces suggests something is wrong either with the memory, either with the virtual memory system or my physical memory configuration.  

Here's my second test: I wrote a program to allocate memory forever.  On a working BeagleBone, the Linux kernel eventually kills the process.  On my board, however, the system appears to have crashed the operating system entirely.  This is the test program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
    int bytes = 0;
    char *buf = 0;
    while (1)
    {
        buf = (char *) malloc(4096);
        if (buf)
        {
            bytes += 4096;
            memset(buf, 0, 1024);
            memset(buf + 2048, 0, 2048);
        }
        else
        {
            printf("Failed at %d!\n", bytes);
            return;
        }
        printf("Allocated %d bytes\n", bytes);
    }
}

Does anyone have any insights on what I should check?
  • Ok, it looks to me that this board has 128MB instead of 256MB, which is why all these paging requests failed.

    Why did the board think it had 256 MB?  Apparently U-Boot hard-codes the number of SDRAM banks to 1 and 256 MB for the BeagleBone:

     /* Physical Memory Map */
    #define CONFIG_NR_DRAM_BANKS            1               /* 1 bank of DRAM */
    #define PHYS_DRAM_1                     0x80000000      /* DRAM Bank #1 */
    #define PHYS_DRAM_1_SIZE                0x10000000      /* 256 MiB */



    Why doesn't U-Boot properly autodetect the RAM size and configuration?   In the AM335X U-Boot code, it appears that the total RAM size is hard-coded based on PHYS_DRAM_1_SIZE.  Interestingly, all the memory controller (EMIF) registers, which deal with timings of the SDRAM, are hard-coded.  

    Why, then, doesn't Linux do its own autodetection? Right now I believe that the Linux AM335X port leaves all the EMIF configuration to U-Boot, since I see no evidence right now that it attempts to reconfigure any of the EMIF registers.  

    How does Linux know how much memory is available?  I'm still trying to answer the question.  The obvious answer would be that it's specified on the kernel command-line via the 'mem=256M' argument, but I don't think the default boot arguments has that set.  
  • AFAIK PHYS_DRAM_1_SIZE parameter is used at runtime in u-boot. The AM335x u-boot Code is designed to support different AM33xx based boards.

  • Is there a way to do SDRAM autodetection on the AM335x?  Right now it appears U-Boot hard-codes 256 MB to the value and passes the ATAG_MEM parameter to Linux.  I have several boards with different memory sizes, and it would be nice to use the same U-Boot binary.

    I see in the U-Boot OMAP4 EMIF code there is a mechanism there appears a way to do it, but it's not obvious how to do it with an am AM335x.

  • Looks like the Mode Register in the JEDEC standard is the key:

    http://www.jedec.org/standards-documents/focus/mobile-memory-lpddr2-memory-mcp/lpddr2

    It looks like the EMIF code in the omap4 directory implements SDRAM autodetection.  Since the AM335x board uses the ti81xx directory, this would serve as a model of what the implementation for the AM335x  version would do.

  • Hi,

    i have the same problem. I want detect the sieze ram of my different board with am335x processor. I use the u-boot-2011.09. How can i detect automatically the ram size?

    Thanks