This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

  • Resolved

Linux/AM5728: Linux Memory Policies

Part Number: AM5728

Tool/software: Linux

Hello, I'm working with getting my AM5728-based custom board to boot, and I'm running into a strange hang early in the kernel boot procedure.  I believe it may have to do with my RAM configuration.  During boot, my kernel prints the following messages then hangs indefinitely:

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0

[    0.000000] Linux version 4.9.28-geed43d1050 (tom@tom-ThinkPad-P50s) (gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11) ) #6 SMP PREEMPT Mon Oct 2 10:59:18 EDT 2017

[    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=30c5387d

[    0.000000] CPU: div instructions available: patching division code

[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache

[    0.000000] OF: fdt:Machine model: tom's custom board

[    0.000000] bootconsole [earlycon0] enabled

[    0.000000] efi: Getting EFI parameters from FDT:

[    0.000000] efi: UEFI not found.

[    0.000000] Reserved memory: created CMA memory pool at 0x0000000095800000, size 56 MiB

[    0.000000] OF: reserved mem: initialized node ipu2_cma@95800000, compatible id shared-dma-pool

[    0.000000] Reserved memory: created CMA memory pool at 0x0000000099000000, size 64 MiB

[    0.000000] OF: reserved mem: initialized node dsp1_cma@99000000, compatible id shared-dma-pool

[    0.000000] Reserved memory: created CMA memory pool at 0x000000009d000000, size 32 MiB

[    0.000000] OF: reserved mem: initialized node ipu1_cma@9d000000, compatible id shared-dma-pool

[    0.000000] Reserved memory: created CMA memory pool at 0x000000009f000000, size 8 MiB

[    0.000000] OF: reserved mem: initialized node dsp2_cma@9f000000, compatible id shared-dma-pool

[    0.000000] cma: Reserved 24 MiB at 0x000000008e400000

[    0.000000] Memory policy: Data cache writealloc

Could hanging after Data cache writealloc indicate a problem with the main system RAM?  The major difference between my board and the AM5728 EVM is that I only have 256MB of DDR3 RAM with different timings.  I've updated the u-boot configurations for this, and u-boot seems to have no issues with running and relocating itself in the RAM.  u-boot RAM read and write tests also indicate a high degree of coherency.


According to the EVM, the next line in the kernel boot should be:
[    0.000000] OMAP4: Map 0x00000000ffd00000 to fe600000 for dram barrier

which is why I feel like the problem could be memory related.  Is this assumption valid?  If it is what could cause u-boot to work correctly with my RAM and Linux not?

Another possibility is that the addresses given by the IPU/DSP's in the debug lines are accessing memory spaces that are unavailable to them.  Since we have only 256MB of RAM, mapped as 0x80000000 to 0x90000000, and those memory addresses for the DSP/IPU are outside of that range (0x95800000, 0x99000000, 0x9d000000, 0x9f000000, etc) they're trying to access memory that is simply unavailable to them?

Here is the full boot, in case anyone is interested.  Full decompiled device tree is as well.

3581.putty.log

3618.customboard.txt

  • In reply to Tom W:

    Tom,

    Tom W
    I disabled the LPAE, and proceeded further in the boot than before (log is attached)

    I do not see any attached log. Can you check again?

    Tom W

    I've also noticed that the line 

    [    0.000000] OMAP4: Map 0x8fe00000 to fe600000 for dram barrier

     

    is still being printed.

    I suspect this 0xFE600000 is the virtual address of the 0x8FE00000 physical address, so this should not be an issue. I will check that further.

    Tom W
    From what I've read about the LPAE, it seems rather necessary for the A-15 architecture.  I'm not planning on going above the 32-bit addressing limit, so I can see why disabling it could make sense, but I'm unsure if it's wise.

    Reading AM572x DM, it seems that LPAE is only needed when you have more than 2GB DRAM

    3.1 Device Comparison Table

    DDR3 Memory Controller (2)

    (2) In the Unified L3 memory map, there is maximum of 2GB of SDRAM space which is available to all L3 initiators including MPU (MPU, GPU, DSP, IVA, DMA, etc). Typically this space is interleaved across both EMIFs to optimize memory performance. If a system populates > 2GB of physical memory, that additional addressable space can be accessed only by the MPU via the ARM V7 Large Physical Address Extensions (LPAE).


    Note that LPAE is enabled by default from 4.4 kernel onwards. This feature is not enabled in AM572x PSDK 2.02 (kernel 4.1). See below e2e threads:

     

    Regards,
    Pavel



  • In reply to Tom W:

    Tom W

    I've also noticed that the line 

    [    0.000000] OMAP4: Map 0x8fe00000 to fe600000 for dram barrier

     

    is still being printed.

    I confirm 0xfe600000 is virtual addr, so it should NOT be an issue. See linux-kernel/arch/arm/mach-omap2/omap4-common.c

    /* Used to implement memory barrier on DRAM path */
    #define OMAP4_DRAM_BARRIER_VA            0xfe600000

    static void __iomem *dram_sync, *sram_sync;
    static phys_addr_t dram_sync_paddr;

    /* Steal one page physical memory for barrier implementation */
    void __init omap_barrier_reserve_memblock(void)
    {
        dram_sync_size = ALIGN(PAGE_SIZE, SZ_1M);
        dram_sync_paddr = arm_memblock_steal(dram_sync_size, SZ_1M);
    }


    void __init omap_barriers_init(void)
    {
        struct map_desc dram_io_desc[1];

        dram_io_desc[0].virtual = OMAP4_DRAM_BARRIER_VA;
        dram_io_desc[0].pfn = __phys_to_pfn(dram_sync_paddr);
        dram_io_desc[0].length = dram_sync_size;
        dram_io_desc[0].type = MT_MEMORY_RW_SO;
        iotable_init(dram_io_desc, ARRAY_SIZE(dram_io_desc));
        dram_sync = (void __iomem *) dram_io_desc[0].virtual;

        pr_info("OMAP4: Map %pa to %p for dram barrier\n",&dram_sync_paddr, dram_sync);

        soc_mb = omap4_mb;
    }



  • In reply to Pavel Botev:

    Apologies for the late reply, I've been away from the office.  Here is the log you requested:

    g3_linux_no_lpae.log

    Also thank you for confirming the LPAE and virtual address from the source.  

    I've come to the assumption that over months of rigorous use with minimal safe ejection, I can blame my EXT4 filesystem errors on my SD card hardware.  They're reduced significantly with another SD card.

    From the log, and my subsequent tests, I've found that loading the kernel modules doesn't succeed, which I find odd, as I've compiled them and installed them as per the PSDK4 wiki.  However, my system appears to read an unreachable chunk of memory, spinlock and panic.  The memory address it attempts to jump to is never the same, and I have yet to identify any pattern between tests.

    [    6.859521] Unable to handle kernel paging request at virtual address 00004d64



    There are several similar faults where it attempts to jump to an address less than the page size, and as such prints out an error for a NULL pointer reference.  The kernel goes and dies after that.  This appears to still be due to memory/filesystem errors, but more of a misconfiguration rather than a hardware fault.

    7522.putty.log

    putty1.log

  • In reply to Tom W:

    Doing a bit more digging, it's apparent that the reason the system doesn't want to run is because the EXT4 accesses result in a Kernel OOPS fault, which in turn causes the kernel to kill the process, which hangs the whole system.  This is also seen when systemd's init process has an attempt on its life as well.  I noticed that the CMEM module (which according to the wiki page, at least partially handles the virtual-to-physical address translation system) does not get loaded: 

    [ 3.392048] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
    Starting Load Kernel Modules...
    [ 3.432379] cmemk: disagrees about version of symbol module_layout
    [ 3.443113] cryptodev: disagrees about version of symbol module_layout
    [ OK ] Created slice User and Session Slice.
    [ 3.452841] gdbserverproxy: disagrees about version of symbol module_layout
    [ 3.473874] uio_module_drv: disagrees about version of symbol module_layout
    [ OK ] Reached target Slices.
    [ OK ] Listening on Journal Socket (/dev/log).
    Mounting POSIX Message Queue File System...
    [ OK ] Listening on udev Control Socket.
    Starting Create list of required st... nodes for the current kernel...
    Starting Journal Service...
    Mounting Temporary Directory...
    [ OK ] Listening on /dev/initctl Compatibility Named Pipe.
    [ OK ] Started Dispatch Password Requests to Console Directory Watch.
    [ OK ] Reached target Paths.
    [ OK ] Mounted Debug File System.
    [ OK ] Mounted POSIX Message Queue File System.
    [ OK ] Mounted Temporary Directory.
    [ OK ] Started Journal Service.
    [ OK ] Started Setup Virtual Console.
    [ OK ] Started Remount Root and Kernel File Systems.
    [FAILED] Failed to start Load Kernel Modules.

    I've noticed that these modules are all external modules.  They came from the compressed filesystem that PSDK4 comes with (in filesystem/tisdk-rootfs-image-am57xx-evm.tar.xz).  Also, when the normal modules are compiled by the following command:

    make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- modules

    These modules are not compiled (log attached): compile.log

    Am I correct in assuming that these modules are incorrectly included in the build somehow?  If there is a way to recompile them with my kernel to have the proper version number, then that would be excellent.

  • In reply to Tom W:

    Tom W
    Here is the log you requested:

    (Please visit the site to view this file)

    From this log, it seems to me that the boot flow stuck in rootfs, not in linux kernel. Can you try to switch from tisdk-rootfs to the bare minimal arago-base fs and check how it goes there?

    {PSDK}/filesystem/arago-base-tisdk-image-am57xx-evm.tar.xz

    Tom W
    I've come to the assumption that over months of rigorous use with minimal safe ejection, I can blame my EXT4 filesystem errors on my SD card hardware.  They're reduced significantly with another SD card.

    Can you try with EXT3? Note that the default SD card script is creating rootfs with ext3:

    {PSDK}/bin/create-sdcard.sh

    Regards,
    Pavel



  • In reply to Tom W:

    Tom W
    This is also seen when systemd's init process has an attempt on its life as well.  I noticed that the CMEM module (which according to the wiki page, at least partially handles the virtual-to-physical address translation system) does not get loaded: 

    Tom W
    I've noticed that these modules are all external modules.  They came from the compressed filesystem that PSDK4 comes with (in filesystem/tisdk-rootfs-image-am57xx-evm.tar.xz).  Also, when the normal modules are compiled by the following command:

    The source code is located at:

    {PSDK}/board-support/extra-drivers/

    Refer to the toplevel Makefiel ({PSDK}/Makefile) for how to build and install these modules (i.e. cmem_mod)

    processors.wiki.ti.com/.../Processor_Linux_SDK_Top-Level_Makefile

    Regards,
    Pavel



  • In reply to Pavel Botev:

    AHA!  That appears to have done it!  I managed to get to the emergency terminal once I removed the offending external modules (and flushed out some file permission issues on my build-host) and got myself to the proper terminal.  Thank you very much for your help, Pavel!

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.