This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: Linux Memory Policies

Part Number: AM5728

Tool/software: Linux

Hello, I'm working with getting my AM5728-based custom board to boot, and I'm running into a strange hang early in the kernel boot procedure.  I believe it may have to do with my RAM configuration.  During boot, my kernel prints the following messages then hangs indefinitely:

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0

[    0.000000] Linux version 4.9.28-geed43d1050 (tom@tom-ThinkPad-P50s) (gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11) ) #6 SMP PREEMPT Mon Oct 2 10:59:18 EDT 2017

[    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=30c5387d

[    0.000000] CPU: div instructions available: patching division code

[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache

[    0.000000] OF: fdt:Machine model: tom's custom board

[    0.000000] bootconsole [earlycon0] enabled

[    0.000000] efi: Getting EFI parameters from FDT:

[    0.000000] efi: UEFI not found.

[    0.000000] Reserved memory: created CMA memory pool at 0x0000000095800000, size 56 MiB

[    0.000000] OF: reserved mem: initialized node ipu2_cma@95800000, compatible id shared-dma-pool

[    0.000000] Reserved memory: created CMA memory pool at 0x0000000099000000, size 64 MiB

[    0.000000] OF: reserved mem: initialized node dsp1_cma@99000000, compatible id shared-dma-pool

[    0.000000] Reserved memory: created CMA memory pool at 0x000000009d000000, size 32 MiB

[    0.000000] OF: reserved mem: initialized node ipu1_cma@9d000000, compatible id shared-dma-pool

[    0.000000] Reserved memory: created CMA memory pool at 0x000000009f000000, size 8 MiB

[    0.000000] OF: reserved mem: initialized node dsp2_cma@9f000000, compatible id shared-dma-pool

[    0.000000] cma: Reserved 24 MiB at 0x000000008e400000

[    0.000000] Memory policy: Data cache writealloc

Could hanging after Data cache writealloc indicate a problem with the main system RAM?  The major difference between my board and the AM5728 EVM is that I only have 256MB of DDR3 RAM with different timings.  I've updated the u-boot configurations for this, and u-boot seems to have no issues with running and relocating itself in the RAM.  u-boot RAM read and write tests also indicate a high degree of coherency.


According to the EVM, the next line in the kernel boot should be:
[    0.000000] OMAP4: Map 0x00000000ffd00000 to fe600000 for dram barrier

which is why I feel like the problem could be memory related.  Is this assumption valid?  If it is what could cause u-boot to work correctly with my RAM and Linux not?

Another possibility is that the addresses given by the IPU/DSP's in the debug lines are accessing memory spaces that are unavailable to them.  Since we have only 256MB of RAM, mapped as 0x80000000 to 0x90000000, and those memory addresses for the DSP/IPU are outside of that range (0x95800000, 0x99000000, 0x9d000000, 0x9f000000, etc) they're trying to access memory that is simply unavailable to them?

Here is the full boot, in case anyone is interested.  Full decompiled device tree is as well.

0525.putty.log

2605.customboard.txt

  • Hi Tom,

    From the info you shared, it seems to me that linux kernel is trying to access memory that is not available. The default PSDK 4 (kernel 4.9) comes with 2GB DRAM, while you have only 256MB DRAM.

    You can also test your DRAM memory at u-boot stage with mw/md commands.

    Regards,
    Pavel
  • Hi Tom,

    Your assumption about accessing memory addresses for the DSP/IPU outside the physical existing memory can cause a crash of system.

    BR
    Tsvetolin Shulev
  • Thanks for the clarification, I've followed your recommendation with mw/md commands (as well as re-enabling mtest) and I've found some potential tricky spots in my DRAM.  These appear to be sections in RAM that U-Boot has placed commands or other important data (I've found the FIT/FDT command string in RAM).  But when you said that the default PSDK4 comes with 2GB (to match the AM57x EVM, for sure), is there somewhere in the kernel setup that I have to specify RAM size?  Because I've noticed that the statement that is supposed to print next after my system hangs is 

    [    0.000000] OMAP4: Map 0x00000000ffd00000 to fe600000 for dram barrier

    or something equivalent with regards to my memory boundaries.  So my theory is that my kernel is somehow looking for a  2GB boundary and it's looking past my 256MB boundary, and consequently hanging outside of available space.

    As an intermediary band-aid solution I've removed the DSP/IPU/offending device tree nodes from my device tree for now.

  • Tom,

    Tom W said:
    Thanks for the clarification, I've followed your recommendation with mw/md commands (as well as re-enabling mtest) and I've found some potential tricky spots in my DRAM.  These appear to be sections in RAM that U-Boot has placed commands or other important data (I've found the FIT/FDT command string in RAM). 

    I would suggest you to fix your DRAM first. See if the below pointers will be in help for testing it:

    I will have a look your other question (regarding memory map) and come back to you.

    Regards,
    Pavel

  • Tom W said:
    But when you said that the default PSDK4 comes with 2GB (to match the AM57x EVM, for sure), is there somewhere in the kernel setup that I have to specify RAM size? 

    In u-boot, make sure you have update your dmm_lisa_map_x registers and DTS file which are by default configured for 2GB DRAM:

    u-boot-2017.01/arch/arm/dts/am57xx-beagle-x15-common.dtsi

    memory@0 {
            device_type = "memory";
            reg = <0x0 0x80000000 0x0 0x80000000>;  //2GB DRAM
        };

    In kernel, you can adjust the DTS file from 2GB to 256MB:

    linux-4.9.28/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi

    memory@0 {
            device_type = "memory";
            reg = <0x0 0x80000000 0x0 0x80000000>;
        };

    See also below pointers for better DRAM config understanding:

    Regards,
    Pavel

  • Hi Pavel, thanks for the reply.  I've modified both of these memory@0 nodes to fit my config (256MB starting at 0x80000000):

    memory@0 {
            device_type = "memory";
            reg = <0x0 0x80000000 0x0 0x10000000>;
        };

    I've also modified the reserved-memory node as such:

    reserved-memory {
    /delete-node/ ipu2_cma@95800000;
    /delete-node/ dsp1_cma@99000000;
    /delete-node/ ipu1_cma@9d000000;
    /delete-node/ dsp2_cma@9f000000;

    cmem_block_mem_0: cmem_block_mem@a0000000 {
    reg = <0x0 0x88000000 0x0 0x01000000>;
    };

    };

    so that the DSP/IPU nodes are not used (they are disabled as well later in the DT) and the cmem block is moved to a space in RAM (and shrunk) that exists on my system.

    I've unfortunately had no results.  This is the output I receive:

    Starting kernel ...

    [ 0.000000] Booting Linux on physical CPU 0x0
    [ 0.000000] Linux version 4.9.28-geed43d1050 (tom@tom-ThinkPad-P50s) (gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11) ) #16 SMP PREEMPT Wed Oct 4 15:08:57 EDT 2017
    [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=30c5387d
    [ 0.000000] CPU: div instructions available: patching division code
    [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
    [ 0.000000] OF: fdt:Machine model: tom's customboard
    [ 0.000000] bootconsole [earlycon0] enabled
    [ 0.000000] efi: Getting EFI parameters from FDT:
    [ 0.000000] efi: UEFI not found.
    [ 0.000000] cma: Reserved 24 MiB at 0x000000008e400000
    [ 0.000000] Memory policy: Data cache writealloc

  • Tom W said:
    cmem_block_mem_0: cmem_block_mem@a0000000 {
    reg = <0x0 0x88000000 0x0 0x01000000>;
    };

    You should remove this node, or at least try with:

    cmem_block_mem_0: cmem_block_mem@88000000 {
    reg = <0x0 0x88000000 0x0 0x01000000>;
    };

    Tom W said:

    Starting kernel ...

    [ 0.000000] Booting Linux on physical CPU 0x0
    [ 0.000000] Linux version 4.9.28-geed43d1050 (tom@tom-ThinkPad-P50s) (gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11) ) #16 SMP PREEMPT Wed Oct 4 15:08:57 EDT 2017
    [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=30c5387d
    [ 0.000000] CPU: div instructions available: patching division code
    [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
    [ 0.000000] OF: fdt:Machine model: tom's customboard
    [ 0.000000] bootconsole [earlycon0] enabled
    [ 0.000000] efi: Getting EFI parameters from FDT:
    [ 0.000000] efi: UEFI not found.
    [ 0.000000] cma: Reserved 24 MiB at 0x000000008e400000
    [ 0.000000] Memory policy: Data cache writealloc

    Can you check if ARM A15 LPAE feature is enabled? If yes, disable it and try again.

    Regards,
    Pavel

  • You can also add some debug prints to identify where exactly your boot flow stuck

    linux-kernel/arch/arm/mm/mmu.c

    /*
     * paging_init() sets up the page tables, initialises the zone memory
     * maps, and sets up the zero page, bad page and bad page tables.
     */
    void __init paging_init(const struct machine_desc *mdesc)
    {
        void *zero_page;

        build_mem_type_table();   // --> Memory policy: Data cache writealloc
        prepare_page_table();
        map_lowmem();
        memblock_set_current_limit(arm_lowmem_limit);
        dma_contiguous_remap();
        early_fixmap_shutdown();
        devicemaps_init(mdesc); // --> OMAP4: Map 0x00000000ffd00000 to fe600000 for dram barrier
        kmap_init();
        tcm_init();

        top_pmd = pmd_off_k(0xffff0000);

        /* allocate the zero page. */
        zero_page = early_alloc(PAGE_SIZE);

        bootmem_init();

        empty_zero_page = virt_to_page(zero_page);
        __flush_dcache_page(NULL, empty_zero_page);
    }

     

    Check first if the flow goes out of the build_mem_type_table() function, then check if the flow goes inside the devicemaps_init() function.

  • Note also that AM335x SK (StarterKit) comes with 256MB DDR3 memory, you can check its device tree file and linux kernel config to get the proper settings for 256MB memory map.

    www.ti.com/.../tmdssk3358
  • Thanks again Pavel, I've fixed the memory address for cmem_block_mem@a0000000 and replaced it with cmem_block_mem@88000000.

    I disabled the LPAE, and proceeded further in the boot than before (log is attached) however I've noticed that I'm receiving quite a lot of EXT4 filesystem errors.  Unfortunately my other SD cards are all being used, so I can't verify for certain if the fault is with this SD card or not.  I've also noticed that the line 

    [    0.000000] OMAP4: Map 0x8fe00000 to fe600000 for dram barrier

     

    is still being printed.

    From what I've read about the LPAE, it seems rather necessary for the A-15 architecture.  I'm not planning on going above the 32-bit addressing limit, so I can see why disabling it could make sense, but I'm unsure if it's wise.

  • Tom,

    Tom W said:
    I disabled the LPAE, and proceeded further in the boot than before (log is attached)

    I do not see any attached log. Can you check again?

    Tom W said:

    I've also noticed that the line 

    [    0.000000] OMAP4: Map 0x8fe00000 to fe600000 for dram barrier

     

    is still being printed.

    I suspect this 0xFE600000 is the virtual address of the 0x8FE00000 physical address, so this should not be an issue. I will check that further.

    Tom W said:
    From what I've read about the LPAE, it seems rather necessary for the A-15 architecture.  I'm not planning on going above the 32-bit addressing limit, so I can see why disabling it could make sense, but I'm unsure if it's wise.

    Reading AM572x DM, it seems that LPAE is only needed when you have more than 2GB DRAM

    3.1 Device Comparison Table

    DDR3 Memory Controller (2)

    (2) In the Unified L3 memory map, there is maximum of 2GB of SDRAM space which is available to all L3 initiators including MPU (MPU, GPU, DSP, IVA, DMA, etc). Typically this space is interleaved across both EMIFs to optimize memory performance. If a system populates > 2GB of physical memory, that additional addressable space can be accessed only by the MPU via the ARM V7 Large Physical Address Extensions (LPAE).


    Note that LPAE is enabled by default from 4.4 kernel onwards. This feature is not enabled in AM572x PSDK 2.02 (kernel 4.1). See below e2e threads:

     

    Regards,
    Pavel

  • Tom W said:

    I've also noticed that the line 

    [    0.000000] OMAP4: Map 0x8fe00000 to fe600000 for dram barrier

     

    is still being printed.

    I confirm 0xfe600000 is virtual addr, so it should NOT be an issue. See linux-kernel/arch/arm/mach-omap2/omap4-common.c

    /* Used to implement memory barrier on DRAM path */
    #define OMAP4_DRAM_BARRIER_VA            0xfe600000

    static void __iomem *dram_sync, *sram_sync;
    static phys_addr_t dram_sync_paddr;

    /* Steal one page physical memory for barrier implementation */
    void __init omap_barrier_reserve_memblock(void)
    {
        dram_sync_size = ALIGN(PAGE_SIZE, SZ_1M);
        dram_sync_paddr = arm_memblock_steal(dram_sync_size, SZ_1M);
    }


    void __init omap_barriers_init(void)
    {
        struct map_desc dram_io_desc[1];

        dram_io_desc[0].virtual = OMAP4_DRAM_BARRIER_VA;
        dram_io_desc[0].pfn = __phys_to_pfn(dram_sync_paddr);
        dram_io_desc[0].length = dram_sync_size;
        dram_io_desc[0].type = MT_MEMORY_RW_SO;
        iotable_init(dram_io_desc, ARRAY_SIZE(dram_io_desc));
        dram_sync = (void __iomem *) dram_io_desc[0].virtual;

        pr_info("OMAP4: Map %pa to %p for dram barrier\n",&dram_sync_paddr, dram_sync);

        soc_mb = omap4_mb;
    }

  • Apologies for the late reply, I've been away from the office.  Here is the log you requested:

    6557.g3_linux_no_lpae.log

    Also thank you for confirming the LPAE and virtual address from the source.  

    I've come to the assumption that over months of rigorous use with minimal safe ejection, I can blame my EXT4 filesystem errors on my SD card hardware.  They're reduced significantly with another SD card.

    From the log, and my subsequent tests, I've found that loading the kernel modules doesn't succeed, which I find odd, as I've compiled them and installed them as per the PSDK4 wiki.  However, my system appears to read an unreachable chunk of memory, spinlock and panic.  The memory address it attempts to jump to is never the same, and I have yet to identify any pattern between tests.

    [    6.859521] Unable to handle kernel paging request at virtual address 00004d64



    There are several similar faults where it attempts to jump to an address less than the page size, and as such prints out an error for a NULL pointer reference.  The kernel goes and dies after that.  This appears to still be due to memory/filesystem errors, but more of a misconfiguration rather than a hardware fault.

    5355.putty.log

    2538.putty1.log

  • Doing a bit more digging, it's apparent that the reason the system doesn't want to run is because the EXT4 accesses result in a Kernel OOPS fault, which in turn causes the kernel to kill the process, which hangs the whole system.  This is also seen when systemd's init process has an attempt on its life as well.  I noticed that the CMEM module (which according to the wiki page, at least partially handles the virtual-to-physical address translation system) does not get loaded: 

    [ 3.392048] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
    Starting Load Kernel Modules...
    [ 3.432379] cmemk: disagrees about version of symbol module_layout
    [ 3.443113] cryptodev: disagrees about version of symbol module_layout
    [ OK ] Created slice User and Session Slice.
    [ 3.452841] gdbserverproxy: disagrees about version of symbol module_layout
    [ 3.473874] uio_module_drv: disagrees about version of symbol module_layout
    [ OK ] Reached target Slices.
    [ OK ] Listening on Journal Socket (/dev/log).
    Mounting POSIX Message Queue File System...
    [ OK ] Listening on udev Control Socket.
    Starting Create list of required st... nodes for the current kernel...
    Starting Journal Service...
    Mounting Temporary Directory...
    [ OK ] Listening on /dev/initctl Compatibility Named Pipe.
    [ OK ] Started Dispatch Password Requests to Console Directory Watch.
    [ OK ] Reached target Paths.
    [ OK ] Mounted Debug File System.
    [ OK ] Mounted POSIX Message Queue File System.
    [ OK ] Mounted Temporary Directory.
    [ OK ] Started Journal Service.
    [ OK ] Started Setup Virtual Console.
    [ OK ] Started Remount Root and Kernel File Systems.
    [FAILED] Failed to start Load Kernel Modules.

    I've noticed that these modules are all external modules.  They came from the compressed filesystem that PSDK4 comes with (in filesystem/tisdk-rootfs-image-am57xx-evm.tar.xz).  Also, when the normal modules are compiled by the following command:

    make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- modules

    These modules are not compiled (log attached): 2642.compile.log

    Am I correct in assuming that these modules are incorrectly included in the build somehow?  If there is a way to recompile them with my kernel to have the proper version number, then that would be excellent.

  • Tom W said:
    Here is the log you requested:

    (Please visit the site to view this file)

    From this log, it seems to me that the boot flow stuck in rootfs, not in linux kernel. Can you try to switch from tisdk-rootfs to the bare minimal arago-base fs and check how it goes there?

    {PSDK}/filesystem/arago-base-tisdk-image-am57xx-evm.tar.xz

    Tom W said:
    I've come to the assumption that over months of rigorous use with minimal safe ejection, I can blame my EXT4 filesystem errors on my SD card hardware.  They're reduced significantly with another SD card.

    Can you try with EXT3? Note that the default SD card script is creating rootfs with ext3:

    {PSDK}/bin/create-sdcard.sh

    Regards,
    Pavel

  • Tom W said:
    This is also seen when systemd's init process has an attempt on its life as well.  I noticed that the CMEM module (which according to the wiki page, at least partially handles the virtual-to-physical address translation system) does not get loaded: 

    Tom W said:
    I've noticed that these modules are all external modules.  They came from the compressed filesystem that PSDK4 comes with (in filesystem/tisdk-rootfs-image-am57xx-evm.tar.xz).  Also, when the normal modules are compiled by the following command:

    The source code is located at:

    {PSDK}/board-support/extra-drivers/

    Refer to the toplevel Makefiel ({PSDK}/Makefile) for how to build and install these modules (i.e. cmem_mod)

    processors.wiki.ti.com/.../Processor_Linux_SDK_Top-Level_Makefile

    Regards,
    Pavel

  • AHA!  That appears to have done it!  I managed to get to the emergency terminal once I removed the offending external modules (and flushed out some file permission issues on my build-host) and got myself to the proper terminal.  Thank you very much for your help, Pavel!