This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AM5728: ARM LPAE and/or 4GB mapped memory support

Part Number: AM5728
Other Parts Discussed in Thread: SYSBIOS, 66AK2H14

Tool/software: TI-RTOS

Hello,

Is it possible to access the full range of 4GB of DDR connected to an AM5728 processor in TI-RTOS programs? Specifically, is it possible to have virtual addresses 0x80000000 through 0xFFFFFFFF mapped to the lower half of 4GB of DDR with virtual addresses 0x100000000 through 0x180000000 mapped to the upper half of 4GB of DDR, and have all of those addresses accessible via pointers in a program? That specific range isn't required, but I want to be able to access all 4GB at one time in a program.

Thanks,

Matt McKee

  • Mathew,

    Please check a similar discussion on this topic and the guidance provided here:
    e2e.ti.com/.../2740190

    Regards.
    Rahul
  • Hi Rahul,

    I did see and read that discussion. Unfortunately, the DDR diagnostic program that you mentioned in one of your replies crashes when trying to access memory addresses above 0xFFFFFFFF even if those addresses are mapped to 0x200000000 and beyond and are accessible via CCS' memory browser after hardware configuration. This assumes the DDR diagnostic program you mentioned is the one in the board support library, diag/mem.

    Do you have some sample form of LPAE support ?

    Thanks,

    Matt McKee

  • Matt,

    See footnote 2 under Table 3-1 in the AM572x Data Manual (SPRS953E):
    (2) In the Unified L3 memory map, there is maximum of 2GB of SDRAM space which is available to all L3 initiators including MPU (MPU,
    GPU, DSP, IVA, DMA, etc). Typically this space is interleaved across both EMIFs to optimize memory performance. If a system
    populates > 2GB of physical memory, that additional addressable space can be accessed only by the MPU via the Arm V7 Large
    Physical Address Extensions (LPAE).

    This aligns with Table 3-1 in the AM572x and footnotes on page 398 of the AM572x Sitara Processors Technical Reference Manual (SPRUHZ6J). Note that the processing cores all natively support 32-bit / 4GB addressing. All addressing beyond this limit must be handled through an MMU or other extended addressing method. Therefore, only 2GB of DDR3 is directly accessible as indicated in the DM. The other 2GB, when the maximum allowed is installed, must be accessed using LPAE.

    When you have the processor configured for only 1GB of SDRAM on each EMIF, do you see robust memory accesses?  We need to verify this before moving to more complex configurations.

    Tom

  • Hi Tom,

    Thank you for your response. When the processor is configured for 1GB of SDRAM on each EMIF, for a total of 2GB SDRAM, we do see robust memory access. The provided TI-RTOS diagnostic example 'mem,' in the board support portion of the PDK, can write to and read from the entirety of those 2GBs. 0x80000000 to 0xFFFFFFFF.

    Thanks,

    Matt McKee

  • Matt,

    We appreciate your patience.  All sample code readily available aligns with the existing EVMs that only have 1GB on each EMIF.  We are working to identify sample code that properly configured the LPAE for access to the extended memory space.

    Tom

  • Hi Tom,

    Thank you for your continued updates. I am looking forward to reviewing the sample code that you find.

    Cheers,

    Matt McKee
  • Matt,

    We are still working this thread.

    Tom

  • Hi Matt,

    The LPAE is done through MMU configuration. Here is the reference details for TI-RTOS. Similar mapping could be done for baremetal in the case of diagnostics.
    software-dl.ti.com/.../Mmu.html

    It is good to understand if you would need TI-RTOS reference example finally beyond diagnostic. In most of the TI Processor SDK RTOS examples, there is .cfg file where the MMU is used to setup DDR mapping.

    Regards, Eric
  • Hi Eric,

    If you have an example of an MMU configuration where the full 4GB is addressable, that would be appreciated. Additionally, information on how to apply that MMU configuration to baremetal diagnostics would also be extremely useful. As far as I can tell, the latest SDK mem diagnostics program doesn't test the full 4GB of DDR4 on the AM65x EVM so this information would be useful for that as well.

    Thanks,
    Matt McKee
  • Hi Matt,

    We have baremetal case MMU setup for AM654x. In any board diagnostic examples, the application is linked with ti.csl.aa53fg library where you can find a csl_a53.c is archived in this library, from pdk_am65xx_1_0_2\packages\ti\csl\arch\a53\src. The function CSL_mmuStartup() setup the MMU.

    There are two API calls:
    CSL_mmuInitLevel1Desc() and CSL_mmuInitLevel2Desc(). The former uses 1GB block size. The latter uses 2MB block size. We did the first 2GB DDR:
    desc = CSL_mmuInitLevel1Desc(0x80000000, &attrs);
    gCSLa53Mmulevel1Table[2] = desc;
    desc = CSL_mmuInitLevel1Desc(0xC0000000, &attrs);
    gCSLa53Mmulevel1Table[3] = desc;

    The address is a 64-bit address, you can do the same for the next 2GB DDR.

    Regards, Eric
  • Hi Eric,

    Can you confirm whether this is actually possible on the AM572x SOCs? As far as I can tell, the A15 version of the CSL code that you mention takes 32-bit void pointers for virtual and physical address arguments and also casts them as 32-bit integers within the functions. I would think that this would prevent the mapping of physical addresses beyond 0xFFFF FFFF which is necessary for using more than 2GB of DDR since the physical addresses for anything beyond 2GB starts at 0x02 0000 0000.

    Thanks,
    Matt McKee
  • Hi,

    From what I saw the SYSBIOS A15 MMU: software-dl.ti.com/.../Mmu.html

    Void
    Mmu_setFirstLevelDesc(Ptr virtualAddr, UInt64 phyAddr, Mmu_DescriptorAttrs *attrs);
    Void
    Mmu_setSecondLevelDesc(Ptr virtualAddr, UInt64 phyAddr, Mmu_DescriptorAttrs *attrs);

    They are 64-bit address.

    Regards, Eric
  • Hi Eric,

    Happy New Year. Would this work for baremetal code? If not, can you check whether the CSL code actually supports mapping 64-bit addresses?

    Thanks,
    Matt McKee
  • Hi Matt,

    This is the SYSBIOS code and will not work on bare-metal environment. For AM572x, I looked at the bare-metal CSL code: ti\csl\arch\a15\src\csl_a15_startup.c for
    CSL_a15InitMmuLongDesc()
    And
    CSL_a15SetMmuSecondLevelLongDesc()

    The physical address used is 32-bit. I checked with our development team. We never had to do 64 bit for Processor SDK RTOS projects in bare metal or TI-RTOS. What is your DDR size and address range used for this?

    Regards, Eric
  • Hi Eric,

    The DDR size is 4GB and I believe the address range would be 0x80000000 to 0xFFFFFFFF and 0x200000000 to 0x280000000 for a complete 4GB DDR test. Since the virtual address range is only 32-bit I was hoping to use VA 0x0 to 0xFFFFFFFF after hardware configuration (if possible).

    Thanks,
    Matt McKee
  • Matt,

    If you do a virtual address (0x0 to 0xFFFFFFFF) for the 4GB DDR, how do you access any normal addresses originally between 0x0 to 0x7fff_ffff? And if you wanted to map a 36-bit physical address into 32-bit VA via API, the CSL API doesn't use 36-bit pointers, I was told the A15 GCC compiler only supports 32-bit pointer. If use the 36-bit, it will not work.

    Regards, Eric
  • Hi Eric,

    I assumed there might be some sort of MMU table trickery I could employ to access the full range of DDR since the implication thus far has been that it is possible to access 4GB in certain situations. Does the TI ARM compiler support 36-bit pointers? Even if it does, would I only be able to use >2GB DDR as stack or heap space due to the 32-bit VA range limitation?

    Thanks,
    Matt
  • Matt,

    The A15 compiler we used is: gcc-arm-none-eabi-6-2017-q1-update/bin/arm-none-eabi-gcc. When using 64-bit pointer we got problem , so the CSL MMU code we only used 32-bit pointer.

    Regards, Eric
  • Eric,

    If the A15 compiler included with the RTOS does not support 64-bit pointers, can you advise me on how to utilize the full range of any DDR population >2GB in a SYSBIOS project? On a 4GB DDR configuration, can the DDR >2GB be utilized in any way, specifically at the same time as the first 2GB?

    Thanks,
    Matt McKee
  • Matt,
    Sorry for the delay on this, we will post our reply soon.
  • Matt,

    Sorry for the delay! We only have AM57x boards with 2GB DDR installed. For the CSL software, we need to study the feasibility based on the ARM spec details on configuring the LPAE using current CSL lib. This will take sometime.

    We are also checking other group they may have 4GB DDR with similar AM57x chips and if that was tested.

    I will update you the status in a few days later.

    Regards, Eric
  • Hi,

    Thanks for the patience! Another group has EVM with 4GB DDR and that was tested using Linux, the info is here: processors.wiki.ti.com/.../Category:Processor_SDK_Linux_Automotive
    The DRA7x processor is the same as AM57x. You can see the software download link and look into the code for how the LPAE is setup.

    For the AM57x, the Linux is similar to DRA7x, the info is here: www.ti.com/.../PROCESSOR-SDK-AM57X
    LPAE and the configuration is done in Linux Kernel. LPAE support in kernel is enabled by Kconfig symbol CONFIG_ARM_LPAE.
    Look for the function early_mm_init() that gets called in file "arch/arm/kernel/setup.c" This should give more details on how LPAE is setup.

    LPAE was not able to test on AM57x EVMs due to the DDR is only 2GB in size. And there is no RTOS for LPAE test. Hope you can find the code info from Linux and adapt to your card for DDR test.

    Regards, Eric
  • Let me rephrase: The DRA7x processor and AM57x come from the same basic platform, there are different IPs and features, but for the purpose of this post, Eric comments are accurate.
  • Eric, Rogerio,

    Thank you for the feedback. PHYTEC supports 4GB DDR3 in Linux for the phyCORE-AM572x so we have some familiarity with LPAE and how it is setup (at least in Linux).

    In the TI-RTOS roadmap are there plans to add support for 4GB DDR3 for the DRA7x processor? That would be very helpful for us as we have customers running into limitations with 2GB DDR3 with TI-RTOS.

    Thanks,
    Serah
  • Serah,

    Sorry, we don't have any plan to support LPAE in TI-RTOS.

    Regards, Eric
  • Matt,

    How are you doing? May I suggest the following:

    As you know, Catalog's AM57x EVM itself doesn't support 4GB memory but its sister chip from Automotive, DRA7xx EVM supports 4GB memory. So theoretically you can look at how that is done and then port the configuration portion of the code to AM572x.

    EVM:
    www.ti.com/.../J6EVM5777

    Software:
    www.ti.com/.../PROCESSOR-SDK-DRA7X

    best regards,
    David Zhou
  • Matthew McKee said:
    Is it possible to access the full range of 4GB of DDR connected to an AM5728 processor in TI-RTOS programs? Specifically, is it possible to have virtual addresses 0x80000000 through 0xFFFFFFFF mapped to the lower half of 4GB of DDR with virtual addresses 0x100000000 through 0x180000000 mapped to the upper half of 4GB of DDR, and have all of those addresses accessible via pointers in a program?

    The Cortex-A15 has a 32-bit virtual address, which can be mapped by the MMU to a 40-bit physical address using the ti.sysbios.family.arm.a15.Mmu module.

    I don't have an AM5728 board with 4GB of DDR, but did perform a test on a Cortex-A15 in a 66AK2H14 on a EVMK2H with 8GB fitted to the DDR3A interface. For the test the SYS/BIOS Cortex-A15 virtual address map was set so that:

    a. The program runs in Multicore shared memory (MSM), where virtual addresses 0x0C000000 -> 0x0C5FFFFF are mapped to MCMCSRAM physical addresses 0x000C000000 -> 0x000C5FFFFF

    b. Virtual addresses 0x10000000 -> 0xFFFFFFFF are used to form a 3840MB heap, mapped to DDR3A physical addresses 0x0800000000 -> 0x08F0000000.

    c. Some of the other virtual addresses < 0x10000000 are mapped to peripherals required by SYS/BIOS such as timers.

    The test program performed a number of allocations from the heap, and filled the allocations with a test pattern and then verified the test pattern to check that all of the heap could be used:

    [arm_A15_0] enter main()
    Test iteration 0 using 1 allocations
      Requested alloc_size=0xf0000000  actual alloc_size=0xf0000000  buffer=@10000000
    Test iteration 1 using 2 allocations
      Requested alloc_size=0xc0000000  actual alloc_size=0xc0000000  buffer=@10000000
      Requested alloc_size=0x30000000  actual alloc_size=0x30000000  buffer=@d0000000
    Test iteration 2 using 3 allocations
      Requested alloc_size=0x80000000  actual alloc_size=0x80000000  buffer=@10000000
      Requested alloc_size=0x60000000  actual alloc_size=0x60000000  buffer=@90000000
      Requested alloc_size=0x10000000  actual alloc_size=0x10000000  buffer=@f0000000
    Test iteration 3 using 4 allocations
      Requested alloc_size=0x80000000  actual alloc_size=0x80000000  buffer=@10000000
      Requested alloc_size=0x30000000  actual alloc_size=0x30000000  buffer=@90000000
      Requested alloc_size=0x30000000  actual alloc_size=0x30000000  buffer=@c0000000
      Requested alloc_size=0x10000000  actual alloc_size=0x10000000  buffer=@f0000000
    Test complete

    A similar approach may be possible on an AM5728 to configure some of the unused virtual addresses below 0x80000000 as mapping to DDR memory with physical addresses >= 0x100000000. One issue is you need to leave virtual address space for any peripherals required by the program, and while the MMU can be used to give a different peripheral virtual address to the physical address, not sure if the drivers will work with a non-default virtual address.

    My test program is attached, developed using:
    - CCS 8.3
    - GNU ARM compiler v7.2.1
    - SYS/BIOS 6.75.1.05
    - XDCtools 3.51.1.18_core

    66AK2H14_A15_3840M_heap.zip

    Note: When the program was first run the EVMK2H boot mode was set to "DSP no boot" with the xtcievmk2x_arm.gel initialising the DDR interfaces, but the GEL script only configured DDR3A as 2Gbytes which caused the test to fail as part of the heap was aliased to the same physical addresses and the expected test pattern was not read back. Changing the EVMK2H boot mode to "ARM SPI" and letting U-boot initialise DDR3 as 8 GB then allowed the test to run successfully.

  • Hi Chester,

    Thanks for that response and sample code. I haven't had time to work with it yet, but that does give me something to try out when I revisit the topic. Much appreciated.

    -Matt McKee