This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DIMM size

Hello,

I got the KeyStone II EVM. It has the DIMM memory with the size of 2G Byte.

But when I ran the free command in Linux prompt, I showed only 512 MB.

root@129:~/sFFT-1.0-2.0-seq-c-keystone-2# free
total used free shared buffers cached
Mem: 512124 39548 472576 0 0 4072
-/+ buffers/cache: 35476 476648
Swap: 0 0 0

But I do configured the memory size as 1536 M in the ubot

uboot# setenv mem_reserve 1536M
So where is the problem?




  • Hello Cheng,

    free is showing the memory used by ARM

    mem_reserve is reserving the memory used by C66x(DSP).

    So total ARM + DSP = 0.5G + 1.5 G = 2 G.

    regards,

    David

  • Thank you for your reply, David!

    I am a bit confused, cause I learned that ARM and DSP shared the DIMM in KeyStone II, right? 

    dzhou said:

    Hello Cheng,

    free is showing the memory used by ARM

    mem_reserve is reserving the memory used by C66x(DSP).

    So total ARM + DSP = 0.5G + 1.5 G = 2 G.

    regards,

    David

  • Cheng,

    U-Boot has mem_reserve env variable to reserve DDR3 memory at the end of the 32 bit address space. This will be useful for reserving memory for DSP. Based on the memory availability on the board, the address range of this region will change. So any users of this feature need to make sure the address match with what is reserved through this mechanism. Otherwise the user application can step into kernel memory space and cause kernel crash during system operation. By default 512M memory is reserved at the end of the address space. To change the default size, user need to update this env variable and save the configuration using saveenv command.

    regards,

    David

  • Thanks, David,

    By modifying the 

    setenv mem_reserve

    I could finally configure a larger memory size for the ARM.

    But I am still not clear that ARM and DSP shared the physical DIMM, isn't it? Why I have to split the memory into ARM and DSP side?

    For instance, for the EVM with 8G memory. I could only create up to 4GByte data on the ARM and move it to DSPs, right?

    Thanks
  • Hi Cheng,

    ARM and DSP share physical memory not virtual memory. In this case, when you reserve 512 MB for ARM, it means you are reserving that memory for ARM Linux. If you cat /proc/iomem, this becomes quite clear:

    800000000-81fffffff : System RAM
      800008000-8006b178f : Kernel code
      800704000-800775f33 : Kernel data

    TI provides the cmem module that is used by both OpenCL and OpenMPAcc to allocate contiguous chunks of memory that are available for use to both the ARM and DSP. This cmem region is outside of ARM Linux memory range. When your cmem kernel module is loaded, you will be able to see this again from /proc/iomem:

    0c040000-0c4fffff : CMEM - This is in MSMC SRAM

    823000000-87fffffff : CMEM - This is in DDR 

    Of course, the cmem range may differ based on the parameters that were used while loading the kernel module.  

    So if you wanted to create a 4Gbyte shared buffer (using OpenCL or OpenMPAcc) between the ARM and DSP, this would be created in cmem and no data movement would be required.  

    Gaurav

  • Thanks Gaurav,

    In my application, I wanna create a data with size, say 1.5G Byte on the ARM side, and transfer to DSP for computation.

    So If I understand correctly, CMEM is outside of the ARM Linux memory space, so virtually, the memory space is split into ARM Linux, and CMEM, which is a shared memory by ARM and DSP ??

    So back to my question above, if I have 1.5 G data on ARM, and also wanna let DSP use that data, how can I do that?

    Thanks

    Cheng

    Gaurav Mitra1 said:

    Hi Cheng,

    ARM and DSP share physical memory not virtual memory. In this case, when you reserve 512 MB for ARM, it means you are reserving that memory for ARM Linux. If you cat /proc/iomem, this becomes quite clear:

    800000000-81fffffff : System RAM
      800008000-8006b178f : Kernel code
      800704000-800775f33 : Kernel data

    TI provides the cmem module that is used by both OpenCL and OpenMPAcc to allocate contiguous chunks of memory that are available for use to both the ARM and DSP. This cmem region is outside of ARM Linux memory range. When your cmem kernel module is loaded, you will be able to see this again from /proc/iomem:

    0c040000-0c4fffff : CMEM - This is in MSMC SRAM

    823000000-87fffffff : CMEM - This is in DDR 

    Of course, the cmem range may differ based on the parameters that were used while loading the kernel module.  

    So if you wanted to create a 4Gbyte shared buffer (using OpenCL or OpenMPAcc) between the ARM and DSP, this would be created in cmem and no data movement would be required.  

    Gaurav

  • Hi Cheng,

    In OpenCL: Use clEnqueueMapBuffer(). This will give you an ARM host pointer to a buffer in CMEM. You can operate on ARM using the host pointer as you like, call clEnqueueUnmapMemObject(), then set the associated buffer as your kernel argument and enqueue the kernel. 

    In OpenMPAcc: Use __TI_omp_device_alloc(). This will also give you an ARM host pointer to a buffer in CMEM. Operate on it as you like on ARM, then pass it into a target region. When you are done, call __TI_omp_device_free(). 

    Both these methods allow you to have a shared buffer in CMEM with the ARM and DSP able to operate on it. They also do not require you to do any data movement between ARM Linux and CMEM memory regions. 

    Gaurav

  • Hi Gaurav,

    Thanks for your reply. It did reply my question. 

    Just a follow up question: 

    If I created a memory at the Host, e.g., 

    double *data = (double *) malloc (size_t * sizeof(double));

    #pragma omp target map (to:data[0: size_t)

    #pragma omp parallel for

    for(int i=0; i< size_t; i++)

         data[i] ...

    So what happened on the memory allocation? 

    I suppose in this case the array of "data" is allocated on the host, which is at the memory region of the Linux on ARM, right? 

  • Hi Cheng,

    Yes. In this case:

    1. *data is allocated in a heap within ARM Linux host memory
    2. #pragma omp target map(to:data[0:size_t]) performs a memcpy from *data in ARM Linux host memory to a DSP device buffer in CMEM
    3. #pragma omp parallel region operates on this DSP device buffer
    4. If you had a map(from:data[0:size_t]) or specified tofrom instead of to initially, the DSP device buffer would be memcpy'd out to *data in ARM Linux host memory

    Gaurav

  • Thanks Gaurav,

    So using __TI_omp_device_alloc() could avoid the data copies from host to device, and verse vera... and will yield to better performance, right?

    Thanks

    Cheng

  • That is correct Cheng.

    Gaurav

  • Thanks for your patience, Gaurav,

    I appreciate it!!

    Cheng