This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

memcpy to Pool memory best practices

Other Parts Discussed in Thread: OMAP3530

BACKGROUND:

running Linux 2.6.29 on OMAP3530

Using V4L2 driver to capture images from USB camera

Allocating memory for images in DSPLink Pool memory (DSPLink 1.61.03)

PROBLEM:

memcpy from buffer allocated by V4L2 to buffer allocated by Pool is very slow (~30ms).

memcpy from buffer allocated by V4L2 to memory allocated in the application is pretty fast (~5ms).

 

<code>

char localbuffer[307200];

char * poolBuffer = GetPoolMemory( 307200 );

// Dequeue the buffer from V4L2

  buf.type    = V4L2_BUF_TYPE_VIDEO_CAPTURE;
  buf.memory  = V4L2_MEMORY_MMAP;
  ret = ioctl( fd, VIDIOC_DQBUF, &buf);

 

char * psrc = (u8*)vd->mem[vd->buf.index];

memcpy( poolBuffer, psrc, 307200 ); // slow ~ 30ms

memcpy( localbuffer, psrc, 307200 ); // fast ~ 5ms

...

</code>

QUESTION:

Q1: Why would the memcpy be so slow to the memory allocated by Pool (DSPLink)?

Q2: Is there a "better" [i.e. faster] way to get an image from the V4L2 interface into memory that can then be processed by the DSP core?

 

  • Steve15 said:
    Q1: Why would the memcpy be so slow to the memory allocated by Pool (DSPLink)?

    I am not sure this is the case but one possibility would be that the memory in the pool to be shared with the DSP has extra cache coherency overhead associated with it that is not dealt with when working with a local buffer that is only operated on by the ARM, however this difference seems a bit excessive.

    Steve15 said:
    Q2: Is there a "better" [i.e. faster] way to get an image from the V4L2 interface into memory that can then be processed by the DSP core?

    One possibility would be to leverage a DMA channel.

  • Bernie Thompson said:

    Q2: Is there a "better" [i.e. faster] way to get an image from the V4L2 interface into memory that can then be processed by the DSP core?

    One possibility would be to leverage a DMA channel.

    [/quote]

    What is the best way to leverage the DMA channel?

    Is there some sample code that I can reference?

  • Where do I get started with DMA on the GPP side?  I really just need to do a simple memcpy replacement called from C code.  Are there 2 lines of code that can do this or an example somewhere?

    Thanks.

  • bump.

    Bernie?

    RESTATE PROBLEM:

    I have memory allocated using POOL interfaces from DSPLink to share memory between GPP and DSP.  I would like to use buffers allocated by POOL and pass this as a user pointer to the V4L2 interface to capture into directly. 

    Is there any example for the OMAP3530 that:

    1. Shows how to capture images using the V4L2 interfaces (GPP side)

    2. GPP tells the DSP that data is ready by sending a message

    3. DSP "processes the image" [sums the pixels]

    4. DSP sends the sum to the GPP using a message

     

    Thanks.

     

  • Hi,

    The reason why copies to POOL buffers might be slow, is that POOL buffers are in non-cached memory on the Linux-side.

    Have you looked at using CMEM?

    http://tiexpressdsp.com/index.php/CMEM_Overview

    Recent versions of CMEM allow allocating cacheable memory, so you might be able to use cmem-allocated memory to have fast memcpy.
    http://tiexpressdsp.com/index.php/Linux_Utils_Roadmap

    Regards,
    Mugdha

  • @Mugdha

    using the CMEM module was a promising lead!  I use the CMEM_allocate and copy the data into that buffer and it is very fast (~3ms).

    example:

    ptr = (u8 *)CMEM_alloc2(0, 614400, &params);

    NEW PROBLEM:

    How to I access that buffer on the DSP?  Is there something I need to add to the TCI file (dsplink-omap3530-base.tci)?

    Currently the DSP has no knowledge of the buffers allocated by CMEM.  [Whereas is DOES know about the buffers allocated by DSPLink.]

    My current CMEM is loaded:

    insmod /home/cmemk.ko phys_start=0x85000000 phys_end=0x86000000 pools=3x614400

    Thanks!  I feel like we are very close.

     

  • Steve,

    You basically need to do two things:

    1. Ensure that the DSP doesn't trample over this newly carved out CMEM space.

    2. Ensure that the DSP has access to this space (i.e. its MMU is configured for this space).

    For (1), you just need to make sure that your TCI file defines the DSP memory map such that the CMEM space is not part of the memory region that is used for DSP code/data (DDR2 in case of OMAP3530).

    For (2), you need to update the /dsplink/config/all/CFG_OMAP3530_SHMEM.c file to add the memory region that is used for CMEM into the LINKCFG_memTable_00, with its SHARED field as TRUE. For more details on how to do this, refer to:

    http://tiexpressdsp.com/index.php/Changing_DSPLink_Memory_Map

    See this also (it's important for OMAP3530): http://tiexpressdsp.com/index.php/OMAP3_DSP_MMU_Configuration

    Regards,

    Mugdha

  • Hi MugdhaK:
    Thanks for your post!
    But I using MCSDK. So how should I do to config CMEM in DSP side?
    Thanks.