memcpy to Pool memory best practices

Steve15

Intellectual 385 points

Other Parts Discussed in Thread: OMAP3530

BACKGROUND:

running Linux 2.6.29 on OMAP3530

Using V4L2 driver to capture images from USB camera

Allocating memory for images in DSPLink Pool memory (DSPLink 1.61.03)

PROBLEM:

memcpy from buffer allocated by V4L2 to buffer allocated by Pool is very slow (~30ms).

memcpy from buffer allocated by V4L2 to memory allocated in the application is pretty fast (~5ms).

<code>

char localbuffer[307200];

char * poolBuffer = GetPoolMemory( 307200 );

// Dequeue the buffer from V4L2

buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
ret = ioctl( fd, VIDIOC_DQBUF, &buf);

char * psrc = (u8*)vd->mem[vd->buf.index];

memcpy( poolBuffer, psrc, 307200 ); // slow ~ 30ms

memcpy( localbuffer, psrc, 307200 ); // fast ~ 5ms

...

</code>

QUESTION:

Q1: Why would the memcpy be so slow to the memory allocated by Pool (DSPLink)?

Q2: Is there a "better" [i.e. faster] way to get an image from the V4L2 interface into memory that can then be processed by the DSP core?

over 14 years ago

0 Bernie Thompson TI over 14 years ago

TI__Mastermind 41665 points

Steve15 said:
Q1: Why would the memcpy be so slow to the memory allocated by Pool (DSPLink)?

I am not sure this is the case but one possibility would be that the memory in the pool to be shared with the DSP has extra cache coherency overhead associated with it that is not dealt with when working with a local buffer that is only operated on by the ARM, however this difference seems a bit excessive.

Steve15 said:
Q2: Is there a "better" [i.e. faster] way to get an image from the V4L2 interface into memory that can then be processed by the DSP core?

One possibility would be to leverage a DMA channel.

0 Steve15 over 14 years ago in reply to Bernie Thompson TI

Intellectual 385 points

Bernie Thompson said:

Q2: Is there a "better" [i.e. faster] way to get an image from the V4L2 interface into memory that can then be processed by the DSP core?

One possibility would be to leverage a DMA channel.

[/quote]

What is the best way to leverage the DMA channel?

Is there some sample code that I can reference?

0 Steve15 over 14 years ago in reply to Steve15

Intellectual 385 points

Where do I get started with DMA on the GPP side? I really just need to do a simple memcpy replacement called from C code. Are there 2 lines of code that can do this or an example somewhere?

Thanks.

0 Steve15 over 14 years ago in reply to Steve15

Intellectual 385 points

bump.

Bernie?

RESTATE PROBLEM:

I have memory allocated using POOL interfaces from DSPLink to share memory between GPP and DSP. I would like to use buffers allocated by POOL and pass this as a user pointer to the V4L2 interface to capture into directly.

Is there any example for the OMAP3530 that:

1. Shows how to capture images using the V4L2 interfaces (GPP side)

2. GPP tells the DSP that data is ready by sending a message

3. DSP "processes the image" [sums the pixels]

4. DSP sends the sum to the GPP using a message

Thanks.

0 MUGDHA ARORA over 14 years ago in reply to Steve15

Genius 4120 points

Hi,

The reason why copies to POOL buffers might be slow, is that POOL buffers are in non-cached memory on the Linux-side.

Have you looked at using CMEM?

http://tiexpressdsp.com/index.php/CMEM_Overview

Recent versions of CMEM allow allocating cacheable memory, so you might be able to use cmem-allocated memory to have fast memcpy.
http://tiexpressdsp.com/index.php/Linux_Utils_Roadmap

Regards,
Mugdha

0 Steve15 over 14 years ago in reply to MUGDHA ARORA

Intellectual 385 points

@Mugdha

using the CMEM module was a promising lead! I use the CMEM_allocate and copy the data into that buffer and it is very fast (~3ms).

example:

ptr = (u8 *)CMEM_alloc2(0, 614400, &params);

NEW PROBLEM:

How to I access that buffer on the DSP? Is there something I need to add to the TCI file (dsplink-omap3530-base.tci)?

Currently the DSP has no knowledge of the buffers allocated by CMEM. [Whereas is DOES know about the buffers allocated by DSPLink.]

My current CMEM is loaded:

insmod /home/cmemk.ko phys_start=0x85000000 phys_end=0x86000000 pools=3x614400

Thanks! I feel like we are very close.

0 MUGDHA ARORA over 14 years ago in reply to Steve15

Genius 4120 points

Steve,

You basically need to do two things:

1. Ensure that the DSP doesn't trample over this newly carved out CMEM space.

2. Ensure that the DSP has access to this space (i.e. its MMU is configured for this space).

For (1), you just need to make sure that your TCI file defines the DSP memory map such that the CMEM space is not part of the memory region that is used for DSP code/data (DDR2 in case of OMAP3530).

For (2), you need to update the /dsplink/config/all/CFG_OMAP3530_SHMEM.c file to add the memory region that is used for CMEM into the LINKCFG_memTable_00, with its SHARED field as TRUE. For more details on how to do this, refer to:

http://tiexpressdsp.com/index.php/Changing_DSPLink_Memory_Map

See this also (it's important for OMAP3530): http://tiexpressdsp.com/index.php/OMAP3_DSP_MMU_Configuration

Regards,

Mugdha

0 Changsheng Li over 9 years ago in reply to MUGDHA ARORA

Expert 2445 points

Hi MugdhaK:
Thanks for your post!
But I using MCSDK. So how should I do to config CMEM in DSP side?
Thanks.

Processors

Processors forum

memcpy to Pool memory best practices