Zero copy shared memory with DSP and IPU

Kevin JOLY

Other Parts Discussed in Thread: SYSBIOS

Dear all,

In order to process some computer vision algorithm in parallel, I would like to use the DSP and IPU cores. So, I thought it would be nice to use a zero copy shared memory to store the image where all the CPUs can access. RPMSG and remoteproc is working well on my pandaboard ES and I have already investigate some 'library" to use to do this : DCE, ION, TI TILER, etc ... So, the question is : which one should be used now with RPMSG? How the shared memory pointer is sent to the RTOS?

I use a ubuntu TI kernel.

Thanking you in advance

Best regards

Kevin

over 11 years ago

0 Yanko Todorov-XID over 11 years ago

TI__Mastermind 29520 points

Hello Kevin,To use IPU subsystem based on Cortex M3 cores, you must use DCE library. DCE is a library to remotely invoke the hardware accellerated codecs on IVA-HD via a syslink/rcm shim layer. This provides access to the Codec Engine codec interface on the coprocessor (ducati/M3) from the host.

DCE is used by Video player for OMAP framebuffer - http://git.mansr.com/?p=omapfbplay and GStreamer plugins for omap4 using libdce - https://gitorious.org/gstreamer-omap/gst-ducati/source/742ce4785e6eafbe8844a44bb0e84770f2d8cb48:

You can obtain Syslink through the link - http://omappedia.org/wiki/SysLink_Git_Trees

I suggest you referring to link - http://omappedia.org/wiki/DistributedCodecEngine to obtain more information about distributed codec engine DCE.

Best regards,

Yanko

0 Kevin JOLY over 11 years ago in reply to Yanko Todorov-XID

Prodigy 160 points

Yanko,

I'm sorry, maybe I did a bad explanation of my problem. In fact, I made my custom firmware for DSP and Cortex-M3 using Sysbios and FreeRTOS. It work well with RPMSG and RPMSG_OMX kernel module so that I am able to send message from the host processor to remote processors. The aim is to perform custom processing using those cores (both M3 and DSP). What I was wondering : is there a way to allocate a shared memory region and send the pointer to this region to the remote processors? Do I have to make a new kernel driver or can I use DCE for this purpose?

I've seen an example of shared memory using FreeRTOS : it defined a shared region in resource.h and resource.c. According to Omappedia, this region is allocated for the M3 by remoteproc kernel driver and we can get the virtual address in kernel space from the rproc_da_to_va() call. It can be a solution to my problem but if something is already existing I'd rather to use it!

regards

Kevin

0 Kevin JOLY over 11 years ago in reply to Kevin JOLY

Prodigy 160 points

Sorry for double-posting but I got myself the answer of my question, hope it will help:

The allocation of shared memory is done through omapdrm driver (have a look on DCE on TI's omap4 launchpad). This driver use the Tiler memory of the omap4460. Then, I made a new rpmsg client driver inspired by rpmsg-omx and omapdce to send the physical address to the remote processor.

I have still some few question:

- Why the access time of this Tiler memory is slower than access-time in a locally declared array?

- After calling the omap_bo_del() and omap_device_del() functions, the shared region seem to be still allocated. Did I missed something?

Kevin

0 Yanko Todorov-XID over 11 years ago in reply to Kevin JOLY

TI__Mastermind 29520 points

Hello Kevin,

Q1: Why the access time of this Tiler memory is slower than access-time in a locally declared array?

- The TILER can be virtually accessed in four different modes: 8-, 16-, 32-bit, and page modes.

Written differently, the functionality of the TILER is to map a 2D virtually addressed interconnect request
into one or more physically addressed interconnect requests by:
• Transforming the virtual address, data, and byte-enable to match the requested 0-, 90-, 180-, or 270-
degree orientation in a tiled 2D addressing space
• Optionally, translating the oriented tiled address by a page-specific vector to manage memory
fragmentation and physical object aliasing

It is normal the access time to Tiler memory to be slower.

See the TILER allocation map, which shows the number, size, and location of buffers allocated by the encoders and decoders in the system. /sys/module/tiler_omap/parameters/alloc_debug

Q2: After calling the omap_bo_del() and omap_device_del() functions, the shared region seem to be still allocated. Did I missed something?

- Could you provide more in information about your use case?

See the following presentation about GStreamer and DMA buffers in OMAP4 - http://gstreamer.freedesktop.org/data/events/gstreamer-conference/2012/omap-dmabuf-gstcon2012.pdf

I suggest having look on this link http://sourcecodebrowser.com/libdrm/2.4.33/omap__drm_8c.html#ae8e2742718da01f32eae83de7415854c

Best regards,

Yanko

0 Kevin JOLY over 11 years ago in reply to Yanko Todorov-XID

Prodigy 160 points

Yanko,

thank you very much for those informations. That make sense now!

About Q2: my bad, I called this function omap_gem_get_paddr(obj, &paddr, true); before each transfer to send the physical pointer to the remote processor. But, according to those lines :http://lxr.free-electrons.com/source/drivers/gpu/drm/omapdrm/omap_gem.c#L744 this argument lead to a reserve function. So, obviously it was not the correct way to do. Instead, I add an IOCTL call in my driver to ask for the paddr only once right after the user allocation using omapdrm (like omapdce do?).

Best regards,

Kevin

0 Yanko Todorov-XID over 11 years ago in reply to Kevin JOLY

TI__Mastermind 29520 points

Hello Kevin,

In current SW releases for OMAP4, the path to omap_gem.c file is /drivers/staging/omapdrm

Refer to release notes - http://www.omappedia.com/wiki/4AJ.2.5P2_OMAP4_Jelly_Bean_Release_Notes

If buffer is allocated physically contiguous, the OMAP_BO_DMA flag is set and the paddr is valid. Also if the buffer is remapped in TILER and paddr_cnt > 0, then paddr is valid. However, if you are using the physical address and OMAP_BO_DMA is not set, then you should be going through omap_gem_{get,put}_paddr() to ensure the mapping is not removed.

If you don't use reserve function, then you also can use tiler pin/unpin function in drivers/staging/omapdrm omap_dmm_tiler.c

You also can use and DRM_IOCTLs as you suggested, but you should modify the driver.

Best regards,

Yanko

0 Kevin JOLY over 11 years ago in reply to Yanko Todorov-XID

Prodigy 160 points

Yanko,

There is no OMAP_BO_DMA in userspace /usr/include/omap/omap_drm.h header. But maybe OMAP_BO_SCANOUT does the job to allocate physically contiguous buffer?

About the access time, I investigated more deeply and the reason why the Tiler memory was slow is because I used write-combine cache method (OMAP_BO_WC). The access time was the same as local memory when I used OMAP_BO_CACHED but it results to iommu fault when the DSP tried to access it. Since my shared data are not subject to change very often, it would be nice to use cache and to manually flush it. Do you know a way to use cache and avoid iommu fault?

Thanking you again for your helpful answers.

Best regards,

Kevin

Processors

Processors forum

Zero copy shared memory with DSP and IPU