The TI E2E™ design support forums will undergo maintenance from Sept. 28 to Oct. 2. If you need design support during this time, contact your TI representative or open a new support request with our customer support center.

Physically contiguous memory is important in embedded systems.  Often, DMA engines, DSPs, and other hardware peripherals don't go through MMUs, and therefore must operate on memory that's physically contiguous.

We introduced CMEM to help manage this memory.  (CMEM API Reference Guide)  CMEM has a few key features:

  1. Actually manage memory.  That is, support allocate/free APIs and the provide the bookkeeping necessary to do so.  CMEM can manage multiple blocks, as well as provide a "heap" and/or "pool" based allocation from each block.
  2. Address translation.  The CMEM APIs are offered in user mode, and return process-specific virtual memory address.  When providing this memory to peripherals and other cores, these virtual addresses cannot be used.  CMEM provides services to manage this virtual/physical address translation.
  3. Cache management.  CMEM's allocator can provide either cached or non-cached memory.  When cached memory is granted, the user can utilize these cache APIs as needed.

Do I have to use CMEM?

It depends, but the answer is "probably", at least some portion of it.  Given the list of features, you can see CMEM does more than just allocation.  So, if you're using Codec Engine, or some other library/framework that relies on those features, you'll obviously need CMEM. This section describes some of the use cases where CMEM is required.

For Codec Engine in particular, CMEM's allocator (feature #1 above) is used when granting memory - during algorithm creation - to 'local' Linux/WinCE-side algorithms (as part of the XDAIS memory request/grant process).  The dependencies for the 'local' Linux/WinCE-side algorithm is shown in this dependency diagram:

Memory Dependencies for Local Algorithms


On a ARM-only DM365, for example, CE unconditionally requires CMEM's memory allocator to provide memory to those 'local' Linux/WinCE-side algs during creation.  (Not to be confused with data buffers allocated and managed by the app, these IALG-requested "memTab[]" buffers are used internal to the algorithm.)  Note also that the alg runs in user-mode.  As a result, if these buffers need to be used by DMA's/IMCOP/etc, the alg must appropriately manage cache and address translation.  The alg should use the OS-independent MEMUTILS services for these operations.  This further isolates the alg from OS-specific libraries like CMEM.

Additionally, when using 'remote' algs, CE (actually the VISA stubs) uses CMEM's address translation (feature #2 above) to convert the user-supplied virtual addresses to physical addresses suitable for the remote processor.

Can I use non-CMEM memory for data buffers?

Sure!  If you want to manage data buffers independently CMEM, you have to take care of a few things (which Memory_* APIs take care of implicitly):

  1. Map your memory into your process's address space.  On Linux, this is typically done with mmap() / munmap() calls.
  2. Register the buffer with CE (so it can appropriately perform address translation).  You can use CE's Memory_registerContigBuf() / Memory_unregisterContigBuf() APIs for this.
  3. [Potentially] add/remove your memory buffer into the DSP-side MMU (for devices with remote MMUs, like OMAP3)

There may also be some subtle cache alignment issues to consider as well - don't give the DSP a cache-mis-aligned buffer right next to another buffer in use.  Cache management is performed on a cache aligned boundary and you don't want to inadvertently muck with neighboring buffers that share the same cache space.

Note also that as of Linux Utils 2.24, CMEM (on Linux) supports insmod'ing it with zero memory to manage.  This allows you to, for example, utilize CMEM for only the address translation services but not its memory management APIs.

Got related questions?  Post them in the comments below, or on the Embedded Software Forums.

Chris

Anonymous
  • @pablo, from an XDAIS algorithm/codec POV, you're describing 2 types of memory.  One is the internal memory the XDAIS alg requests via its IALG_Fxns, and which the framework allocates and gives it.  The other type of memory is data buffers (like your video frames) which are _not_ allocated by the alg, but rather allocated by the application and provided to the alg via arguments (e.g. a process() call's inBufs/outBufs).

    For the 'remote' codec use case you're describing, the application allocates these data buffers (sometimes using CMEM, but it can be non-CMEM memory if you follow the rules above).  The application then passes pointers to these data buffers (using their virtual memory addresses, b/c that's the memory space applications use) in the CE process() calls.  Under the hood, CE will handle translating these virtual memory pointers to "pointers usable by the remote processor" - often that's just the physical address, especially for remote processors without an MMU.

    Finally, in the example you mentioned, I think that's a bug(!) - XDAIS algorithms shouldn't directly be calling CMEM APIs.  In fact, that codec should fail the QualiTI validation.  This enables XDAIS algs to run in many different frameworks without directly binding to any one of the.

    Chris

  • Hi Chris,

    First of all, very good post. This has carified some doubts that I had. However, I still have some doubts when applying this concepts in a real system, probably because of a lack of experience.

    Imagine that we are working in a DM6446 device. We have a Linux application running on the ARM that wants to use a codec (xDAIS module) that runs on the DSP and let's assume that this codec need to access to a shared region between ARM and DSP (for example for reading a video frame).

    The Linux application would need to allocate a continuous memory region for storing this video frame, and then calling the codec telling what is the physical address of memory where the frame is stored. Is this right? In this case, what would be the difference between proceding this way and declaring a memory requirement for the xDAIS module (_alloc) that wants a region of the external memory? It is because this region could only be accessed by the xDAIS algorithm?

    I am having a look at the TI example located in \ti\sdo\fc\ires\examples\codecs\vicp2codec1. In vicp2codec1_ti.c, function VICP2CODEC1_TI_useVICP call the functions CMEM_init() and CMEM_getPhys(). I do not undertand what would be the purpose their execution in function useVICP.

    Sorry if the questions are too basic. Thank you in advance Chris.

    Best Regards,

    Pablo Colodron