This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMDSLCDK138: OMAPL138 IPC and shared memory

Part Number: TMDSLCDK138
Other Parts Discussed in Thread: OMAPL138, DA8XX

Hello TI!

We're currently evaluating the use of TI's IPC for ARM<->DSP communication. Here is the environment:

  • OMAP L138 LCDK
  • Processor SDK Linux OMAPL138 06.03.00.106
  • SYS/BIOS 6.76.03.01
  • IPC 3.50.04.08

IPC product tree includes an example (ex02_messageq) that demonstrates the use of MessageQ+remoteproc+virtio chain for messages exchange. In this example, a contiguous range of DDR memory is used for both the IPC buffers (0xc3000000-0xc31000000) and DSP executable (>0xc3100000). As far as we understand, the DSP's part of IPC chain doesn't support the cache coherence maintenance, thus the cache is simply disabled in ex02_messageq/dsp/Dsp.cfg/Dsp.cfg:

/* Set 0xc0000000 -> 0xc3ffffff to be non-cached VirtQueue based IPC
 * Set 0xc4000000 -> 0xc4ffffff to be cached. Not currently not used
 */
Cache.MAR192_223 = 0x00000010;
 

Running the DSP code in DDR with no cache is nonsense. Taking into account the 16 MB resolution of MARs, we could think of allocating the DSP memory range in such a way that the first megabyte (devoted to IPC) hangs in a non-cached area. That's just awkward, but okay. But our primary concern is that OMAP L138 processor features a 128K range of fast on-chip memory destined for ARM-DSP communication (0x80000000-0x8001ffff). In our application, we need only a small number (say, 8) of 4KB buffers for communication from/to the DSP., these together with the trace buffer could perfectly fit into the 128K of the shared memory. We guess if the power of TI's IPC stack could be coupled with this hardware capability.

Could you please provide a high-level view of how this task might be solved with existing remoteproc implementation? Clearly, this involves some hardcore Linux programming, but what is your opinion on the most efficient approach? As an example, we see that DTBs for other TI cores might specify more than one carveouts (which is not the case for da850-lcdk.dts), such that the respective remoteproc variant has some flexibility in memory allocation. Could you point to some reference solution?

Many thanks for your assistance!

Regards, Pavel

  • Hi, Pavel,

    For the ex02_messageq example, you can move the DSP code to cached area in 0xc4000000, The move should also be reflected in config.bld and resource table. The default resource table of romap-l138 in ipc/packages/ti/ipc/remoteproc shows

    #define DATA_DA 0xc3100000

    If using shared memory is desired in IPC, it is the same as using DDR as in the example code. The address of shared memory at 0x80000000 is used instead of DDR at 0xc0000000. 

    If IPC is not used, but using shared memory between Linux/ARM and RTOS/DSP, the shared memory area  needs to be reserved in reserved-memory node in dts file. CHIPSIG registers can be used to notify the other side.

    Rex

  • Thank you. Rex!

    We wish to use the shared memory region (0x80000000) for IPC, and a part of the DDR memory to deploy the DSP executable (starting from 0xc300000). Here is asummary of our dts file related to the DSP (both the dts and dtsi are also available at 1drv.ms/.../s!AuWcVv1NeQohjTasH_UOUB4r8Ctl

    /* OMITTED */
    / {
    	/* OMITTED */
    	memory@c0000000 {
    		/* 128 MB DDR2 SDRAM @ 0xc0000000 */
    		reg = <0xc0000000 0x08000000>;
    	};
     
        	reserved-memory {
    		#address-cells = <1>;
    		#size-cells = <1>;
    		ranges;
    
    		dsp_memory_region: dsp-memory@c3000000 {
    			compatible = "shared-dma-pool";
    			reg = <0xc3000000 0x02000000>;
    			reusable;
    			status = "okay";
    		};
    	}; 
    	/* OMITTED */
    }; 
    /* OMITTED */
    &dsp {
    	memory-region = <&dsp_memory_region>;
    	status = "okay";
    }; 

    So, there is a single dsp_memory_region defined, and it is utilized for both the IPC buffers and DSP code/data. Could you please clarify:

    1. How to specify the second memory region (starting at 0x80000000 of 0x20000/128K bytes)?
    2. How to instruct the remoteproc to use the orginal dsp_memory_region solely for DSP executable, and the new region in the shared memory - for IPC communication?

    Regards, Pavel

  • Hi, Pavel,

    You should be able to define another entry in reserved-memory node for the area you want to use with starting address and size. The reserved memory in dts file notifies Kernel not to map this area in MMU so DSP can access it. If configured in MMU, it will block DSP to access.The memory location being in DDR or shared memory doesn't matter. As long as you reserved it for DSP access and configured in DSP (config.bld and resource table). They should be accessible.

    The DSP executable location is configured in config.bld and resource table. Remoteproc gets the info in resource table to load the executable to.

    Rex

  • Hi, Rex!

    I'll try to be more specific. First, we have to reduce the number of message buffers from 512 to 8, in both directions. We also need to increase the message buffer size from 512 to 4096 bytes, so that the whole bulk of buffers is 2*8*4096 = 64K. These buffers, together with VRINGS, should fit into 128K of the Shared Memory. I took the following steps to change the layout of message buffers:

    1. Modified ti-processor-sdk-linux-omapl138-lcdk-06.03.00.106/board-support/linux-4.19.94+gitAUTOINC+be5389fd85-gbe5389fd85//drivers/rpmsg/virtio_rpmsg_bus.c:
      --- virtio_rpmsg_bus.c.original	2020-04-19 09:43:17.000000000 +0300
      +++ virtio_rpmsg_bus.c	2020-05-06 16:45:28.460003417 +0300
      @@ -149,8 +149,8 @@
        * can change this without changing anything in the firmware of the remote
        * processor.
        */
      -#define MAX_RPMSG_NUM_BUFS	(512)
      -#define MAX_RPMSG_BUF_SIZE	(512)
      +#define MAX_RPMSG_NUM_BUFS	(16)
      +#define MAX_RPMSG_BUF_SIZE	(4096)
       
       /*
        * Local addresses are dynamically allocated on-demand. 
    2.  Modified ti-processor-sdk-linux-omapl138-lcdk-06.03.00.106/board-support/linux-4.19.94+gitAUTOINC+be5389fd85-gbe5389fd85///net/rpmsg/rpmsg_proto.c:
      --- rpmsg_proto.c.original	2020-04-19 09:43:17.000000000 +0300
      +++ rpmsg_proto.c	2020-05-06 16:47:44.192008452 +0300
      @@ -32,7 +32,7 @@
       /* Maximum buffer size supported by virtio rpmsg transport.
        * Must match value as in drivers/rpmsg/virtio_rpmsg_bus.c
        */
      -#define RPMSG_BUF_SIZE               (512)
      +#define RPMSG_BUF_SIZE               (4096)
       
       struct rpmsg_socket {
       	struct sock sk; 
    3. Modified ipc_3_50_04_08/linux/src/transport/TransportRpmsg.c:
      --- TransportRpmsg.c.original	2020-04-20 23:13:45.000000000 +0300
      +++ TransportRpmsg.c	2020-05-06 16:41:27.699994486 +0300
      @@ -67,7 +67,7 @@
       
       /* More magic rpmsg port numbers: */
       #define MESSAGEQ_RPMSG_PORT       61
      -#define MESSAGEQ_RPMSG_MAXSIZE   512
      +#define MESSAGEQ_RPMSG_MAXSIZE  4096
       
       #define TransportRpmsg_GROWSIZE 32
       #define INVALIDSOCKET (-1) 
    4. Modified ipc_3_50_04_08/packages/ti/ipc/rpmsg/_VirtQueue.h:
      --- _VirtQueue.h.original	2020-04-20 23:13:45.000000000 +0300
      +++ _VirtQueue.h	2020-05-06 16:23:07.239953663 +0300
      @@ -53,7 +53,7 @@
       /*!
        *  @brief  Size of buffer being exchanged in the VirtQueue rings.
        */
      -#define RPMSG_BUF_SIZE     (512)
      +#define RPMSG_BUF_SIZE     (4096)
       
       
       #if defined (__cplusplus) 
    5.  Modified ipc_3_50_04_08/packages/ti/ipc/rpmsg/RPMessage.xdc:
      --- RPMessage.xdc.original	2020-04-20 23:13:45.000000000 +0300
      +++ RPMessage.xdc	2020-05-06 16:38:24.763987700 +0300
      @@ -47,11 +47,11 @@
            *  ======== numMessageBuffers ========
            *  The number of message buffers available in the pool
            */
      -    config UInt numMessageBuffers = 512;
      +    config UInt numMessageBuffers = 16;
       
           /*!
            *  ======== messageBufferSize ========
            *  The size (in bytes) of each message buffer
            */
      -    config UInt messageBufferSize = 512;
      +    config UInt messageBufferSize = 4096;
       } 
    6. Modified the config file of ex02_messageq DSP app, to change  the buffers layout and declare a custom resource table:
      --- Dsp.cfg.original	2020-04-20 23:13:45.000000000 +0300
      +++ Dsp.cfg	2020-05-06 15:35:51.795848480 +0300
      @@ -88,8 +88,8 @@
       var HeapBuf = xdc.useModule('ti.sysbios.heaps.HeapBuf');
       var params = new HeapBuf.Params;
       params.align = 8;
      -params.blockSize = 512;
      -params.numBlocks = 256;
      +params.blockSize = 4096;
      +params.numBlocks = 16;
       var msgHeap = HeapBuf.create(params);
       
       var MessageQ  = xdc.useModule('ti.sdo.ipc.MessageQ');
      @@ -133,6 +133,19 @@
       Timer.timerSettings[1].master = true;
       Timer.defaultHalf = Timer.Half_LOWER;
       Clock.timerId = 1;
      +
      +// Enable Memory Translation module that operates on the Resource Table
      +Resource.loadSegment = Program.platform.dataMemory;
      +// Use custom resource table
      +Resource.customTable = true;
      +
      +// Custom sizes of the virtqueues see cutom resource table(rsc_table_omapl138.c RPMSG_VQX_SIZE)
      +VirtQueue = xdc.useModule('ti.ipc.family.omapl138.VirtQueue');
      +VirtQueue.VQ0_SIZE = 8;
      +VirtQueue.VQ1_SIZE = 8;
      +VirtQueue.RP_MSG_NUM_BUFS = 8;
      +
      +
       /*
        *  ======== Instrumentation Configuration ========
        */ 
    7. Created a custom resource table in MainDsp.c:
      #include <ti/ipc/remoteproc/rsc_types.h>
      #include <xdc/runtime/SysMin.h>
      
      #define DATA_DA                 0xc3100000
      
      #ifndef DATA_SIZE
      #  define DATA_SIZE  (SZ_1M * 8) /*(SZ_1M * 15)*/
      #endif
      
      #define RPMSG_VRING0_DA         0xc3000000
      #define RPMSG_VRING1_DA         0xc3002000
      
      #define RPMSG_VQ0_SIZE          8
      #define RPMSG_VQ1_SIZE          8
      
      /* flip up bits whose indices represent features we support */
      #define RPMSG_DSP_FEATURES      1
      
      struct my_resource_table {
          struct resource_table base;
      
          UInt32 offset[3];
      
          /* rpmsg vdev entry */
          struct fw_rsc_vdev rpmsg_vdev;
          struct fw_rsc_vdev_vring rpmsg_vring0;
          struct fw_rsc_vdev_vring rpmsg_vring1;
      
          /* data carveout entry */
          struct fw_rsc_carveout data_cout;
      
          /* trace entry */
          struct fw_rsc_trace trace;
      };
      
      extern Void* ti_trace_SysMin_Module_State_0_outbuf__A[];
      
      #define TRACEBUFADDR (UInt32)&ti_trace_SysMin_Module_State_0_outbuf__A
      #define TRACEBUFSIZE 0x1000
      
      #pragma DATA_SECTION(ti_ipc_remoteproc_ResourceTable, ".resource_table")
      #pragma DATA_ALIGN(ti_ipc_remoteproc_ResourceTable, 4096)
      
      struct my_resource_table ti_ipc_remoteproc_ResourceTable = {
          1, /* we're the first version that implements this */
          3, /* number of entries in the table */
          0, 0, /* reserved, must be zero */
          /* offsets to entries */
          {
              offsetof(struct my_resource_table, rpmsg_vdev),
              offsetof(struct my_resource_table, data_cout),
              offsetof(struct my_resource_table, trace),
          },
      
          /* rpmsg vdev entry */
          {
              TYPE_VDEV, VIRTIO_ID_RPMSG, 0,
              RPMSG_DSP_FEATURES, 0, 0, 0, 2, { 0, 0 },
              /* no config data */
          },
          /* the two vrings */
          { RPMSG_VRING0_DA, 4096, RPMSG_VQ0_SIZE, 1, 0 },
          { RPMSG_VRING1_DA, 4096, RPMSG_VQ1_SIZE, 2, 0 },
      
          {
              TYPE_CARVEOUT, DATA_DA, DATA_DA, DATA_SIZE, 0, 0, "DSP_MEM_DATA",
          },
      
          {
              TYPE_TRACE, TRACEBUFADDR, TRACEBUFSIZE, 0, "trace:dsp",
          },
      };
    8.  I extended host's App.c so that it fills the payload area of a message with random data, and checks its integrity for the messages echoed back by the DSP:
      --> main:
      --> Main_main:
      --> App_create:
      App_create: Host is ready
      <-- App_create:
      --> App_exec:
      App_exec: sending message 1
      App_exec: sending message 2
      App_exec: sending message 3
      App_exec: message received (4032 payload bytes, checksum 0xd9aeca5a OK), sending message 4
      App_exec: message received (4032 payload bytes, checksum 0xc050934b OK), sending message 5
      App_exec: message received (4032 payload bytes, checksum 0x77439cc5 OK), sending message 6
      App_exec: message received (4032 payload bytes, checksum 0x8ddcbb43 OK), sending message 7
      App_exec: message received (4032 payload bytes, checksum 0xeb94d252 OK), sending message 8
      App_exec: message received (4032 payload bytes, checksum 0xf1938828 OK), sending message 9
      App_exec: message received (4032 payload bytes, checksum 0x5ab0736c OK), sending message 10
      App_exec: message received (4032 payload bytes, checksum 0x206ed068 OK), sending message 11
      App_exec: message received (4032 payload bytes, checksum 0xbdfae52e OK), sending message 12
      App_exec: message received (4032 payload bytes, checksum 0x264a031b OK), sending message 13
      App_exec: message received (4032 payload bytes, checksum 0x91590c95 OK), sending message 14
      App_exec: message received (4032 payload bytes, checksum 0x6bc62b13 OK), sending message 15
      App_exec: message received (4032 payload bytes, checksum 0x0d8e4202 OK)
      App_exec: message received (4032 payload bytes, checksum 0x17891878 OK)
      App_exec: message received (4032 payload bytes, checksum 0xbcaae33c OK)
      <-- App_exec: 0
      --> App_delete:
      <-- App_delete:
      <-- Main_main:
      <-- main: 

    Everything works fine so far. Now, I want to relocate message buffers to the shared memory, still using the DDR for the DSP image. I added a new node to reserved-memory in da850-lcdk.dts, and tried to tell the remoteproc to use two memory ranges:

    --- da850-lcdk.dts.original	2020-04-19 09:43:17.000000000 +0300
    +++ da850-lcdk.dts	2020-05-06 11:30:07.272116260 +0300
    @@ -26,11 +26,23 @@
     		reg = <0xc0000000 0x08000000>;
     	};
     
    +	memory@80000000 {
    +		/* 128 KB Shared Memory */
    +		reg = <0x80000000 0x00020000>;
    +	};
    +
     	reserved-memory {
     		#address-cells = <1>;
     		#size-cells = <1>;
     		ranges;
     
    +		dsp_shared_memory: dsp-memory@80000000 {
    +			compatible = "shared-dma-pool";
    +			reg = <0x80000000 0x20000>;
    +			reusable;
    +			status = "okay";
    +		};
    +
     		dsp_memory_region: dsp-memory@c3000000 {
     			compatible = "shared-dma-pool";
     			reg = <0xc3000000 0x1000000>;
    @@ -385,6 +397,6 @@
     };
     
     &dsp {
    -	memory-region = <&dsp_memory_region>;
    +	memory-region = <&dsp_shared_memory>,<&dsp_memory_region>;
     	status = "okay";
     }; 

    Also changed the respective entries in my resource table:

    --- MainDsp.c.resourceTbl	2020-05-06 18:59:01.252300658 +0300
    +++ MainDsp.c	2020-05-06 16:28:09.711964884 +0300
    @@ -59,8 +59,8 @@
     #  define DATA_SIZE  (SZ_1M * 8) /*(SZ_1M * 15)*/
     #endif
     
    -#define RPMSG_VRING0_DA         0xc3000000
    -#define RPMSG_VRING1_DA         0xc3002000
    +#define RPMSG_VRING0_DA         0x80000000
    +#define RPMSG_VRING1_DA         0x80002000
     
     #define RPMSG_VQ0_SIZE          8
     #define RPMSG_VQ1_SIZE          8 

    This results in a failure during remoteproc startup. It responds the following:

    [   37.036286] davinci-rproc 11800000.dsp: device does not have specific CMA pool: -22
    [   37.180579] davinci-rproc: probe of 11800000.dsp failed with error -22

    Could you please explain how to make the da8xx flavor of remoteproc to use two memory regions: the first one (residing in the shared memory 0x80000000) for data exchange, and the other (DDR 0xc0000000) - for loading the DSP executable?

    Best Regards, Pavel

  • Hi, Pavel,

    The driver currently doesn’t support loading into either of the internal memories or on-chip SRAMs. The vring is done over a CMA pool.

    Rex

  • Hi, Pavel,

    By the way, I am getting info internally to see what involves to move vring to shared memory.

    Rex

  • Hi, Pavel,

    I got the response internally that "Where is the customer expecting the firmware sections to come from? If they are expecting this to be from the regular DDR region, then the current driver will need enhancements."

    Rex

  • Hi, Rex!

    Yes, your engineer is absolutely right in his/her supposition. Moreover, I stated the necessity of driver enhancement in the very first message of this thread. I just appealed for an advice on the most rational approach to this enhancement, and nothing more. In the meanwhile, we managed to solve the problem on our own. Here are the essentials of our solution:

    • DTS file should define two memory regions, one in the DDR (for DSP executable), and another in the SRAM (for IPC buffers). Both regions should have the no-map attribute.
    • In our approach, the custom resource table doesn't define a carveout region. It is limited to specification of VRINGS and the trace buffer.
    • We revised and extended the way the da8xx_remoteproc.c allocates memory for the DSP executable and IPC data buffers. Basically, we borrowed some code from remoteproc variants for other TI devices.

    Our approach is applicable to Linux and RTOS Processor SDKs for OMAP L138, version 06.03.00.106. If anyone needs to reproduce our enhancement, do the following:

    1. Apply aforementioned patches to virtio_rpmsg_bus.c, rpmsg_proto.c, TransportRpmsg.c, _VirtQueue.h, RPMessage.xdc, This is a must since orginal IPC buffers (512x512 bytes) would not fit into 120K of on-chip shared SRAM. We use 16 buffers, each of 4096 bytes.
    2. Use the following patch for da850-lcdk.dts:
      --- da850-lcdk.dts.orig	2020-04-19 09:43:17.000000000 +0300
      +++ da850-lcdk.dts	2020-05-11 16:59:02.310595084 +0300
      @@ -31,10 +31,17 @@
       		#size-cells = <1>;
       		ranges;
       
      -		dsp_memory_region: dsp-memory@c3000000 {
      +		dsp_memory_region_l3_cba_ram: dsp-memory0@80000000 {
      +			compatible = "shared-dma-pool";
      +			reg = <0x80000000 0x20000>;
      +			no-map;
      +			status = "okay";
      +		};
      +
      +		dsp_memory_region_ddr: dsp-memory1@c3000000 {
       			compatible = "shared-dma-pool";
       			reg = <0xc3000000 0x1000000>;
      -			reusable;
      +			no-map;
       			status = "okay";
       		};
       	};
      @@ -385,6 +392,7 @@
       };
       
       &dsp {
      -	memory-region = <&dsp_memory_region>;
      +        memory-region = <&dsp_memory_region_l3_cba_ram>,
      +			<&dsp_memory_region_ddr>;
       	status = "okay";
       }; 
    3.  Patch the drivers/remoteproc/da8xx-remoteproc.c:
      --- da8xx_remoteproc.c.orig	2020-05-11 12:08:12.001947700 +0300
      +++ da8xx_remoteproc.c	2020-05-11 15:59:53.410463435 +0300
      @@ -59,6 +59,8 @@
        * @rproc: rproc handle
        * @mem: internal memory regions data
        * @num_mems: number of internal memory regions
      + * @rmem: reserved memory regions data
      + * @num_rmems: number of reserved memory regions
        * @dsp_clk: placeholder for platform's DSP clk
        * @ack_fxn: chip-specific ack function for ack'ing irq
        * @irq_data: ack_fxn function parameter
      @@ -70,6 +72,8 @@
       	struct rproc *rproc;
       	struct da8xx_rproc_mem *mem;
       	int num_mems;
      +	struct da8xx_rproc_mem *rmem;
      +	int num_rmems;
       	struct clk *dsp_clk;
       	struct reset_control *dsp_reset;
       	void (*ack_fxn)(struct irq_data *data);
      @@ -192,10 +196,71 @@
       	writel(SYSCFG_CHIPSIG2, drproc->chipsig);
       }
       
      +/*
      +* Internal Memory translation helper
      +*
      +* Custom function implementing the rproc .da_to_va ops to provide address
      +* translation (device address to kernel virtual address) for internal RAMs.
      +* The translated addresses can be used either by the remoteproc core for 
      +* loading, or by any rpmsg bus drivers.
      +*/
      +static void *da8xx_rproc_da_to_va(struct rproc *rproc, u64 da, int len,
      +                                            u32 flags)
      +{
      +	struct da8xx_rproc *drproc = rproc->priv;
      +	void __iomem *va = NULL;
      +	phys_addr_t bus_addr;
      +	u32 dev_addr, offset;
      +	size_t size;
      +	int i;
      +
      +	dev_dbg(rproc->dev.parent, "da8xx_rproc_da_to_va da 0x%x len %d\n",
      +	        (u32)da, len);
      +
      +	if (len <= 0)
      +		return NULL;
      +
      +	for (i = 0; i < drproc->num_mems; i++) {
      +		bus_addr = drproc->mem[i].bus_addr;
      +		dev_addr = drproc->mem[i].dev_addr;
      +		size = drproc->mem[i].size;
      +
      +		if (da >= dev_addr && ((da + len) <= (dev_addr + size))) {
      +			offset = da - dev_addr;
      +			va = drproc->mem[i].cpu_addr + offset;
      +			return (__force void *)va;
      +		}
      +
      +		if (da >= bus_addr && ((da + len) <= (bus_addr + size))) {
      +			offset = da - bus_addr;
      +			va = drproc->mem[i].cpu_addr + offset;
      +			return (__force void *)va;
      +		}
      +	}
      +
      +	/* handle static DDR reserved memory regions */
      +	for (i = 0; i < drproc->num_rmems; i++) {
      +		dev_addr = drproc->rmem[i].dev_addr;
      +		size = drproc->rmem[i].size;
      +
      +		if (da >= dev_addr && ((da + len) <= (dev_addr + size))) {
      +			offset = da - dev_addr;
      +			va = drproc->rmem[i].cpu_addr + offset;
      +
      +			dev_dbg(rproc->dev.parent, "da8xx_rproc_da_to_va da 0x%x len %d va 0x%x\n",
      +			        (u32)da, len, (u32)va);
      +			return (__force void *)va;
      +		}
      +	}
      +
      +	return NULL;
      +}
      +
       static const struct rproc_ops da8xx_rproc_ops = {
       	.start = da8xx_rproc_start,
       	.stop = da8xx_rproc_stop,
       	.kick = da8xx_rproc_kick,
      +	.da_to_va = da8xx_rproc_da_to_va
       };
       
       static int da8xx_rproc_get_internal_memories(struct platform_device *pdev,
      @@ -236,6 +301,107 @@
       	return 0;
       }
       
      +static int da8xx_reserved_mem_init(struct platform_device *pdev,
      +                                       struct da8xx_rproc *drproc)
      +{
      +	struct device *dev = &pdev->dev;
      +	struct device_node *np = dev->of_node;
      +	struct device_node *rmem_np;
      +	struct reserved_mem *rmem;
      +	int num_rmems;
      +	int ret, i;
      +
      +	num_rmems = of_property_count_elems_of_size(np, "memory-region", 
      +	                                            sizeof(phandle));
      +
      +	dev_dbg(dev, "memory-region num = %d\n", 
      +	        num_rmems);
      +
      +	if (num_rmems <= 0) {
      +		dev_err(dev, "device does not have reserved memory regions, ret = %d\n",
      +		        num_rmems);
      +		return -EINVAL;
      +	}
      +	if (num_rmems < 2) {
      +		dev_err(dev, "device needs atleast two memory regions to be defined, num = %d\n",
      +		        num_rmems);
      +		return -EINVAL;
      +	}
      +
      +	/* use reserved memory region 0 for vring DMA allocations */
      +	ret = of_reserved_mem_device_init_by_idx(dev, np, 0);
      +	if (ret) {
      +		dev_err(dev, "device cannot initialize DMA pool, ret = %d\n",
      +		        ret);
      +		return ret;
      +	}
      +
      +	num_rmems--;
      +	drproc->rmem = kcalloc(num_rmems, sizeof(*drproc->rmem), GFP_KERNEL);
      +	if (!drproc->rmem) {
      +		ret = -ENOMEM;
      +		goto release_rmem;
      +	}
      +
      +	/* use remaining reserved memory regions for static carveouts */
      +	for (i = 0; i < num_rmems; i++) {
      +		rmem_np = of_parse_phandle(np, "memory-region", i + 1);
      +		if (!rmem_np) {
      +			ret = -EINVAL;
      +			goto unmap_rmem;
      +		}
      +
      +		rmem = of_reserved_mem_lookup(rmem_np);
      +		if (!rmem) {
      +			of_node_put(rmem_np);
      +			ret = -EINVAL;
      +			goto unmap_rmem;
      +		}
      +		of_node_put(rmem_np);
      +
      +		drproc->rmem[i].bus_addr = rmem->base;
      +		/* 64-bit address regions currently not supported */
      +		drproc->rmem[i].dev_addr = (u32)rmem->base;
      +		drproc->rmem[i].size = rmem->size;
      +		drproc->rmem[i].cpu_addr = ioremap_wc(rmem->base, rmem->size);
      +		if (!drproc->rmem[i].cpu_addr) {
      +			dev_err(dev, "failed to map reserved memory#%d at %pa of size %pa\n",
      +			        i + 1, &rmem->base, &rmem->size);
      +			ret = -ENOMEM;
      +			goto unmap_rmem;
      +		}
      +
      +		dev_dbg(dev, "reserved memory%d: bus addr %pa size 0x%zx va 0x%x da 0x%x\n",
      +		        i + 1, &drproc->rmem[i].bus_addr,
      +		        drproc->rmem[i].size, (u32)drproc->rmem[i].cpu_addr,
      +		        drproc->rmem[i].dev_addr);
      +	}
      +	drproc->num_rmems = num_rmems;
      +
      +	return 0;
      +
      +unmap_rmem:
      +	for (i--; i >= 0; i--) {
      +		if (drproc->rmem[i].cpu_addr)
      +			iounmap(drproc->rmem[i].cpu_addr);
      +	}
      +	kfree(drproc->rmem);
      +release_rmem:
      +	of_reserved_mem_device_release(dev);
      +	return ret;
      +}
      +
      +static void  da8xx_reserved_mem_exit(struct device *dev, struct da8xx_rproc *dproc)
      +{
      +	int i;
      +
      +	for (i = 0; i < dproc->num_rmems; i++)
      +		iounmap(dproc->rmem[i].cpu_addr);
      +	kfree(dproc->rmem);
      +
      +	of_reserved_mem_device_release(dev);
      +}
      +
       static int da8xx_rproc_probe(struct platform_device *pdev)
       {
       	struct device *dev = &pdev->dev;
      @@ -291,20 +457,11 @@
       		return PTR_ERR(dsp_reset);
       	}
       
      -	if (dev->of_node) {
      -		ret = of_reserved_mem_device_init(dev);
      -		if (ret) {
      -			dev_err(dev, "device does not have specific CMA pool: %d\n",
      -				ret);
      -			return ret;
      -		}
      -	}
      -
       	rproc = rproc_alloc(dev, "dsp", &da8xx_rproc_ops, da8xx_fw_name,
       		sizeof(*drproc));
       	if (!rproc) {
       		ret = -ENOMEM;
      -		goto free_mem;
      +		return ret;
       	}
       
       	/* error recovery is not supported at present */
      @@ -320,6 +477,13 @@
       	if (ret)
       		goto free_rproc;
       
      +	ret = da8xx_reserved_mem_init(pdev, drproc);
      +	if (ret) {
      +		dev_err(dev, "reserved memory init failed, ret = %d\n",
      +		        ret);
      +		goto free_rproc;
      +	}
      +
       	platform_set_drvdata(pdev, rproc);
       
       	/* everything the ISR needs is now setup, so hook it up */
      @@ -328,7 +492,7 @@
       					rproc);
       	if (ret) {
       		dev_err(dev, "devm_request_threaded_irq error: %d\n", ret);
      -		goto free_rproc;
      +		goto free_mem;
       	}
       
       	/*
      @@ -338,7 +502,7 @@
       	 */
       	ret = reset_control_assert(dsp_reset);
       	if (ret)
      -		goto free_rproc;
      +		goto free_mem;
       
       	drproc->chipsig = chipsig;
       	drproc->bootreg = bootreg;
      @@ -349,7 +513,7 @@
       	ret = rproc_add(rproc);
       	if (ret) {
       		dev_err(dev, "rproc_add failed: %d\n", ret);
      -		goto free_rproc;
      +		goto free_mem;
       	}
       
       	if (rproc_get_id(rproc) < 0)
      @@ -357,11 +521,11 @@
       
       	return 0;
       
      +free_mem:
      +	da8xx_reserved_mem_exit(dev, drproc);
       free_rproc:
       	rproc_free(rproc);
      -free_mem:
      -	if (dev->of_node)
      -		of_reserved_mem_device_release(dev);
      +
       	return ret;
       }
       
      @@ -379,9 +543,10 @@
       	disable_irq(drproc->irq);
       
       	rproc_del(rproc);
      +
      +	da8xx_reserved_mem_exit(dev, drproc);
      +
       	rproc_free(rproc);
      -	if (dev->of_node)
      -		of_reserved_mem_device_release(dev);
       
       	return 0;
       } 
    4. Rebuild the Linux kernel, kernel modules, DTB, IPC (both Linux and SYS/BIOS parts).
    5. Your custom resource table might look as follows:
      #include <ti/ipc/remoteproc/rsc_types.h>
      #include <xdc/runtime/SysMin.h>
      
      #ifndef DATA_SIZE
      #  define DATA_SIZE  (SZ_1M * 8)
      #endif
      
      #define RPMSG_VRING0_DA         0x80000000
      #define RPMSG_VRING1_DA         0x80002000
      
      #define RPMSG_VQ0_SIZE          8
      #define RPMSG_VQ1_SIZE          8
      
      /* flip up bits whose indices represent features we support */
      #define RPMSG_DSP_FEATURES      1
      
      struct my_resource_table {
          struct resource_table base;
      
          UInt32 offset[2];
      
          /* rpmsg vdev entry */
          struct fw_rsc_vdev rpmsg_vdev;
          struct fw_rsc_vdev_vring rpmsg_vring0;
          struct fw_rsc_vdev_vring rpmsg_vring1;
      
          /* trace entry */
          struct fw_rsc_trace trace;
      };
      
      extern Void* ti_trace_SysMin_Module_State_0_outbuf__A[];
      
      #define TRACEBUFADDR (UInt32)&ti_trace_SysMin_Module_State_0_outbuf__A
      #define TRACEBUFSIZE 0x1000
      
      #pragma DATA_SECTION(ti_ipc_remoteproc_ResourceTable, ".resource_table")
      #pragma DATA_ALIGN(ti_ipc_remoteproc_ResourceTable, 4096)
      
      struct my_resource_table ti_ipc_remoteproc_ResourceTable = {
          1, /* we're the first version that implements this */
          2, /* number of entries in the table */
          0, 0, /* reserved, must be zero */
          /* offsets to entries */
          {
              offsetof(struct my_resource_table, rpmsg_vdev),
              offsetof(struct my_resource_table, trace),
          },
      
          /* rpmsg vdev entry */
          {
              TYPE_VDEV, VIRTIO_ID_RPMSG, 0,
              RPMSG_DSP_FEATURES, 0, 0, 0, 2, { 0, 0 },
              /* no config data */
          },
          /* the two vrings */
          { RPMSG_VRING0_DA, 4096, RPMSG_VQ0_SIZE, 1, 0 },
          { RPMSG_VRING1_DA, 4096, RPMSG_VQ1_SIZE, 2, 0 },
      
          {
              TYPE_TRACE, TRACEBUFADDR, TRACEBUFSIZE, 0, "trace:dsp",
          },
      };
      
    6. IPC example ex02_messageq (extended with the custom resource table, see above) can be used to test the modified IPC.

    Hopefully, that would help someone. Even more I hope that TI would allow a flexible choice of IPC message buffers location in future releases of their SDKs for OMAP L138 and other multicore processors. And please add the cache coherence maintenance for the DSP part of the MessageQ!

    Best Regards, Pavel