HeapMem Assertion Failure

Ronny Jimenez

Hello,

Im developing communication between both ARM and DSPs of the KeyStone II device (TCI6638) using MessageQ and Shared Memory. The idea is to use only the MessageQ module to send a pointer from Shared Memory to another core.

For now, the ARM side just has the main which calls Ipc_start() and a dummy function which waits for a message from the DSP side.

THe DSP side has a Task and its main function has Ipc_start() as well, but the most important thing is that when I execute it the next error appears:

ti.sdo.ipc.heaps.HeapMem: line 948: assertion failure: A_noHeap: Region has no heap
xdcuntime.Error.raise: terminating execution

Reading in this forum I found someone post that is required to call Ipc_start() in order to initialized the shared memory regions objects. Therefore Im wondering why the error still appears, if I have in both sides (ARM, DSP) the call to Ipc_start(). I use another source file (helper.h which I include of course in the principal .c) in the DSP, to initialize the heaps and their values. So resuming I have in the main the Ipc_start(), then I create the Task, and inside this Task I call two functions (inside helper.h) that are in charge of the heaps initialization.

In my .cfg I also have this: BIOS.addUserStartupFunction("&IpcMgr_ipcStartup");
without this the Ipc_start() in the ARM side always fails.

Im using SYS/BIOS 6.37, IPC 3.00.04.29 and the MCSDK 3.00.03.15.

Any help regarding this issue is welcome. Thank you in advanced for it!

Best regards,

Ronny

over 10 years ago

Vincent W. over 10 years ago

TI__Genius 12865 points

Hi Ronny,

Are you running Linux on the ARM? Could you share your .cfg file(s)? I am wondering if you have configured SharedRegion correctly.

Thanks,

Vincent

Ronny Jimenez over 10 years ago in reply to Vincent W.

Intellectual 550 points

Vincent,

Thank you for your prompt reply.

Yes Im running Linux. The .cfg file for the DSPs is shown, as well as the section in the .c source file that tries to init the heap.

CFG:

/* ========================== XDC Runtime Configuration ======================= */
var Memory			= xdc.useModule("xdc.runtime.Memory");
var Diags			= xdc.useModule("xdc.runtime.Diags");
var System			= xdc.useModule("xdc.runtime.System");
var SysMin			= xdc.useModule("xdc.runtime.SysMin");
System.SupportProxy		= SysMin;


/* ============================ SYS/BIOS Configuration ========================= */
var BIOS			= xdc.useModule("ti.sysbios.BIOS");
var Semaphore			= xdc.useModule("ti.sysbios.knl.Semaphore");
var HeapMem			= xdc.useModule("ti.sysbios.heaps.HeapMem");
var HeapBuf			= xdc.useModule("ti.sysbios.heaps.HeapBuf");
var Cache 			= xdc.useModule("ti.sysbios.family.c66.Cache");
var Task			= xdc.useModule("ti.sysbios.knl.Task");
var Idle			= xdc.useModule("ti.sysbios.knl.Idle");
var Hwi 			= xdc.useModule("ti.sysbios.family.c64p.Hwi");
Hwi.enableException = true;


/* =============================== IPC Configuration ============================== */
var Ipc				= xdc.useModule("ti.sdo.ipc.Ipc");
var MessageQ  			= xdc.useModule("ti.sdo.ipc.MessageQ");
var HeapBufMP			= xdc.useModule("ti.sdo.ipc.heaps.HeapBufMP");
var SharedRegion 		= xdc.useModule("ti.sdo.ipc.SharedRegion");
var VirtioSetup 		= xdc.useModule("ti.ipc.transports.TransportRpmsgSetup");
MessageQ.SetupTransportProxy 	= xdc.module("ti.sdo.ipc.transports.TransportShmSetup");
var TransportRpmsg		= xdc.useModule("ti.ipc.transports.TransportRpmsg");
var VirtQueue			= xdc.useModule("ti.ipc.family.tci6638.VirtQueue");
var Interrupt			= xdc.useModule("ti.ipc.family.tci6638.Interrupt");
var NsRemote 			= xdc.useModule("ti.ipc.namesrv.NameServerRemoteRpmsg");
var Resource 			= xdc.useModule("ti.ipc.remoteproc.Resource");


/* =============================== UTILS Configuration ============================== */
var NameServer 			= xdc.useModule("ti.sdo.utils.NameServer");
var MultiProc 			= xdc.useModule("ti.sdo.utils.MultiProc");
NameServer.SetupProxy = NsRemote;


xdc.loadPackage("ti.ipc.ipcmgr");
BIOS.addUserStartupFunction("&IpcMgr_ipcStartup");

Idle.addFunc("&VirtQueue_cacheWb");

BIOS.heapSize = 0x40000;

Task.deleteTerminatedTasks = true;

Diags.setMaskMeta("ti.ipc.transports.TransportRpmsg",
Diags.INFO|Diags.USER1|Diags.STATUS, Diags.ALWAYS_ON);
Diags.setMaskMeta("ti.ipc.namesrv.NameServerRemoteRpmsg", Diags.INFO,
Diags.ALWAYS_ON);

VirtioSetup.common$.diags_INFO = Diags.ALWAYS_ON;
Setup;

/* Note: MultiProc_self is set during VirtQueue_init based on DNUM. */

MultiProc.setConfig(null, ["HOST", "CORE0", "CORE1", "CORE2", "CORE3",
                       "CORE4", "CORE5", "CORE6", "CORE7"]);
		       
var params		= new HeapBuf.Params;
params.align		= 8;
params.blockSize	= 512;
params.numBlocks	= 512;
var msgHeap		= HeapBuf.create(params);

MessageQ.registerHeapMeta(msgHeap, 0);

Cache.setMarMeta(0xA0000000, 0x1FFFFFF, 0);

Program.global.sysMinBufSize = 0x8000;
SysMin.bufSize  =  Program.global.sysMinBufSize;

/* Enable Memory Translation module that operates on the Resource Table */
Resource.loadSegment = Program.platform.dataMemory;


/* Shared Memory base address and length */
var SHAREDMEM           = 0x0C000000;
var SHAREDMEMSIZE       = 0x00500000;

SharedRegion.setEntryMeta(0,
    { base: SHAREDMEM, 
      len:  SHAREDMEMSIZE,
      ownerProcId: 0,
      isValid: true,
      name: "SR0",
    });
        
		       
/* All sections are being placed in local memory. */
Program.sectMap["sharedL2"]		= "L2SRAM";
Program.sectMap["systemHeap"]		= "L2SRAM";
//Program.sectMap["systemHeap"]		= Program.platform.stackMemory;
Program.sectMap[".sysmem"]		= "L2SRAM";
Program.sectMap[".args"]		= "L2SRAM";
Program.sectMap[".cio"]			= "L2SRAM";
Program.sectMap[".far"]			= "L2SRAM";
Program.sectMap[".rodata"]		= "L2SRAM";
Program.sectMap[".neardata"]		= "L2SRAM";
Program.sectMap[".init_array"]		= "L2SRAM";
Program.sectMap[".bss"]			= "L2SRAM";
Program.sectMap[".code"]		= "L2SRAM";
Program.sectMap[".data"]		= "L2SRAM";
Program.sectMap[".fardata"]		= "L2SRAM";
Program.sectMap[".args"]		= "L2SRAM";
Program.sectMap[".cio"]			= "L2SRAM";
Program.sectMap[".plt"]			= "L2SRAM";
Program.sectMap[".vecs"]		= "L2SRAM";
Program.sectMap["platform_lib"]		= "L2SRAM";
Program.sectMap[".far:taskStackSection"]= "L2SRAM";
Program.sectMap[".stack"]		= "L2SRAM";
Program.sectMap[".text"]		= "L2SRAM";



.c:

 HeapBufMP_Params    heapBufParams;
 HeapBufMP_Handle    heapHandle;
 HeapBufMP_Params_init(&heapBufParams);
 heapBufParams.regionId       = 0;
 heapBufParams.name           = DDR_HEAP;
 heapBufParams.numBlocks      = 128;
 heapBufParams.blockSize      = SIZE_DDR_HEAP_SHARED;
 heapHandle = HeapBufMP_create(&heapBufParams);
 if (heapHandle == NULL) 
 {
   System_abort("HeapBufMP_create failed\n" );
 }
 else
 {
   System_printf("Heap in memory address %p created \n",heapHandle);
 }

Vincent W. over 10 years ago in reply to Ronny Jimenez

TI__Genius 12865 points

Hi Ronny,

Could you try to change the owner of SR0 to be 1:

SharedRegion.setEntryMeta(0,
    { base: SHAREDMEM, 
      len:  SHAREDMEMSIZE,
      ownerProcId: 1,
      isValid: true,
      name: "SR0",
    });

This is because HOST is 0 and it does not support SharedRegion. So one of the DSPs must own it instead for the module to be properly initialized.

Best regards,
Vincent

Ronny Jimenez over 10 years ago in reply to Vincent W.

Intellectual 550 points

Vincent W. said:

Hi Ronny,

Could you try to change the owner of SR0 to be 1:

SharedRegion.setEntryMeta(0,
    { base: SHAREDMEM, 
      len:  SHAREDMEMSIZE,
      ownerProcId: 1,
      isValid: true,
      name: "SR0",
    });

This is because HOST is 0 and it does not support SharedRegion. So one of the DSPs must own it instead for the module to be properly initialized.

Best regards,
Vincent

Vincent,

I changed that but unfortunatelly it didnt work. Any other suggestion?

Thanks

Ronny

Ronny Jimenez over 10 years ago in reply to Vincent W.

Intellectual 550 points

Vincent W. said:

Hi Ronny,

Could you try to change the owner of SR0 to be 1:

SharedRegion.setEntryMeta(0,
    { base: SHAREDMEM, 
      len:  SHAREDMEMSIZE,
      ownerProcId: 1,
      isValid: true,
      name: "SR0",
    });

This is because HOST is 0 and it does not support SharedRegion. So one of the DSPs must own it instead for the module to be properly initialized.

Best regards,
Vincent

I create the heap using for example something like this:

   HeapMemMP_Params    heapMemMPParams;

    HeapMemMP_Params_init(&heapMemMPParams);
    heapMemMPParams.name            = SL2SRAM_HEAP;
    heapMemMPParams.regionId        = 1;
    heapMemMPParams.gate            = NULL;
    heapMemMPParams.sharedBufSize   = SIZE_SL2SRAM_HEAP_SHARED;
    sl2sramHeapHandle = HeapMemMP_create(&heapMemMPParams);
    if (sl2sramHeapHandle == NULL)
    {
        System_abort("HeapMemMP_create failed for SL2SRAM \n");
    }

Vincent W. over 10 years ago in reply to Ronny Jimenez

TI__Genius 12865 points

Hi Ronny,

Just to be clear, your code on the DSP (core 0) crashed when it hit the assertion during Ipc_start(), so you never reach this heap creation code at all. Is my understanding correct?

Do you intend to have communication between the DSPs, or do you simply want IPC between ARM and individual DSPs? In the former case you do not need SharedRegions, and you do not need to call Ipc_start() on the DSPs. Only if you need communication between DSPs would you need SharedRegions. SharedRegion is not supported on the host CPU.

-Vincent

Ronny Jimenez over 10 years ago in reply to Vincent W.

Intellectual 550 points

Vincent W. said:

Hi Ronny,

Just to be clear, your code on the DSP (core 0) crashed when it hit the assertion during Ipc_start(), so you never reach this heap creation code at all. Is my understanding correct?

Do you intend to have communication between the DSPs, or do you simply want IPC between ARM and individual DSPs? In the former case you do not need SharedRegions, and you do not need to call Ipc_start() on the DSPs. Only if you need communication between DSPs would you need SharedRegions. SharedRegion is not supported on the host CPU.

-Vincent

Vincent,

I think it is not correct at all. I do the Ipc_start() in both ARM and DSPs, and it success without problems in all cores (Host and the 8 DSPs). The problem appears whenever I try to create the heap, either inmediately after calling Ipc_start() or after 50 lines of code for example. I was wondering if the way Im initializing the heaps are ok? Or If I should use another module, for example HeapBufMP?

I actually need communication between ARM-DSPs and DSPs-DSPs, so I really need those shared regions, the question is how to create the shared region correctly.

Let me know what you think please.

Thank you for your help!

Regards,

Ronny

Vincent W. over 10 years ago in reply to Ronny Jimenez

TI__Genius 12865 points

Hi Ronny,

OK now I understand better what you are seeing. I was confused by your original post which seemed to say you hit an assertion in the context of Ipc_start(). If the assertion is happening in the context of your own HeapMemMP_create() call, it is probably because you are setting

heapMemMPParams.regionId = 1;

Yet in the .cfg file I only see SharedRegion.setEntryMeta() being called to create SR 0. You need to create SR 1 before you can create a HeapMemMP instance from it.

Best regards,

Vincent

Ronny Jimenez over 10 years ago in reply to Vincent W.

Intellectual 550 points

Vincent W. said:

Hi Ronny,

OK now I understand better what you are seeing. I was confused by your original post which seemed to say you hit an assertion in the context of Ipc_start(). If the assertion is happening in the context of your own HeapMemMP_create() call, it is probably because you are setting

heapMemMPParams.regionId = 1;

Yet in the .cfg file I only see SharedRegion.setEntryMeta() being called to create SR 0. You need to create SR 1 before you can create a HeapMemMP instance from it.

Best regards,

Vincent

Vincent,

First thanks for your answer.

Second, In my .cfg file I have 3 shared regions: SR0, SR1, and SR2. In the earlier .cfg I posted these were not included, but you can see them below. I already tested with ownerProcId in 0 and 1, but without success, and also with the SharedRegion.Translatein true and false. As you can see at the end there are some memory sections created by my own using the Platform Wizard of the CCS. Nevertheless, the shared regions dont initialize using neither the original or the customized (at the end the customized is exactly the same, just that I split the memories in different sections, i.e DDR3_0, DDR3_1 and so forth....) thats why the shared regions SHAREDMEMX have different memory aligns.

Could be this a problem of the IPC 3.x itself? This same implementation worked perfectly with IPC 1.24. That is why now I dont understand what is missing or what is wrong. Any further idea?

Thank you so much Vincent

/* ========================== XDC Runtime Configuration ======================= */
var Memory			= xdc.useModule("xdc.runtime.Memory");
var Diags			= xdc.useModule("xdc.runtime.Diags");
var System			= xdc.useModule("xdc.runtime.System");
var SysMin			= xdc.useModule("xdc.runtime.SysMin");
System.SupportProxy		= SysMin;


/* ============================ SYS/BIOS Configuration ========================= */
var BIOS			= xdc.useModule("ti.sysbios.BIOS");
var Semaphore			= xdc.useModule("ti.sysbios.knl.Semaphore");
var HeapMem			= xdc.useModule("ti.sysbios.heaps.HeapMem");
var HeapBuf			= xdc.useModule("ti.sysbios.heaps.HeapBuf");
var Cache 			= xdc.useModule("ti.sysbios.family.c66.Cache");
var Task			= xdc.useModule("ti.sysbios.knl.Task");
var Idle			= xdc.useModule("ti.sysbios.knl.Idle");
var Hwi 			= xdc.useModule("ti.sysbios.family.c64p.Hwi");
Hwi.enableException = true;


/* =============================== IPC Configuration ============================== */
var Ipc				= xdc.useModule("ti.sdo.ipc.Ipc");
var MessageQ  			= xdc.useModule("ti.sdo.ipc.MessageQ");
HeapMemMP			= xdc.useModule("ti.sdo.ipc.heaps.HeapMemMP");
var HeapBufMP			= xdc.useModule("ti.sdo.ipc.heaps.HeapBufMP");
var SharedRegion 		= xdc.useModule("ti.sdo.ipc.SharedRegion");
GateMP				= xdc.useModule("ti.sdo.ipc.GateMP");
var VirtioSetup 		= xdc.useModule("ti.ipc.transports.TransportRpmsgSetup");
MessageQ.SetupTransportProxy 	= xdc.module("ti.sdo.ipc.transports.TransportShmSetup");
var TransportRpmsg		= xdc.useModule("ti.ipc.transports.TransportRpmsg");
var VirtQueue			= xdc.useModule("ti.ipc.family.tci6638.VirtQueue");
var Interrupt			= xdc.useModule("ti.ipc.family.tci6638.Interrupt");
var NsRemote 			= xdc.useModule("ti.ipc.namesrv.NameServerRemoteRpmsg");
var Resource 			= xdc.useModule("ti.ipc.remoteproc.Resource");


/* =============================== UTILS Configuration ============================== */
var NameServer 			= xdc.useModule("ti.sdo.utils.NameServer");
var MultiProc 			= xdc.useModule("ti.sdo.utils.MultiProc");
NameServer.SetupProxy = NsRemote;


xdc.loadPackage("ti.ipc.ipcmgr");
BIOS.addUserStartupFunction("&IpcMgr_ipcStartup");

Idle.addFunc("&VirtQueue_cacheWb");

BIOS.heapSize = 0x40000;

Task.deleteTerminatedTasks = true;


Diags.setMaskMeta("ti.ipc.transports.TransportRpmsg",
Diags.INFO|Diags.USER1|Diags.STATUS, Diags.ALWAYS_ON);
Diags.setMaskMeta("ti.ipc.namesrv.NameServerRemoteRpmsg", Diags.INFO,
Diags.ALWAYS_ON);

VirtioSetup.common$.diags_INFO = Diags.ALWAYS_ON;

/* Note: MultiProc_self is set during VirtQueue_init based on DNUM. */

MultiProc.setConfig(null, ["HOST", "CORE0", "CORE1", "CORE2", "CORE3",
                       "CORE4", "CORE5", "CORE6", "CORE7"]);
		       
var params		= new HeapBuf.Params;
params.align		= 8;
params.blockSize	= 512;
params.numBlocks	= 512;
var msgHeap		= HeapBuf.create(params);

MessageQ.registerHeapMeta(msgHeap, 0);

Cache.setMarMeta(0xA0000000, 0x1FFFFFF, 0);

Program.global.sysMinBufSize = 0x8000;
SysMin.bufSize  =  Program.global.sysMinBufSize;

/* Enable Memory Translation module that operates on the Resource Table */
Resource.loadSegment = Program.platform.dataMemory;


/* Shared Memory base address and length */
var SHAREDMEM0           = 0x0C400000;
var SHAREDMEMSIZE0       = 0x000F0000;
var SHAREDMEM1           = 0x0C4F0000;
var SHAREDMEMSIZE1       = 0x00080000;
var SHAREDMEM2           = 0xEB000000;
var SHAREDMEMSIZE2       = 0x06000000;

SharedRegion.numEntries = 3;
SharedRegion.translate = true;
SharedRegion.setEntryMeta(0,
    { base: SHAREDMEM0, 
      len:  SHAREDMEMSIZE0,
      ownerProcId:0,
      isValid: true,
      cacheLineSize: 128,
      name: "SR0",
    });   
SharedRegion.setEntryMeta(1,
    { base: SHAREDMEM1, 
      len:  SHAREDMEMSIZE1,
      ownerProcId: 0,
      isValid: true,
      cacheLineSize: 128,
      name: "SR1",
    });   
SharedRegion.setEntryMeta(2,
    { base: SHAREDMEM2, 
      len:  SHAREDMEMSIZE2,
      ownerProcId: 0,
      isValid: true,
      cacheEnable: true,
      createHeap: true,
      cacheLineSize: 128,
      name: "SR2",
    });    
       
Program.sectMap[".dataSL2SRAM"]			= "SL2SRAM_0";
Program.sectMap[".dataDDR"]			= "DDR3_0";
Program.sectMap[".text"]			= "DDR3_0";

Program.sectMap[".dataLL2SRAM"]			= "L2SRAM";
Program.sectMap[".dataDDRShared"]		= "DDR3_SHARED";
Program.sectMap[".dataSL2SRAMShared"]		= "SL2SRAM_SHARED";

Vincent W. over 10 years ago in reply to Ronny Jimenez

TI__Genius 12865 points

Hi Ronny,

First of all, please use ownerProcId other than 0 for your SRs. In IPC 3.x, SharedRegion is not supported on the host CPU, so using ownerProcId=0 will invariably lead to problems even if it doesn't solve your issue at hand.

Meanwhile, have you tried the dual_transport IPC test example from IPC 3.x for Keystone II? I don't have a Keystone II board handy, but it'd be interesting to see if you can 1. run that test example and 2. modify it to call HeapMemMP_create like you are trying to do. Comparing a working example with your code may be a good approach to isolate what you may be doing different. To run the test, rebuild IPC (both linux and bios binaries) for Keystone II, and refer to the "Installing Tests" and "Running tests" sections in the IPC Install Guide: http://processors.wiki.ti.com/index.php/IPC_Install_Guide_Linux#Installing_Tests

http://processors.wiki.ti.com/index.php/IPC_Install_Guide_Linux#Running_Test_Applications

Run the application binary "MessageQApp" on the host after loading the slave cores with "dual_transports.xe66".

Hopefully you will see that the example works with your HeapMemMP_create call (be sure to allocate from SR0 since the test only creates region 0), and then you can compare it with your application. The source code for the slave cores is in packages\ti\ipc\tests\dual_transport.c. If things do not work out, let us know and we'll go from there.

Best regards,

Vincent

Ronny Jimenez over 10 years ago in reply to Vincent W.

Intellectual 550 points

Vincent,

I was able to run the Dual Transports example, nevertheless this uses HeapBufMP instead of HeapMemMP.

Just for now I used the HeapBufMP in my project and the three shared regions were initialized without problems. Now, the problem I have is the virtio_rpmsg_bus tries to initialize twice the channels. For example, in the console in the first moment when I load the binaries into the DSPs the following appears:

load succeeded
[ 224.447040] remoteproc0: powering up 2620040.dsp0
[ 224.2493] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 224.452544] virtio_rpmsg_bus virtio0: creating channel rpmsg-proto dr 0x3d
[ 224.452630] rpmsg_proto rpmsg0: inserting rpmsg sr 1024, dst: 61
[ 224.467973] remoteproc0: registered virtio0 (type 7)
run succeeded

This happens obviously in the 8 DSPs, but after that, something or somehow tries to do it again, so this appears:

[ 225.830748] virtio_rpmsg_bus virtio0: creating channel rpmsg-proto addr 0x3d
[ 225.83681 virtio_rpmsg_bus virtio0: channel rpmsg-proto:ffffffff:3d alrdy exist
[ 225.843636] virtio_rpmsg_bus virtio0: __rpmsg_create_channel failed
[ 225.848781] virtio_rpmsg_bus virt0: creating channel rpmsg-proto addr 0x3d
[ 22553968] virtio_rpmsg_bus virtio0: __rpmsg_create_channel failed

And that happens for all the processors again. The behavior is undesired and sometimes it happens and sometimes not, so it is very unstable. The DSP binaries run good in the first round, but then is when the above lines appear.

Recalling, I call Ipc_start() in the ARM linux side, and in the DSPs as well, and also I have the line

BIOS.addUserStartupFunction("&IpcMgr_ipcStartup");

in the .cfg file.

Any clue about this behavior?

Regards,

Ronny

Ramsey over 10 years ago in reply to Ronny Jimenez

TI__Genius 12025 points

Roony,

Are you configuring and building one DSP image which is loaded on all the DSP processors or do you have different images for each DSP?

I'm looking to find an IPC+Keystone expert as I don't have any knowledge of Keystone. I'll let you know what I find.

~Ramsey

Ronny Jimenez over 10 years ago in reply to Ramsey

Intellectual 550 points

Dear Ramsey,

Im using the same image for all the DSPs. Actually they all execute in the right way, but when each one finishes its execution, the virtio-rpmsg problem appears.

Thanks for your help!

Ronny

Ramsey over 10 years ago in reply to Ronny Jimenez

TI__Genius 12025 points

Roony,

So each executable runs to completion and then you observe the virtio channels being initialized a second time. Have I understood this correctly?

What does each executable to when it completes? Do the tasks terminate or simply block forever? Is the idle task running? Does the host reset the DSP and turn off clocks?

Might there be an exception during the shutdown phase which causes the DSP to restart?

~Ramsey

Robert Tivy over 10 years ago in reply to Ronny Jimenez

TI__Mastermind 18260 points

Ronny Jimenez said:
[ 225.830748] virtio_rpmsg_bus virtio0: creating channel rpmsg-proto addr 0x3d
[ 225.83681 virtio_rpmsg_bus virtio0: channel rpmsg-proto:ffffffff:3d alrdy exist0312.0001-virtio_rpmsg_bus-Fix-issues-related-to-invalidating-.txt
[ 225.843636] virtio_rpmsg_bus virtio0: __rpmsg_create_channel failed

Ronny,

We recently discovered a similar issue during validation of a single-DSP Keystone II device (K2E). While the issue is not directly related to happening when only one DSP core is loaded, it seemed to happen much more frequently when loading only one Keystone DSP.

We created a Linux kernel patch for this that appears to fix the issue (attached). Would you be able to apply this to your kernel and report your results back here?

You can apply it with either:
    - git apply <patchfile>
or
    - patch -p1 < <patchfile>
to make the change without commiting it, or you can:
    - git am <patchfile>
to apply and commit the change to your local repo.

Thanks & Regards,

- Rob

Ronny Jimenez over 10 years ago in reply to Robert Tivy

Intellectual 550 points

Rob,

Sorry the delay. I will apply this and report the results as soon as possible.

Just to be sure enough, the <patchfile> is the file with the name 0312.0001-virtio_rpmsg_bus-Fix-issues-related-to-invalidating- that you attached right? And the directory in which I have to apply the patch is the "kernel" directory inside the linux-keystone repo right?

Thank you for your help.

Ronny

Robert Tivy over 10 years ago in reply to Ronny Jimenez

TI__Mastermind 18260 points

Ronny Jimenez said:
Just to be sure enough, the <patchfile> is the file with the name 0312.0001-virtio_rpmsg_bus-Fix-issues-related-to-invalidating- that you attached right

Yup, it got kind of buried in my reply, but that's the one.

Ronny Jimenez said:
And the directory in which I have to apply the patch is the "kernel" directory inside the linux-keystone repo right?

The patch file should be applied from the top of the linux-keystone repo.

Regards,

- Rob

Marko Moberg over 9 years ago in reply to Robert Tivy

Intellectual 550 points

Hi,

I am having a similar issue with mcsdk-03.00.04.18 based Linux image and Keystone II device. In my case the issue seems to be related to CMEM_alloc2() called from ARM side app.

I have configured Linux kernel to allocate 256MB for CMA and I am using CMEM_alloc2() to reserve most of it for CMEM usage (example below where heapSize is e.g. 0xf00 0000)

params.type = CMEM_HEAP;
params.flags = CMEM_CACHED;
params.alignment = 128;
size_t heapSize = CMA_HEAP_SIZE;
d->pHeap = CMEM_alloc2(CMEM_CMABLOCKID, heapSize, &params);

After the CMEM_alloc2() call the kernel starts to spit out the error messages from each core:

[ 70.038929] virtio_rpmsg_bus virtio6: __rpmsg_create_channel failed

[ 70.055582] virtio_rpmsg_bus virtio6: creating channel rpmsg-proto addr 0x3d

[ 70.061338] virtio_rpmsg_bus virtio6: channel rpmsg-proto:ffffffff:3d already

The same Dsp is loaded and running on each core prior to CMEM_alloc2 call. The strange thing is that if I decrease the heap size used in CMEM_alloc2 the problem does not occur.

CMA start address in my case is in DDR3 at 0x9fb0 0000 and mem_reserve for DSP is 512MB at the end of DDR3 (from 0xe000 0000 onwards).

I wonder if your patch would also fix the issue in my case or is this perhaps something else?

regards,

Marko

P.S. I tried the changes in the patch but I am still getting the failure in rpmsg create channel.

Robert Tivy over 9 years ago in reply to Marko Moberg

TI__Mastermind 18260 points

Marko Moberg said:

I have configured Linux kernel to allocate 256MB for CMA and I am using CMEM_alloc2() to reserve most of it for CMEM usage (example below where heapSize is e.g. 0xf00 0000)

While it may appear to successfully reserve a 256MB block for CMA, I'm not sure that CMA can actually handle large allocations.

One of the issues might be the VM space available. CMA creates a kernel mapping for the allocated buffer, and there's only so much kernel VM space available.

Another alternative you could try is to use CMEM_allocPhys2() along with CMEM_map(). Those 2 together are equivalent to CMEM_alloc2(), but they separate the allocation from the mmap()ing. If the allocation works but the mmap fails then you can mmap() less that the full buffer (the intent of this API split is to relieve pressure on the user VM space by CMEM_map()ping smaller portions of the big buffer). If you don't need a user mapping then you don't have to call CMEM_map() at all.

Another thing that might shed some light is to use a "debug" cmemk.ko kernel module. This will print lots of information to the Linux console, but you can feel free to post it back here and I will take a look at it. You can make a debug version by issuing "make debug" in the cmem module directory.

Marko Moberg said:

CMA start address in my case is in DDR3 at 0x9fb0 0000 and mem_reserve for DSP is 512MB at the end of DDR3 (from 0xe000 0000 onwards).

Doesn't DDR start at 0x80000000? 512MB from the beginning of DDR would be 0xa0000000.

How much memory does your Keystone device have?

Marko Moberg said:

I wonder if your patch would also fix the issue in my case or is this perhaps something else?

As you have found, this patch would not fix your problem. The patch addresses an issue that is exposed after transfering 256 rpmsg buffers from the remote core to the Linux host.

Regards,

- Rob

Marko Moberg over 9 years ago in reply to Robert Tivy

Intellectual 550 points

Hi Rob,

Didn't manage to spend too much time on this today but here are the latest findings. I split the CMEM_alloc2() into CMEM_allocPhys2() and CMEM_map() but CMEM_map seems to be acting crazy:

CMEM_AllocParams params;
params.type = CMEM_HEAP;
params.flags = CMEM_CACHED;
params.alignment = 128;
size_t heapSize = CMA_HEAP_SIZE;
off_t physaddr = CMEM_allocPhys2(CMEM_CMABLOCKID, heapSize, &params);

// physaddr is 0x9fb0 0000 at this point

d->pHeap = CMEM_map(physaddr,0x1234);

// CMEM_map() return null pointer (0x0) and the following error message:

// CMEM Error: map: Failed to mmap buffer at physical address 0x12349fb00000

For some reason CMEM_map() concatenates the size argument with the physical address?! Any idea what's going on? And another thing regarding my previous post; what is the relationship between the memory allocation (CMA/CMEM) and the rpmsg message failures I am seeing?

Marko

Robert Tivy over 9 years ago in reply to Marko Moberg

TI__Mastermind 18260 points

Marko Moberg said:

d->pHeap = CMEM_map(physaddr,0x1234);

// CMEM_map() return null pointer (0x0) and the following error message:

// CMEM Error: map: Failed to mmap buffer at physical address 0x12349fb00000

For some reason CMEM_map() concatenates the size argument with the physical address?! Any idea what's going on?

This feels like you're using a cmem library that was built using a 64-bit size for the off_t type (the type of the 1st parameter to CMEM_map()) yet linking that library with code built using a 32-bit size for off_t.

Can you add
-D_FILE_OFFSET_BITS=64
to your application C file build?

Take a look in <ludev>/src/cmem/api at the Makefile to see if it has that. Or perhaps I have it backwards and you're app is using 64 yet your library isn't? They need to match the setting of _FILE_OFFSET_BITS.

Marko Moberg said:
And another thing regarding my previous post; what is the relationship between the memory allocation (CMA/CMEM) and the rpmsg message failures I am seeing?

Not much relationship there, except that both rpmsg and CMEM are using CMA memory. But rpmsg has its own dedicated CMA block, whereas the CMEM API that you're calling is using the global CMA area.

Regards,

- Rob

Marko Moberg over 9 years ago in reply to Robert Tivy

Intellectual 550 points

Thanks for the tip. After adding the flag into our application build environment the problem disappeared. However, for some reason it causes our malloc routines to fail but that is probably another story which I need to debug next. We didn't see this issue when we used CMEM_alloc2() without _FILE_OFFSET_BITS=64 setting.

Are there any examples of CMA + CMEM usage somewhere? What's the main difference or clear pros/cons between the following two allocation schemes:

1) Allocate CMA from kernel + use CMEM allocation (alloc2) with CMEM_CMABLOCKID in application

2) Define heap memory when loading driver (insmod cmemk.ko) + use CMEM allocation in application

For number 2 we ca assume that Linux memory is defined in u-boot in such a way that it is not overlapping with cmemk.ko command line memory definitions.

Marko

Robert Tivy over 9 years ago in reply to Marko Moberg

TI__Mastermind 18260 points

Marko Moberg said:
We didn't see this issue when we used CMEM_alloc2() without _FILE_OFFSET_BITS=64 setting

With CMEM_alloc2(), there are internal variables that are self-consistent with the type sizes. In other words, the off_t type is 64-bits all within the cmem.c code. When you call CMEM_map() w/o FILE_OFFSET_BITS=64 then your off_t type is 32 bits and is being passed to code that is interpreting it as 64 bits.

Marko Moberg said:

Are there any examples of CMA + CMEM usage somewhere? What's the main difference or clear pros/cons between the following two allocation schemes:

1) Allocate CMA from kernel + use CMEM allocation (alloc2) with CMEM_CMABLOCKID in application

2) Define heap memory when loading driver (insmod cmemk.ko) + use CMEM allocation in application

There's just the <ludev>/src/cmem/tests/apitest.c file that uses CMABLOCKID.

The benefit to using CMA is illustrated by what you *don't* have to do to not use it, i.e., your case 2). With CMA you don't have to carve out memory away from Linux (and we don't have to tell users how to do that).

Drawbacks to using CMA include:
    - don't know how much is available to CMEM, since any kernel code can allocate from the global CMA pool.
    - also due to other allocators, CMA memory can fragment.
    - you have to inform the remote core of the phys addr of the CMA memory, whereas with a carve out you can somewhat know beforehand what memory addresses to use (if you manage it closely).

Regards,

- Rob

Robert Tivy over 9 years ago in reply to Marko Moberg

TI__Mastermind 18260 points

Marko Moberg said:
However, for some reason it causes our malloc routines to fail but that is probably another story which I need to debug next

Forgot to answer this part...

Keep in mind that there is only so much virt addr space available to your user program. If you use a bunch of the virt addr space with a big CMEM_map() then there may not be much available for malloc.

Regards,

- Rob

Marko Moberg over 9 years ago in reply to Robert Tivy

Intellectual 550 points

I continued playing around with CMA allocations but it is quite mysterious. I also tried the combination of mem=512M@0x80000000 in kernel command line at startup and cmemk.ko with physical address boundaries. It seems to working like a charm.

Here are some pieces of info about the CMA which I am still struggling with:

16MB CMA allocation (kernel defconfig)

CMEM alloc size: 0xd00000 (13MB) CMEM_allocPhys2 ok (heap start address (virt)= 0xb6f775ec)

CMEM alloc size: 0xe00000 onwards fail (CMEM Error: allocHeap: ioctl CMEM_IOCALLOCHEAPCACHED failed: -1)

256MB CMA allocation (kernel defconfig)

0x500000 (5MB) CMEM_allocPhys2 ok

0x600000 onwards make rpmsg messages go crazy:

[ 70.038929] virtio_rpmsg_bus virtio6: __rpmsg_create_channel failed

[ 70.055582] virtio_rpmsg_bus virtio6: creating channel rpmsg-proto addr 0x3d

[ 70.061338] virtio_rpmsg_bus virtio6: channel rpmsg-proto:ffffffff:3d already

So it seems that the mapping is not the issue but the allocation itself. I must be missing some essential piece of something from somewhere because the current behavior doesn’t make much sense; The more CMA I make available, the less memory can be allocated by CMEM.

By the way, which address space does CMA reservation (2e800000) at kernel boot refer to?

[ 0.000000] cma: CMA: reserved 16 MiB at 2e800000

[ 0.000000] Virtual kernel memory layout:

[ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)

[ 0.000000] fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)

[ 0.000000] vmalloc : 0xf0000000 - 0xff000000 ( 240 MB)

[ 0.000000] lowmem : 0xc0000000 - 0xef800000 ( 760 MB)

[ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB)

[ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB)

[ 0.000000] .text : 0xc0008000 - 0xc072da80 (7319 kB)

[ 0.000000] .init : 0xc072e000 - 0xc077ee40 ( 324 kB)

[ 0.000000] .data : 0xc0780000 - 0xc07cb888 ( 303 kB)

[ 0.000000] .bss : 0xc07cb888 - 0xc07fadf4 ( 190 kB)

regards

Marko

Robert Tivy over 9 years ago in reply to Marko Moberg

TI__Mastermind 18260 points

Marko Moberg said:

16MB CMA allocation (kernel defconfig)

CMEM alloc size: 0xd00000 (13MB) CMEM_allocPhys2 ok (heap start address (virt)= 0xb6f775ec)

CMEM alloc size: 0xe00000 onwards fail (CMEM Error: allocHeap: ioctl CMEM_IOCALLOCHEAPCACHED failed: -1)

Probably some other kernel driver is allocating from global CMA, apparently leaving less than 14 MB available to CMEM allocations. I suspect it is the DMA subsystem - when booting my Keystone board I see the following:
[ 0.109588] DMA: preallocated 256 KiB pool for atomic coherent allocations
which comes from CMA memory, and there could be other allocators in your system.

Marko Moberg said:

256MB CMA allocation (kernel defconfig)

0x500000 (5MB) CMEM_allocPhys2 ok

0x600000 onwards make rpmsg messages go crazy:

[   70.038929] virtio_rpmsg_bus virtio6: __rpmsg_create_channel failed

[   70.055582] virtio_rpmsg_bus virtio6: creating channel rpmsg-proto addr 0x3d

[   70.061338] virtio_rpmsg_bus virtio6: channel rpmsg-proto:ffffffff:3d already

So it seems that the mapping is not the issue but the allocation itself. I must be missing some essential piece of something from somewhere because the current behavior doesn’t make much sense; The more CMA I make available, the less memory can be allocated by CMEM.

Agreed, this doesn't make a whole lot of sense. I don't know what's going on with the rpmsg.

CMA does not currently handle large memories well. I can only suggest that you use the "keep away from Linux" approach and manually define the phys_start/phys_end on the cmemk.ko modprobe command line.

Marko Moberg said:

By the way, which address space does CMA reservation (2e800000) at kernel boot refer to?

[ 0.000000] cma: CMA: reserved 16 MiB at 2e800000

Keystone can have the LPAE memory layout (Large Physical Address Extension), in which the actual memory address space is 36 bits, allowing for more than 4 GB of RAM. The CMA code that prints the physical address above does not conform to the larger address space and prints just 32 bits. But I don't think it's just chopping off the high order digit, but instead printing a 32-bit "alias" that exists for the 1st 4 GB of the RAM. However, I don't really know what that alias is - what I can tell you is that when I was last working on this stuff the 36-bit phys addr of 0x81f000000 was aliased to a 32-bit address of 0x9f000000, but that doesn't explain your 0x2e800000.

Regards,

- Rob

Marko Moberg over 9 years ago in reply to Robert Tivy

Intellectual 550 points

Hi Rob,

Thanks for the comments. I guess we will move forward with the manual definition of phys_start/phys_end. One additional question though:

I read from http://processors.wiki.ti.com/index.php/MCSDK_UG_Chapter_Exploring#LPAE that 2GB of DDR3A aliased address base (0x8000 0000 - 0xffff ffff) is not cache coherent. Based on that info:

1) Does cmemk.ko support address space which is larger than 32-bits i.e. can I reserve and use a memory region with e.g. phys_start=0x8c0000000? (at least it seemed to work after a quick try)

2) If I am using non-aliased DDR3A address space e.g. 0x8c0000000, does it mean that I don't have to worry about cache writeback & invalidate on ARM side? What about DSP accesses? Are they cache coherent in those "real" DDR3A addresses?

We are planning on using a section of a DDR3A bank as a shared heap between ARM and DSP so it would be nice if we didn't have to worry about the manual cache writeback and invalidate.

regards,

Marko

Robert Tivy over 9 years ago in reply to Marko Moberg

TI__Mastermind 18260 points

I don't know much about cache coherence on the ARM, and I have not heard that DDR has it, but I am far from authoritative on this aspect.

Marko Moberg said:

1) Does cmemk.ko support address space which is larger than 32-bits i.e. can I reserve and use a memory region with e.g. phys_start=0x8c0000000? (at least it seemed to work after a quick try)

Yes, when your Linux kernel (to which you point your cmemk.ko build) is configured with LPAE then the appropriate type sizes used within cmemk.c adjust to > 32 bit sizes.

Marko Moberg said:

2) If I am using non-aliased DDR3A address space e.g. 0x8c0000000, does it mean that I don't have to worry about cache writeback & invalidate on ARM side? What about DSP accesses? Are they cache coherent in those "real" DDR3A addresses?

The CMEM_alloc* APIs have a CMEM_AllocParams parameter with a flags element which you can set to CMEM_CACHED or CMEM_NONCACHED. I don't think there is any inherent cache coherence with DDR, but I could be wrong. If you allocated cached memory then you will need to perform cache maintainence when accessing from the ARM. CMEM does provide cache APIs with which you can do the cache operations.

DSP side cacheability is controlled exclusively through the MAR registers. If a 16 MB MAR region is marked as cacheable then you will need to do invalidate before read and writeback/invalidate after writes. The DSp has no cache coherence.

Regards,

- Rob

Processors

Processors forum

HeapMem Assertion Failure