Multiple codec instances for multichanel encoding/decoding on dm365

Jan Pohanka

Hello,

I'm writing a simple video server, which has to be able among others to capture, encode and recode video at the same time. The application will run on dm365. System is based on dvsdk 4.02 and using codec engine. Unfortunately I'm fighting with codec creation.

I discovered that I'm able to create up to four h264 decoders for 1600x1200, the fifth call to VIDDEC2_create returns NULL. I can also create several encoders (I do not know the top limit). However there is a strange issue with order of these initializations.

For example I can do following sequence:

decoder_init(&octx1);
encoder_init(&ictx1);
decoder_init(&octx2);
encoder_init(&ictx2);
decoder_init(&octx3);
encoder_init(&ictx3);
decoder_init(&octx4);
encoder_init(&ictx4);

BUT if I swith the order to

encoder_init(&ictx1);
decoder_init(&octx1);
encoder_init(&ictx2);
decoder_init(&octx2);

(init encoder first) then the first decoder initialization fails immediatly on calling VIDDEC2_create, which returns NULL.

Each of my *_init() functions sets IVIDDEC2_Params, IVIDDEC2_DynamicParams, and calls VIDDEC2_create and VIDDEC2_control (or VIDENC2_create and VIDENC2_control respectively).

Does there exist any contstraint for codecs creation that I'm not aware of or there is some another problem?

I'm using following default params for the codecs:

static const VIDDEC2_Params Vdec2_Params_DEFAULT = {
sizeof(VIDDEC2_Params), /* size */
576, /* maxHeight */
720, /* maxWidth */
30000, /* maxFrameRate */
6000000, /* maxBitRate */
XDM_BYTE, /* dataEndianess */
XDM_YUV_420SP, /* forceChromaFormat */
};

static const VIDDEC2_DynamicParams Vdec2_DynamicParams_DEFAULT = {
sizeof(VIDDEC2_DynamicParams), /* size */
XDM_DECODE_AU, /* decodeHeader */
0, /* displayWidth */
IVIDEO_NO_SKIP, /* frameSkipMode */
IVIDDEC2_DISPLAY_ORDER, /* frameOrder */
0, /* newFrameFlag */
0, /* mbDataFlag */
};

static const VIDENC1_Params Venc1_Params_DEFAULT = {
sizeof(VIDENC1_Params), /* size */
XDM_DEFAULT, /* encodingPreset */
IVIDEO_LOW_DELAY, /* rateControlPreset */
1200, /* maxHeight */
1600, /* maxWidth */
30000, /* maxFrameRate */
6000000, /* maxBitRate */
XDM_BYTE, /* dataEndianness */
0, /* maxInterFrameInterval */
XDM_YUV_420P, /* inputChromaFormat */
IVIDEO_PROGRESSIVE, /* inputContentType */
XDM_CHROMA_NA /* reconChromaFormat */
};

static const VIDENC1_DynamicParams Venc1_DynamicParams_DEFAULT = {
sizeof(IVIDENC1_DynamicParams), /* size */
1200, /* inputHeight */
1600, /* inputWidth */
30000, /* refFrameRate */
30000, /* targetFrameRate */
6000000, /* targetBitRate */
30, /* intraFrameInterval */
XDM_ENCODE_AU, /* generateHeader */
0, /* captureWidth */
IVIDEO_NA_FRAME, /* forceFrame */
1, /* interFrameInterval */
0 /* mbDataFlag */
};

best regards
Jan

over 13 years ago

0 GAnthony over 13 years ago

TI__Intellectual 2790 points

Jan:

I notice the encoder params are VIDENC1, but it is mentioned that the *_init() functions call VIDENC2_create(). Can you please confirm the VIDENC2_*() functions are being called with VIDENC2_Params type?

Otherwise, the allocation of scratch memory by DSKT2 (Part of Framework Components) is rather complex, and the order of instantiation can change the amount of internal memory actually acquired, depending on DSKT2 settings, if the algorithms are in the same scratch groupId.

http://processors.wiki.ti.com/index.php/Codec_Engine_GroupIds

http://processors.wiki.ti.com/index.php/Framework_Components_DSKT2_User%27s_Guide

There is a good explanation of this effect in the server config file in codec_engine product: examples/ti/sdo/ce/examples/servers/all_codecs_new_config/all.cfg:

 *  Next we set "algorithm scratch sizes", a scheme we use to minimize internal
 *  memory resources for algorithms' scratch memory allocation. Algorithms that
 *  belong to the same "scratch group ID" -- field "groupId" in the algorithm's
 *  Server.algs entry above, reflecting the priority of the task running the
 *  algorithm -- don't run at the same time and thus can share the same
 *  scratch area. When creating the first algorithm in a given "scratch group"
 *  (between 0 and 19), a shared scratch area for that groupId is created with
 *  a size equal to SARAM_SCRATCH_SIZES[<alg's groupId>] below -- unless the
 *  algorithm requests more than that number, in which case the size will be
 *  what the algorithm asks for. So SARAM_SCRATCH_SIZES[<alg's groupId>] size is
 *  more of a groupId size guideline -- if the algorithm needs more it will get
 *  it, but getting these size guidelines right is important for optimal use of
 *  internal memory. The reason for this is that if an algorithm comes along
 *  that needs more scratch memory than its groupId scratch area's size, it
 *  will get that memory allocated separately, without sharing.
 *
 *  This DSKT2.SARAM_SCRATCH_SIZES[<groupId>] does not mean it is a scratch size
 *  that will be automatically allocated for the group <groupId> at system
 *  startup, but only that is a preferred minimum scratch size to use for the
 *  first algorithm that gets created in the <groupId> group, if any.
 *
 *  (An example: if algorithms A and B with the same groupId = 0 require 10K and
 *  20K of scratch, and if SARAM_SCRATCH_SIZES[0] is 0, if A gets created first
 *  DSKT2 allocates a shared scratch area for group 0 of size 10K, as A needs.
 *  If then B gets to be created, the 20K scratch area it gets will not be
 *  shared with A's -- or anyone else's; the total internal memory use will be
 *  30K. By contrast, if B gets created first, a 20K shared scratch will be
 *  allocated, and when A comes along, it will get its 10K from the existing
 *  group 0's 20K area. To eliminate such surprises, we set
 *  SARAM_SCRATCH_SIZES[0] to 20K and always spend exactly 20K on A and B's
 *  shared needs -- independent of their creation order. Not only do we save 10K
 *  of precious internal memory, but we avoid the possibility that B can't be
 *  created because less than 20K was available in the DSKT2 internal heaps.)
 *
"

So, it is possible the order of execution is resulting in a higher internal memory usage than is available. Or, maybe it's some other resource. By enabling debug_trace for framework components (especially DSKT2), it may elucidate the exact point of failure:

http://processors.wiki.ti.com/index.php/Trace_in_Framework_Components

   http://processors.wiki.ti.com/index.php/FC_Config_Updates#Debug_and_Trace_settings

Regards,

- Gil

0 Jan Pohanka over 13 years ago in reply to GAnthony

Expert 1310 points

Hi Gil,

GAnthony said:

I notice the encoder params are VIDENC1, but it is mentioned that the *_init() functions call VIDENC2_create(). Can you please confirm the VIDENC2_*() functions are being called with VIDENC2_Params type?

It was just a typo. In fact we use VIDENC1.

Thank you for clarification of groupIds, I think that setting SARAM_SCRATCH_SIZES could help me. But I have checked my xdc config file (which is based on encode examples from dm365 dvsdk 4.02) and there is explicitly said not to use DSKT2

/* Load support for the Codec Engine OSAL */
var osalGlobal = xdc.useModule('ti.sdo.ce.osal.Global');
osalGlobal.runtimeEnv = osalGlobal.LINUX;

/* Load support for the 'Davinci Multimedia Application Interface' module */

environment['xdc.cfg.check.fatal'] = 'false';

var RMAN = xdc.useModule('ti.sdo.fc.rman.RMAN');
RMAN.useDSKT2 = false;
RMAN.persistentAllocFxn = "__ALG_allocMemory";
RMAN.persistentFreeFxn = "__ALG_freeMemory";
RMAN.semCreateFxn = "Sem_create";
RMAN.semDeleteFxn = "Sem_delete";
RMAN.semPendFxn = "Sem_pend";
RMAN.semPostFxn = "Sem_post";
RMAN.tableSize = 10;

var EDMA3 = xdc.useModule('ti.sdo.fc.edma3.Settings');
var vicp = xdc.useModule('ti.sdo.linuxutils.vicp.VICP');
var HDVICP = xdc.useModule('ti.sdo.fc.ires.hdvicp.HDVICP');
var VICP2 = xdc.useModule('ti.sdo.fc.ires.vicp.VICP2');
var VICPSYNC = xdc.useModule('ti.sdo.fc.vicpsync.VICPSYNC');
var HDVICPSYNC = xdc.useModule('ti.sdo.fc.hdvicpsync.HDVICPSYNC');
var MEMUTILS = xdc.useModule('ti.sdo.fc.memutils.MEMUTILS');
var ADDRSPACE = xdc.useModule('ti.sdo.fc.ires.addrspace.ADDRSPACE');
var EDMA3CHAN = xdc.useModule('ti.sdo.fc.ires.edma3chan.EDMA3CHAN');
var EDMA = xdc.useModule('ti.sdo.linuxutils.edma.EDMA');
var CMEM = xdc.useModule('ti.sdo.linuxutils.cmem.CMEM');

var MEMTCM = xdc.useModule('ti.sdo.fc.ires.memtcm.MEMTCM');
MEMTCM.cmemBlockId = 1; //Since we use _1 in our insmod command.

xdc.loadPackage("ti.sdo.ce.video2");
xdc.loadPackage("ti.sdo.fc.hdvicpsync");

/*
* ======== Engine Configuration ========
*/

var H264ENC = xdc.useModule('ti.sdo.codecs.h264enc.ce.H264ENC');
var H264DEC = xdc.useModule('ti.sdo.codecs.h264dec.ce.H264DEC');

var h264codecGroupId = 0;

var Engine = xdc.useModule('ti.sdo.ce.Engine');
var myEngineE = Engine.create("encode", [
{name: "h264enc", mod: H264ENC, local: true, groupId: h264codecGroupId},
]);

var myEngineD = Engine.create("decode", [
{name: "h264dec", mod: H264DEC, local: true, groupId: h264codecGroupId},
]);

Should I enable DSKT2 for my application or there is some another way how to do that on dm365 and DSKT2 is not available?

best regards
Jan

0 GAnthony over 13 years ago in reply to Jan Pohanka

TI__Intellectual 2790 points

Jan:

OK, I see your platform is ARM only, and has local codecs, so does not actually use DSKT2 for algorithm memory allocations. I understand this is a special case.

So Framework Components on ARM uses CMEM configured pools or heap to satisfy algorithm memory requests.

GroupIds are still used, not for scratch memory, but for sharing the video hardware accelerator and any DMA resources.

   I see the .cfg file also configures the ARM TC memory manager, for ARM internal memory:

       var MEMTCM = xdc.useModule('ti.sdo.fc.ires.memtcm.MEMTCM');
       MEMTCM.cmemBlockId = 1; //Since we use _1 in our insmod command.

Though my hypothesis that the order dependency might be due to DSKT scratch allocations was off the mark, it could be a similar dependency issue with the other resources (CMEM heap, Video HW, DMA).

There is some mention of order dependency when algorithms allocate from a CMEM heap that gets fragmented, here:

http://processors.wiki.ti.com/index.php/CMEM_Overview#General_Purpose_Heaps

So, we now need to verify that the application is set up with enough CMEM memory, and is using this ARM "TC" memory (which I understand the H264 codec needs).

It appears there is 32K of this ARM TCM memory, starting at physical 0x0000000.

From other forum posts, it appears the CMEM insmod command should look something like this (your pools may be different):

modprobe cmemk phys_start=0x83C00000 phys_end=0x88000000 pools=7x4845568 allowOverlap=1
phys_start_1=0x00001000 phys_end_1=0x00008000 pools_1=1x28672 useHeapIfPoolUnavailable=1

Is seems there are a few ways to satisfy algorithm memory requests:

1) Carve up CMEM memory into pools, based on codec requirements (derived from the codec datasheets); or

2) Try to allocate from CMEM pools, but if those are not sufficiently configured, grab from the CMEM heap; or

3) Just grab from CMEM heap always.

Option #1 is best for production systems, as the pools of fixed size buffers avoid memory fragmentation that could occur by allocating from a heap (Option #3). But determining the number of pools and pool sizes is challenging.

Option #2 is part way between #1 and #3.

The optional useHeapIfPoolUnavailable=1 argument tells CMEM to allocate from the heap (anything remaining after the pools are allocated), if there is insufficient memory in the pools to satisfy the algorithm memory request.

If you want the algorithms to use Heap only, you can set in your .cfg:

algSettings = xdc.useModule('ti.sdo.ce.alg.Settings');
algSettings.useHeap = true;

So, let's see what CMEM options you are using, and what method of CMEM allocation you want to use.

Here are some links on CMEM, and how to find out memory usage in CMEM pools:

http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/linuxutils/3_23_00_01/exports/linuxutils_3_23_00_01/docs/html/cmem_8h.html

You could also turn on the CMEM and FC debug_trace, and see what CMEM allocations are happening during the codec init calls:

http://processors.wiki.ti.com/index.php/CMEM_Overview#Debugging_Techniques

Regards,

- Gil

0 Jan Pohanka over 13 years ago in reply to GAnthony

Expert 1310 points

Hi Gil,

GAnthony said:

   So, we now need to verify that the application is set up with enough CMEM memory, and is using this ARM "TC" memory (which I understand the H264 codec needs).

   It appears there is 32K of this ARM TCM memory, starting at physical 0x0000000.

   From other forum posts, it appears the CMEM insmod command should look something like this (your pools may be different):

modprobe cmemk phys_start=0x83C00000 phys_end=0x88000000 pools=7x4845568 allowOverlap=1
                                     phys_start_1=0x00001000 phys_end_1=0x00008000 pools_1=1x28672 useHeapIfPoolUnavailable=1

...

My command for inserting cmem module is almost the same as the one that you specified:

modprobe cmemk phys_start=0x84200000 phys_end=0x88000000 phys_start_1=0x00001000 phys_end_1=0x00008000 pools_1=1x28672 allowOverlap=1 useHeapIfPoolUnavailable=1

So I expect that the algotithms from group 1 should use arm internal memory from pools_1.

I have also enabled the debugging messages but it seems, that the problem is not caused by cmem, because there is no mark of unsuccessful allocation. In fact the only message in the log, which informs about the problem is

@66,583,960us: [+7 T:0x400dd000 S:0xbee0271c] ti.sdo.ce.alg.Algorithm - Algorithm_create> Assignment of alg resources through RMAN FAILED (0x7)

I attached the whole log (0624.cmemd.log) to this post. It was produced by with following settings in xdc config

CMEM.debug = true;
EDMA.debug = true;
vicp.debug = true;
xdc.loadPackage('ti.sdo.fc.ires.hdvicp').profile = "debug";
xdc.loadPackage('ti.sdo.fc.ires.vicp').profile = "debug";
xdc.loadPackage('ti.sdo.fc.rman').profile = "debug";
xdc.loadPackage('ti.sdo.fc.edma3').profile = "debug";
xdc.loadPackage('ti.sdo.fc.ires.edma3chan').profile = "debug";
xdc.useModule('ti.sdo.fc.global.Settings').profile = "debug_trace";

How can I find out which resource is causing the problems, please?

best regards
Jan

0 GAnthony over 13 years ago in reply to Jan Pohanka

TI__Intellectual 2790 points

Jan:

I not seeing sufficient Framework Component RMAN trace from the log.

From the trace log, it appears you have framework components version 2.26.

For that version, enabling of the debug trace is done differently than previous versions (I see your cfg uses a mix of older and newer methods, and I'm wondering if that is causing the RMAN trace to be shunted).

Could you please try setting only the .profile=debug_trace as shown here, and regenerate the log?

http://processors.wiki.ti.com/index.php/Trace_in_Framework_Components#Framework_Components_2.22_and_later_2.x_releases

Hopefully then we will get some RMAN_* and other FC trace in the log.

Thanks,

- Gil

0 Jan Pohanka over 13 years ago in reply to GAnthony

Expert 1310 points

Hi,

here is the complete log 2664.ce.log. If I understand it correctly, the problem is with allocating EDMA3 channel on the line 10759 of the log.

Is there any way how to make it independent on the order of the initialization?

best regards
Jan

0 GAnthony over 13 years ago in reply to Jan Pohanka

TI__Intellectual 2790 points

Jan:

See if the advice in this post will resolve the issue:

http://e2e.ti.com/support/embedded/linux/f/354/t/51307.aspx

Regards,

- Gil

0 Jan Pohanka over 13 years ago in reply to GAnthony

Expert 1310 points

Hi Gil,

thank you for the answer. The advice from the linked post solved my problem. I can create four encoders and three decoders now without regards to the order of creation now. Fourth decoder creation ends with error because of insufficient cmem resources (see attached log 2502.ce3.log), but I think I will never need so big number of decoders, so I'm happy for now.

I have still another issue regarding dm365 codecs here http://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/t/211132.aspx. Do you think you can look at it? Unfortunately there is no response for some time.

with best regards
Jan

0 GAnthony over 13 years ago in reply to Jan Pohanka

TI__Intellectual 2790 points

Jan:

Glad we found the solution.

Regarding the other post about image artifacts, you could check the extended error code, to see if there's an issue with the input stream that the DM365 decoder is reporting; otherwise, it is a deeper question, which will require an H.264 expert to answer.

Regards,

- Gil

Processors

Processors forum

Multiple codec instances for multichanel encoding/decoding on dm365