This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Using two different HDVICPs

Hi,

I am working on an H.264 decoding application with DM6467 EVM. I want to create two instances of H.264 decoder, and run both of them on different HDVICPs. I come to know that this can be achieved using groupId feature of Codec Engine, but not sure how to do this. Can anybody help me out?  What all changes require in building codec combo? 

  • I'm interested in doing exactly the same thing. Any ideas?

     

    Do we have to buy the HDVICP API? Does it exist?

     

  • In the server config file (decode.cfg), we need to make sure that sufficient scratch memory is allocated to the scratch group H264DEC is mapped to -

    For example, you might have something like -

    Server.algs = [

        {name: "h264dec", mod: H264DEC,groupId:0,threadAttrs: {
                    stackMemId: 0, priority: Server.MINPRI + 1}
        },   
    ];

    Here the module H264DEC is mapped to scratch group 0 (groupId:0).

    Also make sure that scratch memory (internal memory) sufficient to run 2 instances is allocated to that scratch group id 0 (something like the following) -

        DSKT2.DARAM_SCRATCH_SIZES = [ 65536, 0, 0,0,0,0,0, /* ... */ 0 ];
        DSKT2.SARAM_SCRATCH_SIZES = [ 65536, 0, 0,0,0,0,0, /* ... */ 0 ];

    Also sufficient number (twice as one instance) of EDMA PaRams, Channels & Tccs should be allocated to Scratch group id 0

        EDMA3.maxPaRams[0] = aa;
        EDMA3.maxTccs[0] = bb;
        EDMA3.maxEdmaChannels[0] = cc;
        EDMA3.maxQdmaChannels[0] = 0;

    Also in the server main.c, 2 HDVICP resources need to registered through RMAN_resgiter, (ie, hdvicpConfig.numResources = 2)

    With these changes you should be able to run 2 HDVICPs concurrently

    Regards,

    Anirban

     

  • Hi, thanks for your reply. I'll have a look into your suggestion and let you know how it goes.

    Ralph

  • Most of what you say already appears to be the case in the codec server shipped with the DVSDK.

    The only difference is your use of the variables "aa", "bb" and "cc". What do these mean? Are you just suggesting it is up to me to decide what they should be?

    Let me elaborate in case it is not clear: I basically want to be able to call Vdec2_process() on a particular HDVICP so that each HDVICP can be assigned a particular stream of video to decode. Is this possible with the current "closed" H264 codecs that are supplied?

     

    Thanks,
    Ralph

  • Ralph,

    What I mean by the variables 'aa', 'bb' & 'cc' is the following -

    Consider, that one instance of the decoder requires

        EDMA3.maxPaRams[0]           = 159;
        EDMA3.maxTccs[0]             = 21;
        EDMA3.maxEdmaChannels[0]     = 21;
        EDMA3.maxQdmaChannels[0] = 0;

    Then, for running 2 instances we should set these values to twice of 1 instance requirement -

        EDMA3.maxPaRams[0]           = 318;
        EDMA3.maxTccs[0]             = 42;
        EDMA3.maxEdmaChannels[0]     = 42;
        EDMA3.maxQdmaChannels[0] = 0;

    So that 2 instances can be allocated & opened from the same Scratch group (0).

    With these settings, you should be able to allocate, open & execute 2 decoder instances simultaneously.

    Regards,

    Anirban

     

  • I see. Thanks for that.

    I have in fact tried changing the values in my server.cfg in "cs2dm6467_1_00_00_10/packages/ti/sdo/server/cs/ " in my DVSDK directory but this seems to have no effect. I think this because if I set all the values of maxPaRams[0], EDMA3.maxTccs[0] and EDMA3.maxEdmaChannels[0] to just 1, the code still runs fine which is strange given how much I have restricted the resources.

    Is "make clean" not enough in order to rebuild the codec server?

     

    Thanks,

    Ralph

  • Ralph,

    That is strange. You can probably share the cfg file and we can have a look.

    Regards,

    Anirban

     

  • Hi, here's my server.cfg. It's the one that I obtained by followed the "DM6467 getting started"->software guide. I've changed the values that I'd set to 1 back to what they were.

    Thanks,
    Ralph

     

    /*
     *  ======== server.cfg ========
     *
     *  For details about the packages and configuration parameters used throughout
     *  this config script, see the Codec Engine Configuration Guide (link
     *  provided in the release notes).
     */

    /* scratch groups */
    var MAXGROUPS = 20;
    var GROUP_0 = 0;
    var GROUP_1 = 1;

    /*
     *  Configure CE's OSAL.  This codec server only builds for the BIOS-side of
     *  a heterogeneous system, so use the "DSPLINK_BIOS" configuration.
     */
    var osalGlobal = xdc.useModule('ti.sdo.ce.osal.Global');
    osalGlobal.runtimeEnv = osalGlobal.DSPLINK_BIOS;
    osalGlobal.traceBufferSize = 0x40000;

    /* Set the OS to BIOS */
    var os = xdc.useModule('ti.sdo.ce.osal.bios.Settings');

    /* configure default memory seg id to BIOS-defined "DDR2" */
    osalGlobal.defaultMemSegId = "DDR2";

    /* activate BIOS logging module */
    var LogServer = xdc.useModule('ti.sdo.ce.bioslog.LogServer');

    /*
     *  ======== Server Configuration ========
     */
    var Server = xdc.useModule('ti.sdo.ce.Server');
    /* The server's stackSize.  More than we need... but safe. */
    Server.threadAttrs.stackSize = 16384;

    /* The servers execution priority */
    Server.threadAttrs.priority = Server.MINPRI;

    utils.importFile("codec.cfg");

    /*
     *  ======== DSKT2 (XDAIS Alg. memory allocation) configuration ========
     *
     *  DSKT2 is the memory manager for all algorithms running in the system,
     *  granting them persistent and temporary ("scratch") internal and external
     *  memory. We configure it here to define its memory allocation policy.
     *
     *  DSKT2 settings are critical for algorithm performance.
     *
     *  First we assign various types of algorithm internal memory (DARAM0..2,
     *  SARAM0..2,IPROG, which are all the same on a C64+ DSP) to "L1DHEAP"
     *  defined in the .tcf file as an internal memory heap. (For instance, if
     *  an algorithm asks for 5K of DARAM1 memory, DSKT2 will allocate 5K from
     *  L1DHEAP, if available, and give it to the algorithm; if the 5K is not
     *  available in the L1DHEAP, that algorithm's creation will fail.)
     *
     *  The remaining segments we point to the "DDRALGHEAP" external memory segment
     *  (also defined in the.tcf) except for DSKT2_HEAP which stores DSKT2's
     *  internal dynamically allocated objects, which must be preserved even if
     *  no codec instances are running, so we place them in "DDR2" memory segment
     *  with the rest of system code and static data.
     */
    var DSKT2 = xdc.useModule('ti.sdo.fc.dskt2.DSKT2');
    DSKT2.DARAM0     = "IRAM";
    DSKT2.DARAM1     = "IRAM";
    DSKT2.DARAM2     = "IRAM";
    DSKT2.SARAM0     = "IRAM";
    DSKT2.SARAM1     = "IRAM";
    DSKT2.SARAM2     = "IRAM";
    DSKT2.ESDATA     = "DDRALGHEAP";
    DSKT2.IPROG      = "IRAM";
    DSKT2.EPROG      = "DDR2";
    DSKT2.DSKT2_HEAP = "DDR2";    /* to allocate internal DSKT2 object */
    DSKT2.trace = false;
    DSKT2.debug = false;

    /*
     *  Next we define how to fulfill algorithms' requests for fast ("scratch")
     *  internal memory allocation; "scratch" is an area an algorithm writes to
     *  while it processes a frame of data and is discarded afterwards.
     *
     *  First we turn off the switch that allows the DSKT2 algorithm memory manager
     *  to give to an algorithm external memory for scratch if the system has run
     *  out of internal memory. In that case, if an algorithm fails to get its
     *  requested scratch memory, it will fail at creation rather than proceed to
     *  run at poor performance. (If your algorithms fail to create, you may try
     *  changing this value to "true" just to get it running and optimize other
     *  scratch settings later.)
     *
     *  Setting "algorithm scratch sizes", is a scheme we use to minimize internal
     *  memory resources for algorithms' scratch memory allocation. Algorithms that
     *  belong to the same "scratch group ID" -- field "groupId" in the algorithm's
     *  Server.algs entry above, reflecting the priority of the task running the
     *  algorithm -- don't run at the same time and thus can share the same
     *  scratch area. When creating the first algorithm in a given "scratch group"
     *  (between 0 and 19), a shared scratch area for that groupId is created with
     *  a size equal to SARAM_SCRATCH_SIZES[<alg's groupId>] below -- unless the
     *  algorithm requests more than that number, in which case the size will be
     *  what the algorithm asks for. So SARAM_SCRATCH_SIZES[<alg's groupId>] size is
     *  more of a groupId size guideline -- if the algorithm needs more it will get
     *  it, but getting these size guidelines right is important for optimal use of
     *  internal memory. The reason for this is that if an algorithm comes along
     *  that needs more scratch memory than its groupId scratch area's size, it
     *  will get that memory allocated separately, without sharing.
     *
     *  This DSKT2.SARAM_SCRATCH_SIZES[<groupId>] does not mean it is a scratch size
     *  that will be automatically allocated for the group <groupId> at system
     *  startup, but only that is a preferred minimum scratch size to use for the
     *  first algorithm that gets created in the <groupId> group, if any.
     *
     *  (An example: if algorithms A and B with the same groupId = 0 require 10K and
     *  20K of scratch, and if SARAM_SCRATCH_SIZES[0] is 0, if A gets created first
     *  DSKT2 allocates a shared scratch area for group 0 of size 10K, as A needs.
     *  If then B gets to be created, the 20K scratch area it gets will not be
     *  shared with A's -- or anyone else's; the total internal memory use will be
     *  30K. By contrast, if B gets created first, a 20K shared scratch will be
     *  allocated, and when A comes along, it will get its 10K from the existing
     *  group 0's 20K area. To eliminate such surprises, we set
     *  SARAM_SCRATCH_SIZES[0] to 20K and always spend exactly 20K on A and B's
     *  shared needs -- independent of their creation order. Not only do we save 10K
     *  of precious internal memory, but we avoid the possibility that B can't be
     *  created because less than 20K was available in the DSKT2 internal heaps.)
     *
     *  Finally, note that if the codecs correctly implement the
     *  ti.sdo.ce.ICodec.getDaramScratchSize() and .getSaramScratchSize() methods,
     *  this scratch size configuration can be autogenerated by
     *  configuring Server.autoGenScratchSizeArrays = true.
     */
    DSKT2.ALLOW_EXTERNAL_SCRATCH = false;
    DSKT2.SARAM_SCRATCH_SIZES[GROUP_0] = 65536;
    DSKT2.SARAM_SCRATCH_SIZES[GROUP_1] = 0;
    DSKT2.DARAM_SCRATCH_SIZES[GROUP_0] = 65536;
    DSKT2.DARAM_SCRATCH_SIZES[GROUP_1] = 0;

    /*
     *  ======== RMAN (IRES Resource manager) configuration ========
     */
    var RMAN = xdc.useModule('ti.sdo.fc.rman.RMAN');
    RMAN.useDSKT2 = true;
    RMAN.tableSize = 10;
    RMAN.semCreateFxn = "Sem_create";
    RMAN.semDeleteFxn = "Sem_delete";
    RMAN.semPendFxn = "Sem_pend";
    RMAN.semPostFxn = "Sem_post";
    RMAN.debug = false;
    RMAN.trace = false;
    print("In server.cfg (codec server)");
    xdc.useModule('ti.sdo.ce.Settings').checked = false;
    var EDMA3 = xdc.useModule('ti.sdo.fc.edma3.Settings');
    EDMA3.globalInit = false;
    EDMA3.maxTccs[GROUP_0] = 49;
    EDMA3.maxTccs[GROUP_1] = 0;
    EDMA3.maxPaRams[GROUP_0] = 384;
    EDMA3.maxPaRams[GROUP_1] = 0;
    EDMA3.maxEdmaChannels[GROUP_0] = 49;
    EDMA3.maxEdmaChannels[GROUP_1] = 0;
    EDMA3.maxQdmaChannels[GROUP_0] = 4;
    EDMA3.maxQdmaChannels[GROUP_1] = 0;
    EDMA3.regionConfig             = "DM6467_Config";
    EDMA3.trace = false;
    EDMA3.debug = false;

    var EDMA3CHAN = xdc.useModule('ti.sdo.fc.ires.edma3chan.EDMA3CHAN');
    EDMA3CHAN.debug = false;
    EDMA3CHAN.trace = false;

    var HDVICP =  xdc.useModule('ti.sdo.fc.ires.hdvicp.HDVICP');
    HDVICP.debug = false;
    HDVICP.trace = false;

    var HDINTC = xdc.useModule('ti.sdo.fc.hdintc.HDINTC');
    HDINTC.interruptVectorId_0 = 10;
    HDINTC.interruptVectorId_1 = 11;
    HDINTC.hdvicpInterruptEventNo_0 = 29;
    HDINTC.hdvicpInterruptEventNo_1 = 39;

    HDINTC.biosInterruptVectorId_0 = 7;
    HDINTC.biosInterruptVectorId_1 = 8;   
    HDINTC.biosInterruptEventNo_0 = 30;   
    HDINTC.biosInterruptEventNo_1 = 31;

  • Ralph,

    The cfg file looks fine. Also the scratch size & EDMA settinggs look good enough to run 2 instances of H264DEC. You can cross check once with the codec's datasheet. What happens when you try running 2 instances of H264DEC (mapped to Group 0) simultaneously with these settings?

    Regards,

    Anirban

  • Well, I can create up to 3 instances of the codec without any issues but I don't yet know if they are all mapped to group 0. I have some more investigation to do before I know this.

  • Can any of the TI H264 codec programmers tell me whether they are using the IRES_HDVICP_RequestType as "IRES_HDVICP_ID_ANY" in the h264dec codec?

    I can sign an NDA if necessary. It's just I have a reasonable belief that the request type is not this but is in fact " IRES_HDVICP_ID_0".

    Thanks,
    Ralph

  • Ralph,

    TI's H264 Decoder sets IRES_HDVICP_RequestType as "IRES_HDVICP_ID_ANY" as it can run on either of the HDVICPs (0 or 1)

    Regards,

    Anirban

     

  • Okay thanks for the information. I am trying to decode two H264 streams simulataneously so I create two instances of the algorithm and according to what you've said and the H264  datasheet (which states that the algorithm only uses one HDVICP) each algorithm should be allocated its own exclusive HDVICP. Now in addition, each algorithm will be allocated 21 channels of DMA out of the total of 64 on the DM6467 which leaves plenty of channels left over for other uses.

    What is confusing me is why this situation occurs:

    • 1 H264 decoder decoding 1080p video has a DSP utilisation of 65% (for the 10Mbps clip that I am using) and manages a rate of 40fps.
    • 2 H264 decoders decoding 1080p video have a DSP utilisation of 65% (for decoding 2 copies of the 10Mbps clip above) and manage a total decode rate for both clips of 40fps.

    To me, this looks like in the second case that there is no resource sharing going on between the 2 instances of the same codec. If there was, I would expect the DSP utilisation to be higher than 65% (and the overal decode rate to be greater than 40fps) in the case where there are 2 codecs running as clearly 2 HDVICPs will cause the DSP utilisation to be higher than if only 1 HDVICP was running.

    Additionally, the fact that it is possible to create 3 instances of an H264 decoder without problems suggests to me that there is only 1 HDVICP being shared between all codec instances. (Creating a 4th instance fails, presumably because you run out of EDMA channels.) I would have thought that the 3rd call to create an instance of a decoder would fail as there weren't enough HDVICPs assuming each instance gets allocated 1 HDVICP. I admit that I am speculating here...

    By the way, I have checked to confirm that the file access is not the limiting factor here; it is definitely the DSP. I have plenty of ARM CPU power left over too.

    Look forward to hearing from you,


    Ralph

  • Ralph,

    It seems like the 2 decoders are not running concurrently (based on the data you have provided). Can you generate FC logs to see the resources (HDVICP & DMA) allocated to the 2 instances of H264 Decoder?

    Regards,

    Anirban

     

  • I've followed the instructions at:

    http://processors.wiki.ti.com/index.php/Trace_in_Framework_Components

    but cannot get any trace information out of Framework Components even using CE_DEBUG=3. I assume trace is what you mean when you say "log"?

     

    Thanks,

    Ralph

  • Anirban/Ralph,

    If you use the same Group ID for both the instances of codecs, i do not think it will run concurrently. I am not sure, but this is my understanding. We can look into this further.

    Regards,

    Kapil

  • Yes, I'm pretty sure you're right. I'll have a go at using different group IDs. By the way, I no longer think that the cause of me not being able to create 3 instances of the decoder is to do with running out of EDMA channels. Instead I think it is to do with running out of persistant memory as each codec creation is allocated its own section of CMEM. I think the persistant memory is the only thing that is not shared between codecs with the same group ID.

    Thanks,
    Ralph

  • Okay, so I've now got 2 decoders running in different groups. How I got there is a story of awkwardness but it seems possible.

    The only downside seems to be that my overall frame decode rate drops by 25% compared to running 2 instances of the codec in the same group (quite a major downside actually as I was trying to increase the overall frame rate) . I have to have faith that the DSP is allocating each HDVICP to each instance of the algorithm that is in a different group, however I think something is creating a bottleneck behind the scenes (not the DSP's processing power) because I attain a 82% DSP utilisation whether I have the codecs in the same group or in different groups.


    Any ideas what the bottle neck could be? So far I am thinking that having one BufTab for both codecs to use is not good enough if the codecs are in different groups.

     

    Thanks,
    Ralph

  • Update: I've now got two BufTabs, one for each codec. I had misconfigured the codecs and now I am confident that they are each in their own group.

    Unfortunately I can only decode two base profile H264 videos at the same time with codecs in different groups (yes, I am using the h264dec codec and not the h2641080p60vdec one) and the DSP utilisation has not changed and the frame rate is the same as putting the codecs in the same group!

    When I try to decode 2 high profile videos after some initialisation the DSP appears to just hang while the ARM side keeps running and displays 0% DSP usage. Here is the end of the log with all logging turned on. Does anyone know what is failing? I see no error messages yet the DSP appears to crash. (My h264 codecs are named h264dec and h264de2)

    (As an aside, I can decode 1 base profile and 1 high profile stream using codecs in different groups)

    Ralph

     

     

    @3,819,379us: [+0 T:0x427b5490] CE - Engine_fwriteTrace> returning count [2684]
    @3,819,483us: [+0 T:0x427b5490] CV - VISA_call Completed: messageId=0x000204f0, command=0x0, return(status=0)
    @3,819,672us: [+5 T:0x427b5490] CV - VISA_freeMsg(0x6c0e8, 0x42fc6900): Freeing message with messageId=0x000204f0
    @3,819,777us: [+0 T:0x427b5490] ti.sdo.ce.video2.VIDDEC2 - VIDDEC2_process> Exit (handle=0x6c0e8, retVal=0x0)
    @3,819,878us: [+2 T:0x427b5490] ti.sdo.dmai - [Vdec2] VIDDEC2_process() ret 0 inId 0 inUse 0 consumed 157918
    @3,829,797us: [+0 T:0x427b5490] ti.sdo.ce.video2.VIDDEC2 - VIDDEC2_process> Enter (handle=0x6c2c0, inBufs=0x427b443c, outBufs=0x427b4430, inArgs=0x427b4ca0, outArgs=0x427b4500)
    @3,829,933us: [+4 T:0x427b5490] CV - VISA_getMaxMsgSize(0x6c2c0): returning 0x1000
    @3,830,100us: [+5 T:0x427b5490] CV - VISA_allocMsg> Allocating message for messageId=0x000303ba
    @3,830,233us: [+0 T:0x427b5490] CV - VISA_call(visa=0x6c2c0, msg=0x42fc7900): messageId=0x000303ba, command=0x0
    [DSP] @12,072,939tk: [+5 T:0x8d563c0c] CN - NODE> 0x8fd06988(h264de2#1) call(algHandle=0x8fd069d0, msg=0x8ff05900); messageId=0x000303ba
    [DSP] @12,073,024tk: [+0 T:0x8d563c0c] OM - Memory_cacheInv> Enter(addr=0x897b7482, sizeInBytes=977790)
    [DSP] @12,074,134tk: [+0 T:0x8d563c0c] OM - Memory_cacheInv> return
    [DSP] @12,074,168tk: [+0 T:0x8d563c0c] OM - Memory_cacheInv> Enter(addr=0x878c2000, sizeInBytes=2141184)
    [DSP] @12,076,536tk: [+0 T:0x8d563c0c] OM - Memory_cacheInv> return
    [DSP] @12,076,568tk: [+0 T:0x8d563c0c] OM - Memory_cacheInv> Enter(addr=0x87accc00, sizeInBytes=1070592)
    [DSP] @12,077,779tk: [+0 T:0x8d563c0c] OM - Memory_cacheInv> return
    [DSP] @12,077,813tk: [+0 T:0x8d563c0c] ti.sdo.ce.video2.VIDDEC2 - VIDDEC2_process> Enter (handle=0x8fd069d0, inBufs=0x8d56739c, outBufs=0x8d567460, inArgs=0x8ff05a78, outArgs=0x8ff05a84)
    [DSP] @12,077,910tk: [+5 T:0x8d563c0c] CV - VISA_enter(visa=0x8fd069d0): algHandle = 0x8fd06a08
    [DSP] @12,077,961tk: [+0 T:0x8d563c0c] ti.sdo.ce.alg.Algorithm - Algorithm_activate> Enter(alg=0x8fd06a08)
    [DSP] @12,078,013tk: [+0 T:0x8d563c0c] ti.sdo.fc.dskt2 - DSKT2_activateAlg> Enter (scratchId=1, alg=0x8c7b6000)
    [DSP] @12,078,071tk: [+0 T:0x8d563c0c] ti.sdo.fc.dskt2 - DSKT2_activateAlg> Exit
    [DSP] @12,078,111tk: [+0 T:0x8d563c0c] ti.sdo.fc.rman - RMAN_activateAllResources> Enter (alg=0x8c7b6000, resFxns=0x8fc1dc1c, scratchGroupId=1)
    [DSP] @12,078,179tk: [+0 T:0x8d563c0c] ti.sdo.fc.rman - RMAN_activateAllResources> Exit (status=0)
    [DSP] @12,078,224tk: [+0 T:0x8d563c0c] ti.sdo.ce.alg.Algorithm - Algorithm_activate> Exit
    [DSP] @12,176,731tk: [+5 T:0x8d563c0c] CV - VISA_exit(visa=0x8fd069d0): algHandle = 0x8fd06a08
    [DSP] @12,176,807tk: [+0 T:0x8d563c0c] ti.sdo.ce.alg.Algorithm - Algorithm_deactivate> Enter(alg=0x8fd06a08)
    [DSP] @12,176,864tk: [+0 T:0x8d563c0c] ti.sdo.fc.rman - RMAN_deactivateAllResources> Enter (alg=0x8c7b6000, resFxns=0x8fc1dc1c, scratchGroupId=1)
    [DSP] @12,176,935tk: [+0 T:0x8d563c0c] ti.sdo.fc.rman - RMAN_deactivateAllResources> Exit (status=0)
    [DSP] @12,176,984tk: [+0 T:0x8d563c0c] ti.sdo.fc.dskt2 - DSKT2_deactivateAlg> Enter (scratchId=1, algHandle=0x8c7b6000)
    [DSP] @12,177,041tk: [+4 T:0x8d563c0c] ti.sdo.fc.dskt2 - DSKT2_deactivateAlg> Lazy deactivate of algorithm 0x8c7b6000
    [DSP] @12,177,097tk: [+0 T:0x8d563c0c] ti.sdo.fc.dskt2 - DSKT2_deactivateAlg> Exit
    [DSP] @12,177,136tk: [+0 T:0x8d563c0c] ti.sdo.ce.alg.Algorithm - Algorithm_deactivate> Exit
    [DSP] @12,177,181tk: [+0 T:0x8d563c0c] ti.sdo.ce.video2.VIDDEC2 - VIDDEC2_process> Exit (handle=0x8fd069d0, retVal=0x0)
    [DSP] @12,177,248tk: [+5 T:0x8d563c0c] CN - NODE> returned from call(algHandle=0x8fd069d0, msg=0x8ff05900); messageId=0x000303ba
    @3,860,581us: [+0 T:0x427b5490] CE - Engine_fwriteTrace> returning count [2684]
    @3,860,688us: [+0 T:0x427b5490] CV - VISA_call Completed: messageId=0x000303ba, command=0x0, return(status=0)
    @3,860,857us: [+5 T:0x427b5490] CV - VISA_freeMsg(0x6c2c0, 0x42fc7900): Freeing message with messageId=0x000303ba
    @3,860,961us: [+0 T:0x427b5490] ti.sdo.ce.video2.VIDDEC2 - VIDDEC2_process> Exit (handle=0x6c2c0, retVal=0x0)
    @3,861,063us: [+2 T:0x427b5490] ti.sdo.dmai - [Vdec2] VIDDEC2_process() ret 0 inId 5 inUse 0 consumed 87586
    @3,861,175us: [+0 T:0x427b5490] ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x6ce60] timeout[0xffffffff]
    @3,861,279us: [+0 T:0x427b5490] ti.sdo.ce.osal.Sem - Leaving Sem_pend> sem[0x6ce60] status[0]
    @3,861,381us: [+2 T:0x427b5490] ti.sdo.dmai - [Buffer] Set user pointer 0x45f76aa4 (physical 0x897ccaa4)
    @3,861,482us: [+0 T:0x427b5490] ti.sdo.ce.osal.Sem - Entered Sem_post> sem[0x6ce78]
    @3,861,595us: [+0 T:0x427b5490] ti.sdo.ce.osal.Sem - Leaving Sem_post> sem[0x6ce78]
    @3,861,696us: [+0 T:0x427b5490] ti.sdo.ce.osal.Sem - Entered Sem_post> sem[0x6ce60]
    @3,861,792us: [+0 T:0x427b5490] ti.sdo.ce.osal.Sem - Leaving Sem_post> sem[0x6ce60]
    @3,861,928us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Leaving Sem_pend> sem[0x6ce78] status[0]
    @3,862,039us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x6cba0] timeout[0xffffffff]
    @3,862,139us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Leaving Sem_pend> sem[0x6cba0] status[0]
    @3,862,232us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Entered Sem_post> sem[0x6cba0]
    @3,862,322us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Leaving Sem_post> sem[0x6cba0]
    @3,862,409us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x6cbb8] timeout[0xffffffff]
    @3,862,503us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Leaving Sem_pend> sem[0x6cbb8] status[0]
    @3,862,597us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Entered Sem_post> sem[0x6cbb8]
    @3,862,699us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Leaving Sem_post> sem[0x6cbb8]
    @3,862,790us: [+0 T:0x46c75490] ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x6cbd0] timeout[0xffffffff]
    @4,200,233us: [+0 T:0x4001fcd0] CE - Engine_getCpuLoad(0x6cf20)
    Decode demo ARM Load: 8% DSP Load: 77% Display Type: 1080P 60Hz
    Video Codec: H.264 HP Video fps: 3 fps Video bit rate: 18588 kbps Video resolution: 1920x1080
    Sound codec: N/A Sound bit rate: 0 kbps Sampling freq: 0 Time: 00:00:01

     

  • Okay, I think I've gone some of the way to solving the playing back of dual high profile H264 streams problem now; the issue is something to do with not displaying all video buffers as soon as they are decoded.

    This still leaves the other issue of finding the bottleneck in the system that is causing the DSP usage to stay constant whether there is 1 codec or 2 codecs (each in a different group).

    Ralph

  • Ralph,

    How are you measuring the DSP loading? Note that the Codec will interrupt the DSP for each MB pair processing. To obtain the actual DSP loading you will have to add up all the Interrupt processing cycles. I am assuming that other than the codecs nothing else is running on the DSP. In that case the DSP will be idle for the remaing time when there are no interrupts.

    Regards,

    Kapil

  • Hi Kapil,

     

    thanks for your reply. I am measuring the DSP loading using the feature in the decode demo app of the DVSDK. This itself uses Engine_getCpuLoad() which in turn uses callServer(). 

    callServer() calls Comm_put() and Comm_get() which talk to DSP/BIOS using the message RMS_GETCPUSTAT. I cannot find any documentation on RMS_GETCPUSTAT.

    There is nothing else running on the DSP.

     

    Thanks,

    Ralph