This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

H264 BP + MP Decoder plugin in SV04?

Hi,

I am trying to develop video transcoding application on quad DSP C6678 board. I am able to test all demo applications(H264 encoder,J2k encoder and J2k transcoder) on Ubuntu 12.04 LTS using the following tools...

  • ccs 5.1
  • desktop-linux-sdk_01_00_00_04_32bit_setuplinux
  • bios_mcsdk_02_00_05_17_setuplinux
  • mcsdk_video_02_01_00_03_setuplinux

Actually my application is H264 transcoder (decoer/encoder), At present in SV04 build version(ubuntu version, mcsdk video 2.1.0.3) there is no video decoder plug-in available.

Is anybody integrated the H264 Decoder in SV04 build version?

I have downloaded the all the codec plug-ins from the following link and trying to integrated/add h264 decoder in SV04 build

http://software-dl.ti.com/dsps/dsps_public_sw/codecs/C6678_Video_Codecs/01_00_001/index_FDS.html

Please can any body provide integrated H264 decoder in SV04 DSP build version as apatch?

The transcoder on DSP C6678 reads h264 bitstream from host (PCIe interface using Desktop linux sdk) decodes it, encodes yuv, and sends encoded bitstream back to host through PCIe.

Thanks,

Kali

  • Hi Kali,

    SV04 with video decoders (e.g., H264HPDEC, H264BPMPDEC, MPEG2DEC, and etc.) integrated will be available in the next release of MCSDK Video 2.1, which is planned by end of December. We will let you know once the release is pushed on Web.

    Thanks,

    Hongmei


  • Thanks Hongmei,


    I am also continue my work on this integration (H264 BP + MP Decoder in SV04) and let you know the update.



    Regards,


    Kali


  • Hi Hongmei,

    I am able to integrated H264 BP MP decoder on DSPC8681E. The H264 BP decoder is successfully decoding H264 BP bitstream(PCIe file based application).

    I am planning to run both H264 BP decoder and encoder together on DSPC8681E.

    When I try to integrate the H264 BP decoder with H264 BP encoders it always fails to create second instance in SV04 framework. TI documentation explains H264 BP encoder is configured to run on Multi-chip Multi-core, Is this causes problem for creating instances for second codec? if true, how JPEG 2000 Transcoder is working? if not, what is the procedure and why it fails for creating second instances?

    Do you have any suggestions to integrate H264 decoder with H264 BP Encoder on DSPC8681E?

    Is it possible to created both H264 BP decoder and encoder instances like JPEG 2000 Transcoder demo application?

    Note:
    If we create instances separately, both H264 BP encoder and decoder are running successful. If we create both instances at once, it fails. Please find the following code from siuVctRun2.c file for creating instances for both H264 BP encoder and decoder.


    Thanks and Regards,
    Kali


    Reference TI Documentation:

    http://processors.wiki.ti.com/index.php/MCSDK_VIDEO_2.1_PCIE_Demo_Guide
    http://processors.wiki.ti.com/index.php/MCSDK_VIDEO_2.1_PCIe_Demo_Development_Guide#Data_Flow_of_H264BP_Encoding
    http://processors.wiki.ti.com/index.php/MCSDK_VIDEO_2.1_Demo_Guide




      /* Create Codecs */
    ///////////////////////////////////////////////////////////////////////////////////////.....
    if (strcmp((const char *)siuVctRunParams->codecName,"H264ENC") == 0)
      {
        if (SIU_VCT_DATA_PCIE_IN()) {
          taskCxt->algInArgs.inputBufs  = (XDAS_Int8 **)&sharedInputBufPtrX86[0];
          taskCxt->algInArgs.outputBufs = (XDAS_Int8 **)&sharedOutputBufPtrX86[0];
        } else {
          taskCxt->algInArgs.inputBufs  = (XDAS_Int8 **)&sharedInputBufPtr[0];
          taskCxt->algInArgs.outputBufs = (XDAS_Int8 **)&sharedOutputBufPtr[0];
        }

        memset(&mediaTaskCxt[1],0,sizeof(mediaTaskContext_t));
        mediaTaskCxt[1].algInArgs.scratch[DNUM] = &codecScratch[DNUM][0];
        mediaTaskCxt[1].algInArgs.scratchSize[DNUM] = CODEC_SCRATCH_SIZE;

     
        taskCxt->coreId = -1;
        inst->mcInst.nCores = siuVctRunParams->nCores;
        for (i=0; i<inst->mcInst.nCores; i++)
        {
          CORE_TEAM_MAPPING[i] = siuVctRunParams->coreTeamMapping[i];
          inst->mcInst.coreTeamMapping[i] = siuVctRunParams->coreTeamMapping[i];
          if ( siuVctRunParams->coreTeamMapping[i] == SIU_VCT_GLOBAL_CORE_ID)
            taskCxt->coreId = i;
        }
        if ( taskCxt->coreId == -1)
        {
          for ( i = 0; i < inst->mcInst.nCores; i++)
          {
            CORE_TEAM_MAPPING[i] += siuContext.dspId*8;
            inst->mcInst.coreTeamMapping[i] += siuContext.dspId*8;
          }
     
          for ( i = 0; i < inst->mcInst.nCores; i++)
          {
            if ( inst->mcInst.coreTeamMapping[i] == SIU_VCT_GLOBAL_CORE_ID)
            {
              taskCxt->coreId = i;
            }
          }
        }
        taskCxt->numCores = inst->mcInst.nCores;
     
        if (SIU_VCT_DATA_PCIE_IN()) {
          siu_ipc_osal_multichip_open_codec_share(taskCxt->coreId);
          siu_ipc_osal_multichip_barrier_init(taskCxt->coreId);
        } else {
          /* Single chip for TFTP mode */
          if(first_time) {
              /* Attach between the master core and the slave cores */
              siu_ipc_osal_attach_owner(siuVctRunParams->nCores);
              if(taskCxt->coreId == 0) {
                  /* Master core create barrier, gate, and shared regions */
                  siu_ipc_osal_barrier_create(siuVctRunParams->nCores);
                  siu_ipc_osal_create_gate();
                  siu_ipc_osal_create_share(sizeof(SIUVID_MC_SHMEM_MemTab), (void**)&siuContext.ipcContext.codecShm);
     
              } else {
                  /* Slave core open barrier and shared regions */
                  siu_ipc_osal_barrier_open(CORE_TEAM_MAPPING[0]);
                  siu_ipc_osal_open_share((void**)&siuContext.ipcContext.codecShm);
              }
            first_time = 0;
          }
          /* Open gate */
          siu_ipc_osal_open_gate();
        }
     
        ret_val =  (tlong)xdm0p9_vid_enc_create(taskCxt,
                                                "H264ENC",
                                                codecStaticParams,
                                                codecDynamicParams,
                                                siuVctRunParams->nCores);
       while ( ret_val != 0 );         

    //ret_val =  xdm1p0_vid_dec_create(&mediaTaskCxt[1], "H264DEC", NULL, NULL);   // input params are internally coded.. no issues with static/dynamic params..
    //while ( ret_val != 0 );         
     
      } ///////////////////////////////////////////////////////

  • Hi Kali,

    The multi-chip configuration of H264BP encoder should not cause the failure of creating the second codec instance. As JPEG2000 transcode demo, it is possible to create both H.264BP decoder and encoder instances. The failure you saw can be due to insufficient memory allocation. One way to debug this is hooking up code composer and step into "xdm1p0_vid_dec_create()" to root cause. To ease the debugging via CCS, the optimization flags can be temporarily turned off by commenting out the following line in dsp\mkrel\c64x\makedefs.mk:

    C_OPT_FLAGS  = -mt -mw -os -o3 --optimize_with_debug --> #C_OPT_FLAGS  = -mt -mw -os -o3 --optimize_with_debug

    After this change, delete all the .oc files under DSP directory and then rebuild sv04.

    Thanks,

    Hongmei


  • Hi Hongmei,

    I have successfully created both H264 BP encoder and decoder instances on DSPC8681E. Now I am planning to transfer data from encoder output to decoder or decoder output to encoder. I have enabled both encoder and decoder process as shown in the following source code. The processing of decoder or encoder success, if we call only encoder or decoder. If we call both encoder and decoder, only encoder processing success and decoder fails to decode.

    Both encoder and decoder instances are created successfully as shown in the earlier email, please could you suggest why H264 BP decoder process fails to decode when encoder process is called?


    Thanks and Regards,
    Jagadeesh K

    //////////////H264 BP Encoder and Decoder processing/////////////////////////
       else if ( strcmp((const char *)siuVctRunParams->codecName,"H264ENC") == 0)
      {
    tlong ret_val = 0,

            ret_val = xdm0p9_vid_enc_process (taskCxt, (XDAS_Int8 *)rxMsgProcess->inBufPtr[0], (XDAS_Int8 *)rxMsgProcess->outBufPtr[0]);   
        siu_osal_wbinv_cache((tword *)rxMsgProcess->outBufPtr[0], ret_val, TRUE);
        siu_osal_inv_cache((tword *)rxMsgProcess->inBufPtr[0], 152064, TRUE);    // the input/output video format is CIF YUV420P(352x288 res)
    ret_val = xdm1p0_vid_dec_process(&mediaTaskCxt[1], (XDAS_Int8 *)rxMsgProcess->outBufPtr[0], ret_val, (XDAS_Int8 *)rxMsgProcess->inBufPtr[0]);
    siu_osal_wbinv_cache((tword *)rxMsgProcess->inBufPtr[0], ret_val , TRUE);
     

        siu_osal_inv_cache((void *)rxMsgProcess->inBufPtr[0], ret_val, TRUE);
        memcpy((tword *)rxMsgProcess->outBufPtr[0], (tword *)rxMsgProcess->inBufPtr[0], ret_val);
        siu_osal_wbinv_cache((void *)rxMsgProcess->outBufPtr[0], ret_val, TRUE);
        txMsgProcess->outBufSize[0] = ret_val;
        txMsgProcess->outBufPtr[0] = rxMsgProcess->outBufPtr[0];
        txMsgProcess->freeBufID[0] =
        txMsgProcess->outputId[0] = rxMsgProcess->inputId;
    }
    ///////////////////////////////////////////////////

  • Hi Hongmei,

    Resolved all issues related to H264 BP Encoder and Decoder processing on DSPC8681E, both instances are running successfully.

    Please can you let us know when will be the next version MCSDK VIDEO SDK release?

    Thanks,

    Kali

  • Hi Kali,

    Sorry for my late response. I was OoO the last two days. 

    Glad to know that you have both codecs running successfully. What fix did you make to get the decoder work? One thing I can think of is adding decoder output buffer manager. 

    The next version of MCSDK Video is planned by end of January. 

    Thanks,

    Hongmei

  • Hi Hongmei,

    To make decoder work, I have made changes for output  buffer management and memory allocation for decoder instances creation.

    Now I am planning to create multiple H264 encoder/decoder instances for video processing on DSPC8681E. Let you know the update once I started the development process.

    Thank you for all your support,

    Kali

  • Hi Hongmei,

    I have integrated DSP version (DSPC8681E - SV04 framework - data transfer using PCIe CMEM) H264 transcoder(Decoder/Encoder) in my application and now we are testing it. Now we are facing some performance issues related to processing number of frames per seconds.

    We have tested both H264 BP Encoder / Decoder separately and find the H264 BP decoder is not able to process bit-streams to achieve real-time application development. The following are test observations...

    For our testing we are using CIF(352 x 288) and SD(720 x 576) resolutions.

    The following calculations are for CIF resolution only. Both are calculated separately (using one codec at one time only).

    • When we run H264 BP encoder, it process(encodes) 15500 frames in less than 110 seconds (approx 145 frames-per second).

    • When we try to decode(H264 BP) the same bit-stream (above 15500 encoded frames) with H264 BP decoder, it takes more than 500 seconds - average of 30 frames per seconds.

    If we go for SD(720x576) resolutions the processing of decoder including encoder is very slow and we are not able to achieve for real-time bit-stream processing(both encode and decode at 30fps)...

    I know the encoder is designed and developed to work on multi-core and multi-chip, but decoder is not like that. Is there is any way we can get better decoder performance(bit-stream processing)?

    Is it possible to run H264 BP decoder on multi-core(2 cores or 4 cores)?

    we are using the following version H264 BP Decoder/Encoder libs for our application development, Is this version built with release mode or debug mode?

    http://software-dl.ti.com/dsps/dsps_public_sw/codecs/C6678_Video_Codecs/01_00_001//exports/C66x_h264vdec_02_04_00_03_ELF.bin

    http://software-dl.ti.com/dsps/dsps_public_sw/codecs/C6678_Video_Codecs/01_00_001//exports/C66x_h264venc_01_24_00_01_ELF.bin

    And please can you let us know when we can expect the next version MCSDK VIDEO and video codecs release?

    Thanks and Regards,

    Kali

  • Hi Kali,

    The latest C6678 video codecs can be found at http://software-dl.ti.com/dsps/dsps_public_sw/codecs/C6678/index.html. Performance data of individual codecs can be found from the corresponding datasheets. All the codec libs (including those from the previous version you are using) are compiled with release mode.

    H264BP decoder is a single-core codec. Is it possible for you to use H264HP decoder which supports 2-core decoding for BP/MP/HP streams?

    As for the performance, we are expecting that decoder runs faster than encoder. With CPU frequency of 1250MHz, it is expected that D1 transcoding (decoding+encoding) @30fps can be done on a single core. What bit rate are you using during the performance evaluation? Where do you place your program memory? What is your cache settings? What build profiling is used when compiling your .cfg file?

    We are in the final stage of releasing MCSDK Video. We will let you know once the GA release is available on Web.

    Thanks,

    Hongmei

  • Hi Hongmei,
    Thanks for sharing the new libraries...
    I will try to integrate new H264 HP decoder and let you know. We always use CPU running at 1250Mz frequency for our testing.

    Please find the following inline answers to your questions...

    What bit rate are you using during the performance evaluation?
    >>4Mbps for encoding source YUV, for transcoder it starts from 3Mbps to 250Kbps (any one bit-rate). attached is BP encoder configuration file. At the end of the e-mail trail please find the BP decoder configuration params...

    Where do you place your program memory?
    >>The program memory for both encoder and decoder are placed in MSMC only.

    What is your cache settings?
    What build profiling is used when compiling your .cfg file?
    >>We are using the default (TI's web release MCSDK-VIDEO 02_01_00_03) chace settings and build profiling used to build DSP (SV04) image.

    Please find attached SV04 configuration files…


    // /ti/xdctools_3_22_04_46/xs xdc.tools.configuro -r release -o ../../mkrel/c64x/bioscfg -b ../../ggcfg/build/hdg/sv04/bios/config.bld -c /ti/TI_CGT_C6000_7.2.4 ../../ggcfg/build/hdg/sv04/bios/bios6.cfg


    H264 BP Decoder Dynamic parameters….
    /////////////////////////////////////////////////////////////////////////////
    /*================================ */
    /* TestApp_SetDynamicParams        */
    /*  setting of run time parameters */
    XDAS_Void TestApp_SetDynamicParams(IVIDDEC2_DynamicParams *dynamicParams) {

        IH264VDEC_DynamicParams *extParams =
                                    (IH264VDEC_DynamicParams *)dynamicParams;

       /* Set IVIDDEC Run time parameters */
        dynamicParams->size = sizeof(IH264VDEC_DynamicParams);

        dynamicParams->decodeHeader  = XDM_DECODE_AU; 

     /*-----------------------------------------------------
      * If Display width is set to 0, decoder takes value
      * of image width as Display width. Any non-zero value
      * will be honoured by the decoder.
      * Note: Displaywidth needs to be an even number.
      *       If display width is zero and cropping is
      *       specified in the stream, then cropped
      *      image width is taken as display width
        -----------------------------------------------------*/
      dynamicParams->displayWidth  = 0;  /*Supported: Set to default value. */


      dynamicParams->frameSkipMode = IVIDEO_NO_SKIP;
                                         /*Set to default value of no skipping */

      dynamicParams->frameOrder    = IVIDDEC2_DISPLAY_ORDER;

      dynamicParams->newFrameFlag  = XDAS_FALSE;

      dynamicParams->mbDataFlag    = XDAS_FALSE;

     // extParams->frameType  = IVIDEO_NA_FRAME;

      /*---------------------------------------------------------------------*/
        /* MB_ERROR_STAT                                                     */
      /* The global buffer size is passed to algorithm,                      */
      /* The error status for each MB of the frame is copied to the buffer,  */
      /* returned as output bufDesc[3]    .                                  */
      /* Init the flag with TRUE to get these datas                          */
      /*---------------------------------------------------------------------*/
        extParams->mbErrorBufFlag  = FALSE;    /* ; TRUE; */
        //extParams->mbErrorBufSize  = ((352 * 288)/256);

      /*---------------------------------------------------------------------*/
      /* Collecting the status of SEI and VUI structure and return the       */
      /* pointer through output bufDesc[4]                                   */
      /* Init the flag with TRUE to get these datas                          */
      /*---------------------------------------------------------------------*/
        extParams->Sei_Vui_parse_flag = FALSE;

      /* Should be set to zero for byte stream format */
        extParams->numNALunits = 0;

      /* For handling B-frames, maxDisplay delay of 1 is mandatory.
         Due to frame reordering allowed in H264 standard, some streams
         might require a max delay of 5 - 16 (depending on the image size).
         However for most of the used case scenarios delay if 1 suffices.
         Note: if maxDisplaydelay is set to "N", then OUTPUT_BUFFER_CNT should
         be "N+1"
      */
        extParams->maxDisplayDelay = 1;

        return;
    }
    /* nothing @ this point */

    1602.codecParams.cfg

    1817.sv04.tar.gz

  • Hi Kali,

    Thanks for the files and all the details. They look fine to me.

    How do you measure the FPS performance? From the host or inside the DSP? Can you please use TSCL around the codec process call to measure the cycles taken by encoding or decoding itself. For example, (TSCL_End - TSCL_Begin) in the code below indicates the number of cycles taken by the process call. Then, we can compare this performance number with your earlier results.

    TSCL_Begin = TSCL;

    /* Basic Algorithm process() call */
    retVal = ividEncfxns->process((IVIDENC2_Handle)cxt->algHandle,
    (IVIDEO2_BufDesc *)&inputBufDesc,
    (XDM2_BufDesc *)&outputBufDesc,
    (IVIDENC2_InArgs *)inArgs,
    (IVIDENC2_OutArgs *)outArgs);

    TSCL_End = TSCL;

    Thanks,

    Hongmei

  • Hi Hongmei,
    
    I'm on Vacation last week, and now working at host side application development only.
    
    Thanks for the suggestion, and sorry for the delay response.
    
    Regarding H264 BP MP Decoder performance measurement: All calculations are calculated at host only, actually for low bit-rates and resolutions we do not see any issues. we have observed processing of high resolutions(VGA,D1) and higher bit-rates(>1Mbps) are slow. For a real-time transcoder application development we are expecting to process (Decode+Encode) at more than 40fps to achive real-time performance of 27fps video processing(including all jitters).
    
    
    We have integrated new H264 BP MP decoder straight away, but not H264 BP Encoder. We will integrate new encoder later and let you know.
    
    
    Thanks and Regards,
    Kali,
    

  • Hi Kali,

    Please find MCSDK Video 2.1 GA release @ http://software-dl.ti.com/sdoemb/sdoemb_public_sw/mcsdk_video/latest/index_FDS.html

    Thanks,

    Hongmei

  • Thanks  Hongmei,

    We will look into the new mcsdk video and let you know, if we face any issues..

    Please can you let us know, TI has any plans to deliver SVC encoder/decoder (Scalable Video Coding - H264 Annex G) on C6678?

    Once again thanks for sharing the new sdk.

    Thanks and Regards,

    Kali

  • Hi Kali,

    You are more than welcome.

    As for the SVC encoder/decoder, we currently have no plans for C6678.

    Thanks,

    Hongmei