This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

HEVC encoder questions

Hi,

I have the following HEVC encoder questions that is not so easy to guess for me by analyzing MCSDK video demo:

1) Please explain IVIDMC3_t.user_id for HEVC encoder (in multi-DSP scenario also).

Because according to http://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/p/362319/1287772.aspx#1287772 in H.264 HP decoder this parameter should be set to DNUM but if I do so for HEVC encoder it hangs.

2) Can I run multi-DSP HEVC encoder on 4 cores of DSP#0 and on 8 cores of DSP#1? What are the limitations?

3) What parameters and with what values should be provided to slave and local master process calls in case of HEVC encoder?

For example for H.264 HP encoder same input buffer as on master and output args should be provided on slave:

result = encoderFunctions->process(encoderHandle, NULL, &encoderComponentSharedData->outBufferDescriptor, NULL, outEncoderArguments);

For H.264 HP decoder only output args should be provided on slave:

result = decoderFunctions->process(decoderHandle, NULL, NULL, NULL, outDecoderArguments);

4) Should buffer with YUV data be present on master DSP only or it should be provided on slave DSPs also?

Regards,

Andrey Lisnevich

  • Hi Andrey,

        Please find the answer to respective Question numbers.

         1) IVIDMC3_t.user_id doesn't go with core id.                                                                 

                For Example : If Encoder is 2 chip.               

                               IVIDMC3_t.user_id will be {0,1,2,..14,15}      

        2) With single Chip, Any core id can start anywhere.   

               In Multi chip, Core number should be continues and should start with 0.          

                     Ex: In 2 chip, Encoder can run {0,1,...10,11}, But not with {4,5,...14,15}      

        3) NULL parameter can't be sent both Master and Slave Core.

        4) Yes, Input buffer needs to be sent for all chips(Master and slave).

    Regards

    Kuladeepak

  • Thanks Kuladeepak,

    Can you please answer additional questions:


    1) So in multichip I should use all 8 cores of first chip starting from core#0 and only then I can use cores of next chip also starting from core#0. And it is possible to run encoder on 10 cores: all 8 cores of first chip and first 2 cores of second chip. Am I right?

    2) Can I configure DSP#2 as first chip and DSP#0 as second chip on DSPC-868x cards?


    3) About output buffer in multichip scenario: How should it be constructed to work on multiple chips? Should it be allocated in shared memory available to all the chips?

    4) Is all of the above true for Full HD (i.e. 2 or 3 DSPs) and 4k (i.e. about 8 DSPs) real-time transcoding scenarios?

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    Please find the answer to respective question.

       1) You understanding is correct. Yes, It is possible to encode in 10 core.

       2) Yes we can config DSP#2 as first chip and DSP#0 as second chip on DSPC-868x cards.

       3) Yes, It should be allocated in shared memory available to all the chips

       4) All of the above is true for 1080p. Has we have memory issue for 4k, It will be supported in upcoming release.

    Regards

    Kuladeepak

  • Hi Kuladeepak,

    I have question regarding input buffer.

    It should be shared between all the instances of multi-DSP encoder. What is the preferred way to share the input buffer: put it into shared memory between DSPs or copy to DDR3 memory of each DSP?

    What way do you use in MCSDK Video Demo?

    Regards,

    Andrey  Lisnevich

  • Hi Andrey,

    Instead of keeping YUV in shared buffer, it is better to replicate input YUV in all DSPs. When it is kept in shared buffer there will be access latency when compared with direct DDR access.

    In MCSDK input is replicated in all DSP DDRs.

    Regards

    Rama

  • Hi,

    I have few more questions regarding HEVC encoder running on multiple DSPs.

    1) How should I allocate memory on user=0 side and user=8 side?

    user0 log:
    shmem name=shared_mem_CABAC_Context01 user_id=0 num_users=2 user_ids=0,8 type=DDR_CACHED size=1600 alignment=128

    user 8 log:
    shmem name=shared_mem_CABAC_Context01 user_id=8 num_users=2 user_ids=0,8 type=DDR_CACHED size=1600 alignment=128

    This is memory shared between DSPs. But to what user this memory is local and to what user it is remote?

    2) What are RMT_* memory types?

    I understand only RMT_UNCACHED_NO_LOC:
    shmem name=shared_NAL_info_rmt user_id=0 num_users=3 user_ids=0,8,16 type=RMT_UNCACHED_NO_LOC size=3504 alignment=128

    It is memory shared between all 3 cores. Not cached. I allocate it on HOST and map to all 3 DSPs.

    Particularly I do not understand why this memory is RMT:
    shmem name=shared_mem_STM00 user_id=0 num_users=8 user_ids=0,1,2,3,4,5,6,7 type=RMT_UNCACHED_LOC_SL2 size=5668 alignment=128
    It is not remote and used only on core 0 actually.

    3) Please review log (encoder running on 3 DSPs):

    https://drive.google.com/file/d/0Byw88ezNrM71VHh0LUxxdzg2Ulk/view?usp=sharing

    Encoder loops infinitely (hangs) acquiring lock doing nothing after sending mailbox messages on all cores. What is encoder waiting?

    In this demo I implemented all local memories and RMT_UNCACHED_NO_LOC memory. Other cross DSP memory types are not implemented. But as I see all of them are cachable DDR_CACHED and encoder does not write back or invalidate them so they are not used to communicate between cores before this hang.

    In this demo Mailbox works between DSPs without interrupts, just by using shared HOST memory. Is it correct implementation? Does encoder hanged waiting for some interrupt to read mailbox messages?

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    Please find the answers below:

    1.

    Andrey Lisnevich said:

     But to what user this memory is local and to what user it is remote?

    It can be local to chip/user 0 and remote to chip 1 / user 8 and vice versa. (Both memories are PCIe/hyperlink mapped interchip memory)

    2. Uncahed SL2 memory doesn't have a separate macro. So it is allocated under RMT_UNCACHED_LOC_SL2. (It is uncached local SL2 memory, not remote)

    3. All inter-DSP memories are important for the functioning of the encoder. All memories allocated under DDR_CACHED are allocated in inter-chip memory, if the users are determined to be across the chip. If you have not implemented all the memories, please consider running tile based encoding. If you do not want a multi-tiled output, inter-chip memories and interrupt generation when the new mail arrival across the chip are essential. Other wise the codec will hang waiting for the mail.

    Thanks and Regards,
    Shashikantha