This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66x H264 HP 4-core decoder questions

Other Parts Discussed in Thread: TMS320C6678

Hi,

Our customer is trying to use shared engineering release and he has some question about callback functions, I was wondering if the codec team can take a look of his questions and help me to clarify them. Please see below questions:

1)      KeyCreate callback: Where could we find documentation of error flow in keyCreate callback?. In samples it is used "while(1);" for error flow but hanging core is not a variant for our application. is any other way to process error situations?

2)      Lock acquire:  from Brief description it is mentioned that If lock is successfully acquired it returns 1 and 0 if lock is already acquired. And it is non-blocking callback. However, in the sample application, which comes with the decoder, the lock looks like it is implemented in a different way, callback is blocking, it returns 0 if lock acquired and non-zero otherwise.

Version release codec: C66x_h264hpvdec_01_01_02_03_ELF

Thank you,

Paula

  • Hi Paula,

    First issue is not critical. Second issue - as I see documentation is wrong. It is spinlock that should be blocking. I implemented in similar way as in sample.

    Now I have another issue - algInit fails and returns -1. I don't know where is the problem. Basic configuration is the same that works with 01.01.01.04 decoder. Only multicore configuration is different: new IVIDMC3_t and coreTeam array.

    Can you point why initialization fails?

    Details:

    This is log when I try to initialize in single core mode on core#0:

    [2013-09-06 00:04:38] main() on DSP=1 CORE=0
    [2013-09-06 00:04:38] Creating decoder task for decoder #0
    [2013-09-06 00:04:38] Initializing master decoder 0
    [2013-09-06 00:04:38] width=704
    [2013-09-06 00:04:38] height=576
    [2013-09-06 00:04:38] user_id=0
    [2013-09-06 00:04:38] num_users=1
    [2013-09-06 00:04:38] task_ID=0
    [2013-09-06 00:04:38] core team:
    [2013-09-06 00:04:38] core #0
    [2013-09-06 00:04:38] keyCreate name='Lock buffer' userid=0 keyspace=2 numusers=1
    [2013-09-06 00:04:38] result=c043900
    [2013-09-06 00:04:38] lockAcquire userid=0 handle=c043900
    [2013-09-06 00:04:38] result=0
    [2013-09-06 00:04:38] keyCreate name='barrier 0 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043980
    [2013-09-06 00:04:38] keyCreate name='barrier 1 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043a00
    [2013-09-06 00:04:38] keyCreate name='barrier 2 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043a80
    [2013-09-06 00:04:38] keyCreate name='barrier 3 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043b00
    [2013-09-06 00:04:38] keyCreate name='barrier 4 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043b80
    [2013-09-06 00:04:38] keyCreate name='barrier 5 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043c00
    [2013-09-06 00:04:38] keyCreate name='barrier 6 of CoreGroup0' userid=0 keyspace=1 numusers=1
    [2013-09-06 00:04:38] result=c043c80
    [2013-09-06 00:04:38] keyCreate name='shmapSL2 handle0 of CG0' userid=0 keyspace=0 numusers=1
    [2013-09-06 00:04:38] result=c043d80
    [2013-09-06 00:04:38] keyCreate name='shmapSL2 handle1 of CG0' userid=0 keyspace=0 numusers=1
    [2013-09-06 00:04:38] result=c043e80
    [2013-09-06 00:04:38] keyCreate name='shmapDDR 2coreHandle of CG0' userid=0 keyspace=0 numusers=1
    [2013-09-06 00:04:38] result=c043f00
    [2013-09-06 00:04:38] keyCreate name='shmapDDR 4coreHandle' userid=0 keyspace=0 numusers=1
    [2013-09-06 00:04:38] result=c043f80
    [2013-09-06 00:04:38] keyCreate name='shmapDDR UncachedHandle' userid=0 keyspace=0 numusers=1
    [2013-09-06 00:04:38] result=c044000
    [2013-09-06 00:04:38] lockRelease userid=0 handle=c043900
    [2013-09-06 00:04:38] result=0
    [2013-09-06 00:04:38] barrWait userid=0 handle=c043b00
    [2013-09-06 00:04:38] barrWait userid=0 handle=c043b80
    [2013-09-06 00:04:38] barrWait userid=0 handle=c043c00
    [2013-09-06 00:04:38] barrWait userid=0 handle=c043c80
    [2013-09-06 00:04:38] algInit failed: -1
    [2013-09-06 00:04:38] Failed to create decoder

    As you see all callback calls are logged - no errors. But algInit fails. Reason unknown.

    Interesting situation when I try to initialize decoder on two cores: core#0 master hangs waiting barrier. core#1 slave just fails.

    Moreover userid parameter on core#1 for all the callbacks is wrong. Decoder sometimes passes task_ID as userid. Sometimes passes zero as userid.

    Logs for both cores:

    core#0:

    [2013-09-06 00:09:50] main() on DSP=1 CORE=0
    [2013-09-06 00:09:50] Creating decoder task for decoder #0
    [2013-09-06 00:09:50] Initializing master decoder 0
    [2013-09-06 00:09:50] width=704
    [2013-09-06 00:09:50] height=576
    [2013-09-06 00:09:50] user_id=0
    [2013-09-06 00:09:50] num_users=2
    [2013-09-06 00:09:50] task_ID=0
    [2013-09-06 00:09:50] core team:
    [2013-09-06 00:09:50] core #0
    [2013-09-06 00:09:50] core #1
    [2013-09-06 00:09:50] keyCreate name='Lock buffer' userid=0 keyspace=2 numusers=2
    [2013-09-06 00:09:50] result=c043900
    [2013-09-06 00:09:50] lockAcquire userid=0 handle=c043900
    [2013-09-06 00:09:50] result=0
    [2013-09-06 00:09:50] keyCreate name='barrier 0 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043980
    [2013-09-06 00:09:50] keyCreate name='barrier 1 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043a00
    [2013-09-06 00:09:50] keyCreate name='barrier 2 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043a80
    [2013-09-06 00:09:50] keyCreate name='barrier 3 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043b00
    [2013-09-06 00:09:50] keyCreate name='barrier 4 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043b80
    [2013-09-06 00:09:50] keyCreate name='barrier 5 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043c00
    [2013-09-06 00:09:50] keyCreate name='barrier 6 of CoreGroup0' userid=0 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043c80
    [2013-09-06 00:09:50] keyCreate name='shmapSL2 handle0 of CG0' userid=0 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043d80
    [2013-09-06 00:09:50] keyCreate name='shmapSL2 handle1 of CG0' userid=0 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043e80
    [2013-09-06 00:09:50] keyCreate name='shmapDDR 2coreHandle of CG0' userid=0 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043f00
    [2013-09-06 00:09:50] keyCreate name='shmapDDR 4coreHandle' userid=0 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043f80
    [2013-09-06 00:09:50] keyCreate name='shmapDDR UncachedHandle' userid=0 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c044000
    [2013-09-06 00:09:50] lockRelease userid=0 handle=c043900
    [2013-09-06 00:09:50] result=0
    [2013-09-06 00:09:50] shmMap userid=0 handle=c043d80
    [2013-09-06 00:09:50] result=c043d00
    [2013-09-06 00:09:50] shmMap userid=0 handle=c043e80
    [2013-09-06 00:09:50] result=c043e00
    [2013-09-06 00:09:50] shmMap userid=0 handle=c044000
    [2013-09-06 00:09:50] result=80fbaa00
    [2013-09-06 00:09:50] barrWait userid=0 handle=c043b00

    core#1:

    [2013-09-06 00:09:50] main() on DSP=1 CORE=1
    [2013-09-06 00:09:50] Creating decoder task for decoder #0
    [2013-09-06 00:09:50] Initializing slave decoder 0
    [2013-09-06 00:09:50] width=704
    [2013-09-06 00:09:50] height=576
    [2013-09-06 00:09:50] user_id=1
    [2013-09-06 00:09:50] num_users=2
    [2013-09-06 00:09:50] task_ID=2
    [2013-09-06 00:09:50] core team:
    [2013-09-06 00:09:50] core #0
    [2013-09-06 00:09:50] core #1
    [2013-09-06 00:09:50] keyCreate name='Lock buffer' userid=0 keyspace=2 numusers=2
    [2013-09-06 00:09:50] result=c043900
    [2013-09-06 00:09:50] lockAcquire userid=0 handle=c043900
    [2013-09-06 00:09:50] result=0
    [2013-09-06 00:09:50] keyCreate name='barrier 0 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043980
    [2013-09-06 00:09:50] keyCreate name='barrier 1 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043a00
    [2013-09-06 00:09:50] keyCreate name='barrier 2 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043a80
    [2013-09-06 00:09:50] keyCreate name='barrier 3 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043b00
    [2013-09-06 00:09:50] keyCreate name='barrier 4 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043b80
    [2013-09-06 00:09:50] keyCreate name='barrier 5 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043c00
    [2013-09-06 00:09:50] keyCreate name='barrier 6 of CoreGroup0' userid=2 keyspace=1 numusers=2
    [2013-09-06 00:09:50] result=c043c80
    [2013-09-06 00:09:50] keyCreate name='shmapSL2 handle0 of CG0' userid=2 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043d80
    [2013-09-06 00:09:50] keyCreate name='shmapSL2 handle1 of CG0' userid=2 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043e80
    [2013-09-06 00:09:50] keyCreate name='shmapDDR 2coreHandle of CG0' userid=2 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043f00
    [2013-09-06 00:09:50] keyCreate name='shmapDDR 4coreHandle' userid=2 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c043f80
    [2013-09-06 00:09:50] keyCreate name='shmapDDR UncachedHandle' userid=2 keyspace=0 numusers=2
    [2013-09-06 00:09:50] result=c044000
    [2013-09-06 00:09:50] lockRelease userid=0 handle=c043900
    [2013-09-06 00:09:50] result=0
    [2013-09-06 00:09:50] algInit failed: -1
    [2013-09-06 00:09:50] Failed to create decoder

    Regards,

    Andrey Lisnevich

  • Demo to reproduce the failure is attached. It is almost the same as in http://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/t/277030.aspx with minimal changes to support new multicore API. Use detailed description and instructions from that post to run the project (runs on first 3 cores).

    Configuration is the same as in parent project, callbacks do not return any errors (see log above) but initialization fails (algInit returns -1).

    As you see with old decoder almost the same code works without API errors.

    testh264dec.zip
  • Andrey, MemTab memory requirements changed from 2-core to 4-core decoder. Now 4-core implementation has 31 memTabs (compared with 23 on 2-core). I think the error is related with memTabs allocation. Comparing alg_malloc.c (tesApplication) from 2-core to 4-core I see now we have different externalDataMemory sections per core. And that might be the reason I am getting "Resource conflict exception", at Algorithm_create() function, when running your project. If you have a chance could you please take a look at _ALG_allocMemory function in the TestApplication and let me know?

    Thank you,

    Paula

  • Hi Paula,

    I saw that 4 cores decoder requires 31 memTabs an demo already supports up to 32 memTabs. Also I do not see problems with memTabs allocations:

    [2013-09-10 10:24:44] memTab[0] space=17 attrs=1 size=5192 alignment=128
    [2013-09-10 10:24:44] address=80539100
    [2013-09-10 10:24:44] memTab[1] space=0 attrs=0 size=9032 alignment=128
    [2013-09-10 10:24:44] address=800000
    [2013-09-10 10:24:44] memTab[2] space=0 attrs=0 size=62784 alignment=128
    [2013-09-10 10:24:44] address=802380
    [2013-09-10 10:24:44] memTab[3] space=0 attrs=0 size=13600 alignment=128
    [2013-09-10 10:24:44] address=811900
    [2013-09-10 10:24:44] memTab[4] space=17 attrs=1 size=8856 alignment=128
    [2013-09-10 10:24:44] address=8054d580
    [2013-09-10 10:24:44] memTab[5] space=17 attrs=1 size=176 alignment=128
    [2013-09-10 10:24:44] address=80562880
    [2013-09-10 10:24:44] memTab[6] space=17 attrs=1 size=24 alignment=256
    [2013-09-10 10:24:44] address=80562a00
    [2013-09-10 10:24:44] memTab[7] space=17 attrs=1 size=361152 alignment=256
    [2013-09-10 10:24:44] address=80562b00
    [2013-09-10 10:24:44] memTab[8] space=17 attrs=1 size=655360 alignment=256
    [2013-09-10 10:24:44] address=805bae00
    [2013-09-10 10:24:44] memTab[9] space=17 attrs=1 size=144 alignment=256
    [2013-09-10 10:24:44] address=8066de00
    [2013-09-10 10:24:44] memTab[10] space=17 attrs=1 size=144 alignment=256
    [2013-09-10 10:24:44] address=8066df00
    [2013-09-10 10:24:44] memTab[11] space=17 attrs=1 size=12672 alignment=256
    [2013-09-10 10:24:44] address=8066e000
    [2013-09-10 10:24:44] memTab[12] space=17 attrs=1 size=136 alignment=256
    [2013-09-10 10:24:44] address=80684200
    [2013-09-10 10:24:44] memTab[13] space=17 attrs=1 size=144 alignment=256
    [2013-09-10 10:24:44] address=80697300
    [2013-09-10 10:24:44] memTab[14] space=17 attrs=1 size=144 alignment=256
    [2013-09-10 10:24:44] address=80697400
    [2013-09-10 10:24:44] memTab[15] space=17 attrs=1 size=544 alignment=256
    [2013-09-10 10:24:44] address=80697500
    [2013-09-10 10:24:44] memTab[16] space=17 attrs=1 size=540 alignment=256
    [2013-09-10 10:24:44] address=80697800
    [2013-09-10 10:24:44] memTab[17] space=17 attrs=1 size=4420 alignment=256
    [2013-09-10 10:24:44] address=80697b00
    [2013-09-10 10:24:44] memTab[18] space=17 attrs=1 size=4420 alignment=256
    [2013-09-10 10:24:44] address=80698d00
    [2013-09-10 10:24:44] memTab[19] space=17 attrs=1 size=4420 alignment=256
    [2013-09-10 10:24:44] address=80699f00
    [2013-09-10 10:24:44] memTab[20] space=17 attrs=1 size=3763584 alignment=256
    [2013-09-10 10:24:44] address=8069b100
    [2013-09-10 10:24:44] memTab[21] space=17 attrs=1 size=8840 alignment=256
    [2013-09-10 10:24:44] address=80a31f00
    [2013-09-10 10:24:44] memTab[22] space=17 attrs=1 size=8840 alignment=256
    [2013-09-10 10:24:44] address=80a34200
    [2013-09-10 10:24:44] memTab[23] space=17 attrs=1 size=8840 alignment=256
    [2013-09-10 10:24:44] address=80a36500
    [2013-09-10 10:24:44] memTab[24] space=17 attrs=1 size=264 alignment=256
    [2013-09-10 10:24:44] address=80a38800
    [2013-09-10 10:24:44] memTab[25] space=17 attrs=1 size=74880 alignment=256
    [2013-09-10 10:24:44] address=80a38a00
    [2013-09-10 10:24:44] memTab[26] space=17 attrs=1 size=110592 alignment=256
    [2013-09-10 10:24:44] address=80a4af00
    [2013-09-10 10:24:44] memTab[27] space=17 attrs=1 size=1584 alignment=256
    [2013-09-10 10:24:44] address=80a65f00
    [2013-09-10 10:24:44] memTab[28] space=17 attrs=1 size=116 alignment=256
    [2013-09-10 10:24:44] address=80a66600
    [2013-09-10 10:24:44] memTab[29] space=17 attrs=1 size=2168 alignment=256
    [2013-09-10 10:24:44] address=80a66700
    [2013-09-10 10:24:44] memTab[30] space=17 attrs=1 size=412 alignment=256
    [2013-09-10 10:24:44] address=80a67000
    

    Currently all DDR3 memory is cached. Can you give more details why resource conflict exception is raised?

  • Moreover other algoritms (H.264 encoder, MPEG-2 decoder, 2x H.264 decoder) use the same memTab allocation routines without any issues.

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    Internal memory  requirements are increased(from 0x18000 to 0x2D000) in 4-core implementation.In this case algInit failing as there is no sufficient internal memory allocated. Please refer alg_malloc.c for required internal memory size.

    Also, Please note the following 

    1.Make sure all function pointers in mcViddecParams are initialized properly.

    2.Set task_ID in mcViddecParams to IVIDMC3_TASK_MASTER for master core and to IVIDMC3_TASK_LOCMASTER for slave core.Please refer TestAppDecoder.c.

    Thanks,

    Praveen.

     

     

  • Hi Praveen,

    As far as I know all the allocations are done before algInit. algInit should not allocate memory directly at ll. memTab allocations are done without any errors after algAlloc, before algInit.

    More over I don't see failures in memTabs or in shmmap allocations. Please explain more detailed what memory allocations fail.

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    We Could able to locate the issue, in the library the internal memory is increased only for HD and fullHD resolutions, Ideally it should be done for all resolutions.This engineering library is only tested for resolutions 720p and above.We will fix this issue in the next release.

    As a work around we recommend you to allocate memory for HD/fullHD resolutions. Modify the below lines in Configuration.c file in your set up.

        decoderConfiguration->inputWidth = 1280;//320
        decoderConfiguration->inputHeight = 720;//180

    Thanks,

    Praveen.

  • Hi Praveen,

    I did all the items you described:

    1) maxWidth is not less than 1280, maxHeight is not less than 720

    2) task_ID in mcViddecParams is set to IVIDMC3_TASK_MASTER for master core and to IVIDMC3_TASK_LOCMASTER for slave core

    3) All pointers of mcViddecParams initialized properly (except mailbox callbacks)

    4) DDR3 uncached heap added and used when required

    And now somewhere in master decoder's algInit it resets core #0 (i.e. again executes main()).

    I do not see obvious cause of this reset in my code. Can you please check why it happens?

    The project with the modifications is attached.

    Regards,

    Andrey Lisnevich

    testh264dec.zip
  • Hi Andrey,

    We Could able to replicate the issue, when we are running freely. However when we try to step through the code, the issue is not seen.

    We are debugging this issue, will update you as we make progress.

    Thanks,

    Praveen.

  • Hi Andrey,

    We Identified and fixed the issue in the library and attached the updated library.Can please integrate this new library and Verify ?

    Thanks,

    Praveen

     

    H264_4Core_Lib.zip
  • Hi Praveen,

    With new decoder and fixed memory bug (related to caching) in demo I can initialize decoder to run on one and on two cores. But algInit for slave decoders fails with -1 and master hangs when I try to run it on 4 cores. I configured decoder tasks as in test project that goes with decoder and do not see reason of the failure.

    Test project attached. You should run it on 5 cores (4 cores for decoder and one for encoder)

    Regards,

    Andrey Lisnevich

    testh264dec.zip
  • Hi Praveen,

    Andrey start testing latest eng release C66x_h264hpvdec_01_01_02_04_ELF.zip and he found some issues  in buffer management that I would like to share with you for your expert opinion:

    Please consider the following log:

    [2013-11-15 13:28:07] Decoding

    [2013-11-15 13:28:07]

    [2013-11-15 13:28:07] new buffer 1

    [2013-11-15 13:28:07] decode inputID 1

    [2013-11-15 13:28:07] Decoder process error -1

    [2013-11-15 13:28:07] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:07] outBufsInUseFlag 1

    [2013-11-15 13:28:07]

    [2013-11-15 13:28:07] decode inputID 1

    [2013-11-15 13:28:07] Decoder process error -1

    [2013-11-15 13:28:07] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:07] outBufsInUseFlag 1

    [2013-11-15 13:28:07]

    [2013-11-15 13:28:07] decode inputID 1

    [2013-11-15 13:28:07] Decoder process error -1

    [2013-11-15 13:28:07] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:07] outBufsInUseFlag 1

    [2013-11-15 13:28:07]

    [2013-11-15 13:28:07] decode inputID 1

    [2013-11-15 13:28:07] Decoder process error -1

    [2013-11-15 13:28:07] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:07] outBufsInUseFlag 1

    [2013-11-15 13:28:07]

    [2013-11-15 13:28:07] decode inputID 1

    [2013-11-15 13:28:07] Decoder process error -1

    [2013-11-15 13:28:07] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:07] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 1

    [2013-11-15 13:28:08] Decoder process error -1

    [2013-11-15 13:28:08] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 1

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08] freeBufID 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 1

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] freeBufID 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] new buffer 2

    [2013-11-15 13:28:08] decode inputID 2

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 2

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] freeBufID 2

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] new buffer 3

    [2013-11-15 13:28:08] decode inputID 3

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 3

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] freeBufID 3

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] new buffer 4

    [2013-11-15 13:28:08] decode inputID 4

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 4

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] freeBufID 4

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] new buffer 5

    [2013-11-15 13:28:08] decode inputID 5

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 5

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] freeBufID 5

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] new buffer 6

    [2013-11-15 13:28:08] decode inputID 6

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 6

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] outputID 3

    Processing stops here because of inconsistent buffer management

    "Extended error 0x1095 - non-fatal error " means: Decoding is skipping NAL units till a valid sync point is found. So it is normal when you start decoding live H.264 stream.

    Then this two iterations looks inconsistent:

    [2013-11-15 13:28:08] decode inputID 1

    [2013-11-15 13:28:08] Decoder process error -1

    [2013-11-15 13:28:08] Extended error 0x1095 - non-fatal error

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08]

    [2013-11-15 13:28:08] decode inputID 1

    [2013-11-15 13:28:08] outBufsInUseFlag 1

    [2013-11-15 13:28:08] freeBufID 1

    In first iteration I provide buffer with ID 1 and after decoding I see that buffer is still in use (outBufsInUseFlag 1). It means that buffer should be used again for next iteration.

    In next iteration I provide bufferID 1 again as it was requested by decoder but result of the decode call is confusing: decoder says that buffer is still in use (outBufsInUseFlag 1) and at the same time decoder releases the buffer as not used (freeBufID 1)

    Another confusing fact:

    [2013-11-15 13:28:08] decode inputID 3

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] freeBufID 3

    ......

    [2013-11-15 13:28:08] decode inputID 6

    [2013-11-15 13:28:08] outBufsInUseFlag 0

    [2013-11-15 13:28:08] outputID 3

    Decoder provides as output outputID 3 - the buffer that was already released in previous iteration.

     

    At the same time previous version of H.264 decoder 1.1.1.4 handles this stream good:

    [2013-11-15 14:02:38] Decoding

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] new buffer 1

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:38]

    [2013-11-15 14:02:38] decode inputID 1

    [2013-11-15 14:02:38] Decoder process error -1

    [2013-11-15 14:02:38] Extended error 0x1095 - non-fatal error

    [2013-11-15 14:02:38] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 1

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 1

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 2

    [2013-11-15 14:02:39] decode inputID 2

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 2

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 3

    [2013-11-15 14:02:39] decode inputID 3

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 3

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 4

    [2013-11-15 14:02:39] decode inputID 4

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 4

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 5

    [2013-11-15 14:02:39] decode inputID 5

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 5

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 6

    [2013-11-15 14:02:39] decode inputID 6

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 6

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 3

    [2013-11-15 14:02:39] freeBufID 3

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 7

    [2013-11-15 14:02:39] decode inputID 7

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 7

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 2

    [2013-11-15 14:02:39] freeBufID 2

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 8

    [2013-11-15 14:02:39] decode inputID 8

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 8

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 4

    [2013-11-15 14:02:39] freeBufID 4

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 9

    [2013-11-15 14:02:39] decode inputID 9

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 9

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 1

    [2013-11-15 14:02:39] freeBufID 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 10

    [2013-11-15 14:02:39] decode inputID 10

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 10

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 7

    [2013-11-15 14:02:39] freeBufID 7

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 11

    [2013-11-15 14:02:39] decode inputID 11

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 11

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 6

    [2013-11-15 14:02:39] freeBufID 6

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] new buffer 12

    [2013-11-15 14:02:39] decode inputID 12

    [2013-11-15 14:02:39] outBufsInUseFlag 1

    [2013-11-15 14:02:39]

    [2013-11-15 14:02:39] decode inputID 12

    [2013-11-15 14:02:39] outBufsInUseFlag 0

    [2013-11-15 14:02:39] outputID 8

    [2013-11-15 14:02:39] freeBufID 8

    .....

     Thanks for your help,

    Paula

  • Paula,

     

    Does he running decoder with 4-Cores or 2Cores ? It is possible for him to share the set up to replicate the issue ?

     

    Thanks,

    Praveen.

     

  • Hi Praveen,

    It doesn't depend on number of cores. You can run on single core and get this result.

    Regards,

    Andrey Lisnevich

  • Hi Praveen,


    One more issue - I still can't initialize latest developer release of H.264 decoder in 4x cores mode. It just hangs on decoding of first frame.

    With totally the same code in 1 and 2 core mode I can decode and see correct YUV.

    To reproduce this issue just use the same demo project from testh264dec.zip two posts above (configured to use 4 cores for decoding)

    Regards,

    Andrey Lisnevich

  • Hi Andrey, our codec team checked your project and it seems there are some integration issues for using 4-core. New 4-core decoder splits cores in two groups (Group0, Group1) and each group consist of a master and a slave core (similar as previous 2-core decoder). Please see below comments and check points:

    - Please check TestAppDecoder.c lines 566-600. There you can see how barriers handles are created in order to sync master and slave core per group of cores

    - Please check TestAppDecoder.c lines 600-633. There you can see how to initialize params for both group of cores.

    - Additionally Alg_Create will happen first on core Group0 while core Group1 is waiting before the create. When Group0 finishes it triggers core Group1 for create. Please see lines 752 and 975 (start/stop masters sync)inTestAppDecoder.c

    - Inside the Do-while loop core Group0 and core Group1 won't synchronize.

    - Finally, as additional information, Interface with application for all input and output buffers, will be handled by Group0 master core and, internally, it will communicate with core Group1.

    Please see attached TestAppDecoder.c

    On the other hand, about reported buffer management issue, we have seen this happens particularly when the first few frames of the clip cannot be properly decoded (error code) following by correct bitstream. This issue was fixed for 1 and 2-core and currently we are fixing it for 4-cores.

    Please let us know if something is not clear or if you have further question on how to implement the 4-core decoder.

    thanks a lot,

    Paula

     6811.TestAppDecoder.c

  • Hi Paula,

    I follow all the instructions you gave and still can't do 4core decoding. Initialization goes good bud decoder hangs while first process call. You can see logs from 4 cores attached. From logs you can see configuration:

    core#0 taskID=MASTER user_id=0 num_users=4 coreTeam={0,1,2,3}

    core#1 taskID=LOCMASTER user_id=1 num_users=4 coreTeam={0,1,2,3}

    core#2 taskID=MASTER user_id=2 num_users=4 coreTeam={0,1,2,3}

    core#3 taskID=LOCMASTER user_id=3 num_users=4 coreTeam={0,1,2,3}

    Decoders on core#2 and core#3 initialize with 3 seconds delay and it means that group0 (core#0 core#1) is initialized first. Then group1 is initialized (core#2 core#3). You can see this from timestamps.

    You can see logs from key create, lock and barrier functions.

    From logs you can see that cores #0 and #1 execute process call and exit from it but cores #2 and #3 hang in process call waiting for lock/barrier.

    Can you answer few questions to clarify the situation:

    1) Main question: Why core#0 tries to acquire "Lock buffer" lock many times without releasing it during process call?

    2) Can you explain in sample how lock should work on multiple users?

    3) Why on core#1 where decoder is user_id is 1 I get user_id=0 for some operations?

    Why on core#2 where decoder user_id is 2 I get user_id=0 or user_id=1 and never user_id=2?

    Why on core#3 where decoder user_id is 3 I always get user_id=1?

    4) Why on core#2 and core#3 in key create function user_ids array parameter is {0,1} and not {2,3}

    5) Why decoder on core#1 tries to acquire lock with lock_id=0 while initialization? It causes ERROR DBL LOCK because at the same time decoder on core#0 acquires the lock with the same user_id=0.

    6) Why lock "Lock buffer" and two shared buffers "shmapDDR 4coreHandle", "shmapDDR UncachedHandle" are created with num_users=2 but are shared and used by all 4 decoders?

    Regards,

    Andrey Lisnevich

    cores_logs.zip
  • Hi Andrey,

    Please see my answers inline

    >> From logs you can see that cores #0 and #1 execute process call and exit from it but cores #2 and #3 hang in process call waiting for lock/barrier.

    [Praveen] From the logs we can see intialization is happening correct, and also first process call came out sucessfully, but we need two process calls on CoreGroup0(core#0 , core#1) to complete decoding, because the first process call is a dummy call it will just take input and output buffers and store them in share region for other CoreGroup1(core#2 , core#3) to access them, in the next process call CoreGroup0(core#0 , core#1) will trigger the CoreGroup1(core#2 , core#3) for decoding, So if we continue to run CoreGroup0(core#0 , core#1) with next process call then CoreGroup1(core#2 , core#3) will also come out, by that time two frames will be decoded.


    >>Can you answer few questions to clarify the situation:

    1) Main question: Why core#0 tries to acquire "Lock buffer" lock many times without releasing it during process call?

    [Praveen] There was a lock acquire in a for loop for which corresponding lock release is missing, fixed in the attached the new library.(Attached H264_D_4Core_New_lib.zip file)3618.H264_D_4Core_New_lib.zip

    2) Can you explain in sample how lock should work on multiple users?

    [Praveen] In multicore scenario, if one core acquires lock then other cores will wait for that lock to be released before they acquired.  

    3) Why on core#1 where decoder is user_id is 1 I get user_id=0 for some operations?
    Why on core#2 where decoder user_id is 2 I get user_id=0 or user_id=1 and never user_id=2?
    Why on core#3 where decoder user_id is 3 I always get user_id=1?

    [Praveen] In four core implementation, Four cores will be split in to two core groups CoreGroup0(core#0/Master , core#1/Salve) and CoreGroup1(core#2/Master , core#3/Slave).While creating barriers for on CG0 and CG1, we used user_id=0 for masters and user_id=1 for slaves on both the core groups for ease of implementation. So we will get user_id's 0 and 1 for barrier calls on both the core groups 

    4) Why on core#2 and core#3 in key create function user_ids array parameter is {0,1} and not {2,3}

    [Praveen] Answered in 3

    5) Why decoder on core#1 tries to acquire lock with lock_id=0 while initialization? It causes ERROR DBL LOCK because at the same time decoder on core#0 acquires the lock with the same user_id=0.

    [Praveen] Fixed in the new library, please check.

    6) Why lock "Lock buffer" and two shared buffers "shmapDDR 4coreHandle", "shmapDDR UncachedHandle" are created with num_users=2 but are shared and used by all 4 decoders?

    [Praveen] Even though these handles created with num_users=2, all the four core will get same handles because we used same name to create handles, we updated the code in the new library to use num_users as 4.

    Please let me know for any questions

    Thanks,

    Praveen.

  • Hi Andrey,

    Could able to decode using 4-Cores?

    We are planning for GA release of 4-Core HP decoder.Please let us know if you have any issues with 4-Core Decoder, So that we will try to wrap every thing into GA release.

    Thanks,

    Praveen

  • Hi Praveen,

    I still have problems with 4-core decoder. I did group0 and group1 independent like you described but decoding still decoding hangs.

    Do you have release of H.264 decoder with fixed buffer management issue? Then I can rule out buffer management problems and report issues I have to you.

    Regards,

    Andrey Lisnevich

  • Also can you give me short description for syncing memory and parameters for all group0/1 master/slave decoders before and after process call.

    For example in case of H.264 encoder I have to:

    Master core:
    1) Sync shared memories and mem tabs before process call

    2) wbInv input buffers, inv output buffer before process call

    3) wbInv output buffer after process call

    All slaves cores:

    1) Sync shared memories and mem tabs before process call

    2) Invalidate and pass same output buffer (as in master) to process call

    3) write back output buffer after process call

  • Hi Andrey,

    Attached Library in my previous post(on Dec 3rd)has the fix for buffer management issue, please note that we are not depending on "outBufsInUseFlag", instead for every process call on Main master(Group0 Master) will take a new buffer(using BUFFMGR_GetFreeBuffer()), if that buffer is not used in side process call(For example,For second field in interlaced case and for error streams) then Main master will release the buffer after the process call using "viddecOutArgs.freeBufID" updated inside the library, So The released buffer can be used for next process call.

    Synchronization In H264 4Core Decoder:
    1) On Group0:
    In side decode loop (starts at line no 986 and ends at line no 1531, in TestAppDecoder.c attached in Nov 22nd post) input and output buffer pointers are assigned on main master(Group0 Master), mean while Group0 Slave is waiting in Barrier(BarrWait at line no 1086), Once Group0 Master reaches that barrier both will enter in to process call.After the process call Group0 Master will do releasing and displaying buffers etc, Mean while Group0 Slave is waiting in Barrier(BarrWait at line no 1521),Once Group0 Master reaches that barrier both will continue with next process call.      

    2) On Group1:
    In side decode loop (starts at line no 986) Group1 Master directly come to Barrier(BarrWait at line no 1086), Once its slave(Group1 slave)reaches that barrier both will enter in to process call, inside process call Group1 master will communicate with Group0 Master. After the process call Group1 Master will wait in Barrier(BarrWait at line no 1521),Once Group1 Salve reaches that barrier both will continue with next process call.So on Group1 only two syncs will happen(Master with its slave) before and after the process call.On Group1 we are using "if(outArgs->viddecOutArgs.bytesConsumed < 0)" condition(In side the library, we are setting this variable to -1, if nothing is there to decode on this Group) to exit from decode loop after entire bit stream is decoded(see at line no 1501).

    Note that at any BarrWait at any point of time only two Cores(Either Group0 Master with its slave or Group1 Master with its slave)will sync and Group0 and Group1 will not sync at any time in the decode loop, And also every alternative process call on Group0 is dummy call,So we have double the number of process calls on Group0 then the Group1(For example if a bit stream has 10 frames to decode, then we will have 10 Process calls(5 dummy calls and 5 decode calls will decode 5 frames) on Group0(Master and Slave) and 5 Process calls(will decode 5 frames) on Group1(Master and Slave)).       

    Please Let us know for any questions

    Thanks,
    Praveen

  • Hi Praveen,

    Can you please comment the following 4 points:


    1) I still experience buffer management problems. Please see attached log from H.264 HP new encoder in single core mode.


    [D] - decoder output

    [BM] - buffer manager actions (based on decoder output)


    Main problem is that decoding stops because the decoder provides as outputID buffer 21:


    [2013-12-16 14:13:57] --------------------

    [2013-12-16 14:13:57] [BM] new buffer 33

    [2013-12-16 14:13:57] [D] inDecoderArguments->inputID=33

    [2013-12-16 14:13:57] [D] decoder->process()

    [2013-12-16 14:13:57] [D] inDecoderArguments->inputID=33

    [2013-12-16 14:13:57] [D] outputID=21

    [2013-12-16 14:13:57] [BM] reference buffer 21

    [2013-12-16 14:13:57] Failed reference buffer descriptor

    [2013-12-16 14:13:57] WARNING: Deactivate resources failed (3)

    [2013-12-16 14:13:57] Fatal master decoder task error. Terminating task.



    But this buffer was released few process calls before:


    [2013-12-16 14:13:56] --------------------

    [2013-12-16 14:13:56] [BM] new buffer 21

    [2013-12-16 14:13:56] [D] inDecoderArguments->inputID=21

    [2013-12-16 14:13:56] [D] decoder->process()

    [2013-12-16 14:13:56] [D] inDecoderArguments->inputID=20

    [2013-12-16 14:13:56] [D] Decoder process error -1

    [2013-12-16 14:13:56] [D] Extended error 0x1095 - non-fatal error

    [2013-12-16 14:13:56] [D] freeBufID=21

    [2013-12-16 14:13:56] [BM] release decoder buffer 21

    [2013-12-16 14:13:56] --------------------


    2) Another issue (probably related) is that process call changes inDecoderArguments->inputID that AFAIK should be readonly for it. You can see it in logs.


    Regarding outBufsInUseFlag I have 2 notes:


    3) “please note that we are not depending on "outBufsInUseFlag", instead for every process call on Main master(Group0 Master) will take a new buffer(using BUFFMGR_GetFreeBuffer())”


    I am using the same code (decoding loop) that works with different decoders. Other decoders depend on “outBufsInUseFlag”: MPEG-2 decoder, old H.264 HP decoder, etc. And you break this dependency in new H.264 decoder. It means that I need to do “special” routines for new H.264 decoder.


    Why not just always set outBufsInUseFlag=0 in case of new H.264 decoder - it means exactly the behaviour you described: “for every process call on Main master(Group0 Master) will take a new buffer”. And it will not break dependency on outBufsInUseFlag.


    4) “if that buffer is not used in side process call(For example,For second field in interlaced case and for error streams) then Main master will release the buffer after the process call using "viddecOutArgs.freeBufID"


    It means that in my code should allocate buffers that actually not needed and will be released after process call. This is actually means waisting CPU cycles for redundant allocations.


    Using outBufsInUse flag properly should help to solve this problem and do not do redundant allocations.


    Regards,

    Andrey Lisnevich

  • Hi Andery,

    We found that there an isssue with the way we update InArgs->inputID, ideally it should not be updated, we are currently fixing this issue.

    We are also changing the way "outBufsInUse" flag is updated inside library. For 1-core and 2-core decoder, we will keep the logic same as old HP Decoder, but for 4-Core we will always set outBufsInUseFlag=0, because in 4-Core at a time two output buffers will be in use.

    Can you please share the test vector you are using so that we can validate the library before we share it ?

    Thanks,
    Praveen 

  • Hi Praveen,

    I have no test vector. I do tests on live stream. The sample dump is attached. It is MPEG-TS. You can get from it elementary H.264 stream using ffmpeg, vlc or any other convertor.

    With this stream I always get buffer management errors I described above: decoder outputs buffer that was released few iterations before.


    Regards,

    Andrey Lisnevich

    stream.ts.zip
  • Elementary H.264 stream from .ts file above is attached.

    stream.264.zip
  • Hi Andery,

    With elementary stream shared, we expect for first few frames decoder will return with error codes, but decoder is not returning any error codes and it is decoding properly.

    Attached library has fixes for buffer management issues reported, Can you please check with this new library and let us know if you see any issues ?

    Note that with this new library, decoder will always return "outBufsInUseFlag" as 0 for 4-Core and for 1-Core and 2-Cores it will return same value as old HP Decoder.

    3125.H264_4Core_Library.zip

    Thanks,

    Praveen

       

  • Hi Praveen,

    With new version of decoder buffer management issue is fixed and I can do 1 and 2 core decoding. The only problem is that error is not propagated from master to slave:

    on Master:

    [2013-12-19 14:27:17] [D] Decoder process error -1
    [2013-12-19 14:27:17] [D] Extended error 0x1095 - non-fatal error

    on Slave:

    [D] Decoder process error -1
    [2013-12-19 14:27:17] Extended error 0x0 - non-fatal error

    In 4 core mode decoder hangs:

    Group0 (core#0 and #1) do 4 iterations and then hangs because Group0 master waits for something in infinite loop acquiring and releasing a lock.

    Group1 (core#2 and #3) hangs while first iteration because Group1 master exits process call but Group1 slave starts waiting for a barrier.

    You can see attached logs for details.

    Can you please help to resolve the problem?

    Please note:

    1) I am decoding live stream and for first few frames I usually get non-fatal decoder errors. The same stream decodes good in 1 and 2 core mode but hangs in 4 core mode.

    2) This is how I do process call on slave cores (i.e. #1 #2 and #3)

    result = decoderFunctions->process(decoderHandle, NULL, NULL, NULL, outDecoderArguments);

    Because I don't know how to provide correct input buffer, output buffer and InDecoderArguments for slaves.

    Regards,

    Andrey Lisnevich

    logs.zip
  • Hi Andrey,

    Thanks for conforming buffer management issue fix.

    Regarding the problem in 2-Core , i think even in old HP decoder also we are not updating "extendedError" flag in case of slave core in the library.

    The problem for four core hang is, i see from logs group1 master is simply coming from process call, that is because in case master core we are checking for valid input and output buffer pointers, but in 4-Core this check should be done only on group0 master, so now we fixed it and attached the library.

    Note that for salve process calls you can pass NULL for input buffer, output buffer and InDecoderArguments as you said.

    Can you please check with this new library and let us know if you see any issues ?

     2843.H264_4Core_20131219.zip

    Thanks,

    Praveen

  • Hi Praveen,

    Still decoding hangs. Now group0 master waits for something in infinite loop in 6th iteration and group1 master waits for something in infinite loop in 2nd iteration. While waiting they acquire and release the same lock.

    Logs attached.

    Regards,

    Andrey Lisnevich

    logs.zip
  • Hi Andery,

    We have updated the library, Can you please check with this new library attached ?

    8831.H264_4Core_Lib_20131223.zip

    Thanks,

    Praveen

  • Hi Praveen,


    Still not everything is Ok in 4-cores mode. On Group1 master I get no error after first call, but on Group1 slave I get an error (full logs attached):


    Goroup1 Master:

    ....

    [2013-12-23 21:20:05] lockRelease @0c043980 OK
    [2013-12-23 21:20:05] lockAcquire @0c043980
    [2013-12-23 21:20:05] lockAcquire @0c043980 OK
    [2013-12-23 21:20:05] lockRelease @0c043980
    [2013-12-23 21:20:05] lockRelease @0c043980 OK
    [2013-12-23 21:20:05] barrWait @0c043a5c begin
    [2013-12-23 21:20:05] barrWait @0c043a5c end
    [2013-12-23 21:20:05] process() end
    [2013-12-23 21:20:05] --------------------

    Group 1 Slave:

    [2013-12-23 21:20:05] --------------------
    [2013-12-23 21:20:05] process() begin
    [2013-12-23 21:20:05] barrWait @0c043a5c begin
    [2013-12-23 21:20:05] barrWait @0c043a5c end
    [2013-12-23 21:20:05] process() end
    [2013-12-23 21:20:05] Decoder process error -1
    [2013-12-23 21:20:05] Extended error 0x0 - non-fatal error
    [2013-12-23 21:20:05] --------------------


    My code expects at least synchronous error condition on master and slaves.

    Also regarding error handling on slave core extended error is always 0x0. So I don't know if the error fatal or not. In old 2-core decoder after synchronizing memtabs and shared memories I always get the same error code as on master as a result of XDM_GETSTATUS call.


    Regards,

    Andrey Lisnevich

    logs.zip
  • Hi Andrey,

    >>Still not everything is Ok in 4-cores mode. On Group1 master I get no error after first call, but on Group1 slave I get an error (full logs attached):

    [Praveen] We are not able to replicate this, if you get an error on slave then it should come on master also, because here we will get error first on master then that will be communicated to slave. Can you please check again and please share setup you are using to replicate the issue.

    >>My code expects at least synchronous error condition on master and slaves.

    [Praveen] Updated the library to get same error code on master and slaves.Please use this attached library.

    8664.H264_4Core_Lib_20131224.zip

    Please let us know for any questions

    Thanks,

    Praveen

  • Hi Praveen,

    Looks like this 8664 release of decoder corrupts my structures in 4-cores mode:

    [2013-12-24 17:56:48] ---------- Iteration 0
    [2013-12-24 17:56:48] New bufferID=1
    [2013-12-24 17:56:48] @8847eb80 1056000
    [2013-12-24 17:56:48] @88580880 264000
    [2013-12-24 17:56:48] @885c1000 264000
    [2013-12-24 17:56:48] process(inputID=1) begin
    [2013-12-24 17:56:48] barrWait @0c043994 begin
    [2013-12-24 17:56:48] barrWait @0c043994 end
    [2013-12-24 17:56:48] process(inputID=1) end
    [2013-12-24 17:56:48] @8847eb80 1056000
    [2013-12-24 17:56:48] @88580880 264000
    [2013-12-24 17:56:48] @885c1000 264000
    [2013-12-24 17:56:48] ---------- Iteration 1
    [2013-12-24 17:56:48] New bufferID=2
    [2013-12-24 17:56:48] @88601780 1056000
    [2013-12-24 17:56:48] @88703480 264000
    [2013-12-24 17:56:48] @88743c00 264000
    [2013-12-24 17:56:48] process(inputID=2) begin
    [2013-12-24 17:56:48] barrWait @0c043994 begin
    [2013-12-24 17:56:48] barrWait @0c043994 end
    [2013-12-24 17:56:48] process(inputID=2) end
    [2013-12-24 17:56:48] @8847eb80 202896144
    [2013-12-24 17:56:48] @0c2c0efc 8649516
    [2013-12-24 17:56:48] @00840cfc 17

    I print outDecoderBufferDescriptor before and after process call on core#0 (Group0 Master) and see that it is corrupted.


    The same code in 2 core mode has no this issue.

    Detailed log attached.


    Regards,

    Andrey Lisnevich

    logs.zip
  • Hi Andery,

    Thanks for your time to validate the 4-Core HP decoder library.

    Is it possible for you to share the test setup or project you are using ? that we will help us to validate the library before we share with you.

    Please use the attached library updated with the fix for outDecoderBuffferDescriptor corruption.

    6131.H264_4Core_Lib_20131226.zip

    Thanks,

    Praveen

     

  • Hi Praveen,

    With this new codec in 4 core mode decoding does not work:


    1) hangs in process call: infinite lock release/acquire on both group masters

    or

    2) infinitely generates errors by process call: Extended error 0x1095

    Logs of both problems attached.

    Regards,

    Andrey Lisnevich

    log.zip
  • Hi Andrey,

    We could not able to replicate the issue, but did some changes based on logs. Can you please check with the attached library.

    Also, please share a stand alone setup to replicate the issues.

    5875.H264_4Core_20131227.zip

    Thanks,

    Praveen

  • Hi Praveen,

    I will try to prepare a demo that demonstrates the issue.

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    Did you had a chance to prepare the demo ?

    Thanks,

    Praveen.

  • Hi Praveen,

    While creating demo I found and fixed problem with input buffer management in my code and now I got output from the decoder in 4-cores mode.

    But I faced another issue - for some reason decoder is much slower in 4-core mode than in 2-core mode. Can you please check attached logs. You can use timestamps to see difference in decoding speed. It is decoding of the same input by 2 cores and 4 cores decoder. In both modes all the code is the same, multicore API (locks, barriers, memory management) is also the same.

    Regards,

    Andrey Lisnevich

    logs.zip
  • Hi Andrey,

    With our test application using 4-cores, we are observing performance improvement of at least 40% compared to that of 2-Core implementation.

    We looked at the logs and wondering that dsp is spending too much of time for processing.It could be because of some synchronization issue between core groups.Can you please share the demo setup for our verification ?

    Thanks,

    Praveen. 

  • Hi Praveen,

    You can find demo attached. It runs on 8 cores but actually uses first 4 cores for decoding.

    Demo reads H.264 input from 0x90000000 address and outputs decoded YUV to 0x98000000.

    Sample input Full HD input that I use for tests: https://drive.google.com/file/d/0Byw88ezNrM71LUdjN1VPWUZqS1U/edit?usp=sharing


    In h264dec_test/logs directory you can find logs for 2 cores and 4 cores modes. In 4 cores mode decoding (decode ticks 2056287491) is much slower that in 2 cores mode (decode ticks 65375317)


    To enable 2 cores mode change the following line in Configuration.c:

    #define NUM_DECODER_CORES (4)

    to

    #define NUM_DECODER_CORES (2)

    Feel free to ask any questions.

    Regards,

    Andrey Lisnevich

    h264dec_test.zip
  • Hi Andrey,

    We could able to replicate the issue at our end, however the project files attached in the setup are giving compilation errors(This project was created for a device-variant that is not currently recognized: TMS320C66XX.TMS320C6678.), so copied the project files from the setup that you previously shared on 7th Oct 2013.

    We are observing that decoder is spending more time in process call in 4-Core mode than the 2-core mode, we are looking into this issue and we will revert back with an update shortly.

    Thanks,

    Praveen 

  • Hi Andrey,

    We could able to locate the problem, the function TranscodeComponent_shmSync() is consuming large cycles because the "shmem_size" is in the order of Megabytes in 4-core implementation, So Cache_wbInv/Cache_inv functions will consume large cycles to do cache trash. So we have used "Cache_wbInvAll()" to do cache trash which will consume less cycles.

    Please refer to code b/w line no 764 and 784 in the h264vdec_ti_VidMc.c file( attached for your reference) to see how we implemented "ShmSync" function.

    8015.h264vdec_ti_VidMc.c

    After making this change in your setup the performance of 4-Core implementation is as expected.

    Please let us know for any questions


    Thanks,

    Praveen

  • Hi,

    I have another problem. H.264 decoder in 4cores mode fails to decode stream. But it is decodes well in 2x cores mode. The input is totally the same for both decoders.

    You can find 2x and 4x logs attached.

    It happens when I input data for decoders frame by frame. In both modes firs goes set of 0x1095 errors:

    0x95 H264D_ERR_SEM_SLCHDR_WAIT_SYNC_POINT Decoding is skipping NAL units till a valid sync point is found

    Then at some point 2x decoder starts decoding frames and produce output but at the same point 4x decoder starts produce errors 0x20c1 and then hangs:

    0xC1 H264D_ERR_IMPL_PPSUNAVAIL The PPS referred to in the slice header is unavailable

    Can you help in solving this issue?

    Regards,

    Andrey Lisnevich

    logs.zip
  • Hi Andrey,


    In 4-Core mode, every alternate process call is dummy and it will not process the input frame.The next process(active) call actually processes two frames of data, So it requires to feed the active process call with two frames of data.

    Please let us know for any questions


    Thanks,

    Praveen

  • Hi Praveen,

    From your words I see that input buffer logic of decoder in 4-core mode differs from 1, 2 cores mode.

    I implemented  it as you said and still get the same error. See attached log file.

    To input 2 frames I use multiple buffers of XDM1_BufDesc:

    XDM1_BufDesc.numBufs = 2;

    XDM1_BufDesc.bufs[0].buf - data of first frame

    XDM1_BufDesc.bufs[0].bufSize = size of first frame

    XDM1_BufDesc.bufs[1].buf - data of second frame

    XDM1_BufDesc.bufs[1].bufSize = size of second frame

    IVIDDEC2_InArgs.numBytes = XDM1_BufDesc.bufs[0].bufSize + XDM1_BufDesc.bufs[1].bufSize;

    Can you please help to find what is wrong.

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    For decoder in 4-Core mode, Input buffer need to have minimum two frames of data, then internally Main master(Group0) will do frame delineation and pass the second frame input pointer to other group.So no need to provide the input buffer using multiple buffers, just provide a single buffer having two frames of data.

    Please let me know for any questions

    Thanks,

    Praveen

         

  • Hi Praveen,

    Thanks for you support. Finally I managed to run 4-core decoding.

    But it is very not stable on real-time DVB streams that contain errors (actually usual situation). It hangs somewhere inside 'process' call. The issue appears only in 4-cores mode.

    1 and 2 cores mode look much more stable to errors: they report errors but do not hang.

    You can find sample H.264 elementary stream attached. You can use it with my demo above.

    Let me know if the issue is reproducible.

    Regards,

    Andrey Lisnevich

    buggy.264.zip