This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

HEVC encoder hangs randomly

Hi,

We are trying to use latest HEVC encoder C66x_h265venc_01_00_00_44_ELF for live transcoding of SD stream running on 8 cores.

But we noticed that it works up to 30 mins and hangs randomly. Sometimes it hangs even in 1 minute.

We created the demo that reproduces the issue:

https://drive.google.com/file/d/0Byw88ezNrM71S3gyMFI4ZkRqNzg/view?usp=sharing

It should be run on 8 cores. Input to the encoder is generated and always the same.

The demo may hang very fast. But also it may hang after processing 100000 frames (about 30 min running on EVM).

Please review  1.log and 2.log

Each 100 frames demo outputs frame number with size and crc32 of the last frame.

In 1.log demo hanged after 68500 calls. In 2.log it hanged much faster after 15700.

If you compare the files you can see that before hanging it starts generating different output:

1.log 2.log

bytesGenerated=574 crc32=3093084694
#15100
bytesGenerated=25 crc32=3535748092
#15200
bytesGenerated=683 crc32=1842690605
#15300
bytesGenerated=236 crc32=3188576091
#15400
bytesGenerated=276 crc32=1085796158
#15500
bytesGenerated=34 crc32=391204912
#15600
bytesGenerated=303 crc32=4289287941
#15700
bytesGenerated=679 crc32=3494958107

-- CONTINUE WORKING...

bytesGenerated=574 crc32=3093084694
#15100
bytesGenerated=25 crc32=3535748092
#15200
bytesGenerated=581 crc32=481235607
#15300
bytesGenerated=918 crc32=1741148722
#15400
bytesGenerated=497 crc32=105519857
#15500
bytesGenerated=699 crc32=48649686
#15600
bytesGenerated=38 crc32=3215711229
#15700
bytesGenerated=586 crc32=1659338925

-- HANGED

So most likely because of some random factor (i.e. reading uninitialized memory, etc.) encoder hangs.

We do not see problems in our code.

Can you please help debugging the issue to find out where is the problem?

Regards,

Andrey Lisnevich

  • Forgot to mention that for us demo always hangs looping somewhere in H265VENC_TI_ScheduleWavefronts or in H265VENC_TI_getTaskPB_1Chip calling multicore API lock callbacks.
  • Andrey, please confirm us the Encoder version you are using (you posted C66x_h265vdec_01_00_00_29) thanks

    Paula

  • Hi Paula,

    Yes my wrong. I am using encoder and it is C66x_h265venc_01_00_00_44_ELF

    Regards,
    Andrey Lisnevich
  • Hi Andrey,
    Just insight, Usually hangs happens at mention function when all tasks of current frame are not completed, and lock acquiring was not possible. We are working with your test setup to reproduce the hang. Meantime can you send me "*pLCU_completed" array value at hang state, which gives task completion state for each LCU.

    Regards
    Kuladeepak
  • Hi Kuladeepak,

    Problem can be not only in HEVC encoder but in our code (callbacks for example) or any kind of memory corruption that leads to unexpected results.

    I don't know how to access "*pLCU_completed" array. Can you give more details about that?

    Regards,
    Andrey Lisnevich

  • Hi Andrey,

    Can you send the memory dump of memory allocated under "IVIDMC3_SHMEM_ATTRS_RMT_UNCACHED_LOC_SL2" when it has hung?
    The pLCU_completed is pointing to a memory allocated with the above attribute in the uncached SL2 memory.

    Thanks and Regards,
    Shashikantha
  • Hi shashikantha srinivas,

    The memory dump: https://drive.google.com/folderview?id=0B7aam1m6VieebDNCMkNaSkhrbnc&usp=sharing

    (Allocated under "IVIDMC3_SHMEM_ATTRS_RMT_UNCACHED_LOC_SL2" when it has hung)

  • Hi
    We have also observing hang with release, but not able to reproduce hang issue with debug lib.
    Memory dump given looks fine. We are further inspecting it.

    Regards
    Kuladeepak
  • Hi,

    After disabling cache threshold logic in shmSync callback I get different output each time I start demo:

    int32_t TranscodeComponent_shmSync(int32_t user_id, void* shmemHandle, int32_t* shmem_base, int32_t shmem_size, uint32_t shmem_sync_attribs)
    {
    assert(shmemHandle != NULL);
    assert(((Ividmc3Key*) shmemHandle)->data.sharedMemory.base != NULL);
    assert(((Ividmc3Key*) shmemHandle)->data.sharedMemory.size > 0);
    assert(getHeapByType(((Ividmc3Key*) shmemHandle)->data.sharedMemory.heapType) != NULL);

    assert(shmem_base == ((Ividmc3Key*) shmemHandle)->data.sharedMemory.base);
    assert((shmem_size >= 0) && (shmem_size <= ((Ividmc3Key*) shmemHandle)->data.sharedMemory.size)); // TODO: change shmem_size >= 0 to shmem_size > 0 once fixed in HEVC encoder
    assert(shmem_sync_attribs != 0);

    if (shmem_sync_attribs == (IVIDMC3_SYNC_ATTRIBS_LOC_WRITEBACK | IVIDMC3_SYNC_ATTRIBS_LOC_INVALIDATE)) {
    //if (shmem_size >= SIUVIDMC3_CACHE_WBINVALL_THRESHOLD) {
    // Cache_wbInvAll();
    //} else {
    Cache_wbInv(shmem_base, shmem_size, Cache_Type_ALL, TRUE);
    //}
    } else if (shmem_sync_attribs == IVIDMC3_SYNC_ATTRIBS_LOC_WRITEBACK) {
    //if (shmem_size >= SIUVIDMC3_CACHE_WBINVALL_THRESHOLD) {
    // Cache_wbInvAll();
    //} else {
    Cache_wb(shmem_base, shmem_size, Cache_Type_ALL, TRUE);
    //}
    } else if (shmem_sync_attribs == IVIDMC3_SYNC_ATTRIBS_LOC_INVALIDATE) {
    //if (shmem_size >= SIUVIDMC3_CACHE_WBINVALL_THRESHOLD) {
    // Cache_wbInvAll();
    //} else {
    Cache_inv(shmem_base, shmem_size, Cache_Type_ALL, TRUE);
    //}
    } else {
    assert(0); // not implemented sync method
    }

    return 0;
    }

    But still do not see problems in my code. Can you debug into and see what causes this?

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    We were able to get the cause for the hang issue. It seems two cores are executing a critical code which is inside the lock simultaneously. ie two cores are getting a single lock. 

    Can you please review your lock acquire and release codes and fix the issue? Unless the critical section is executed exclusively by a single core, this hang issue cannot be fixed.

    Thanks and Regards,

    Shashikantha

  • Hi Shashikantha,


    I do not see problems in lock routines.


    This is the test:

    #pragma DATA_SECTION(lock, ".dd3uncached")
    #pragma DATA_ALIGN(lock, 8)
    static int lock = -1;

    #pragma DATA_SECTION(lockCounter1, ".dd3uncached")
    #pragma DATA_ALIGN(lockCounter1, 8)
    static int lockCounter1 = 0;

    #pragma DATA_SECTION(lockCounter2, ".dd3uncached")
    #pragma DATA_ALIGN(lockCounter2, 8)
    static int lockCounter2 = 0;

    static void lockCheckTask(UArg arg0, UArg arg1)
    {
        int core = DNUM;
        System_printf("LOCK CHECK TASK\n");
        System_flush();

        int i;
        for (i = 0; i < 10000; i++) {
            TranscodeComponent_lockAcquire(core, &lock);
            lockCounter1++;
            lockCounter2++;
            TranscodeComponent_lockRelease(core, &lock);
        }

        System_printf("core %d lockCounter1=%d lockCounter2=%d\n", core, lockCounter1, lockCounter2);
        System_flush();
    }

    And this is results from all 8 cores:

    [2015-01-22 14:30:51] core 0 lockCounter1=44147 lockCounter2=44147
    [2015-01-22 14:30:51] core 1 lockCounter1=77499 lockCounter2=77499
    [2015-01-22 14:30:51] core 2 lockCounter1=77857 lockCounter2=77857
    [2015-01-22 14:30:51] core 3 lockCounter1=78400 lockCounter2=78400
    [2015-01-22 14:30:51] core 4 lockCounter1=79118 lockCounter2=79118
    [2015-01-22 14:30:51] core 5 lockCounter1=79587 lockCounter2=79587
    [2015-01-22 14:30:51] core 6 lockCounter1=79778 lockCounter2=79778
    [2015-01-22 14:30:51] core 7 lockCounter1=80000 lockCounter2=80000

    From other hand I see problems in the way HEVC codec calls lock callbacks. I added additional asserts to lock release function:

    int32_t TranscodeComponent_lockRelease(int32_t user_id, void* lockHandle)
    {
        assert(lockHandle != NULL);

        int_fast32_t* lockUserId = (int_fast32_t*) lockHandle;

        _nassert((int) lockUserId % 8 == 0);

        GateMP_Handle gate = GlobalGates_getTranscodeComponentGate();
        IArg gateKey = GateMP_enter(gate);

        if (*lockUserId == user_id) {
            *lockUserId = -1;
            GateMP_leave(gate, gateKey);
            return IVIDMC3_LOCK_ERROR_NONE;
        } else if (*lockUserId == -1) {
           GateMP_leave(gate, gateKey);
           assert(0);
           return IVIDMC3_LOCK_ERROR_DBL_UNLOCK;
        } else {
           GateMP_leave(gate, gateKey);
           assert(0);
           return IVIDMC3_LOCK_ERROR_USER_ID;
        }
    }

    It fails on assert near IVIDMC3_LOCK_ERROR_USER_ID. It means that lock release was called on lock acquired by different user.

    Regards,

    Andrey Lisnevich

  • Hi,

    I implemented lock in different way:

    There are only 2 locks needed for this HEVC configuration. For each lock I create personal GateMP. Then call GateMP_enter in acquire and GateMP_leave in release - simplest implementation. This lock also passes my test.

    But I experience the same problems with this lock and when cache threshold logic is disabled I get different output for each execution.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,
    I understand your issue. We had a check point, whether more then one core is getting the lock simultaneously.
    It says more then one core is getting the lock simultaneously. If this happen encode will behave randomly, Could get different output for each execution.

    What do you mean by 2 locks needed for HEVC. Can please elaborate the exercise you did.

    Regards
    Kuladeepak
  • Hi Kuladeepak,

    > Can you please review your lock acquire and release codes and fix the issue? Unless the critical section is executed exclusively by a single core, this hang issue cannot be fixed.

    We reviewed the code and implemented the locks in different way. But still getting random output each time demo execution when shared memory sync threshold is disabled (like we mention it before).

    So it looks for us that the problem is not in locks.

    Regards,
    Andrey Lisnevich
  • Hi Kuladeepak,

    Do you have any new on the issue?

    Please let us know how we can help.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    Is the hang issue resolved after your lock implementation change?

    For the mismatch issue, can you remove the threshold completely and always do cache writeback or invalidate only for the region passed to the API, and see whether the mismatch issue is resolved?

    Thanks and Regards,

    Shashikantha

  • Hi Andrey,
    I didn't see the code where you have commented out the threshold logic.
    Is the hang issue resolved after changing of the lock API?
    If it is resolved please share the new API, so that cache writeback issue can be reproduced.

    Thanks and Regards,
    Shashikantha
  • Hi Shashikantha,

    Looks like hang issue is gone with correct locks implementation. But I am still testing this.

    But still when threshold logic is disabled I get different output each execution while input is the same. I believe it is not Ok. Similar threshold logic present in MCSDK VIDEO demo. Disable threshold logic like this:

    int32_t TranscodeComponent_shmSync(int32_t user_id, void* shmemHandle, int32_t* shmem_base, int32_t shmem_size, uint32_t shmem_sync_attribs)
    {
    assert(shmemHandle != NULL);
    assert(((Ividmc3Key*) shmemHandle)->data.sharedMemory.base != NULL);
    assert(((Ividmc3Key*) shmemHandle)->data.sharedMemory.size > 0);
    assert(getHeapByType(((Ividmc3Key*) shmemHandle)->data.sharedMemory.heapType) != NULL);

    assert(shmem_base == ((Ividmc3Key*) shmemHandle)->data.sharedMemory.base);
    assert((shmem_size >= 0) && (shmem_size <= ((Ividmc3Key*) shmemHandle)->data.sharedMemory.size)); // TODO: change shmem_size >= 0 to shmem_size > 0 once fixed in HEVC encoder
    assert(shmem_sync_attribs != 0);

    if (shmem_sync_attribs == (IVIDMC3_SYNC_ATTRIBS_LOC_WRITEBACK | IVIDMC3_SYNC_ATTRIBS_LOC_INVALIDATE)) {
    //if (shmem_size >= SIUVIDMC3_CACHE_WBINVALL_THRESHOLD) {
    // Cache_wbInvAll();
    //} else {
    Cache_wbInv(shmem_base, shmem_size, Cache_Type_ALL, TRUE);
    //}
    } else if (shmem_sync_attribs == IVIDMC3_SYNC_ATTRIBS_LOC_WRITEBACK) {
    //if (shmem_size >= SIUVIDMC3_CACHE_WBINVALL_THRESHOLD) {
    // Cache_wbInvAll();
    //} else {
    Cache_wb(shmem_base, shmem_size, Cache_Type_ALL, TRUE);
    //}
    } else if (shmem_sync_attribs == IVIDMC3_SYNC_ATTRIBS_LOC_INVALIDATE) {
    //if (shmem_size >= SIUVIDMC3_CACHE_WBINVALL_THRESHOLD) {
    // Cache_wbInvAll();
    //} else {
    Cache_inv(shmem_base, shmem_size, Cache_Type_ALL, TRUE);
    //}
    } else {
    assert(0); // not implemented sync method
    }

    return 0;
    }

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

     

    Andrey Lisnevich said:
    I get different output each execution while input is the same. I believe it is not Ok

    Yes the output should always bitmatch for the same input for each execution.

    To narrow down the issue to cache sync, can you please turn off the cache altogether and verify if the output is same for concurrent runs?

    Thanks and Regards,
    Shashikantha

  • Hi Shashikantha,

    I put all the allocated blocks into DDR3 uncached memory but still getting different output each time when threshold is disabled.

    To test this please increase DDR3 uncached heap size in app.cfg like:

    Program.global.ddr3UncachedSharedRegionSize = 0x10000000;

    and change heap definition function in TranscodeComponent.c:

    static IHeap_Handle getHeapByType(IVIDMC3_SHMEM_ATTRS heapType)
    {
    switch (heapType) {
    case IVIDMC3_SHMEM_ATTRS_DDR_CACHED:
    case IVIDMC3_SHMEM_ATTRS_RMT_CACHED_LOC_DDR:
    //return ddr3Heap;
    return ddr3UncachedHeap;

    case IVIDMC3_SHMEM_ATTRS_DDR_UNCACHED:
    case IVIDMC3_SHMEM_ATTRS_RMT_UNCACHED_LOC_DDR:
    case IVIDMC3_SHMEM_ATTRS_RMT_UNCACHED_NO_LOC:
    case IVIDMC3_SHMEM_ATTRS_RMT_UNCACHED_LOC_SL2:
    return ddr3UncachedHeap;

    case IVIDMC3_SHMEM_ATTRS_SL2:
    case IVIDMC3_SHMEM_ATTRS_RMT_CACHED_LOC_SL2:
    //return msmcHeap;
    return ddr3UncachedHeap;

    default:
    return NULL;
    }
    }

    P. S.
    Fixed lock implementation for the demo:

    int32_t TranscodeComponent_lockAcquire(int32_t user_id, void* lockHandle)
    {
    assert(lockHandle != NULL);

    volatile int_fast32_t* lockUserId = (int_fast32_t*) lockHandle;

    _nassert((int) lockUserId % 8 == 0);

    GateMP_Handle gate = GlobalGates_getTranscodeComponentGate();

    while (1) {
    IArg gateKey = GateMP_enter(gate);

    if (*lockUserId == -1) { // Lock is released
    *lockUserId = user_id;
    Cache_wb((void*) lockUserId, sizeof(*lockUserId), Cache_Type_L1D, TRUE);
    GateMP_leave(gate, gateKey);
    return IVIDMC3_LOCK_ERROR_NONE;
    } else {
    assert(*lockUserId != user_id);
    }

    GateMP_leave(gate, gateKey);

    // Wait until lock is released
    while (*lockUserId != -1);
    }
    }

    int32_t TranscodeComponent_lockRelease(int32_t user_id, void* lockHandle)
    {
    assert(lockHandle != NULL);

    int_fast32_t* lockUserId = (int_fast32_t*) lockHandle;

    _nassert((int) lockUserId % 8 == 0);

    GateMP_Handle gate = GlobalGates_getTranscodeComponentGate();
    IArg gateKey = GateMP_enter(gate);

    if (*lockUserId == user_id) {
    *lockUserId = -1;
    Cache_wb((void*) lockUserId, sizeof(*lockUserId), Cache_Type_L1D, TRUE);
    GateMP_leave(gate, gateKey);
    return IVIDMC3_LOCK_ERROR_NONE;
    } else if (*lockUserId == -1) {
    GateMP_leave(gate, gateKey);
    assert(0);
    return IVIDMC3_LOCK_ERROR_DBL_UNLOCK;
    } else {
    GateMP_leave(gate, gateKey);
    assert(0);
    return IVIDMC3_LOCK_ERROR_USER_ID;
    }
    }

    Regards,
    Andrey Lisnevich
  • Andrey,

    By disabling cache, I meant disabling the caching itself. ie in function initializeCaching() set size of L1P, L1D, and L2 to 0. Also use Cache_Mar_DISABLE when calling the Cache_setMar(). Restart the board before running with these changes.

    This ensures that no data is cached. If the outputs match after these change, then the problem is related to cache coherency, otherwise the problem lies somewhere else.

    Thanks and Regards,
    Shashikantha

  • Shashikantha,

    Thanks for your clarifications. I did like you said:

    1) Disabled all caches in platform package
    2) Disabled all caches in SYS/BIOS by calling Cache_setSize
    3) Disabled caching for all MSMC and DDR3 memory regions by calling Cache_setMar with Cache_Mar_DISABLE on them
    4) Also for all allocations I use DDR3 memory even when MSMC memory is requested, just in case

    And now I get the same output each time I run demo. This output also matches the output with enabled cache and turned on cache synch threshold functionality.

    As you said looks like problem is related to cache coherency.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    It looks like the input buffers are not invalidated properly.
    In doMasterEncoding module we have done the following change:

    Cache_wb(displayBufs[0], (bufferSizes[0]), Cache_Type_ALL, TRUE);
    //Cache_wb(_inputYBufs, (_frameNum * bufferSizes[0]), Cache_Type_ALL, TRUE);
    Cache_wb(displayBufs[1], (bufferSizes[1]), Cache_Type_ALL, TRUE);
    //Cache_wb(_inputUBufs, (_frameNum * bufferSizes[1]), Cache_Type_ALL, TRUE);
    Cache_wb(displayBufs[2], (bufferSizes[2]), Cache_Type_ALL, TRUE);
    //Cache_wb(_inputVBufs, (_frameNum * bufferSizes[2]), Cache_Type_ALL, TRUE);

    After this change, the output is matching.
    In the current code, the cache writeback for input buffer is done for all N-1 frames, except the current N frame.
    If the current N frame input buffer is written back, as done in the above code, the output is fine.
    Also when thresholds were used for cache writeback, the whole cache got flushed when it exceeded the threshold, hence the input buffer cache was being flushed.
    Please see if the fix solves the issue at your end.

    Thanks and Regards,
    Shashikantha
  • Thanks for support,

    It is bug in the demo. So now all the issues are fixed. HEVC encoding runs smoothly.

    Regards,
    Andrey Lisnevich