This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

HEVC encoder hang

Hi,

HEVC encoder in multicore mode hangs after generating few hundreds of frames in algorithm process call.

I found the problem while transcoding live stream but was able to reproduce it in simple demo that is attached to the post. In my case it hangs after this log message (full log is attached):

.....

HEVC encoder generated 2000 frames
HEVC encoder generated 2100 frames

The demo is similar to one in post http://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/t/347196.aspx but with multicore functionality added.

Please help to resolve the issue.

Regards,

Andrey Lisnevich

hevc_issue2.zip
  • Hi Andrey,

    We are able to reproduce. Presently we are bit occupied with Beta release.
    Please let me know if it is urgent.

    Regards

    Kuladeepak

  • Hi Kuladeepak,

    For us this issue is critical because it prevents us from doing long-term transcoding tests. Stability is one of the key features of production transcoders.

    Regards,

    Andrey Lisnevich

  • Hi Andrey,
       As i have mentioned, i was able to reproduce the issue. It randomlly hangs in different frame but at the same barrier. Encoder hangs in a barrier which is present after the encoder call. Some core hangs in this barier and some other in MFENCE instruction.
    I am continuing debuging.

    Regards
    Kuladeepak

  • Hi Andrey,

    As i told earlier, I was able to reproduce the hang. Debuging didn't help much as one core was hung in MFENCE.
    So i tried reproduce the issue in our test application by given same input(Blank Input) and same config. Encoder ran for 10000-15000 frames uninterrupted.
    Looks like some issue with the application, adding printf or Memseting the Input buffer to zero before the master process call, encoder didn't hang.

    Please let me know any input from your side.

    Regards
    Kuladeepak

  • Hi Kuladeepak,

    What can cause hang in mfence?

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    MFence is a blocking instruction while CPU triggered memory transaction(Cache write back,Cache mode changes,). Looks like some issue with barrier implementation as Cores are hanging in CacheInvalidate API which is called from barrier. While debugging we are observing that core comes out of MFENCE when it is run from halted state along with other cores.

    Can you please try once using uncached memory by removing all invalidate APIs in barrier implementation?

    Regards

    Kuladeepak

  • Hi Kuladeepak,

    I use the same barrier with other codecs for a long time without any issues. I do not see problems in barrier implementation.

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    it is possible to you to give a quick try to Kuladeepak suggestion (removing all invalidate APIs in barrier implementation)?

    This could give us some light on what is wrong..

    On the other hand, we have a new MCSDK video with latest HEVC encoder, H.264HP encoder and H.264HP decoder releases in case you want to give a quick test to them. 

    http://software-dl.ti.com/sdoemb/sdoemb_public_sw/mcsdk_video/02_02_00_42/index_FDS.html 

    Also, we are working on HEVC dec 4K integration on MCSDK video (would be drop43). I will let you know when it is ready.

    Thanks a lot,

    Paula 

    PS: H.264HP decoder now includes MP/BP features but it is in alpha release, GA is planned for September.

     

  • Hi Andrey,

    Looks like TranscodeComponent_lockAcquire as some issue. Which can lead to deadlock.
    Suppose we have only 2 core and cores are at the end of the frame.
    Time 1: Core0 has acquire the lock.
    Time 2: Core1 checks the Lock status, Since Core0 is acquired value of key->userId will be 0.
    Time 3: Core0 has released the lock, and value of key->userId is -1.
    Time 4: Core1 can hung as it waiting till key->userId become non -1, And value of key->userId doesn't change as Core0 doesn't have task to acquire lock.

    Since Cache_inv happens continously resulting others cores hang in MFence.

    We can have work around by commenting the following code in TranscodeComponent_lockAcquire function(TranscodeComponent.c).
    do {
    Cache_inv(key, sizeof(*key), Cache_Type_ALL, TRUE);
    } while (key->userId == -1);

    It looks hang is not observed, I have ran for more then 1 hour.
    Please let me know if your are still observing hang.

    Regards
    Kuladeepak

  • Hi Kuladeepak,

    I implemented barrier and locks using uncached memory and also found the problem in lockAcquire logic.

    do {
    Cache_inv(key, sizeof(*key), Cache_Type_ALL, TRUE);
    } while (key->userId == -1);

    should be changed to

    do {
    Cache_inv(key, sizeof(*key), Cache_Type_ALL, TRUE);
    } while (key->userId != -1);

    Thank for support,

    Andrey Lisnevich