H.264 encoder with barrier implementation on uncacheable memory

Andrey Lisnevich

Hi,

I implemented barrier that works on uncached memory:

void uncachedOrderedBarrier(volatile int_fast8_t* barrierValue, int_fast8_t userIndex, int_fast8_t isLastUser)
{
    _nassert((int) barrierValue % 8 == 0);
    _nassert(userIndex >= 0);

    //Cache_inv((void*)0x9FFFFF00, 1, Cache_Type_L1D, FALSE);

    while (*barrierValue != userIndex); // Wait for user's turn to enter the barrier

    if (isLastUser) {
        *barrierValue = -userIndex; // Last user enables exit mode of the barrier
    } else {
        *barrierValue = userIndex + 1; // Give turn to next user to enter the barrier

        while (*barrierValue != -userIndex - 1);

        *barrierValue = -userIndex; // Give turn to next user to exit the barrier
    }
}

void MulticoreApi_swbarr(int32_t coreID, int32_t swbarr_id, uint32_t swbarr_cnt)
{
uncachedOrderedBarrier((int_fast8_t*) &uncachedOrderedBarrierValues[swbarr_id], coreID, swbarr_cnt == coreID + 1);
}

The problem is that H.264 HP encoder (01.00.02.00) produces artifacts (shaking video slices) with this barrier implementation.
I found out that to remove artifacts I need to Cache_inv or Cache_wbInv any cache-enabled memory address inside the barrier (see commented Cache_inv in the code above).

When I replace Cache_inv with a dummy delay loop OR any other cache operation like Cache_wbInvAll OR Cache_inv cache-disabled memory address the same artifacts appear again - I tried different variants.

I spent a lot of time debugging this issue and do not see problems in my barrier implementation. Can you explain this behaviour?

Regards,

Andrey Lisnevich

over 9 years ago

0 Sudheesh over 9 years ago

Expert 1945 points

Hi Andrey,

Codec calls shmmap_sync apis for cache coherece across cores. API is implemented at application side, please check for any changes in that with the issue. If you have made any changes to sync apis, please share that.

Does it work with ANY cache-enabled memory address? or related/near to any input/output buffer address?

Also please share encoded output with the artifact issue.

Regards

Sudheesh

0 Andrey Lisnevich over 9 years ago in reply to Sudheesh

Genius 3305 points

Hi Sudheesh,

It works with ANY cache-enabled address, close or far, MSMC or DDR3.
I did no changes to shared memory APIs.

For me it is reproducible when running 4 cores with Full HD resolution.

Regards,

Andrey Lisnevich

0 Andrey Lisnevich over 9 years ago in reply to Andrey Lisnevich

Genius 3305 points

Corrupted output sample attached.

out0.ts.zip

0 Ramamohana Reddy over 9 years ago in reply to Andrey Lisnevich

Intellectual 880 points

Hi Andrey,

Based on output you shared, looks like input YUV data corrupted.

Only Core 1, 2,3 slices are corrupted and core 1 slice looks fine.

master (Core 0) shares input data pointers to slave cores and goes to barrier. Slave cores get updated data pointers once they come out of barrier using sync APIs. Looks like this sync got failed.

To make sure above is problem, you can disable cache for all codec shared buffer requests.

Can you make sure all input buffers & memtabs requested from codec are aligned to 256? please share your sync API implementation. If it is reproducible with standalone setup you prepared earlier please share with us..

Regards

Rama

0 Andrey Lisnevich over 9 years ago in reply to Ramamohana Reddy

Genius 3305 points

Hi Rama,

You can find the demo to reproduce the issue by the following URL:

https://drive.google.com/file/d/0Byw88ezNrM71SmJPOWVwNDJBR0U/edit?usp=sharing

Details about the demo:
- It runs H.264 decoder on cores 0-3
- Decoder reads H.264 elementary input from addres 0x90000000 (13 MiB) in infinite loop. Demo input is attached (stream.264). You should load the file by the address above before starting demo using Load Memory tool.
- Decoder sends frames to H.264 encoder
- H.264 encoder runs on cores 4-7
- Encoder outpus data by 0x98000000 address
- After encoding reasonable number of frames you can pause demo and save generated H.264 elementary stream using Save Memory tool
- Demo informs how many 32 bit words of H.264 data were generated in console:
[C66xx_4] Writing 673 bytes total 824820 words
- You can see artifacts in generated output
- If you uncomment the following line in TranscodeComponent.c artifacts will gone:

//Cache_inv((void*)0x0C000000, 1, Cache_Type_L1D, FALSE); // TODO: temporary fix - barrier should invalidate something

I use the following components:
- XDCtools 3.30.3.47
- EDMALLD 2.11.11
- FC 3.30.0.06
- IPC 3.22.2.11
- SYS/BIOS 6.40.2.27
- H.264 decoder 01.01.04.00
- H.264 encoder 01.00.02.00

Feel free to ask any questions.

Regards,

Andrey Lisnevich

0 Sudheesh over 9 years ago in reply to Andrey Lisnevich

Expert 1945 points

Hi Andrey,

We could able to reproduce the issue with the test setup provided.

We will update further as we examine the issue.

Regards

Sudheesh

0 Sudheesh over 9 years ago in reply to Sudheesh

Expert 1945 points

Hi Andrey,

We are stil debugging the issue. There is no major update.

Regards

Sudheesh

0 Andrey Lisnevich over 9 years ago in reply to Sudheesh

Genius 3305 points

Hi Sudheesh,

Do you see the root of this strange issue? Is it our integration code or TI codec?

Regards,

Andrey Lisnevich

0 Sudheesh over 9 years ago in reply to Andrey Lisnevich

Expert 1945 points

Hi Andrey,

We could not able to debug further much on this issue as we got locked up with other priorities.

We will give you an update early next week.

Thanks and regards

Sudheesh

0 Paula Carrillo over 9 years ago in reply to Sudheesh

TI__Mastermind 32515 points

Hi Sudheesh, wondering if you have had a chance to check this issue.

thanks for your help,

Paula

0 Sudheesh over 9 years ago in reply to Paula Carrillo

Expert 1945 points

Hi Paula/Andrey,

Currently i'm looking into the this issue.

In first few process calls, input buffer pointer is not getting updated at slave cores though Master properly updates those pointers and shares across cores.

Looks like shared memory is updated by slave cores also there by causing the above cache issue.

Thanks and regards

Sudheesh

0 Sudheesh over 9 years ago in reply to Sudheesh

Expert 1945 points

Hi Andrey,

Cache issue is solved with the attached library when I checked in your application.

Could you please try with attached library(please change extension of file) at your end?

Regards

Sudheesh

8080.h264hpvenc_ti.txt

0 Andrey Lisnevich over 9 years ago in reply to Sudheesh

Genius 3305 points

Thanks Sudheesh,

Now it works and I can test which barrier implementation is the best.

Regards,

Andrey Lisnevich

Processors

Processors forum

H.264 encoder with barrier implementation on uncacheable memory