This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

H.264 Baseline Profile on C6678 quality problem



Hi,
We are using TI H.264 Baseline Profile Encoder on C6678 chip.
We are experiencing heavy problems with video quality using this encoder:
1) Most of bit-rate algorithms do not work properly.
The only 2 bit-rate control setups that work:
-params.videncParams.rateControlPreset = IVIDEO_USER_DEFINED; params.rcAlgo = 0;
-params.videncParams.rateControlPreset = IVIDEO_LOW_DELAY;
2) When using multi-core encoder, the bit-rate is not handled in a proper way (after configuring the bit-rate to 4Mbps, we get 5Mbps and higher bit-rate) - especcially in multi-core setup
3) On lower bit-rates, for example 720p, 30fps, 3Mbps, the image looks bad, very high quantization and moreover, the quantization is much higher near the border between regions that are encoded on different cores. The result is that even in almost static image, the quality is not improved even after many P-frames were generated that are supposed to send more details and to improve frame quality

My encoder params and dynamic params configuration is as following:
p_obj->params.profileIdc = 66;
p_obj->params.searchRange = 64;
p_obj->params.videncParams.inputChromaFormat = XDM_YUV_420P;
p_obj->params.videncParams.encodingPreset = XDM_USER_DEFINED;
if(p_obj->encoder_config.is_variable_bitrate)
{
p_obj->params.videncParams.rateControlPreset = IVIDEO_USER_DEFINED;
p_obj->params.rcAlgo = 0;
}
else
{
p_obj->params.videncParams.rateControlPreset = IVIDEO_LOW_DELAY;
}
p_obj->params.videncParams.maxHeight = p_obj->encoder_config.input_height;
p_obj->params.videncParams.maxWidth = p_obj->encoder_config.input_width;
p_obj->params.videncParams.maxFrameRate = 30000;
p_obj->params.videncParams.maxBitRate = 10000000;
p_obj->params.videncParams.dataEndianness = XDM_BYTE;
p_obj->params.videncParams.maxInterFrameInterval = 0;
p_obj->params.videncParams.inputContentType =IVIDEO_PROGRESSIVE;
p_obj->dynamic_params.videncDynamicParams.inputWidth = p_obj->encoder_config.input_width;
p_obj->dynamic_params.videncDynamicParams.inputHeight = p_obj->encoder_config.input_height;
p_obj->dynamic_params.videncDynamicParams.intraFrameInterval = p_obj->encoder_config.intra_frame_frequency;
p_obj->dynamic_params.videncDynamicParams.generateHeader = XDM_ENCODE_AU;
p_obj->dynamic_params.videncDynamicParams.captureWidth = 0;
p_obj->dynamic_params.videncDynamicParams.forceIFrame = 0;
p_obj->dynamic_params.qpIntra = -1;
p_obj->dynamic_params.qpInter = -1;
p_obj->dynamic_params.quartPelDisable = 0;
p_obj->dynamic_params.qpMax = 51;
p_obj->dynamic_params.qpMin = 0;
p_obj->dynamic_params.airMbPeriod = 0;
p_obj->dynamic_params.maxMBsPerSlice = 0;
p_obj->dynamic_params.sliceRefreshRowStartNumber = 0;
p_obj->dynamic_params.sliceRefreshRowNumber = 0;
p_obj->dynamic_params.filterOffsetA = 0;
p_obj->dynamic_params.filterOffsetB = 0;
p_obj->dynamic_params.log2MaxFNumMinus4 = 0;
p_obj->dynamic_params.picOrderCountType = 0;
p_obj->dynamic_params.chromaQPIndexOffset = 0;
p_obj->dynamic_params.constrainedIntraPredEnable = 1;
p_obj->dynamic_params.maxMVperMB = 4;
p_obj->dynamic_params.intra4x4EnableIdc = INTRA4x4_IPSLICES;
p_obj->dynamic_params.hierCodingEnable = 1;
p_obj->dynamic_params.mvDataEnable = 0;
p_obj->dynamic_params.streamFormat = IH264_BYTE_STREAM;
p_obj->dynamic_params.intraRefreshMethod = IH264_INTRAREFRESH_NONE;
p_obj->dynamic_params.Intra_QP_modulation = 0;
p_obj->dynamic_params.Max_delay = 3;
p_obj->dynamic_params.lfDisableIdc = DISABLE_FILTER_SLICE_EDGES; /* when enabling, causes artifacts on different core's boundaries */
p_obj->dynamic_params.sliceGroupChangeCycle = 0;
p_obj->dynamic_params.idrEnable = 0;
p_obj->dynamic_params.streamFormat = IH264_BYTE_STREAM;
p_obj->dynamic_params.pfNalUnitCallBack = NULL;

p_obj->dynamic_params.top_slice_line = top_slice_line;
p_obj->dynamic_params.bottom_slice_line = bottom_slice_line;

p_obj->dynamic_params.videncDynamicParams.targetBitRate = (XDAS_Int32)p_obj->encoder_config.target_bitrate;
p_obj->dynamic_params.qpMax = 51;
p_obj->dynamic_params.qpMin = 0;
p_obj->dynamic_params.videncDynamicParams.intraFrameInterval = p_obj->encoder_config.intra_frame_frequency;
p_obj->dynamic_params.maxBytesPerSlice = p_obj->encoder_config.max_video_packet_size;
p_obj->dynamic_params.videncDynamicParams.targetFrameRate = ((p_obj->encoder_config.frames_per_sec_2x) >> 1) * 1000;
p_obj->dynamic_params.videncDynamicParams.refFrameRate = p_obj->dynamic_params.videncDynamicParams.targetFrameRate;

Is there any another way to improve the video quality with this encoder?
Thank you,
Oleg Fomenko
Surf Communication Solutions

  • Hi Oleg,

    Is it possible for you to replace H264BP encoder with H264HP encoder? H264HP encoder can do BP stream encoding also, and we are obsoleting BP encoder.

    FYI. HP encoder can be downloaded from http://software-dl.ti.com/dsps/dsps_public_sw/codecs/C6678/H264HP_E/latest/index_FDS.html.

    HP encoder has also been integrated in MCSDK Video 2.1: http://software-dl.ti.com/sdoemb/sdoemb_public_sw/mcsdk_video/latest/index_FDS.html

    Thanks,

    Hongmei

  • Hi Hongmei,

    I'll try to integrate high profile encoder instead of baseline but I have a couple of questions:

    -Does it support multi-core mode of encoding? It seems that is does, but there is no more parameters named top_slice_line and bottom_slice_line, so how do I configure each core's work?

    -Is it based on baseline profile encoder codebase?

    -Can it solve any of our problems with video quality?

    -It has much higher number of configuration parameters that in baseline encoder. Can you recommend the configuration that is optimal for video quality? (setting encodingPreset to a value different from USER_DEFINED never worked in previous TI encoders, will it work now?)

    Thanks,

    Oleg

  • Hi Oleg, please see answers in green

    I'll try to integrate high profile encoder instead of baseline but I have a couple of questions:

    -Does it support multi-core mode of encoding? It seems that is does, but there is no more parameters named top_slice_line and bottom_slice_line, so how do I configure each core's work?

    In the configuration file you can select number of cores and coreteam

    ncores = 2

    CoreTeamMap = 0,1

    -Is it based on baseline profile encoder codebase?

    yes it was built on top of BP, we some improvements

    -Can it solve any of our problems with video quality?

    HP codec (using BP tools) has better VQ than previous BP

    -It has much higher number of configuration parameters that in baseline encoder. Can you recommend the configuration that is optimal for video quality? (setting encodingPreset to a value different from USER_DEFINED never worked in previous TI encoders, will it work now?)

    You can use USER_DEFINED, RC could be set to CBR or VBR. You can select if you want CBR constrained (no skip frames) or typical CBR. I am attaching 2 typical .cfg files using BP tools (CAVL, IPPP etc) and CBR. For one of them I select Profile/Level HP in order to enable Scaling Matrix and transform block size 8x8. Please let us know if you face any issue (I changed based on one that I have but I didn't test them) or if VQ is still low.

    4213.encoderConfig_3Mbps_25FPS_GOP25.cfg

    1033.encoderConfig_3Mbps_25FPS_GOP25_SM.cfg

    Thank you,

    Paula

  • Hi Oleg, I was pointed out that now HP encoder specifies top and bottom slice lines internally. Encoder divides equally among cores and if some lines are remaining then they will encoded by master core.

    Thank you,

    Paula

  • Hi,

    Thanks for your answers,

    I'm working now very intensively on the new High Profile H.264 encoder integration and It seems like the encoder generates a valid bitstream.

    For now I have one urgent problem: I can not limit the max size of slice generated by the decoder.

    I want to use RFC 3984 in SingleNAL mode and for that I need to divide the stream into NAL units smaller than packet size and for that I must limit the slice size.

    In API I can see only limitation of number of macro blocks in slice, but not its size in bytes.

    Can anyone help me with that?

    Thanks a lot,

    Oleg

  • Hi Oleg,

    Codec will not support fixed slice size like H263. As Paula pointed out slice division mechanism across multiple cores is handled internally to the codec.

    Thanks

    Sudheesh 

  • Hi,

    I have 2 questions:

    [Codec will not support fixed slice size like H263.]

    -You mean like in previous version or H.264, not H.263, right? And the slice size was not fixed in previous version, but it could be limited by max size in bytes.

    -If it is really not supported, how can I work in Single NAL RTP encapsulation mode? There are many clients that do not support Fragmentation Units (FU) encapsulation mode, so this is the only mode(SingleNAL) that is supported.

    Thanks,

    Oleg

  • Hi Oleg,

    Please see replies inline:

    I have 2 questions:

    [Codec will not support fixed slice size like H263.]

    -You mean like in previous version or H.264, not H.263, right? And the slice size was not fixed in previous version, but it could be limited by max size in bytes.

                    [Sudheesh]:  It’s not H.263. It’s H.241. I will clarify that : the below feature is NOT supported in H264HP Encoder.

                                          "Multiple slices per picture based upon number of bytes per slice for H.241 based MTU packetization."

     

    -If it is really not supported, how can I work in Single NAL RTP encapsulation mode? There are many clients that do not support Fragmentation Units (FU) encapsulation mode, so this is the only mode(SingleNAL) that is supported.

                    [Sudheesh]: From the multicore concurrency  (cores start encoding different rows of current picture in parallel) and performance point of view, it is NOT supported by design of the codec.

     

    Thanks

    Sudheesh

  • Hi Sudheesh,
    Thanks for the quick answer, I understood that I must support FU encapsulation mode in RFC3984 from now and I added this support.
    I've integrated this codec and it works right now only for single core encoding.
    I have a problem running it in multi-core parallel encoding mode:
    The master core starts working (but is stuck at software barrier) when called process function, but the process function on the slave core immediately returns failure when out_args.videnc2OutArgs.extendedError = 0x1.
    As I saw from user's guide, 0x1 means violation of level limitations.
    I don't understand why this happens because I configure both params and dynamic params exactly the same for both cores (except core_task_ID and coreID fields of course)
    Input buffer descriptors and input args are also the same for both cores.
    How can I debug this problem?

    Thanks,
    Oleg

  • Oleg, quick question, when you use our testapplication you can encode using multiple cores (as directed in the user guide), but your problem is when you use the codec inside your application, is my understanding correct? could you please share used static and dynamic params?

    Thank you,

    Paula

  • Hi Oleg,

    Please check for cache coherence also.

    You can take the reference for cache coherence from test application(with the formal codec library release). There you can disable cache.
    Replace the existing cache sizes to the following to disable cache ( Function: TestApp_EnableCache()/ File :h264hpvenc_ti_testapp.c)
    size.l1pSize = Cache_L1Size_0K; /* L1P cache size */
    size.l1dSize = Cache_L1Size_0K; /* L1D cache size */
    size.l2Size = Cache_L2Size_0K; /* L2 cache size */

    Please look at the corresponding code in test application. Please refer to this thread regarding cache coherence issues.

    Please refer to h264hpvenc.cfg file for details regarding section mapping.

    Thanks and regards
    Sudheesh

  • Hi,
    Currently I can successfully encode video stream on a single core (H264HPVENC library), but in a multi-core setup it does not work:
    I tried to disable caching as you advised and it improved the situation, but I still experience problems:
    With cache disabled both master and slave cores run process that returns success. But the output that is taken from the master core (I saw in the sample application that it is supposed to be taken only from master core and anyway slave core always returns bytesGenerated = 0, correct me if I'm wrong) contains only the upper half of the frame. When I play the result video - bottom half of the video is green.
    I didn't try to run your sample application in multi-core setup. Is it supposed to work?

    My cfg and map files are attached as well as my C code that fills all configuration params (lines 80 to 262 in .c file)

    I tried also to get debug traces from the decoder as it is described in the user manual, but extMemoryDebugTraceSize is always returned 0 as well as extMemoryDebugTraceAddr, so it seems that the release version of the encoder is compiled without DEBUG_TRACE macro defined. I can't re-compile it since I don't have a source code.

    What should be my next steps in order to solve this issue?

    0511.app.cfg

    Thanks in advance,

    Oleg

  • Missing files for previous post:

    C code:

    2235.video_codec_TI_H264_BP_enc.c

    map file:

    8865.oleg_test_map.txt

    link command file:

    3678.lnk_cmd.txt

  • Hi Oleg,

    Output is taken from Master core as master core will "stitch" the outputs from different cores towards the end of process call.

    Sample application will run for multiple cores. You can try with changing ncores and CoreTeamMap. For ex. if you need to run the codec in 4 cores, ncores = 4 and CoreTeamMap = 0,1,2,3

    Please check test application for adding barrier functions at appropriate places to make all the cores in sync. For ex. you can see barrier functions before and after the process call, after SETPARAMS control call etc.

    Thanks and regards

    Sudheesh

  • Hi,

    I checked all the software barriers and added even 3 new software barriers: before XDM_SETPARAMS, before process function and after process function

    Still the same result.

    Can you check my configuration that I sent in previous post?

    I think something might be wrong with memory mapping/sections/link commands...

    BTW I don't use IPC, can it cause any problem?

    Thanks,

    Oleg

  • I checked your parameters settings(except section mapping and memory - will check that also). It's fine. Anyway you are getting ouput with single core. 

    In test application with IPC enabled, the SW barrier function  calls ipcBarWait function before waiting at the barrier through Bar_wait. That you can see from callstack in CCS. That means, for multicore syncup IPC is being used. 

    Since the slave section in output is completely blank from the first frame itself and cache is disabled, it might be related to sync issue between master and slave, any issues in EDMA configurations(sample test application has edma configuration file - if it is required) etc.

    Please check with test application for outputs bytes generated at master core at the end of process call for your dual core scenario. This will give the sum of outputs bytes generated for all cores. Please check with your application, whether you are getting the same.

    Please give me the output file also - 4-5 frames may be fine.

    Thanks

    Sudheesh

  • Hi Oleg,

    It looks like you are providing your own ividmc functions: swbarr, shmmap, shmmunmap, and shmmap_sync. Can you please compare them with the ones provided in MCSDK Video 2.1.0.8 to see if there are any gaps? Those functions can be found from mcsdk_video_2_1_0_8\dsp\siu\ividmc\siuVidMc.c: siuVidMc_Swbarr, siuVidMc_Map_Shmem, siuVidMc_Unmap_Shmem, siuVidMc_Shmem_Sync.

    Thanks,

    Hongmei

  • Hi,

    I compared my ividmc implementation to the implementation from sample application and didn't find any gaps.

    My implementation is attached. Could you please check if you see something suspicious? I'm really stuck and don't know how to proceed...

    1781.video_codec_multicore.c

    Moreover, I tried to run the sample application that is part of the codec's package and I didn't succeed even on a single core with default configuration. I get the green output:

    2313.sample_app_output_264.txt

    Output of my application with only upper half of the yuv encoded is attached also:

    2110.shannon_3gp.txt

    Thanks for your help,

    Oleg

  • Hi Oleg,

    Sample application should work as-is. Could you please confirm if you modify it? if so, could we try the testapplication that comes with the codec?.

    Also are you using JTAG for fread/fwrite the input/output files or TFTP (http://processors.wiki.ti.com/index.php/MCSDK_VIDEO_2.1_CODEC_TEST_FW_User_Guide) ? are you using a Shannon EVM or a quad-shannon card?

    is it any error printed on your CCS console?

    Thank you,

    Paula

  • Hi,

    I have finally succeeded to run H.264 HP encoder sample application that encodes a stream on 2 cores and it works fine.

    The problem in my project still exists...

    I solved the cache problem, but still I get a stream (encoded) with bottom half green.

    I compared all my ividmc functions to those from sample application and it looks fine (the code is attached in my previous posts).

    Also I compared executions of all ividmc functions from the coded from sample application and my project. All those calls are in exact the same order.

    What are my next steps to solve this problem? I don't believe that there are no ways of debugging it in a more convenient way, I mean traces, etc.

    I am really stuck with this problem and it is very urgent.

    Thanks,

    Oleg

  • Hi Oleg,

    Sudheesh suggest you to compare output bytes generated by the different cores between your application and our sample test application. Are the number of output bytes the same?   Trying to determine if  the slave core is actually encoding correctly and the issue occurs after or not. also there is any error reported on CCS console?  

    Thank you,

    Paula

  • Hi Paula,

    I'll perform this test on sunday.

    There are no errors on CCS console, the process function in my project return result ok both on master and slave core.

    Since you don't work on sunday, can you please advise me which direction should I proceed if I get the same/different output length in my project versus ti sample application?

    Thanks,

    Oleg

  • Hi Oleg, 

    Can you please also check shared memory APIs are implemented properly 

    Shared memory request is done by codec while codec creation,  with shared memory attributes app should give same memory for both the cores. 

    Also please check sh memory sync APIs are implemented properly with cache invalidate 

    You can refer unit testapplication for implementation details. 

    Regards 

    Rama 

  • Hi,

    I checked my shared memory API implementation and it is valid. The same pointers are returned to master and slave cores for the same key.

    Also, my memory sync function is implemented properly (At least I think so) with cache wbInv (the same as in sample application)

    Regarding the previous question: with the same configuration I get different output length on my application and on ti sample application.

    What are my next steps?

    Thanks,

    Oleg

  • Oleg, it is possible for you share your application with us? could be in a private message if required.

    Thank you,

    Paula

  • Hi Paula,

    Do you need the whole application? The project is huge and it can be problematic, I must talk to the management

    The parts that are relevant to this encoder I posted earlier. Can it help? Or you need the whole source code in order to continue?

    Thanks,

    Oleg

  • Hi Oleg,  I was thinking in reproduce the issue in my setup, but let discuss with the codec team on further step for you to debug your application.

    Thank you,

    Paula

  • Hi Oleg,

    For H264HP encoder, the bit stream stitching is done inside the codec and using EDMA. We wonder how the EDMA channels are allocated for the cores in your application. Also, is there any hardware event used in your application, such as those listed below (they are copied from C6678 data manual. Please find the complete list from the data manual)?

    Table 7-36 EDMA3CC1 Events for C6678 (Part 1 of 2)
    Event Number Event Event Description
    0 SPIINT0 SPI interrupt
    1 SPIINT1 SPI interrupt
    2 SPIXEVT Transmit event
    3 SPIREVT Receive event
    4 I2CREVT I2C receive event
    5 I2CXEVT I2C transmit event
    6 GPINT0 GPIO interrupt
    7 GPINT1 GPIO interrupt
    8 GPINT2 GPIO interrupt
    9 GPINT3 GPIO interrupt
    10 GPINT4 GPIO interrupt
    11 GPINT5 GPIO interrupt
    12 GPINT6 GPIO interrupt
    13 GPINT7 GPIO interrupt
    14 SEMINT0 Semaphore interrupt
    15 SEMINT1 Semaphore interrupt
    16 SEMINT2 Semaphore interrupt
    17 SEMINT3 Semaphore interrupt

    ...

    Also, from your map file, there are two sections we usually place in local L2, instead of DDR. Please place them in local L2 and retry.

    .args

    .cio

    Thanks,

    Hongmei

  • Hi,

    I finally found the problem: the problem was in output buffer descriptors that I passed to process function: I passed 2 different output buffers to master core and slave core when it is supposed that pointer to the same output buffer should be passed to process function.
    Now I see a valid video output in a multi-core scenario!
    One more question: in order to get maximal quality, how should I configure initialBufferLevel and HRDBufferSize parameters?
    In user manual there is some formula like 2*bitrate and 1/2 *bitrate for VBR and CBR scenarios.

    - There are 2 pairs of these parameters: in params and in dynamic_params, should it be equal?
    - These parameters are calculated using maxBitRate in params structure or using targetBitRate from dynamic_params?

    Thanks,
    Oleg

  • Hi Oleg, glad to hear now you are getting a correct output with 2 cores. Please see answers to your questions below

    - There are 2 pairs of these parameters: in params and in dynamic_params, should it be equal?

    PC-- yes we typically use same values for initialBufferLevel and HRDBufferSize

    - These parameters are calculated using maxBitRate in params structure or using targetBitRate from dynamic_params?

    PC-- using TargetBitRate

    Thanks,
    Paula

  • Hi Paula,

    Thank you for the answer,

    I just didn't understand: initialBufferLevel in dynamic_params and initialBufferLevel in params should be the same?

    HRDBufferSize from dynamic_params and HRDBufferSize from params should be equal?

    Oleg

  • Hi Oleg, for codec evaluation I use the same buffer value for params and dynamic params.. also I use the same value for  initialBufferLevel  and HRDBufferSize

    eg. VBR tartget bitrate 1M

    static_param34 = 2000000 # rateControlParams.initialBufferLevel

    static_param35 = 2000000 # rateControlParams.HRDBufferSize

    dynamic_param35 = 2000000 # rateControlParams.initialBufferLevel

    dynamic_param36 = 2000000 # rateControlParams.HRDBufferSize

    Thanks,

    Paula

  • Hi Paula,

    I saw the parameters in the codec evaluation, but my scenario is slightly different:

    I want to configure maxBitrate to about 10Mbps when the actual target bitrate can be changed on the fly let's say between 1mbps - 10mbps.

    I understand that in this case initialBufferLevel and HRDBufferSize from static params should be 20000000 (20M). Is that correct?

    And what should be the values of initialBufferLevel and HRDBufferSize in dynamic params? Should it be updated every time I change my target bitrate in dynamic params?

    By the way, congratulations for such a dramatic improvement in the encoder's quality! This encoder is much better than the previous baseline profile encoder. It maintains bitrate really well and on 1mpbs I get quality that is better than on 4mbps on the previous encoder. Great work!

    Thanks,

    Oleg


  • Hi Oleg, I think you are correct, but I will confirm and come back to you. About the VQ, in the next days (~1 Week aprox) we will publish a new release for H.264 which has improved VQ . Improvements are mainly for HP tools but some apply for MP and BP. I will keep you post it so you can try it =) 

    thank you,

    Paula

  • Oleg, I confirmed for better VQ you should update dynamic HRD buffers values every time you change target bitrate

    thanks,

    Paula