6678 H.264 encoder output differences in single-core vs. multi-core

Jeff Brower73

All-

Building on questions that were answered by Paula in this excellent thread:

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/269123/942550

we are seeing slight differences in 2-core vs. single core output for the 6678 H.264 encoder. For example, if we run 640x480p .yuv file in comparison tests we see:

1 core, 30 fps, 800 kbps
210 YUV frames, 635 H.264 frames, 711115 payload bytes

1 core, 30 fps, 1.5 Mbps
210 YUV frames, 1034 H.264 frames, 1298653 payload bytes

2 cores, 30 fps, 800 kbps
210 YUV frames, 723 H.264 frames, 714200 bytes

2 cores, 30 fps, 1.5 Mbps
210 YUV frames, 1145 H.264 frames, 1315303 bytes

All output .h264 files play fine in VLC -- we can't detect any difference in video quality or VLC stats.

Should output be bit-exact for single-core vs. multicore? If not, what are the differences we can expect -- a slight change made automatically in the encoder, such as bitrate, profile option, etc ?

Thanks.

-Jeff
Signalogic

over 11 years ago

0 Paula Carrillo over 11 years ago

TI__Mastermind 40580 points

Hi Jeff, H.264HP encoder multicores uses slices for data partitioning while single core use the whole frame. Output bitstream is expected to be different.

thank you,

Paula

0 Jeff Brower73 over 11 years ago in reply to Paula Carrillo

Genius 3420 points

Paula-

Thanks very much for your reply.

Yes slice vs. whole frame makes sense. Can you point out some example TI source code for parsing 6678 encoder H.264 output frames into RTP packets? Our code seems to do this fine for single core output, but introduces artifacts when parsing multicore output. Possibly there are some slice identifiers or other formatting markers that we are not detecting correctly.

Thanks.

-Jeff

0 Jeff Brower73 over 11 years ago in reply to Paula Carrillo

Genius 3420 points

Paula-

More info. Our current RTP packetization code demarcates NALU frames by looking for 4-byte identifier values of 0x00000001, as discussed here:

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/100/p/7906/31866

For multicore output, what may be happening to us is that there are additional identifiers (same or different value? ) per frame where data from the second core (slice) starts. Is that the case? If not can you help us identify what are the differences? Thanks.

-Jeff

0 Jeff Brower73 over 11 years ago in reply to Paula Carrillo

Genius 3420 points

Paula-

New info. For multicore encoding we see this in the 6678 H.264 encoder output bitstream:

00 00 00 01 67 [DATA] 00 00 00 01 68 [DATA] 00 00 00 01 65 [DATA] 00 00 00 01 65 [DATA] 00 00 00 01 41 [DATA]

The difference from single core is the two successive slice NALUs (0x65), which I assume are from core 0 and core 1. Do you have any code example showing RTP packetization of the TI encoder output stream for multiple NALU per frame? In our RTP pcap, everything looks good -- FU-A fragmentation, marker bit, etc but VLC and ffplay both show a similar type of artifact, as if the upper half of the frame is stretched (vertically) and lower half is not there.

What about slice size? Anything at all we need to change in TI 6678 H.264 encoder params?

Thanks.

-Jeff

0 Paula Carrillo over 11 years ago in reply to Jeff Brower73

TI__Mastermind 40580 points

Hi Jeff, I am not sure about what should be the correct NAL Unit identifier code but if you want to take a look to one of our RTP demo inside our MCSDK video 2.1 framework then you can try MCSDK video 2.1.x.x. windows installer, below instructions. You would need a shannon EVM.

About slice size,

For HP encoder case, slice size can be restricted with number of MBs per slice by selecting sliceMode = 1 and sliceUnitSize as number of MBs.

(sliceUnitSize would be aligned to multiple of MB row width, if it is not given like that.)

For BP encoder case, additional to MB based control, slice size can be controlled in terms of bytes by selecting sliceMode = 1 and

sliceUnitSize = N bytes, then N bytes informs the number of bytes in one slice in the range [576,1500].

thank you,

Paula

0 Paula Carrillo over 11 years ago in reply to Paula Carrillo

TI__Mastermind 40580 points

Jeff, a 5-cents comment, for "streamFormat" param we typically use 0, but in some cases, (for packetized in RTP with packetization_mode=0), customers use streamFormat = 1. The difference is then it won’t generate that NALU header “0x00000001”.

From config file:
streamFormat = 0 # Type of bitstream to be encoded, 0 => Byte stream format, 1=> NALU format(without start code)

thank you,
Paula

0 Jeff Brower73 over 11 years ago in reply to Paula Carrillo

Genius 3420 points

Paula-

Thanks for this advice. Maybe without the NALU header codes there will be some other H.264 bitstream difference that helps. In that case what do we parse on to find start-of-slice for RTP packetization?

With the EVM demo, it sounds like we could capture a pcap on PC incoming side, and compare with ours. Also we could try receiving the EVM output stream with VLC and ffmpeg. Does one of the demo options include 2-core (or more) encoding, so multiple slices per frame will be in the output RTP stream?

That's key -- as I've mentioned, with one core output (1 slice per frame), VLC can see our stream no problem, but with 2 core output (2 slices per frame) we have artifacts. If we post a short video of the artifacts, could you guys take a look and see if you have an idea of what it might be? More or less the screen appears "vertically stretched", sort of like the lower half of the screen might be ok but the upper half is not being transmitted or decoded properl.

Thanks.

-Jeff

0 Paula Carrillo over 11 years ago in reply to Jeff Brower73

TI__Mastermind 40580 points

Hi Jeff, yes there is a demo which uses 6 cores to encode. below example path:

C:\TI\mcsdk_video_2_1_0_8\demos\pkt_d1dec_resize_1080penc_demo

thank you,

Paula

0 Jeff Brower73 over 11 years ago in reply to Paula Carrillo

Genius 3420 points

Paula-

The problem is fixed now. It turns out we had a slight corruption in our RTP packetization at the end of a fragmented (FU-A) packet sequence. Actually the problem was always there, even for 1 slice (1 core) but was far less noticeable. With more slices (more cores), the problem occurred "in the middle" of the picture and affected the following slices.

Thanks again for your help.

-Jeff

Processors

Processors forum

6678 H.264 encoder output differences in single-core vs. multi-core