This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6678 H264/H265 Performance

Other Parts Discussed in Thread: TMS320C6678

Hello,

I want to compare H264 encoding performance of C6678 vs DM8168.

What is H264 encoding performance on C6678?
Results of my tests on DM8168 is:

720x576 MPEG2, 4-5Mbit/s -> deinterlace -> 720x576 H.264 High Profile, 2Mbit/s - 16 channels.

1920x1080 H.264 10-15Mbit/s -> deinterlace -> 1920x1080 H.264 High Profile, 10Mbit/s - 3-4 channels.


I want the same information about H265 codec.

  • Hi Vladimir,

    Please refer to
    downloads.ti.com/.../H264_Encoder_C6678_DataSheet.pdf
    in which the performance summary is given..

    ---
  • Thank you, Shankari.

    What's about h265 statistics?

    Best regards.
  • Shankari,
    Can you explain what does "Cycles Information" means?
    As i understand - how much DSP cycles i need to encode profile.
    For example, in table 2 "Table 2 Cycles Information - Profiled on TMS320C6678 EVM with Code Generation Tools Version 7.4.6"
    H264HP_ENC_007 with "Fruits_i1920x1080_420p_8bit.yuv , YUV420, CABAC, VBR, IBBP @ 8Mbps @ 30 frames per second" has 1170 MegaCycles per second. As described in (1) this "Measured with C66x DSP 1250MHz clock".
    So for encode "1920x1080_420p_8bit.yuv , YUV420, CABAC, VBR, IBBP @ 8Mbps @ 30 FPS" i used 1170 MCycles of 1250 MHz (93.6 % of DSP).
    This measured on 1 DSP core?
    So, if i want encode "1920x1080_420p_8bit.yuv , YUV420, CABAC, VBR, IBBP @ 8Mbps @ 30 FPS" in 8 thread I can do it on 8 cores with 93.6% core load?
  • Hi Vladimir,
    For H265 please check data sheet from here: e2e.ti.com/.../450351
    In H.264 and HEVC data sheets, table one specify how many cores per configuration ID are used for benchmarking. And cycle information is per core, your understanding is correct.
    thank you,Paula
  • Hi Paula,

    I discussed your answer with my colleagues.
    I think I may asked question in misunderstood manner.
    I wanted to ask: can I transcode 8 video threads "1920x1080_420p_8bit.yuv , YUV420, CABAC, VBR, IBBP @ 8Mbps @ 30 FPS" on 8 cores? Or I can transcode only 1 video thread on 8 core at these settings?

    You tell that cycle information is per core. TMS320C6678 has 8 cores. So summary cycles for DSP must be 8 times more.
    Argument of my colleagues is description in table 1.
    "H264 High Profile, VBR, IBBP, CABAC, Multi Core(8 Cores) - H264HP_ENC_007"

    Another one point of question is performance/price ratio.
    If I can transcode 8 video threads, that performance is very good.
    But if K2 can transcode only 1 video thread on 8 cores, that old DaVinci is much more better. DM8168 can transcode 3 H.264 video threads simultaneously with price about 120 $ (from TI.com). Price of TMS320C6678 is up to 250 $ (from TI.com).
    So it's very important to understand how much Keystone's I need to get performance of one DaVinci? 1/2 or 3?

    Ti positioned Keystone 2 for HEVC encoding. So this question is very important.

    Best regards.
  • Hi Vladimir,

    When using H264HP_ENC_007 type of configuration (p1920x1080_420p_8bit.yuv, YUV420, CABAC, VBR, IBBP @ 8Mbps @ 30 frames per second) you can encode 1 channel using 1 C6678 DSP (8-cores).

    Please note that you can tweak codecs configuration params and reduce number of required cores...

    Table 2: Cycle information shows average and peak information per core. And depending on the configuration used you can multiply this average number by number of used cores and get an aprox of total MIPS per DSP.

    A small clarification C6678 is Keystone 1 not Keystone 2.

    TI davinci uses hardware accelerators (IVAHD) while Shannon uses software codecs. Our HEVC codecs are for DSP, currently we don't have IVAHD for HEVC..

    Thank you,
    Paula
  • Hello Paula,
    I can't open this document because this link go into this thread but not document.

    Paula Carrillo said:

    For H265 please check data sheet from here: e2e.ti.com/.../450351

  • You right, that CC6678 is Keystone 1 not Keystone 2.
    I was comparing CC6678 and DM8168 by advice from local TI office.
    They said that performance of DSP on K1 and K2 on equal speed is the same because DSP cores are the same and codecs performance tables for CC6678 is present.

    My customer really wants use K2 in IP TV transcoder platform, but they need to know performance of solution.
    As i understand K2 is only one solution from TI for H265 encoding, but Keystone is not so good for h264, where DaVinci is much better.
    Is TI plans to implement IVAHD for HEVC in any processor?
  • Hi Vladimir, I haven't seen HEVC IVAHD in roadmap. I will ask internally and let you know if it is otherwise.
    Thank you,Paula
  • Hello Paula,

    Whats about H265 coding performance statistics?

    Best regards

    Vladimir Aparin said:

    Hello Paula,
    I can't open this document because this link go into this thread but not document.

    Paula Carrillo

    For H265 please check data sheet from here: e2e.ti.com/.../450351

  • Hi Vladimir, my apology, I copied the wrong link please check this one:

    FYI, landing page for all C66x video codecs is:

    Thank you,

    Paula

  • Hello Paula,
    I checked this document before, but this info is not comparable with performance info for h264 encoding.

    I made table for performance comparing DM8168 vs C6678 vs K2 in h264/h265 in full HD and HD ready.
    Please can you fill in this table?

    Video parameters on input

    Video parameters on input

    Simultaneous Channels,

    (or may be other metric)

    DM8168

    C6678

    K2

    1080i 50 Hz @ H.264 @ 10-15Mbit/s

    1080p 25 Hz @ H.264 High Profile @ 10Mbit/s

    3-4

    ?

    ?

    576i 50 Hz @ MPEG2 @ 4-5Mbit/s

    576p 25 Hz @ H.264 High Profile @ 2Mbit/s

    16

    ?

    ?

    TBD for H.265

    TBD for H.265

    ?

    ?

    ?

    TBD for H.265

    TBD for H.265

    ?

    ?

    ?

  • Hello Paula,
    I checked this document before, but this info is not comparable with performance info for h264 encoding.

    I made table for performance comparing DM8168 vs C6678 vs K2 in h264/h265 in full HD and HD ready.
    Please can you fill in this table?

    Video parameters on input

    Video parameters on output

    Simultaneous Channels,

    (or may be other metric)

    DM8168

    C6678

    K2

    1080i 50 Hz @ H.264 @ 10-15Mbit/s

    1080p 25 Hz @ H.264 High Profile @ 10Mbit/s

    3-4

    ?

    ?

    576i 50 Hz @ MPEG2 @ 4-5Mbit/s

    576p 25 Hz @ H.264 High Profile @ 2Mbit/s

    16

    ?

    ?

    TBD for H.265

    TBD for H.265

    ?

    ?

    ?

    TBD for H.265

    TBD for H.265

    ?

    ?

    ?

  • Hi Vladimir, do you have a quad-Shannon? And a Ubuntu desktop. If so, you could profile your streams and use cases very easily using our MCSDK video framework. Let me know, if so, I can send you wiki links to use.
    On the other hand, roughly (thumb rule) cores/chips required:
    Mpeg2 dec 576i50@5Mbps ~150MCycles per second. So, we could fit 8 channel per core
    H264 dec 1080i50@10Mps ~ 2 cores. So, we could fit 4 ch per chip
    H264 enc 1080p25@10Mps ~8 cores. So, we could fit 1 ch per chip
    H264 enc 576p25@2Mps ~2cores. So, we could fit 4ch per chip
    I will ask our codec developers to check these numbers and make corrections/comments if required. I am currently out of office but I will check this next week when I am back
    Thank you,Paula

  • My customer have Hawking devboard, so they can make tests on it. So your link will be helpful.

    I update table based on you answer. As we see DM8168 is much better on h264 due IPVC.
    We still need Info for h265 performance  for all generations of processors.

    Video parameters on input Video parameters on output Simultaneous Channels
    DM8168 C6678 K2
    1080i 50 Hz @ H.264 @ 10-15Mbit/s 1080p 25 Hz @ H.264 High Profile @ 10Mbit/s 3-4 1 ?
    576i 50 Hz @ MPEG2 @ 4-5Mbit/s 576p 25 Hz @ H.264 High Profile @ 2Mbit/s 16 4 ?
    TBD for H.265 TBD for H.265 ? ? ?
    TBD for H.265 TBD for H.265 ? ? ?
  • Hi Vladimir,

    MSDK video framework links below:

     

    Please keep in mind current MCSDK video FW is for C6678 devices (Shannon, Quad-Shannon and Octal-Shannon boards). Some users had ported to K2 but we don't have an official TI MCSDK Video for K2.

    Also, I got a correction from our developers about H264HP decoder numbers, please see below:

     H264 dec 1080i50 @ 10Mbps requires 4-cores for decoding. So, we could fit 2 ch per chip

    Finally, for HEVC performance numbers in your table what is the use case? (resolution, fps, bitrate..)

    thank you,

    Paula

  • Paula Carrillo said:

    Finally, for HEVC performance numbers in your table what is the use case? (resolution, fps, bitrate..)


    It's will be very good if parameters of H265 will be equal to H264 by resolution and speed.
    In this case comparison will be much easy. Bitrate to you discretion (It have to be slightly lower, as I think).

  • Hi Vladimir, I got below information from our developers

    HEVC Decoder :-

         Achieved is  1080p60, 8Mbps                            :- 8 core.

        Which can be scaled for 1080p30 , 4 Mbps   :- (4-5) cores.

     

    HEVC Encoder(User Defined encoding preset)

        Achieved is  1080p60, 8Mbps                             :- 32 core or 4 chips.

        Which can be scaled for 1080p30 , 4 Mbps   :- ~16 cores or 2 chips.

    Thank you,

    Paula

  • Hello Paula,

    Can you make additional comments about hardware implementation?
    I want to reproduce same result on multichip PCB. Which interfaces required for HEVC encoding in multichip configuration?
    As I understand Hyperlink should be enough, but I have some doubts about it.

    As I know these results of coding performance was got on some TMDSEVM6678L. Can you provide instruction about connecting and configuring of test stand?
  • Hi Vladimir, this was and old thread and I am not clear about your current question =).. Could you please clarify which platform are you referent (Hawking?, 6678, DM8681)? Which codecs HEVC? H264? About multichip, which platforms are you considering and in which type of connection?

    Thank you,

    Paula

  • Now we are talking about Keystone I - TMS320C6678.
    TMDSEVM6678L as base development board. I want to reproduce same results as you provide above for codec performance. So I need to understand dependencies and interaction between development boards when video codecs are runned.
  • Hello Paula,
    Can you answer on this question?
  • Hi Vladimir, let me redirect this question to our Keystone1 HW experts.
    thank you,
    Paula
  • Hello Paula,


    can you tell when I can get answer on last question?

  • Hi Vladimir, from MCSDK video I see we reserve DDR CHIP2CHIP memory for PCIE, Hyperlink, and Mailbox (mcsdk_video_2_2_0_46\dsp\ggcfg\build\hdg\sv04\ggvf0.becmd).
    For Multichip HEVC uses mainly mailbox for sharing chip2chip information. For Virtual Tales, as far as I remember, default interchip communication between neighbor tiles, was via PCIe. Hyperlink (when available between 2 chips) was used instead of PCIe as an optimization for 4K. I will confirm these information with the codec developers and come back to you soon.

    You can also search inside MCSDK video for "HLNK", "PCI" and "mailbox" keywords for some code inspection.

    thank you,
    Paula
  • Hi Vladimir, I confirmed above information with the developers, so we should be ok.
    thank you,
    Paula
  • HEVC Encoding:

    Configuration Resolution # of C6678 devices at 1.25 GHz
    Low Delay 720p30 6 cores of 8
    Low Delay 1080p30 1+ DSP (10 cores)
    Low Delay 1080p60 2.5 DSP (20 cores)
    Low Delay 4kp30 5 DSP (40 cores)
    Low Delay 4kp60 10 DSP (80 cores)
    Standard 720p30 1 DSP (8 cores)
    Standard 1080p30 2 DSP (16 cores)
    Standard 1080p60 4 DSP (32 cores)
    Standard 4kp30 8 DSP (64 cores)
    Standard 4kp60 16 DSP (128 cores)
    Broadcast 720p30 2 DSP (16 cores)
    Broadcast 720p60 4 DSP (32 cores)
    Broadcast 1080p30 4 DSP (32 cores)
    Boradcast 1080p60 6 DSP (48 cores)
    Broadcast 4kp30 12 DSP (96 cores)
    Broadcast 4kp60 24 DSP (192 cores)

    Even such huge power horse DSP device like C6678 seems to be obsolete for 4K HEVC (because HEVC was designed especially for Hi-Def).

    The future is here.

    Alexey

  • Hello Alexey,

    I know that HEVC require so much cores. My question is about something another now - which interfaces between C6678's are using in HEVC provided by TI? For example PCIe, Hyperlink or something else?