This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

About IVA-HD architecture

HI, everyone,

I see in Swpu249ab page 1070 and page 1076 that:

Four instructions per cycle, four execution units:
– Optimized instruction set for video and image processing
– Four 8 x 8 or 16 x 16 multiply accumulate (MAC) per cycle
– Four slave synchronous die (SAD) per cycle
– Eight interpolations (a + b + 1) >> 1 per cycle
– Two (32-bit x 32-bit -> 64-bit) multiply operations per cycle

As far as I know about C64x+, two 32x32->64/cycle requires 2 .M units.

Eight (a+b+1)>>1/cycle also requires 2 .M units running AVGU4 at the same time.

Is this implying that the IVA-HD, or IVA3 core has at least 2 .M units? If this is the case, what are the other two units available?

Thank you.

Dehuan

  • Hello Dehuan,

    #Q: Is this implying that the IVA-HD, or IVA3 core has at least 2 .M units? If this is the case, what are the other two units available?

    - NO, 2.M units are only actual for DSP subsystem. IVA-HD is the separated subsystem in terms of DSP. IVA-HD uses own hardware accelerators and there is not direct connection with DSP subsystem. 

    Best regards,

    Yanko 

  • Hi, Yanko,


    I thought DSP was a part of IVA subsystem. I spent a long time with OMAP3, and the IVA2.2 subsystem includes a full C64x+ core, a sequencer, and a few video accelerators.

    What I really want to ask is how many function units are there in the mini DSP in OMAP5 series processors, and what are they.

    Thanks.


    Dehuan

  • Hello Dehuan,

    At first, see the picture:

    DSP Subsystem Description
    The DSP subsystem contains the following submodules:
    • TMS320DM64TM 32-bit fixed DSP core for audio processing and general-purpose imaging and video
    processing. It is backward compatible with existing C64xTM video codecs.
    – 32-KiB L1 4-way set associative cache
    – 128-KiB L2 8-way set associative cache

    The internal architecture is an assembly of the following components:
    • High-performance TI DSP (TMS320DMC64xTM) derivative (DSP_C0) integrated in a megamodule,
    including local level 1 (L1) and level 2 (L2) cache and memory controllers for audio processing and
    general-purpose imaging and video processing

    For detail information about DSP mega module you can refer to section 5.3.2.7 Other DSP Reference Documents in OMAP5 TRM.

    You can take a look on these documents:

    TMS320C64x+ DSP Megamodule Reference Guide (TI Literature number SPRU871) describes the
    C64x+ megamodule peripherals.
    • TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (TI Literature number SPRU732)
    • TMS320C6000 Programmer's Guide (TI Literature number SPRU198) describes ways to optimize C
    and assembly code for the TMS320C6000 DSPs, and includes application program examples.

    The IVA-HD subsystem is composed of:
    • Improved motion estimation acceleration engine (IME3), which is used in encoding processing
    • Improved loop filter acceleration engine (ILF3), which performs deblocking filtering
    • Improved sequencer (ICONT1) based on the ARM968E-STM microcontroller. It includes memory and
    INTC and is used as a primary sequencer.
    • Intraprediction estimation engine (IPE3). It is used in encoding processing.
    • Calculation engine (CALC3), which performs transform and quantization calculations
    • Motion compensation engine (MC3), which creates an interprediction macroblock with given motion
    vectors and modes from the reference data
    • Entropy coder/decoder (ECD3), which uses Huffman and arithmetic codes during the process of encoding and decoding the stream
    Video DMA processor (ICONT2), which is also based on the ARM968E-S microcontroller and can be
    used as secondary sequencer
    Video DMA engine (vDMA), which is a DMA engine for data transmission between external memories
    and shared L2 memory
    Synchronization box (SYNCBOX) embedded in each hardware accelerator and in both ICONTs
    Mailbox for communication between IVA-HD and external to it processors (DSP, MPU, and IPU)
    Shared L2 interface and memory
    Video local interconnect for connection between the submodules of the IVA-HD
    IVA-HD system control module (SYSCTRL), which controls the clocks in the subsystem and PRCM
    handshaking

    For detail information see chapter 6 IVA Subsystem in OMAP5 TRM.

    I cannot understand what do you have in mind with mini DSP in OMAP5.


    Best regards,

    Yanko

  • Yanko,

    The reason why I'm curious about the mini-DSP is that we are trying to migrate from OMAP3 to OMAP5.

    As our application relies on heavily on the full-DSP of OMAP3 but not very much on the video accelerators, so the reduction of DSP processing power is a major concern, which basically  means that most of the DSP code will have to be moved to Neon.

    I kind of wish TI could add a GPU to K2E, or a C66 to OMAP, and then our live would be a lot easier.

    Dehuan