This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler: C6000 VLIW pipelining vs IA64/x86

Tool/software: TI C/C++ Compiler

C6000 VLIW pipelining

Hi,

In C6000 optimization workshop I got some questions:

  1. The C6xxx architecture, two flank 32 registers, is similar to IA-64 VLIW, isn’t it?
  2. TI’s latest Keystone processor clocks at 1.2GHz, and some x86 CPUs runs at 4GHz, and can turbo clock to 6GHz under extreme cooling. Some say that DSP has shorter pipeline, but for C6000, the pipeline phases are

Fetch: PG PS PW PR

Decode: DP DC

Execute: E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

and are not really 3x proportionally shorter than Intel x86 (Skylake 14, Bonnell 16). Given that TI always have the state-of-art semiconductor manufacturing ability, why is the clock 3x more slower than Intel? Just for lower power? But Keystone and alike are branded as high performance.

3. The C6000 has 2 side 64 registers, 8 processing unit, which makes software pipelining lot easier as there are more resources to cram in, in each cycle. I read that IA64 was designed with high expectation but not so successfully due to bad backward compatibility. Viewing from the DSP perspective, x86 has very few general purpose registers (like eight), and probably no cross path access (at least like made so explicit/transparent to programmer by C6000). So if one is writes like intensive FIR loop, general purpose compiler (VC,GCC) can do very limited optimization. So I guess that without using Intel ICC and providing it pragmas analogous to TI’s MUST_ITERATE/_nasserts, with which the ICC could then use MMX/SSE instructions, baseline compiler cannot produce high efficiency code like C6000 DSP. Could you comment on this?

4. Intel has MKL math library which are hand-optimized like TI DSPLIB. If we put SSE optimized code on Intel Skylake (14 pipeline phase, vs C6000 16 phases), toe-to-toe with C6000, which wins at efficiency per cycle? Again, why 3x difference?

 

Dave

  • 1. Well, Intel bills IA64 as "EPIC," which is "Explicitly Parallel Instruction Computing," which is an umbrella term which is meant to include VLIW. However, IA64 is super-scalar, which means there is hardware at run-time to compute the instruction order. VLIW and super-scalar are meant to attack the same problem, exploiting instruction-level parallelism (ILP). The tradeoff is that super-scalar requires extra hardware at run-time, and VLIW requires more work in the compiler.
    2. Don't discount the power argument. For embedded systems, battery life is paramount, and don't forget about heat dissipation. Beyond that, you'll want to ask this question in the C6000 forum instead of the compiler forum.
    3. The Intel compiler (ICC) is supposed to be very good, or at least is is for x86. Without some level of source-code annotation, the C language doesn't lend itself very well to the sorts of optimizations you need for certain hardware features that are tailored to intense computations.
    4. Again, you'll want to ask this on the C6000 forum. The TI compiler team does not benchmark Intel devices.