Tool/software: TI C/C++ Compiler
C6000 VLIW pipelining
Hi,
In C6000 optimization workshop I got some questions:
- The C6xxx architecture, two flank 32 registers, is similar to IA-64 VLIW, isn’t it?
- TI’s latest Keystone processor clocks at 1.2GHz, and some x86 CPUs runs at 4GHz, and can turbo clock to 6GHz under extreme cooling. Some say that DSP has shorter pipeline, but for C6000, the pipeline phases are
Fetch: PG PS PW PR Decode: DP DC Execute: E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 |
and are not really 3x proportionally shorter than Intel x86 (Skylake 14, Bonnell 16). Given that TI always have the state-of-art semiconductor manufacturing ability, why is the clock 3x more slower than Intel? Just for lower power? But Keystone and alike are branded as high performance.
3. The C6000 has 2 side 64 registers, 8 processing unit, which makes software pipelining lot easier as there are more resources to cram in, in each cycle. I read that IA64 was designed with high expectation but not so successfully due to bad backward compatibility. Viewing from the DSP perspective, x86 has very few general purpose registers (like eight), and probably no cross path access (at least like made so explicit/transparent to programmer by C6000). So if one is writes like intensive FIR loop, general purpose compiler (VC,GCC) can do very limited optimization. So I guess that without using Intel ICC and providing it pragmas analogous to TI’s MUST_ITERATE/_nasserts, with which the ICC could then use MMX/SSE instructions, baseline compiler cannot produce high efficiency code like C6000 DSP. Could you comment on this?
4. Intel has MKL math library which are hand-optimized like TI DSPLIB. If we put SSE optimized code on Intel Skylake (14 pipeline phase, vs C6000 16 phases), toe-to-toe with C6000, which wins at efficiency per cycle? Again, why 3x difference?
Dave