Hi,
I'm trying to see the benefits of the instruction set of the C6678.
I took an example : I want to multiply two complex numbers a and b.
My first version is basic :
c.re = a.re*b.re-a.im*b.im;
c.im = a.re*b.im+b.re*a.im;
This code takes 29 CPU cycles
So... I wanted to use the intrinsics, I thought it would be quicker :
prod = _cmpysp(_ftof2(a.re,a.im),_ftof2(b.re,b.im));
c.re = _hif(_hid128(prod)) + _hif(_lod128(prod));
c.im = _lof(_lod128(prod)) + _lof(_lod128(prod));
And this code takes 43 CPU cycles
So... Do I use the intrisics the wrong way ? Or does it depend on what we want to do ?
Thanks,
Alex