Hi,
this is my first post! My name is fabian and i am atm trying to speed up my multi-precision library i wrote in C by using intrinsics.
I have been looking at the instruction set of the c64x+ as described in the CPU and Instruction Set reference(spru732j) as well as the list of intrinsics in the programmer's guide(spru198k) but i couldn't find a 32-bit x 32-bit multiply with add intrinsic.
I have read several times that the c64x+ should be able to do 2 32-bit x 32-bit MACs per cycle like e.g. written in the document Optimizing loops on the c66x DSP(sprabg7).
My multi-precision numbers are stored in a struct with an uint32_t array containing the number.
I have implemented a multi precision mulitplication algorithm called "Comba multiplication" which features a tripple precision step in the core loop like a[i] * b[i] + c.
How would you do this by using intrinsics?
Would be great if someone could help me out.
regards and thanks in advance,
fabian