Hello,
I'm looking for four way float add intrinsic. I need to accumulate result of _qmpysp(). Four way float MAC intrinsic is the best.
Best regards,
Wilson.
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello,
I'm looking for four way float add intrinsic. I need to accumulate result of _qmpysp(). Four way float MAC intrinsic is the best.
Best regards,
Wilson.
Look at http://www.ti.com/lit/ug/sprugh7/sprugh7.pdf
section 2.3. The multiplication internal buses are 128-bit so 4 32-bit values can be input, but the arithmetic functional units L and S are only 64-bit wide, so only 2 floating point values can be an input.
To mitigate the in-balance remember that there are two functional units that add 64-bit floating point (L and S) and only one M unit on each side of the core. It was optimized for MAC operation. (multiply accumulation operation)
What I mean is that the hardware does not enable 4 floating point add operation on a single functional unit
Does it answer your question?
In that case please close the thread
If you need further help with optimization post another e2e post
Ran