Four way float add intrinsic

Wilson Choi

Hello,

I'm looking for four way float add intrinsic. I need to accumulate result of _qmpysp(). Four way float MAC intrinsic is the best.

Best regards,

Wilson.

over 9 years ago

0 Yordan Kovachev over 9 years ago

TI__Guru**** 161600 points

Hi Wilson,

Could you share which device & what software release is this?

Best Regards,
Yordan

0 Michael P over 9 years ago

Expert 1810 points

C66x does not have an instruction to do that, so I would be surprised if such an intrinsic exists. However, DADDSP can run on four units (.L1, .L2, .S1 and .S2), so paired DADDSPs issued to .Ln and .Sn can keep up with QMPYSP (which can only run on .M1 and .M2).

SIMD fused multiply-add instructions *would* be nice, especially with the common variants that negate one or the other argument to the addition.

0 Wilson Choi over 9 years ago in reply to Yordan Kovachev

TI__Intellectual 1595 points

Yordan,
It's C66 core in Jacinto 6. No special software release. Just testing algorithm.
Best regards,
Wilson

0 Yordan Kovachev over 9 years ago in reply to Wilson Choi

TI__Guru**** 161600 points

Hi,

I've forwarded this to the C66x design team. Feedback will be posted directly here.

Best Regards,
Yordan

0 ran35366 over 9 years ago in reply to Yordan Kovachev

TI__Genius 12805 points

Look at http://www.ti.com/lit/ug/sprugh7/sprugh7.pdf

section 2.3. The multiplication internal buses are 128-bit so 4 32-bit values can be input, but the arithmetic functional units L and S are only 64-bit wide, so only 2 floating point values can be an input.

To mitigate the in-balance remember that there are two functional units that add 64-bit floating point (L and S) and only one M unit on each side of the core. It was optimized for MAC operation. (multiply accumulation operation)

What I mean is that the hardware does not enable 4 floating point add operation on a single functional unit

Does it answer your question?
In that case please close the thread

If you need further help with optimization post another e2e post

Ran

0 Wilson Choi over 9 years ago in reply to ran35366

TI__Intellectual 1595 points

Understood. Thanks very much
Best regards,
Wilson.

Processors

Processors forum

Four way float add intrinsic