Hi,
Now I am developing Correlator for software GPS on c6678 EVM. And I have a question.
What kind of assembler code do write to realize 32(16x16) multiplies?
I understand 8 multiplies, sample as follow
DMPY2 .M1 A1:A0, A3:A2, A7:A6:A5:A4
|| DMPY2 .M2 B1:B0,B3:B2,B7:B6: B5:B4
Do I need to study special coding method?
in Instruction Set Reference-
1.1.1 4x Multiply
core can now execute up to 32 (16x16-bit) multiplies per cycle.
I suppose it can do that only with complex,, that is DCMPY execute 2 complex mul that correspond to 8 16x16 mul.
In para. 1.1.4, table 1.1, "Vector Size" = "4x16bits", while "Fixed point 16x16 MACs per cycle" = 32 is about some other operation,such as complex and compex matrix multiplication.
hi,
There maybe a thread to reference:
http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/169810.aspx
Allen
Please press the "Verify Answer" button if you think the post is helpful to your question.Thanks.
Hi, Allen.
Thank you for your information.
I will read a thread.
Masayuki
Hi Alberto
Thank you for your reply.
Certainly, DCMPY instruction can execute 8(16x16) multiplies per cycle.
I will reconfirm Instruction set.
I solved this problem by using CMATMPY instruction. it can be excute 16 multiplies per cycle.
I appreciate some advices.