This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
In spru430d.pdf (TMS320C28x DSP CPU and Instruction Set Reference Guide), it take N+2 cycles the QMACL with RPT instruction. In fact, it take 10us execution times when QMACL repeat 512 times, when the C2808 CPU run at 100M, and all the loc32 and XAR7 all pointd to RAM with zero wait states. Is it true?
Make sure the operands pointed to by 'loc32' and XAR7 are in different SARAM blocks. Each physical block of memory is single access so if you have both operands in the same memory block there will be pipeline stalls because both cannot be read at the same time. If you have the operands in different memory blocks then each can be read at the same time.
Lori
Lori:
Thank you! The problem is solved with 'loc32' and XAR7 pointed to different SARAM blocks.
Just to add an additional comment to Lori:
1) According tho the QMACL instruction please also make sure that 'loc32' points to an operand in Data Memory and XA7 points to one in CODE memory. The 28x has 2 independent bus sysytems (code , data), and QMACL accesses both spaces simultaneously, providing you use the correct pointer setup.
2) Please make sure that your operands are aligned to even addresses. The QMACL reads 32 bit each from code and data memory.
Regards