In spru430d.pdf (TMS320C28x DSP CPU and Instruction Set Reference Guide), it take N+2 cycles the QMACL with RPT instruction. In fact, it take 10us execution times when QMACL repeat 512 times, when the C2808 CPU run at 100M, and all the loc32 and XAR7 all pointd to RAM with zero wait states. Is it true?