Dear community,
How can I increase performance of such cycle?
for(i=0;i<XMIT_BUFF_SIZE;i++)
{
X[i] = A[i] + B[i] + C[i] + D[i];
}
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Dear community,
How can I increase performance of such cycle?
for(i=0;i<XMIT_BUFF_SIZE;i++)
{
X[i] = A[i] + B[i] + C[i] + D[i];
}
hi,
You may go with assembly. C5515 has a dual mac operation with one cycle. You can refer http://www.ti.com/lit/ug/swpu068e/swpu068e.pdf
Regards,
Hyun
You could also look into calling multiple instances of the Optimized vector addition from DSPLIB: http://focus.ti.com/docs/toolsw/folders/print/sprc100.html
ushort oflag = add (DATA *x, DATA *y, DATA *r, ushort nx, ushort scale)
This function adds two vectors, element by element.
Also when you download the DSPLIB, full optimized assembly source code is provided. You could start with the vector add routine and modify it accordingly.
Hope this helps,
Mark
Hi Mark,
Thanks for your advice.
I modifed add.asm from dsplib in such a way
; ---------------------------------------------------------------
; Start of outer loop
; for (i=0; iP<nx; i++)
; R(i) = X(i) + Y(i) + A(i) + B(i);
; ---------------------------------------------------------------
RPTBLOCAL loop ;start the outer loop
ADD *AR0+, *AR1+, AC0 ;vector add of two inputs
MOV HI(AC0), AR5
ADD *AR3+, *AR4+, AC0 ;vector add of two inputs
MOV HI(AC0), AR6
ADD *AR5, *AR6, AC2 ;vector add of two inputs
; ---------------------------------------------------------------
; To implement scaling:
; if(scale = #1) then AC0=AC0/2;
; otherwise *r_ptr+ = AC0
; ---------------------------------------------------------------
XCC loop, T1!=#0 ;testing for scaling
||SFTA AC2, -1 ;if scale=1, AC0=AC0/2
loop: MOV HI(AC2), *AR2+ ;end of outer loop
Processing time has decreased four times.
So, I pointed two new data vectors to auxiliary registers AR3 and AR4.
And performed addition pair by pair.
In this code I use auxiliary registers AR5, AR6 to store temporary result of addition. According to swpu068e I have only 8 such registers.
Could I store temporary result of addition in other location without reduction of performance?
In case of increasing amount of variables I need more auxiliry registers to point to them data vectors.
Thanks!
Max