code performance optimization

Max Zhegulin

Intellectual 420 points

Dear community,

How can I increase performance of such cycle?

for(i=0;i<XMIT_BUFF_SIZE;i++)

{

X[i] = A[i] + B[i] + C[i] + D[i];

}

IDE version is Version: 4.1.0.02002

Thanks!

Max

over 14 years ago

0 Hyun Kim over 14 years ago

TI__Genius 14965 points

Hi,

What device are you using?

Regards,

Hyun

0 Max Zhegulin over 14 years ago in reply to Hyun Kim

Intellectual 420 points

Hi Hyun,

C5515

0 Hyun Kim over 14 years ago in reply to Max Zhegulin

TI__Genius 14965 points

hi,

You may go with assembly. C5515 has a dual mac operation with one cycle. You can refer http://www.ti.com/lit/ug/swpu068e/swpu068e.pdf

Regards,

Hyun

0 Mark M over 14 years ago in reply to Hyun Kim

TI__Mastermind 30470 points

You could also look into calling multiple instances of the Optimized vector addition from DSPLIB: http://focus.ti.com/docs/toolsw/folders/print/sprc100.html

ushort oflag = add (DATA *x, DATA *y, DATA *r, ushort nx, ushort scale)

This function adds two vectors, element by element.

Also when you download the DSPLIB, full optimized assembly source code is provided. You could start with the vector add routine and modify it accordingly.

Hope this helps,
Mark

0 Max Zhegulin over 14 years ago in reply to Mark M

Intellectual 420 points

Hi Mark,

Thanks for your advice.

I modifed add.asm from dsplib in such a way

; ---------------------------------------------------------------

; Start of outer loop

; for (i=0; iP<nx; i++)

; R(i) = X(i) + Y(i) + A(i) + B(i);

; ---------------------------------------------------------------

RPTBLOCAL loop ;start the outer loop

ADD *AR0+, *AR1+, AC0 ;vector add of two inputs

MOV HI(AC0), AR5

ADD *AR3+, *AR4+, AC0 ;vector add of two inputs

MOV HI(AC0), AR6

ADD *AR5, *AR6, AC2 ;vector add of two inputs

; ---------------------------------------------------------------

; To implement scaling:

; if(scale = #1) then AC0=AC0/2;

; otherwise *r_ptr+ = AC0

; ---------------------------------------------------------------

XCC loop, T1!=#0 ;testing for scaling

||SFTA AC2, -1 ;if scale=1, AC0=AC0/2

loop: MOV HI(AC2), *AR2+ ;end of outer loop

Processing time has decreased four times.

So, I pointed two new data vectors to auxiliary registers AR3 and AR4.

And performed addition pair by pair.

In this code I use auxiliary registers AR5, AR6 to store temporary result of addition. According to swpu068e I have only 8 such registers.

Could I store temporary result of addition in other location without reduction of performance?

In case of increasing amount of variables I need more auxiliry registers to point to them data vectors.

Thanks!

Max

Processors

Processors forum

code performance optimization