HI,TIer.
Now I want to optimize the algorithm in CCS 6.1.0.00104
The C function is :
static void demo(float * restrict a) {
float _buf[128];
float * restrict _fptr = _buf;
unsigned char i,k,j;
_nassert ((int)(_fptr) % 8 == 0);
_nassert ((int)(a) % 8 == 0);
//#pragma MUST_ITERATE(64, ,64)
for (i = 0; i < 64; i++)
{
_fptr[i] = a[i];
i++;
_fptr[i] = a[i];
}
#pragma MUST_ITERATE(128 ,128)
for (j = 0; j < 128; j++)
{ //Line 154
k = bitrv_LUT[j];
a[j] = _fptr[k];
} //Line 157
return;
}
There are MUST_ITERATE and _nassert in function.However,the Complier stiil advice that it will better if there are MUST_ITERATE and _nassert.
The corresponding asm file is:
;******************************************************************************
;* TMS320C6x C/C++ Codegen PC v7.4.12 *
;* Date/Time created: Wed Mar 11 13:32:43 2015 *
;******************************************************************************
.compiler_opts --abi=coffabi --c64p_l1d_workaround=off --endian=little --hll_source=on --long_precision_bits=40 --mem_model:code=near --mem_model:const=data --mem_model:data=far_aggregates --object_format=coff --silicon_version=6740 --symdebug:dwarf
;******************************************************************************
;* GLOBAL FILE PARAMETERS *
;* *
;* Architecture : TMS320C674x *
;* Optimization : Enabled at level 3 *
;* Optimizing for : Speed *
;* Based on options: -o3, no -ms *
;* Endian : Little *
;* Interrupt Thrshld : Disabled *
;* Data Access Model : Far Aggregate Data *
;* Pipelining : Enabled *
;* Speculate Loads : Enabled with threshold = 9 *
;* Memory Aliases : Presume are aliases (pessimistic) *
;* Debug Info : DWARF Debug *
;* *
;******************************************************************************
The following info is for loop2(Line154~Line157)
;*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop found in file : ../hello.c
;* Loop source line : 153
;* Loop opening brace source line : 154
;* Loop closing brace source line : 157
;* Loop Unroll Multiple : 2x
;* Known Minimum Trip Count : 64
;* Known Maximum Trip Count : 64
;* Known Max Trip Count Factor : 64
;* Loop Carried Dependency Bound(^) : 0
;* Unpartitioned Resource Bound : 3
;* Partitioned Resource Bound(*) : 3
;* Resource Partition:
;* A-side B-side
;* .L units 0 0
;* .S units 0 0
;* .D units 3* 2
;* .M units 0 0
;* .X cross paths 1 0
;* .T address paths 3* 3*
;* Long read paths 0 0
;* Long write paths 0 0
;* Logical ops (.LS) 0 0 (.L or .S unit)
;* Addition ops (.LSD) 1 0 (.L or .S or .D unit)
;* Bound(.L .S .LS) 0 0
;* Bound(.L .S .D .LS .LSD) 2 1
;*
;* Searching for software pipeline schedule at ...
;* ii = 3 Schedule found with 7 iterations in parallel
;*
;* Register Usage Table:
;* +-----------------------------------------------------------------+
;* |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
;* |00000000001111111111222222222233|00000000001111111111222222222233|
;* |01234567890123456789012345678901|01234567890123456789012345678901|
;* |--------------------------------+--------------------------------|
;* 0: | ** ** | ***** |
;* 1: | * ** | ***** |
;* 2: | ***** | **** |
;* +-----------------------------------------------------------------+
;*
;* Done
;*
;* Loop will be splooped
;* Collapsed epilog stages : 0
;* Collapsed prolog stages : 0
;* Minimum required memory pad : 0 bytes
;*
;* Minimum safe trip count : 1 (after unrolling)
;* Min. prof. trip count (est.) : 3 (after unrolling)
;*
;* Mem bank conflicts/iter(est.) : { min 0.000, est 0.250, max 2.000 }
;* Mem bank perf. penalty (est.) : 7.7%
;*
;* Effective ii : { min 3.00, est 3.25, max 5.00 }
;*
;*
;* Total cycles (est.) : 18 + min_trip_cnt * 3 = 210
;*----------------------------------------------------------------------------*
;* SETUP CODE
;*
;* MV A6,B7
;* ADD 1,A6,A6
;* MV A3,B6
;*
;* SINGLE SCHEDULED ITERATION
;*
;* $C$C36:
;* 0 LDBU .D2T2 *B7++(2),B8 ; |156|
;* 1 NOP 6
;* 7 LDW .D2T2 *+B6[B8],B4 ; |156|
;* 8 NOP 2
;* 10 LDBU .D1T1 *A6++(2),A4 ; |156|
;* 11 NOP 2
;* 13 MVD .M2 B4,B5 ; |156| Split a long life
;* 14 NOP 1
;* 15 LDW .D1T1 *+A3[A4],A5 ; |156|
;* 16 NOP 3
;* 19 MV .L1X B5,A4 ; |156| Define a twin register
;* 20 STNDW .D1T1 A5:A4,*A7++(8) ; |156|
;* || SPBR $C$C36
;* 21 ; BRANCHCC OCCURS {$C$C36} ; |153|
;*
;* If you know that this loop will always execute at a multiple of <128> and at least <128> times, try adding "#pragma MUST_ITERATE(128, ,128)" just before the loop.
;*
;* Consider adding assertions to indicate n-byte alignment of variables a if they are actually n-byte aligned: _nassert((int)(a) % == 0).
;*----------------------------------------------------------------------------*
Expect for any reply.
BR!