I have a number of questions related to obtaining space and cpu cycle efficiency with the C2000 family. I have an application where I would like to use integer math for FIR filtering of stream data (to save space) and floating point for more complex math or smaller blocks (to save CPU cycles).
Questions:
1. Can I get Int MAC loops to run as fast as floating point MAC loops on C28346? See detailed code example below.
2. Are there C2000 family processor that can do non trivial integer math as fast as FPU processors to floating point?
3.. Are there any tips for getting the C/C++ compilter to produce the most effiicent assembly code? I found writing a C source loop by different methods can create very different efficiencies of compiler output code with the most efficient result being a RPT / MAC sequence for an FIR algorithm.
4. Is there a cpu cycle efficient way to do 16 * 16 MACS without any loss of precision. Thus for 256 such multiplies, an accumulator size of 40 bits would be required.
Benchmarking the simple FIR code below which uses the preproccessor defined value of "TYPE" for the calculation I found the int implementation much less cycle efficient than the float implemenation. More specifically, if I get the following cycle counts from the simulator between the xx = 0 lines before and after the FIR loops.:\
float: 149 cycles.
int: 905 cycles (6 X the float cycles)
The float case used an "RTP" followed by a "MACF32" for the loop. However, the int case uses an acutal loop.
---------------------------------------------------------------------------
#define TYPE float
TYPE signal[256];
TYPE filter[32];
volatile int xx;
volatile TYPE result;
TYPE FIRPass(TYPE *input)
{
static TYPE filter[128] = {0,1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31,
0,1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31,
0,1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31,
0,1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31};
TYPE retVal = 0.0;
int index;
xx = 0;
for(index = 0; index < sizeof(filter)/sizeof(filter[0]); index++)
{
retVal += *(input + index) * *(filter + index);
}
xx = 0;
return retVal;
}
main()
{
static TYPE input[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31};
result = FIRPass(input);
}
------------------------------------------------
TYPE == float SIMULATOR CPU CYCLES = 149.
27 xx = 0;
0x00A04A: 8F40C0C0 MOVL XAR5, #0x00c0c0
0x00A04C: 8F00C040 MOVL XAR4, #0x00c040
0x00A04E: 761F0300 MOVW DP, #0x300
0x00A050: 2B00 MOV @0x0, #0
30 retVal += *(input + index) * *(filter + index);
0x00A051: E592 ZERO R2
0x00A052: E593 ZERO R3
0x00A053: E596 ZERO R6
0x00A054: E597 ZERO R7
0x00A055: C5A4 MOVL XAR7, @XAR4
0x00A056: F67F || RPT #127
0x00A057: E2501F85 MACF32 R7H, R3H, *XAR5++, *XAR7++
0x00A059: E71001BF ADDF32 R7H, R7H, R6H
0x00A05B: E710009B ADDF32 R3H, R3H, R2H
0x00A05D: 7700 NOP
0x00A05E: E71001D8 ADDF32 R0H, R3H, R7H
32 xx = 0;
------------------------------------------------------
TYPE == int SIMULATOR CPU CYCLES = 905
27 xx = 0;
main:
0x00A078: 8F00C080 MOVL XAR4, #0x00c080
0x00A07A: 8F40C040 MOVL XAR5, #0x00c040
0x00A07C: 761F0300 MOVW DP, #0x300
0x00A07E: 2B00 MOV @0x0, #0
25 TYPE retVal = 0.0;
0x00A07F: BE7F MOVB XAR6, #0x7f
0x00A080: 9A00 MOVB AL, #0x0
30 retVal += *(input + index) * *(filter + index);
0x00A081: 2D84 MOV T, *XAR4++
0x00A082: 3385 MPY P, T, *XAR5++
0x00A083: 94AB ADD AL, @PL
28 for(index = 0; index < sizeof(filter)/sizeof(filter[0]); index++)
0x00A084: 000EFFFD BANZ -3,AR6--
32 xx = 0;