This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi ,
How can we write a C code that will create assembly instrcutions invoving MAC instructions.does the compiler support to generate MAC instructions?
The following code doesn't seem to use MAC in the disassembly view:
float checkarr[]={0.3245023,0.00423456,0.500034,0.02542}; float adcarr[]={0.11123,0.56784,0.7832,0.001237}; float *ch,*ad; int i = 0; ch=(float*)checkarr; ad=(float*)adcarr; float sum; for (i=0;i<4;i++) { sum+=*ch++**ad++; }
Ajay,
Make sure you have unified memory model checked for the compiler (-mt). Also, you have to use the optimizer. I don't recall which level the MACF32 is generated at. As a starting point, I wrote a quick example case for you. There are other constructs you can use I'm sure (e.g., I am using indexed arrays, whereas you were trying to use pointers. You should be able to do it either way).
I am using c2000 cgtools v6.2.9, level -o3 optimization, -mt memory model. Actually, using CCSv6, I just created a new F28377D project. All the project settings are default except the optimization level that I had to manually select. Note that the volatile declaration for x, y, and z are just to keep the optimizer from making them disappear in this simple example.
volatile float x[100], y[100];
volatile float z;
int main(void) {
int i;
float sum;
sum = 0;
for(i=0; i<100; i++)
{
sum = sum + x[i]*y[i];
}
z = sum;
return 0;
}
generates this ASM code for the meat:
ZERO R7H ; [CPU_] |11|
RPT #99
|| MACF32 R7H,R3H,*XAR4++,*XAR7++ ; [CPU_] |11|
ADDF32 R3H,R3H,R2H ; [CPU_] |11|
ADDF32 R2H,R7H,R6H ; [CPU_] |11|
MOVW DP,#_z ; [CPU_U]
.dwpsn file "../main.c",line 14,column 2,is_stmt
ADDF32 R3H,R3H,R2H ; [CPU_] |14|
MOV32 R7H,*--SP ; [CPU_]
MOV32 @_z,R3H ; [CPU_] |14|
MOV32 R6H,*--SP ; [CPU_]
Regards,
David
thank you david,I checked your code by creating a different project & its using MAC,
I was planning to use F2837xD in one of our DSP based application, in which we need to implement 15 second order IIR filters,10 extra multiplications,& 10 extra additions(accumulating sum).We need to complete these operations within a sampling period & this period can be be from 1uS to as low as 625nS.
I tried with this controller but was able to complete only 4 IIR filters with 4 extra multiplications and 4 additions & In the disassembly view i couldn't see any MAC instruction being used,
can we accomplish this requirement if we proceed to optimize code.I tried as per your suggestions but i'm not getting MAC instructions corresponding to my C code.
We selected this controller because we needed a USB & SD interface too.Can we achieve our DSP requirement with this controller or if you could suggest an alternate...
Ajay,
Ajay N Namboothiri said:
I was planning to use F2837xD in one of our DSP based application, in which we need to implement 15 second order IIR filters,10 extra multiplications,& 10 extra additions(accumulating sum).We need to complete these operations within a sampling period & this period can be be from 1uS to as low as 625nS.
Well, things are a little tight. 625 ns is 125 cycles @ 200 MHz. Each IIR has perhaps 5 MACs in it plus data movement for the history. So, 15*5 = 75 MACs. You've also got 20 more operations (10 mpy, 10 add). In total, you're pushing 95 math operations, plus you've still got pointer setup and other overhead. You've only got 125 cycles to implement this. So, things are very tight.
Now this is just a rough estimate. I don't know if you can parallel any of your operations. If so, you've got the CLA at your disposal as well. That would double your available MIPS here.
I think you'll need to go to hand assembly for this however. You need to squeeze that last bit out of the code. The compiler is not going to do that for you.
Regards,
David
thanks david for your detailed explanation!
may be we can achieve our requirement with this,but going by assemby would be little time consuming,can you suggest
an alternate solution,so that we can work with that parallely using C itself,because i thought it would be good if we evaluate with a processor/controller that has more bandwidth as per our requirement.please suggest..