I wrote a linear assembly function as follows:
.def _test
_test: .cproc a_0
.reg val_a0
LDW *a_0++,val_a0
.return val_a0
.endproc
Then in another file in the project calling this function:
void main()
{
int a = 1;
float b = 1.0;
int temp1 = 0;
int temp2 = 0.0;
temp1 = test(&a);
printf("%d\n",temp1);
temp2 = test(&b);
printf("%f\n",temp2);
}
I ran the above code on EVM6678, and the following result appeared in the console window:
1
1065353216.00000
It seemed that LDW couldn't work right with single precise data, can somebody tell me why?
You have temp2 as an INT and when you returned the value then your printf or temp2 is casting an INT to float.
Best Regards,Chad
------------------------------------------------------------------------------------------------------------
Please click the Verify Answer button on this post if it answers your question.
Chad,Thanks, but I checked the code, I mistyped the code here, the original code is as follows:
foat temp2 = 0.0;
Hi May,
Did you declare the prototype of the function? Such as:
float test(void *a);
If not, the output of test(&b) will be treated as integer and be converted to float by INTSP.
Allen
Please press the "Verify Answer" button if you think the post is helpful to your question.Thanks.
I agree with Allen's comments.
That said, back to the basic question of the LDW assembly instruction. It's just going to return the 32 bit value that's stored at the location that's being pointed to. It doesn't care if it's float, int, 2 16bit values packed, etc. It simply returns the 32bits exactly as they're stored in memory. It's your type casting/declarations in C that's affecting how this data is treated.
If you want to, single step into the assembly code, look where the a_0 register (it's going to be A4 since A4 is passed in as the first variable of a function) look at the memory location pointed to by A4, display it as a SP Float in a memory window and see what you observe, display it again as plane hex value, step through the code until you get the LDW executed (4 single steps after LDW is when the data will land in the register (I assume it would be B4 register, but you'll have to look at the code in dissassembly to see.) Now, you'll see this is the exact same 32bit hex data as was in memory and this is what gets returned back.
Thanks Allen and Chad,
With your help, I totally got the right result. But on the other side, I'm sad with the result. I studied on linear assembly in order to improve the processing speed of code, but after it, I found that I failed.
The length of the array in my test is 264, when optimization level was not chosen, the CPU cycle of c code is 11,923 , and the CPU cycle of linear assembly code is 8,350; but after o2 optimization level was chosen , the CPU cycle of c code is 440, and the CPU cycle of linear assembly code is 962, which is two times of the c code! Does it mean that it's so hard to optimize the code? Following spru187t, I tried the optimization methods in section 3 and section 4, but except optimization level coming with the complier, no other methods work. If I badly need to optimize it furtherly, what can I do?
Best regards,
May
In this situation, I think it need the manually assembly coding. You should assign the registers and arrange the pipeline by yourself in order to utilize the calculation resource as much as possible. It will be more complex and time-cosuming than linear assembly, but also more effective.
May,
This thread shows a specific linear assembly test routine and a specific C-code benchmarking routine. Your original questions and the insightful answers were all for those specific code examples.
may may92122The length of the array in my test is 264, ... after o2 optimization level was chosen , the CPU cycle of c code is 440, and the CPU cycle of linear assembly code is 962, ...
You are now talking about completely different program code, both the linear assembly and the main() function in C. The linear assembly example was a trivial one that you would never use in a real application.
And you now you seem to have 2 versions of the same routine, one in C and one in linear assembly. This has not been shown in any of your posts for this thread.
It is no longer clear what your question is, at least not to me. Chad and Allen may know exactly what you are doing, but I do not.
Regards,RandyP
Search for answers, Ask a question, click Verify when complete, Help others, Learn more.
Randy,
I'd have to concur, it's difficult to tell specifically what's being referenced since it's not the code that was originally being discussed here.
You may want to post another thread regarding the optimization, but you'll want to do so in the C/C++ Compiler Forum which includes coverage for assembler and linear assembly as well. That said, I'll note that linear assembly still requires you to 'unroll' the loop to give it the flexibility to build optimal code, and the Compiler itself is designed if given the freedom to generate highly optimized code, and it's recommended to not go to assembly/linear assembly if not necessary, to keep your code as portable as possible.
Best Regards,
Chad
I see now, thanks everyone.