TI E2E Community
Digital Signal Processors (DSP)
C6000 Multicore DSP
Keystone Multicore Forum (C66, 66A, AM5)
How to load a single-precise data in linear assembly code?
I wrote a linear assembly function as follows:
_test: .cproc a_0
Then in another file in the project calling this function:
int a = 1;
float b = 1.0;
int temp1 = 0;
int temp2 = 0.0;
temp1 = test(&a);
temp2 = test(&b);
I ran the above code on EVM6678, and the following result appeared in the console window:
It seemed that LDW couldn't work right with single precise data, can somebody tell me why?
You have temp2 as an INT and when you returned the value then your printf or temp2 is casting an INT to float.
Please click the Verify Answer button on this post if it answers your question.
Chad,Thanks, but I checked the code, I mistyped the code here, the original code is as follows:
foat temp2 = 0.0;
Did you declare the prototype of the function? Such as:
float test(void *a);
If not, the output of test(&b) will be treated as integer and be converted to float by INTSP.
I agree with Allen's comments.
That said, back to the basic question of the LDW assembly instruction. It's just going to return the 32 bit value that's stored at the location that's being pointed to. It doesn't care if it's float, int, 2 16bit values packed, etc. It simply returns the 32bits exactly as they're stored in memory. It's your type casting/declarations in C that's affecting how this data is treated.
If you want to, single step into the assembly code, look where the a_0 register (it's going to be A4 since A4 is passed in as the first variable of a function) look at the memory location pointed to by A4, display it as a SP Float in a memory window and see what you observe, display it again as plane hex value, step through the code until you get the LDW executed (4 single steps after LDW is when the data will land in the register (I assume it would be B4 register, but you'll have to look at the code in dissassembly to see.) Now, you'll see this is the exact same 32bit hex data as was in memory and this is what gets returned back.
Thanks Allen and Chad,
With your help, I totally got the right result. But on the other side, I'm sad with the result. I studied on linear assembly in order to improve the processing speed of code, but after it, I found that I failed.
The length of the array in my test is 264, when optimization level was not chosen, the CPU cycle of c code is 11,923 , and the CPU cycle of linear assembly code is 8,350; but after o2 optimization level was chosen , the CPU cycle of c code is 440, and the CPU cycle of linear assembly code is 962, which is two times of the c code! Does it mean that it's so hard to optimize the code? Following spru187t, I tried the optimization methods in section 3 and section 4, but except optimization level coming with the complier, no other methods work. If I badly need to optimize it furtherly, what can I do?
In this situation, I think it need the manually assembly coding. You should assign the registers and arrange the pipeline by yourself in order to utilize the calculation resource as much as possible. It will be more complex and time-cosuming than linear assembly, but also more effective.
This thread shows a specific linear assembly test routine and a specific C-code benchmarking routine. Your original questions and the insightful answers were all for those specific code examples.
may may92122The length of the array in my test is 264, ... after o2 optimization level was chosen , the CPU cycle of c code is 440, and the CPU cycle of linear assembly code is 962, ...
You are now talking about completely different program code, both the linear assembly and the main() function in C. The linear assembly example was a trivial one that you would never use in a real application.
And you now you seem to have 2 versions of the same routine, one in C and one in linear assembly. This has not been shown in any of your posts for this thread.
It is no longer clear what your question is, at least not to me. Chad and Allen may know exactly what you are doing, but I do not.
Search for answers, Ask a question, click Verify when complete, Help others, Learn more.
I'd have to concur, it's difficult to tell specifically what's being referenced since it's not the code that was originally being discussed here.
You may want to post another thread regarding the optimization, but you'll want to do so in the C/C++ Compiler Forum which includes coverage for assembler and linear assembly as well. That said, I'll note that linear assembly still requires you to 'unroll' the loop to give it the flexibility to build optimal code, and the Compiler itself is designed if given the freedom to generate highly optimized code, and it's recommended to not go to assembly/linear assembly if not necessary, to keep your code as portable as possible.
I see now, thanks everyone.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.