hi
i am brajesh . i am working on the DSP 64x+ processor.when i run the program of SPLOOP the epilog par of SPLOOP is not running.(prilog and kernel is running )..
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
hi
i am brajesh . i am working on the DSP 64x+ processor.when i run the program of SPLOOP the epilog par of SPLOOP is not running.(prilog and kernel is running )..
The SPLOOP/SPKERNEL instructions are very powerful and very flexible. But the arguments are not very easy to learn to use. I can partly use the arguments to SPLOOP for simple apps, but have not gained a true comprehension of the arguments for SPKERNEL. I have read the user's guide and reference guide many times and understand every word but not every concept. And I have been working with TI DSP for 20 years and have written very tightly optimized C6000 and C64x+ assembly code. It is very hard to get the exact nuances of the SPLOOP/SPKERNEL working right when writing original assembly code.
The C6000 Optimizing C Compiler is very good at using these instructions, though. The only thing you should start from is the assembly output of the C Compiler. Write a C program with a for-loop and code that does what you think you want to do, and optimize it enough to get the compiler using the SPLOOP. Then use that SPLOOP/SPKERNEL from the compiler's assembly and modify it to work the way you want it to work.
This way, you can start from something that will function correctly, then with each change you make you can also check if it is running right or not.
And use the CPU Cycle Accurate simulator for this process. It will be able to tell you about mistake you make that the silicon would not. The silicon can catch some errors in critical loops, but only the worst ones like double-allocation of a processing element (using .s1 twice in parallel) and such. The simulator will report more errors and will do it more politely than the silicon.
Of course, you should be able to just write your program in C and use the C optimization techniques to get your program running without having to ever write C64x+ assembly.
So that raises the question of "why do you want to write SPLOOP code?"
Here is a simple program that produces 2 SPLOOP loops, test64xp.zip. I used BIOS 5.33.01 (not needed other than convenience for the linker command file) and CGT 6.1.11.
hi
The code send by you contain two SPLOOP, and it take 199 cycles (without printf statement ) . i have modify one part of your code then it take 152 cycles.
This is your code:
#include <stdio.h>
int x[100], y[100], z = 0;
void main( void )
{
int i;
for ( i = 0; i < 100; i++ )
{
x[i] = i+1;
y[i] = i*22;
}
for ( i = 0; i < 100; i++ )
{
z += x[i]*y[i];
}
printf( "z=%d\n", z );
}
I have modified only sceond for loop of theprogram and write it into assembly . The sceond " for loop " of your program takes 114 cycles while the corresponding assembly code of my program take 64 cycles.
My code is :
#include <stdio.h>
void main( void )
{
int i;
int n=50;
int x[100],y[100];
int z=0;
for(i=0;i<100;i++)
{
x[i]=i+1;
y[i]=i*22;
}
i=0;
sop(x,y,n,z);
printf( "\n z=%d :",z);
}
*assembly code of sop(x,y,n,z)....
.data
.global _sop
.text
_sop
MVC .S2 A6,ILC
MVK .S2 0,B7
|| MVK .S1 0,A7
MVK .S1 0,A11
NOP 1
SPLOOP 1
LDDW .D2T2 *B4++, B17:B16 ;
|| LDDW .D1T1 *A4++, A17:A16 ;
NOP 1
NOP 1
NOP 1
NOP 1
MPY .M2X A17,B17,B5
|| MPY .M1X A16,B16,A5
NOP 1
SPKERNEL 7,0
|| ADD .L2 B5,B7,B7
|| ADD .L1 A5,A7,A7
ADD .L1X A7,B7,A11
STW .D2 B7,*B6
.end
You are very welcome for the example code to generate the SPLOOP instruction sequence.
Hopefully, you will be able to implement your desired algorithms without having to go to the effort of programming in assembly. You can use guidance from the Optimizing C Compiler User's Guide to help you find ways to further optimize your algorithm in C. There are compiler switches and #pragmas that will help a lot, plus careful use of the "restrict" keyword.
It looks as though you are learning well the fine art of writing C64x+ assembly code. You have probably found that it is much harder to find code bugs in assembly than in C.
hi
Please tell me If write a fully optimise C code then it's performence is euivalent to assembly code or not ? Can you give me some comprision of C and assembly program and their performence in term of number of cycles require.
My algorithm is computation intensive soi think bugs is not a big problem. The C code you are sending is fully optimise or not ..?
Comparisons between C (and any high-level language) and assembly have been going on as long as the language has been around. In the 1980's, I remember an Ada compiler vendor claiming to achieve 102% of assembly performance. Most of us found that to be unlikely. I have seen estimates for TI's C compiler in the 60-80% range, but have also experienced customers being able to achieve better than 90%. It is very common to move to assembly for the tightest inner loops, but only if you have to do that to make the program work within the constraints of your system.
TI used to offer a 4-day workshop on C6000 Optimization that did a good job of introducing the concepts needed to understand how to fully optimize a program. It might still be offered, but I am not sure. A good source for information on optimization is the TI Embedded Processors Wiki where you can look at the Code Generation Tools Category and look for topics on Optimization; there are several which you will find helpful.
The C code I sent was intended only to fulfil your request to see the C compiler generate an SPLOOP instruction sequence, not to demonstrate optimization techniques. Please refer to the Wiki pages referenced above for help with optimization techniques and example code/labs.
If this is a learning process you are going through, then the reference materials mentioned will be helpful and your continued analysis of the compiler code and compiler capabilities will be helpful.
If this is a project that has deadlines and specifications, then I would strongly recommend contacting one of the many Third Party consultant firms who have either the ability to work with you on the project or to complete the project to your specifications.
hi..
i am wriing program using SPLOOP and it working sucessfully and efficiently. This program running sucessfully when i run step by step ( one sep one ime) but it not running when try to run complete program at a time, it give some error message like " SIM is not stable " .
i have one more problem regarding " MVD " insruction i am unable to use it.
can you suggest me some way to deburg my program.
I missed the fact that you had replied to this thread.
I question whether your program is "working successfully and efficiently" since it is causing the simulator to go unstable. The simulator is designed to detect and report many different error scenarios, but it must have been unable to report whatever you have done in your program.
The C64x+ CPU & Instruction Set Reference Guide is the place to go to understand the assembly instructions, including MVD.
As I have suggested before, the compiler is the best place to start, then modify its code if you need to optimize further. It is very impressive that you have done so well already with your coding, but it is very difficult to figure out subtle problems if the assembler or simulator do not catch them for you.
Keep up the good work.
Dear Randy,
I have read your post about SPLOOP. I am writing asm codes on DM6437 for Image processing applications. SPLOOP is very complex instruction. I have read several pdf about C64xp programming and code optimization. I have also investigate your short sample codes. You said, you are 20 years TI engineer and have difficulties for asm programming. I want to copy an image between memories and writen below asm codes, However I couldn't set SPKERNEL parameters, at the end I have tried all combinations by developing a program trying all parameter combinations between 0-12, as below
SPLOOP_COEF <= 1,2...,11,12
...
...
SPKERNEL_COEF_1 <= 0,1,2,3,...,12 ;
SPKERNEL_COEF_2 <= 0,1,2,3,...,12 ;
totally 12x12x12 = 1728 combination
However I couldn't compile the my asm file succesfully; some of the compiler results;
1 ---------------------------------------------------------------------
"D:\argenc\dsp\code\video_preview_hedef\evmDM6437\max_gg2.asm", ERROR! at line 204:
[E1400]
Cycle 12 out of range for ii 2
SPKERNEL SPKERNEL_COEF_1,SPKERNEL_COEF_2
1 Assembly Error, No Assembly Warnings
Errors in Source - Assembler Aborted
>> Compilation failure
2---------------------------------------------------------------
"D:\argenc\dsp\code\video_preview_hedef\evmDM6437\max_gg2.asm", ERROR! at line 204:
[E0802]
Multi-cycle NOP instructions are illegal in the execute packet
preceding an execute packet containing an SPKERNEL instruction.
1 Assembly Error, No Assembly Warnings
SPKERNEL SPKERNEL_COEF_1,SPKERNEL_COEF_2
|| STDW .D2T1 A3:A2,*B5++
|| STDW .D1T2 B3:B2,*A5++
Errors in Source - Assembler Aborted
>> Compilation failure
....................................
MVC .S2 A1,ILC ;Do 8 loops
NOP 3 ;4 cycle for ILC to load
SPLOOP SPLOOP_COEF
LDDW .D1T1 *A0++,A3:A2
|| LDDW .D2T2 *B1++,B3:B2
NOP 4
SPKERNEL SPKERNEL_COEF_1,SPKERNEL_COEF_2
|| STDW .D2T1 A3:A2,*B5++
|| STDW .D1T2 B3:B2,*A5++
....................................
What is the problem, Are LDDW and STDW be used with SPLOOP for 128bit copy operation?
Thank for your interest,
Best
Goksel Gunlu
Electronics Engineer
As you can tell from earlier in this thread, the best advice is to write your program in C with max optimization and stay away from assembly. Especially, stay away from customizing and SPLOOP/SPKERNEL loop without a full understanding of their arguments - I do not have a full understanding of their arguments.
My comments on your post:
If you decide all you want is more information on SPKERNEL arguments, we can request this sub-thread to be moved to the Compiler forum.
Regards,
RandyP
If you need more help, please reply back. If this answers the question, please click Verify Answer , below.