This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

prolblem regrading the SPLOOP of 64X+ DSP Processor..

hi

i am brajesh . i am working on the DSP 64x+ processor.when i run the program of SPLOOP the epilog par of SPLOOP is not running.(prilog and kernel is running )..

 

  • The SPLOOP/SPKERNEL instructions are very powerful and very flexible. But the arguments are not very easy to learn to use. I can partly use the arguments to SPLOOP for simple apps, but have not gained a true comprehension of the arguments for SPKERNEL. I have read the user's guide and reference guide many times and understand every word but not every concept. And I have been working with TI DSP for 20 years and have written very tightly optimized C6000 and C64x+ assembly code. It is very hard to get the exact nuances of the SPLOOP/SPKERNEL working right when writing original assembly code.

    The C6000 Optimizing C Compiler is very good at using these instructions, though. The only thing you should start from is the assembly output of the C Compiler. Write a C program with a for-loop and code that does what you think you want to do, and optimize it enough to get the compiler using the SPLOOP. Then use that SPLOOP/SPKERNEL from the compiler's assembly and modify it to work the way you want it to work.

    This way, you can start from something that will function correctly, then with each change you make you can also check if it is running right or not.

    And use the CPU Cycle Accurate simulator for this process. It will be able to tell you about mistake you make that the silicon would not. The silicon can catch some errors in critical loops, but only the worst ones like double-allocation of a processing element (using .s1 twice in parallel) and such. The simulator will report more errors and will do it more politely than the silicon.

    Of course, you should be able to just write your program in C and use the C optimization techniques to get your program running without having to ever write C64x+ assembly.

    So that raises the question of "why do you want to write SPLOOP code?"

  • hi

     i write a simple fpr loop c program and build it at (OP-2) but the output doesn,t contain any SPLLOP..

  • Here is a simple program that produces 2 SPLOOP loops, test64xp.zip. I used BIOS 5.33.01 (not needed other than convenience for the linker command file) and CGT 6.1.11.

    test64xp.zip
  • hi

    The code send by you contain two SPLOOP, and it take 199 cycles (without printf statement ) . i have modify one part of your code then it take 152 cycles.

    This is your code:


    #include <stdio.h>

    int x[100], y[100], z = 0;

    void main( void )
    {
     int i;

     for ( i = 0; i < 100; i++ )
     {
      x[i] = i+1;
      y[i] = i*22;
     }

     for ( i = 0; i < 100; i++ )
     {
      z += x[i]*y[i];
     }

     printf( "z=%d\n", z );
    }

     

     

    I have modified only sceond for loop of theprogram and write it into assembly . The sceond  " for loop " of your program takes 114 cycles  while the corresponding assembly code of my program take 64 cycles.

     My code is :

     #include <stdio.h> 

    void main( void )
    {
     int i;
     int n=50;
     int  x[100],y[100];
     int z=0;
     
     for(i=0;i<100;i++)
     {
      x[i]=i+1;
      y[i]=i*22;
     }
     i=0;
        sop(x,y,n,z);

     printf( "\n z=%d :",z);
    }

     

     

    *assembly code of sop(x,y,n,z)....

     .data
     .global _sop
     .text

    _sop
      MVC .S2 A6,ILC
      MVK .S2 0,B7
    ||  MVK .S1 0,A7
      MVK .S1 0,A11
         NOP 1
     

      SPLOOP 1
     

           LDDW   .D2T2  *B4++,      B17:B16 ;
    ||     LDDW   .D1T1  *A4++,      A17:A16 ;
     
      NOP 1
      NOP 1
      NOP 1
      NOP 1

        MPY .M2X A17,B17,B5
    ||  MPY .M1X A16,B16,A5

      NOP 1
                 
      SPKERNEL 7,0
    ||  ADD .L2 B5,B7,B7
    ||  ADD .L1 A5,A7,A7

      ADD .L1X A7,B7,A11
      
      STW .D2 B7,*B6

      .end

     

     

     

  • You are very welcome for the example code to generate the SPLOOP instruction sequence.

    Hopefully, you will be able to implement your desired algorithms without having to go to the effort of programming in assembly. You can use guidance from the Optimizing C Compiler User's Guide to help you find ways to further optimize your algorithm in C. There are compiler switches and #pragmas that will help a lot, plus careful use of the "restrict" keyword.

    It looks as though you are learning well the fine art of writing C64x+ assembly code. You have probably found that it is much harder to find code bugs in assembly than in C.

  • hi

     Please tell me If  write a fully optimise C code then it's performence is euivalent to assembly code or not ?  Can you give me some comprision of C and assembly program and their performence in term of number of cycles require.

     My algorithm is computation intensive soi think bugs is not a big problem.                                                                                                                                                                                   The C code you are sending is fully optimise or not ..? 

  • Comparisons between C (and any high-level language) and assembly have been going on as long as the language has been around. In the 1980's, I remember an Ada compiler vendor claiming to achieve 102% of assembly performance. Most of us found that to be unlikely. I have seen estimates for TI's C compiler in the 60-80% range, but have also experienced customers being able to achieve better than 90%. It is very common to move to assembly for the tightest inner loops, but only if you have to do that to make the program work within the constraints of your system.

    TI used to offer a 4-day workshop on C6000 Optimization that did a good job of introducing the concepts needed to understand how to fully optimize a program. It might still be offered, but I am not sure. A good source for information on optimization is the TI Embedded Processors Wiki where you can look at the Code Generation Tools Category and look for topics on Optimization; there are several which you will find helpful.

    The C code I sent was intended only to fulfil your request to see the C compiler generate an SPLOOP instruction sequence, not to demonstrate optimization techniques. Please refer to the Wiki pages referenced above for help with optimization techniques and example code/labs.

    If this is a learning process you are going through, then the reference materials mentioned will be helpful and your continued analysis of the compiler code and compiler capabilities will be helpful.

    If this is a project that has deadlines and specifications, then I would strongly recommend contacting one of the many Third Party consultant firms who have either the ability to work with you on the project or to complete the project to your specifications.

  • hi..

     i am wriing program using SPLOOP and it working sucessfully and efficiently. This program running sucessfully when i run step by step ( one sep one ime) but it not running when try to run complete program at a time,  it give some error message like " SIM is not stable " .

    i have one more problem regarding " MVD " insruction i am unable to use it.   

    can you suggest me some way to deburg my program.

  • I missed the fact that you had replied to this thread.

    I question whether your program is "working successfully and efficiently" since it is causing the simulator to go unstable. The simulator is designed to detect and report many different error scenarios, but it must have been unable to report whatever you have done in your program.

    The C64x+ CPU & Instruction Set Reference Guide is the place to go to understand the assembly instructions, including MVD.

    As I have suggested before, the compiler is the best place to start, then modify its code if you need to optimize further. It is very impressive that you have done so well already with your coding, but it is very difficult to figure out subtle problems if the assembler or simulator do not catch them for you.

    Keep up the good work.

  • Dear Randy,

    I have read your post about SPLOOP. I am writing asm codes on DM6437 for Image processing applications. SPLOOP is very complex instruction. I have read several pdf about C64xp programming and code optimization.  I have also investigate your short sample codes. You said, you are 20 years TI engineer and have difficulties for asm programming. I want to copy an image between memories and writen below asm codes, However I couldn't set SPKERNEL parameters, at the end I have tried all combinations by developing a program trying all parameter combinations between 0-12, as below

    SPLOOP_COEF  <=  1,2...,11,12

    ...

    ...

    SPKERNEL_COEF_1 <= 0,1,2,3,...,12 ; 

    SPKERNEL_COEF_2 <= 0,1,2,3,...,12 ; 

     

    totally 12x12x12 = 1728 combination

    However I couldn't compile the my asm file succesfully; some of the compiler results;

    1 ---------------------------------------------------------------------

    "D:\argenc\dsp\code\video_preview_hedef\evmDM6437\max_gg2.asm", ERROR!   at line 204: 

     [E1400] 

             Cycle 12 out of range for ii 2 

    SPKERNEL SPKERNEL_COEF_1,SPKERNEL_COEF_2 

     1 Assembly Error, No Assembly Warnings 

     Errors in Source - Assembler Aborted 

     >> Compilation failure 

    2---------------------------------------------------------------

    "D:\argenc\dsp\code\video_preview_hedef\evmDM6437\max_gg2.asm", ERROR!   at line 204: 

     [E0802] 

             Multi-cycle NOP instructions are illegal in the execute packet 

               preceding an execute packet containing an SPKERNEL instruction. 

    1 Assembly Error, No Assembly Warnings 

    SPKERNEL SPKERNEL_COEF_1,SPKERNEL_COEF_2 

    || STDW .D2T1 A3:A2,*B5++ 

    || STDW .D1T2 B3:B2,*A5++ 

     Errors in Source - Assembler Aborted 

     >> Compilation failure 

    ....................................

    MVC .S2 A1,ILC ;Do 8 loops

    NOP 3 ;4 cycle for ILC to load

     

    SPLOOP SPLOOP_COEF

    LDDW .D1T1 *A0++,A3:A2

    || LDDW .D2T2 *B1++,B3:B2

    NOP 4

     

    SPKERNEL SPKERNEL_COEF_1,SPKERNEL_COEF_2

    || STDW .D2T1 A3:A2,*B5++

    || STDW .D1T2 B3:B2,*A5++

    ....................................

     

    What is the problem, Are  LDDW and STDW be used with SPLOOP for 128bit copy operation?

    Thank for your interest,

    Best

    Goksel Gunlu

    Electronics Engineer

     

  • As you can tell from earlier in this thread, the best advice is to write your program in C with max optimization and stay away from assembly. Especially, stay away from customizing and SPLOOP/SPKERNEL loop without a full understanding of their arguments - I do not have a full understanding of their arguments.

    My comments on your post:

    1. Do not do a memory copy this way. Use IDMA1 or QDMA. Stay aware of potential cache coherency issues.
    2. Write this code in C. Use intrinsics like _amem8. Use pragmas like MUST_ITERATE(,,2) and/or UNROLL(2). Then look at the results.
    3. Consider the error messages and change your code to avoid them.
    4. Write out the instructions in your assembly loop in a spreadsheet-like format to see how you expect SPLOOP to schedule and execute the instructions.
    5. Your loop only executes 8 times, so write it out manually without the SPLOOP. See what you can learn about writing this code in sequence. Even if you will do it more times later, you will learn something about assembly scheduling. Step through the code in the simulator to see how the pipeline works and which registers get updated on which cycle.
    6. Take a look at the Software Pipelining module of the C6000 Optimization Workshop. The materials are available on the TI Wiki Pages here.
    7. Attend a C6000 Optimization Workshop. I am not sure how often they are held.

    If you decide all you want is more information on SPKERNEL arguments, we can request this sub-thread to be moved to the Compiler forum.

    Regards,
    RandyP

     

    If you need more help, please reply back. If this answers the question, please click  Verify Answer  , below.