This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Data Cross Paths on TMS320C6713 DSK



Hi everyone,

 

Can I use .D1x cross path in parallel with .D1?

I mean something like the code below:

 

STW .D1 A1 , *++A4[1]

|| STW .D1x B2 , *++A4

 

If not, what other strategy can I use to do the storing above in the order mentioned, and in PARALLEL?

(BTW I didn't find my answer in TI Instruction Set Manuals).

 

Thanks in advance.

  • Salehe Erfanian Ebadi said:
    what other strategy can I use

    Don't write it in assembly in the first place.  Write it in C.  Or even linear assembly.  Then let the tools figure it out for you.  There probably is a way to have those instructions in parallel.  It probably involves choosing different registers so that a cross path is not required.

    Thanks and regards,

    -George

  • I wrote it both in C and Linear Assembly first. In fact, I DO need to maintain the registers the same as above; and because I have to do a LDW and 2 STWs in my program the .D1 and .D2 are kept busy all the time. So, This is really hard to get the .D1x working in parallel with the .D1 or .D2...

    Do you have any other idea on how to achieve this?

    The code is part of a loop, and it is like this:

     

    loop:          LDW      .D2        *B4++,B1

     ||          STW      .D1        A1,*++A4[1]

     ||               STW      .D1x       B1,*++A4

     ||               SUB .L2 B0,1,B0

     ||    [B0]     B .S2 loop

     

    Actually the two STWs have a conflict here.

    Does anybody think there's a way of doing this?

     

    Thank you.

  • There are two things wrong with that code:

    • You cannot use two D1-unit instructions in parallel.  If you have two stores in parallel, one must be on D1 and the other must be on D2.
    • On C6713, the D unit does not have access to the cross path.

    This loop is very simple; if the C code is properly annotated, the compiler should be able to do a very good job on this loop.  You should focus on optimizing the C code and let the compiler take care of the instructions.  Was the compiler able to software pipeline the loop?