This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C674x ADD and SUB instructions issue

The issue is not solved at the following post: https://e2e.ti.com/support/dsp/omap_applications_processors/f/42/t/438821

I have reproduced the issue by a simple code with correct argument ordering. The SUB instruction does not work when the same unit is used for the instructions and when there is NOP 2 between instructions.

For example:

06B4F218            ADDSP.L1X     A7,B13,A13
00002000            NOP           2
060440D8            SUB.L1        2,A1,A12

06B4FE3A            SUBSP.S2X     B7,A13,B13
00002000            NOP           2
0604A5A2            SUB.S2        5,B1,B12

Please see the attached file for a series of assembly code:

C6746_SUB.txt
07BE9DC2            SUBAW.D2      SP,0x14,SP
07BC18F1            OR.D1X        0,SP,FP
07BC22F4 ||         STW.D2T1      FP,*+SP[1]
073C42F5            STW.D2T1      A14,*+SP[2]
073CE276 ||         STW.D1T2      DP,*+FP[7]
06BC62F5            STW.D2T1      A13,*+SP[3]
06BD0276 ||         STW.D1T2      B13,*+FP[8]
063C82F5            STW.D2T1      A12,*+SP[4]
063D2276 ||         STW.D1T2      B12,*+FP[9]
05BCA2F5            STW.D2T1      A11,*+SP[5]
05BD4276 ||         STW.D1T2      B11,*+FP[10]
053CC2F5            STW.D2T1      A10,*+SP[6]
053D6276 ||         STW.D1T2      B10,*+FP[11]
071808F1            OR.D1         0,A6,A14
051816A1 ||         OR.S1X        0,B6,A10
05183D42 ||         ADDAW.D2      B6,0x1,B10
029018F3            OR.D2X        0,A4,B5
000002AA ||         MVK.S2        0x0005,B0
04383764            LDDW.D1T1     *A14++[1],A9:A8
03280265            LDW.D1T1      *+A10[0],A6
032802E6 ||         LDW.D2T2      *+B10[0],B6
04383764            LDDW.D1T1     *A14++[1],A9:A8
03A84265            LDW.D1T1      *+A10[2],A7
03A842E6 ||         LDW.D2T2      *+B10[2],B7
00000000            NOP           
06A0BE03            MPYSP.M2X     B5,A8,B13
06911E00 ||         MPYSP.M1X     A8,B4,A13
          LOOP:
008000A9            MVK.S1        0x0001,A1
008000AA ||         MVK.S2        0x0001,B1
00008000            NOP           5
06A0BE02            MPYSP.M2X     B5,A8,B13
00008000            NOP           5
06B4F218            ADDSP.L1X     A7,B13,A13
00002000            NOP           2
060440D8            SUB.L1        2,A1,A12
00008000            NOP           5
0119B21A            ADDSP.L2X     B13,A6,B2
00002000            NOP           2
060440DA            SUB.L2        2,B1,B12
00008000            NOP           5
06B4FE18            ADDSP.S1X     A7,B13,A13
00002000            NOP           2
060465A0            SUB.S1        3,A1,A12
00008000            NOP           5
0119BE1A            ADDSP.S2X     B13,A6,B2
00002000            NOP           2
060465A2            SUB.S2        3,B1,B12
00008000            NOP           5
06911E00            MPYSP.M1X     A8,B4,A13
00008000            NOP           5
0134D2B8            SUBSP.L1X     B6,A13,A2
00002000            NOP           2
060480D8            SUB.L1        4,A1,A12
00008000            NOP           5
06B4F23A            SUBSP.L2X     B7,A13,B13
00002000            NOP           2
060480DA            SUB.L2        4,B1,B12
00008000            NOP           5
0119BEB8            SUBSP.S1X     B6,A13,A2
00002000            NOP           2
0604A5A0            SUB.S1        5,A1,A12
00008000            NOP           5
06B4FE3A            SUBSP.S2X     B7,A13,B13
00002000            NOP           2
0604A5A2            SUB.S2        5,B1,B12
00008000            NOP           5
000029C2            SUB.D2        B0,0x1,B0
2FFFED92     [ B0]  B.S2          LOOP (PC-148 = 0x118118cc)
00008000            NOP           5
07BC18F0            OR.D1X        0,SP,FP
053CC2E5            LDW.D2T1      *+SP[6],A10
053D6266 ||         LDW.D1T2      *+FP[11],B10
05BCA2E5            LDW.D2T1      *+SP[5],A11
05BD4266 ||         LDW.D1T2      *+FP[10],B11
063C82E5            LDW.D2T1      *+SP[4],A12
063D2266 ||         LDW.D1T2      *+FP[9],B12
06BC62E5            LDW.D2T1      *+SP[3],A13
06BD0266 ||         LDW.D1T2      *+FP[8],B13
073C42E5            LDW.D2T1      *+SP[2],A14
073CE267 ||         LDW.D1T2      *+FP[7],DP
000C0362 ||         B.S2          B3
07BC22E4            LDW.D2T1      *+SP[1],FP
178014FE            ADDAW.D2      B15,20,SP
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP

If the instructions are successive (there is not NOP) or there is NOP except 2 cycles between the instructions, it works fine. It seems not to work only when 3 cycles passed after the ADDSP or SUBSP instruction.

In addition, the ADD instruction does not also work.

For example:

0119B21A            ADDSP.L2X     B13,A6,B2
00002000            NOP           2
0604405A            ADD.L2        2,B1,B12

0119BEB8            SUBSP.S1X     B6,A13,A2
00002000            NOP           2
0604A1A0            ADD.S1        5,A1,A12

Please see the attached file for a series of assembly code:

C6746_ADD.txt
07BE9DC2            SUBAW.D2      SP,0x14,SP
07BC18F1            OR.D1X        0,SP,FP
07BC22F4 ||         STW.D2T1      FP,*+SP[1]
073C42F5            STW.D2T1      A14,*+SP[2]
073CE276 ||         STW.D1T2      DP,*+FP[7]
06BC62F5            STW.D2T1      A13,*+SP[3]
06BD0276 ||         STW.D1T2      B13,*+FP[8]
063C82F5            STW.D2T1      A12,*+SP[4]
063D2276 ||         STW.D1T2      B12,*+FP[9]
05BCA2F5            STW.D2T1      A11,*+SP[5]
05BD4276 ||         STW.D1T2      B11,*+FP[10]
053CC2F5            STW.D2T1      A10,*+SP[6]
053D6276 ||         STW.D1T2      B10,*+FP[11]
071808F1            OR.D1         0,A6,A14
051816A1 ||         OR.S1X        0,B6,A10
05183D42 ||         ADDAW.D2      B6,0x1,B10
029018F3            OR.D2X        0,A4,B5
000002AA ||         MVK.S2        0x0005,B0
04383764            LDDW.D1T1     *A14++[1],A9:A8
03280265            LDW.D1T1      *+A10[0],A6
032802E6 ||         LDW.D2T2      *+B10[0],B6
04383764            LDDW.D1T1     *A14++[1],A9:A8
03A84265            LDW.D1T1      *+A10[2],A7
03A842E6 ||         LDW.D2T2      *+B10[2],B7
00000000            NOP           
06A0BE03            MPYSP.M2X     B5,A8,B13
06911E00 ||         MPYSP.M1X     A8,B4,A13
          LOOP:
008000A9            MVK.S1        0x0001,A1
008000AA ||         MVK.S2        0x0001,B1
00008000            NOP           5
06A0BE02            MPYSP.M2X     B5,A8,B13
00008000            NOP           5
06B4F218            ADDSP.L1X     A7,B13,A13
00002000            NOP           2
06044058            ADD.L1        2,A1,A12
00008000            NOP           5
0119B21A            ADDSP.L2X     B13,A6,B2
00002000            NOP           2
0604405A            ADD.L2        2,B1,B12
00008000            NOP           5
06B4FE18            ADDSP.S1X     A7,B13,A13
00002000            NOP           2
060461A0            ADD.S1        3,A1,A12
00008000            NOP           5
0119BE1A            ADDSP.S2X     B13,A6,B2
00002000            NOP           2
060461A2            ADD.S2        3,B1,B12
00008000            NOP           5
06911E00            MPYSP.M1X     A8,B4,A13
00008000            NOP           5
0134D2B8            SUBSP.L1X     B6,A13,A2
00002000            NOP           2
06048058            ADD.L1        4,A1,A12
00008000            NOP           5
06B4F23A            SUBSP.L2X     B7,A13,B13
00002000            NOP           2
0604805A            ADD.L2        4,B1,B12
00008000            NOP           5
0119BEB8            SUBSP.S1X     B6,A13,A2
00002000            NOP           2
0604A1A0            ADD.S1        5,A1,A12
00008000            NOP           5
06B4FE3A            SUBSP.S2X     B7,A13,B13
00002000            NOP           2
0604A1A2            ADD.S2        5,B1,B12
00008000            NOP           5
000029C2            SUB.D2        B0,0x1,B0
2FFFED92     [ B0]  B.S2          LOOP (PC-148 = 0x118118cc)
00008000            NOP           5
07BC18F0            OR.D1X        0,SP,FP
053CC2E5            LDW.D2T1      *+SP[6],A10
053D6266 ||         LDW.D1T2      *+FP[11],B10
05BCA2E5            LDW.D2T1      *+SP[5],A11
05BD4266 ||         LDW.D1T2      *+FP[10],B11
063C82E5            LDW.D2T1      *+SP[4],A12
063D2266 ||         LDW.D1T2      *+FP[9],B12
06BC62E5            LDW.D2T1      *+SP[3],A13
06BD0266 ||         LDW.D1T2      *+FP[8],B13
073C42E5            LDW.D2T1      *+SP[2],A14
073CE267 ||         LDW.D1T2      *+FP[7],DP
000C0362 ||         B.S2          B3
07BC22E4            LDW.D2T1      *+SP[1],FP
178014FE            ADDAW.D2      B15,20,SP
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP

Furthermore, I have reproduced the issue on C6713.

Please see the attached file for a series of assembly code:

C6713_SUB.txt
07BE9DC2            SUBAW.D2      SP,0x14,SP
07BC11A1            MV.S1X        SP,FP
07BC22F4 ||         STW.D2T1      FP,*+SP[1]
073C42F5            STW.D2T1      A14,*+SP[2]
073CE276 ||         STW.D1T2      DP,*+FP[7]
06BC62F5            STW.D2T1      A13,*+SP[3]
06BD0277 ||         STW.D1T2      B13,*+FP[8]
00000000 ||         NOP           
063C82F5            STW.D2T1      A12,*+SP[4]
063D2276 ||         STW.D1T2      B12,*+FP[9]
05BCA2F5            STW.D2T1      A11,*+SP[5]
05BD4276 ||         STW.D1T2      B11,*+FP[10]
053CC2F5            STW.D2T1      A10,*+SP[6]
053D6277 ||         STW.D1T2      B10,*+FP[11]
00000001 ||         NOP           
00000000 ||         NOP           
07180941            MV.D1         A6,A14
051811A1 ||         MV.S1X        B6,A10
05183D42 ||         ADDAW.D2      B6,0x1,B10
0290105B            MV.L2X        A4,B5
000002AA ||         MVK.S2        0x0005,B0
04383764            LDDW.D1T1     *A14++[1],A9:A8
03280265            LDW.D1T1      *+A10[0],A6
032802E6 ||         LDW.D2T2      *+B10[0],B6
04383764            LDDW.D1T1     *A14++[1],A9:A8
03A84265            LDW.D1T1      *+A10[2],A7
03A842E6 ||         LDW.D2T2      *+B10[2],B7
00000000            NOP           
06A0BE03            MPYSP.M2X     B5,A8,B13
06911E00 ||         MPYSP.M1X     A8,B4,A13
          LOOP:
008000A9            MVK.S1        0x0001,A1
008000AA ||         MVK.S2        0x0001,B1
00008000            NOP           5
06A0BE02            MPYSP.M2X     B5,A8,B13
00008000            NOP           5
06B4F218            ADDSP.L1X     A7,B13,A13
00002000            NOP           2
060440D8            SUB.L1        2,A1,A12
00008000            NOP           5
0119B21A            ADDSP.L2X     B13,A6,B2
00002000            NOP           2
060440DA            SUB.L2        2,B1,B12
00008000            NOP           5
06911E00            MPYSP.M1X     A8,B4,A13
00008000            NOP           5
0134D2B8            SUBSP.L1X     B6,A13,A2
00002000            NOP           2
060480D8            SUB.L1        4,A1,A12
00008000            NOP           5
06B4F23A            SUBSP.L2X     B7,A13,B13
00002000            NOP           2
060480DA            SUB.L2        4,B1,B12
00008000            NOP           5
000029C2            SUB.D2        B0,0x1,B0
2FFFF712     [ B0]  B.S2          LOOP (PC-72 = 0x00011f38)
00008000            NOP           5
07BC11A0            MV.S1X        SP,FP
053CC2E5            LDW.D2T1      *+SP[6],A10
053D6266 ||         LDW.D1T2      *+FP[11],B10
05BCA2E5            LDW.D2T1      *+SP[5],A11
05BD4266 ||         LDW.D1T2      *+FP[10],B11
063C82E5            LDW.D2T1      *+SP[4],A12
063D2267 ||         LDW.D1T2      *+FP[9],B12
00000000 ||         NOP           
06BC62E5            LDW.D2T1      *+SP[3],A13
06BD0266 ||         LDW.D1T2      *+FP[8],B13
073C42E5            LDW.D2T1      *+SP[2],A14
073CE267 ||         LDW.D1T2      *+FP[7],DP
000C0362 ||         B.S2          B3
07BC22E4            LDW.D2T1      *+SP[1],FP
07BE9D42            ADDAW.D2      SP,0x14,SP
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP
 
C6713_ADD.txt
07BE9DC2            SUBAW.D2      SP,0x14,SP
07BC11A1            MV.S1X        SP,FP
07BC22F4 ||         STW.D2T1      FP,*+SP[1]
073C42F5            STW.D2T1      A14,*+SP[2]
073CE276 ||         STW.D1T2      DP,*+FP[7]
06BC62F5            STW.D2T1      A13,*+SP[3]
06BD0277 ||         STW.D1T2      B13,*+FP[8]
00000000 ||         NOP           
063C82F5            STW.D2T1      A12,*+SP[4]
063D2276 ||         STW.D1T2      B12,*+FP[9]
05BCA2F5            STW.D2T1      A11,*+SP[5]
05BD4276 ||         STW.D1T2      B11,*+FP[10]
053CC2F5            STW.D2T1      A10,*+SP[6]
053D6277 ||         STW.D1T2      B10,*+FP[11]
00000001 ||         NOP           
00000000 ||         NOP           
07180941            MV.D1         A6,A14
051811A1 ||         MV.S1X        B6,A10
05183D42 ||         ADDAW.D2      B6,0x1,B10
0290105B            MV.L2X        A4,B5
000002AA ||         MVK.S2        0x0005,B0
04383764            LDDW.D1T1     *A14++[1],A9:A8
03280265            LDW.D1T1      *+A10[0],A6
032802E6 ||         LDW.D2T2      *+B10[0],B6
04383764            LDDW.D1T1     *A14++[1],A9:A8
03A84265            LDW.D1T1      *+A10[2],A7
03A842E6 ||         LDW.D2T2      *+B10[2],B7
00000000            NOP           
06A0BE03            MPYSP.M2X     B5,A8,B13
06911E00 ||         MPYSP.M1X     A8,B4,A13
          LOOP:
008000A9            MVK.S1        0x0001,A1
008000AA ||         MVK.S2        0x0001,B1
00008000            NOP           5
06A0BE02            MPYSP.M2X     B5,A8,B13
00008000            NOP           5
06B4F218            ADDSP.L1X     A7,B13,A13
00002000            NOP           2
06044058            ADD.L1        2,A1,A12
00008000            NOP           5
0119B21A            ADDSP.L2X     B13,A6,B2
00002000            NOP           2
0604405A            ADD.L2        2,B1,B12
00008000            NOP           5
06911E00            MPYSP.M1X     A8,B4,A13
00008000            NOP           5
0134D2B8            SUBSP.L1X     B6,A13,A2
00002000            NOP           2
06048058            ADD.L1        4,A1,A12
00008000            NOP           5
06B4F23A            SUBSP.L2X     B7,A13,B13
00002000            NOP           2
0604805A            ADD.L2        4,B1,B12
00008000            NOP           5
000029C2            SUB.D2        B0,0x1,B0
2FFFF712     [ B0]  B.S2          LOOP (PC-72 = 0x00011f38)
00008000            NOP           5
07BC11A0            MV.S1X        SP,FP
053CC2E5            LDW.D2T1      *+SP[6],A10
053D6266 ||         LDW.D1T2      *+FP[11],B10
05BCA2E5            LDW.D2T1      *+SP[5],A11
05BD4266 ||         LDW.D1T2      *+FP[10],B11
063C82E5            LDW.D2T1      *+SP[4],A12
063D2267 ||         LDW.D1T2      *+FP[9],B12
00000000 ||         NOP           
06BC62E5            LDW.D2T1      *+SP[3],A13
06BD0266 ||         LDW.D1T2      *+FP[8],B13
073C42E5            LDW.D2T1      *+SP[2],A14
073CE267 ||         LDW.D1T2      *+FP[7],DP
000C0362 ||         B.S2          B3
07BC22E4            LDW.D2T1      *+SP[1],FP
07BE9D42            ADDAW.D2      SP,0x14,SP
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP           
00000000            NOP

Why do the ADD and SUB instructions not work?

Best regards,

Daisuke

 

  • Hi,

    Thanks for your position.

    Usually, SUB instruction would be positioned using the NOP instructions so that its result would be available 3 cycles before the SPKERNEL. I think, if you could review the example 7-12 using the SPLOOPW instruction from the below doc. you could know the positioning of SUB instruction using NOP:

    http://www.ti.com/lit/ug/sprufe8b/sprufe8b.pdf

    Note: The ADDSP, SUBSP instructions executing in the .S functional unit use the rounding mode from and set the warning bits in FADCR. The warning bits in FADCR are the logical-OR of the warnings produced on the .L functional unit and the warnings produced by the ADDSP/SUBSP instructions on the .S functional unit (but not other instructions executing on the .S functional unit).

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------

  • Hi Sivaraj K,

    Thank you for your reply.

    The SPLOOP-related instructions are not used in the code which the ADD and SUB instructions do not work. Since the issue is reproduced on C6713, it should be unrelated to the SPLOOP-related instructions.

    When the ADDSP and SUBSP instructions use the rounding mode, can those instructions affect other instructions on the same unit?

    Best regards,

    Daisuke

     

  • Hi Sivaraj K,

    I will post this issue to the TI C/C++ Compiler-Forum since the assembly code generated by C Compiler does not have the issue.

    Best regards,

    Daisuke

     

  • Hi,

    Moving your post to right forum to be better answered.

    Thanks & regards,
    Sivaraj K
  • I need some more context.  Where does the problem assembly code originate?  Is it hand-coded?  Is it generated by the compiler from C code?

    Daisuke Maeda said:
    The SUB instruction does not work when the same unit is used for the instructions and when there is NOP 2 between instructions.

    Please precisely describe the result you expect, the result you get, and the difference between them.

    Thanks and regards,

    -George

  • Hi Sivaraj K,

    Thank you for your cooperation.

    Hi George,

    Thank you for your reply.

    Originally, the problem assembly code was generated by the code generation tool from the original code which is hand-coded.

    The original code:

      ADDSP  A_hh_Re2,B_tmp0,A_tmp0
    ||  SUBSP  B_hh_Im2,A_tmp0,B_tmp0

      NOP

      ADDSP  B_tmp2,A_tmp3,A_tmp3
    ||  ADDSP  A_tmp2,B_tmp3,B_tmp3

      LDW   *+A_hh_ptr[4],A_hh_Re1
    ||  LDW   *+B_hh_ptr[4],B_hh_Im1
    ||  SUB   B_i,1,B_i

     [B_i] B   LOOP
    ||  ADDSP  B_tmp1,A_tmp0,A_tmp0
    ||  ADDSP  A_tmp1,B_tmp0,B_tmp0
    ||  LDW   *+A_hh_ptr[6],A_hh_Re2
    ||  LDW   *+B_hh_ptr[6],B_hh_Im2

    The disassembled code:

    06B4FE19            ADDSP.S1X     A7,B13,A13
    06B4FE3A ||         SUBSP.S2X     B7,A13,B13
    00000000            NOP
    05897E19            ADDSP.S1X     A11,B2,A11
    05897E1A ||         ADDSP.S2X     B11,A2,B11
    03288265            LDW.D1T1      *+A10[4],A6
    032882E7 ||         LDW.D2T2      *+B10[4],B6
    0003E1A2 ||         SUB.S2        B0,1,B0
    2FFFF693     [ B0]  B.S2          LOOP (PC-76 = 0x118118d4)
    06B1BE19 ||         ADDSP.S1X     A13,B12,A13
    06B1B21B ||         ADDSP.L2X     B13,A12,B13
    03A8C265 ||         LDW.D1T1      *+A10[6],A7
    03A8C2E6 ||         LDW.D2T2      *+B10[6],B7

    After the SUB instruction is executed the destination register (B0) is not updated. In the test case, the ADD and SUB instruction also are the same result.

    Best regards,

    Daisuke

     

  • I can confirm there is a pipeline conflict between the SUBSP in this instruction packet ...

    Daisuke Maeda said:
    06B4FE19            ADDSP.S1X     A7,B13,A13
    06B4FE3A ||         SUBSP.S2X     B7,A13,B13

    and the SUB in this instruction packet ...

    Daisuke Maeda said:
    03288265            LDW.D1T1      *+A10[4],A6
    032882E7 ||         LDW.D2T2      *+B10[4],B6
    0003E1A2 ||         SUB.S2        B0,1,B0

    This conflict is documented in Table 4-25 of the C674x CPU manual.  

    With regard to hand-coded assembly, it is the user's responsibility to avoid pipeline conflicts like this. The assembler does not detect them.

    I agree it is difficult to avoid pipeline conflicts like this one.  I recommend writing this code in linear assembly, and allowing the compiler tools to schedule the instructions for you.

    Thanks and regards,

    -George

  • Hi George,

    Thank you for your reply.

    I found also the conflict on .L unit in Table 4-37 of the C674x CPU manual and in Table 4-33 of the C67x CPU manual.

    Best regards,

    Daisuke