This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Wrong result with compiler optimization level above 2(-O2)



Hi, I've found an unexplicable compiler behavior. The following code does NOT give correct answer  when compiler optimization level is set above 2(-O2).


#include <vector>
#include <complex>

using namespace std; /* Because of the library: "complex" */

typedef float     _REAL;
typedef complex<_REAL>  _COMPLEX;

inline _COMPLEX    Conj(const _COMPLEX& cI)
         {return conj(cI);}

inline _REAL     SqMag(const _COMPLEX& cI)
        {return cI.real() * cI.real() + cI.imag() * cI.imag();}

 
vector<_COMPLEX> HistoryBuf(17, _COMPLEX(0.0, 0.0));
vector<_COMPLEX> cGuardCorrBlock(16, _COMPLEX(0.0, 0.0));
vector<_REAL>  rGuardPowBlock(16, 0.0);

 
int main()
{

 int j;


 for(j = 0; j < 17; j++)
 {
  HistoryBuf[j] = (_REAL)j;

 }

 

 for(j = 0; j < 16; j++)
 {
   
  cGuardCorrBlock[j] += HistoryBuf[j] *
   Conj(HistoryBuf[j + 1]);

  
  rGuardPowBlock[j] += SqMag(HistoryBuf[j]) +
   SqMag(HistoryBuf[j + 1]);


  System_printf("\r\n rGuardPowBlock[%d]=%f; \n",j, rGuardPowBlock[j]);


 }

 return 0;

}

Error is in the Prinf result. Note:When I turn off the optimization level, it is correct. Can the develope team behind CodGen look into this matter??Thanks in advance!

 PS: my DSP platform is C6748, C/C++compiler 7.4.6

 

  • Hi,

    I was trying some things with your code and found something interesting (that I can't understand).
    If I create some auxiliar variables a, b and c like in the code below, c should have the same value of rGuardPowBlock[j], but it doesn't.
    (I'm using C6678 functional simulator and C6000 compiler 7.4.2 with CCS 5.1.0.09)
    int main()
    {

     int j;
     _REAL a, b, c;

     for(j = 0; j < 17; j++)
     {
      HistoryBuf[j] = (_REAL)j;
     }

     for(j = 0; j < 16; j++)
     {

      cGuardCorrBlock[j] += HistoryBuf[j] *
       Conj(HistoryBuf[j + 1]);

      a = SqMag(HistoryBuf[j]);
      b = SqMag(HistoryBuf[j + 1]);
      c = a + b;
      printf("%d -> a = %f, b = %f, c = %f", j, a, b, c);

      rGuardPowBlock[j] += (SqMag(HistoryBuf[j]) + SqMag(HistoryBuf[j + 1]));

      printf("\r\n rGuardPowBlock[%d]=%f; \n",j, rGuardPowBlock[j]);
    }

    Output (using C6678 simulator with optimization -o3):
    [TMS320C66x_0] 0 -> a = 0.000000, b = 1.000000, c = 1.000000
    [TMS320C66x_0]  rGuardPowBlock[0]=0.000000;
    [TMS320C66x_0] 1 -> a = 1.000000, b = 4.000000, c = 5.000000
    [TMS320C66x_0]  rGuardPowBlock[1]=2.000000;
    [TMS320C66x_0] 2 -> a = 4.000000, b = 9.000000, c = 13.000000
    [TMS320C66x_0]  rGuardPowBlock[2]=8.000000;
    [TMS320C66x_0] 3 -> a = 9.000000, b = 16.000000, c = 25.000000
    [TMS320C66x_0]  rGuardPowBlock[3]=18.000000;
    [TMS320C66x_0] 4 -> a = 16.000000, b = 25.000000, c = 41.000000
    [TMS320C66x_0]  rGuardPowBlock[4]=32.000000;
    [TMS320C66x_0] 5 -> a = 25.000000, b = 36.000000, c = 61.000000
    [TMS320C66x_0]  rGuardPowBlock[5]=50.000000;
    [TMS320C66x_0] 6 -> a = 36.000000, b = 49.000000, c = 85.000000
    [TMS320C66x_0]  rGuardPowBlock[6]=72.000000;
    [TMS320C66x_0] 7 -> a = 49.000000, b = 64.000000, c = 113.000000
    [TMS320C66x_0]  rGuardPowBlock[7]=98.000000;
    [TMS320C66x_0] 8 -> a = 64.000000, b = 81.000000, c = 145.000000
    [TMS320C66x_0]  rGuardPowBlock[8]=128.000000;
    [TMS320C66x_0] 9 -> a = 81.000000, b = 100.000000, c = 181.000000
    [TMS320C66x_0]  rGuardPowBlock[9]=162.000000;
    [TMS320C66x_0] 10 -> a = 100.000000, b = 121.000000, c = 221.000000
    [TMS320C66x_0]  rGuardPowBlock[10]=200.000000;
    [TMS320C66x_0] 11 -> a = 121.000000, b = 144.000000, c = 265.000000
    [TMS320C66x_0]  rGuardPowBlock[11]=242.000000;
    [TMS320C66x_0] 12 -> a = 144.000000, b = 169.000000, c = 313.000000
    [TMS320C66x_0]  rGuardPowBlock[12]=288.000000;
    [TMS320C66x_0] 13 -> a = 169.000000, b = 196.000000, c = 365.000000
    [TMS320C66x_0]  rGuardPowBlock[13]=338.000000;
    [TMS320C66x_0] 14 -> a = 196.000000, b = 225.000000, c = 421.000000
    [TMS320C66x_0]  rGuardPowBlock[14]=392.000000;
    [TMS320C66x_0] 15 -> a = 225.000000, b = 256.000000, c = 481.000000
    [TMS320C66x_0]  rGuardPowBlock[15]=450.000000;

    As you can see, c has the right value and rGuardPowBlock doesn't.
    Now if I change
    rGuardPowBlock[j] += (SqMag(HistoryBuf[j]) + SqMag(HistoryBuf[j + 1]));
    to
    rGuardPowBlock[j] += c;

    I get the correct output:
    [TMS320C66x_0] 0 -> a = 0.000000, b = 1.000000, c = 1.000000
    [TMS320C66x_0]  rGuardPowBlock[0]=1.000000;
    [TMS320C66x_0] 1 -> a = 1.000000, b = 4.000000, c = 5.000000
    [TMS320C66x_0]  rGuardPowBlock[1]=5.000000;
    [TMS320C66x_0] 2 -> a = 4.000000, b = 9.000000, c = 13.000000
    [TMS320C66x_0]  rGuardPowBlock[2]=13.000000;
    [TMS320C66x_0] 3 -> a = 9.000000, b = 16.000000, c = 25.000000
    [TMS320C66x_0]  rGuardPowBlock[3]=25.000000;
    [TMS320C66x_0] 4 -> a = 16.000000, b = 25.000000, c = 41.000000
    [TMS320C66x_0]  rGuardPowBlock[4]=41.000000;
    [TMS320C66x_0] 5 -> a = 25.000000, b = 36.000000, c = 61.000000
    [TMS320C66x_0]  rGuardPowBlock[5]=61.000000;
    [TMS320C66x_0] 6 -> a = 36.000000, b = 49.000000, c = 85.000000
    [TMS320C66x_0]  rGuardPowBlock[6]=85.000000;
    [TMS320C66x_0] 7 -> a = 49.000000, b = 64.000000, c = 113.000000
    [TMS320C66x_0]  rGuardPowBlock[7]=113.000000;
    [TMS320C66x_0] 8 -> a = 64.000000, b = 81.000000, c = 145.000000
    [TMS320C66x_0]  rGuardPowBlock[8]=145.000000;
    [TMS320C66x_0] 9 -> a = 81.000000, b = 100.000000, c = 181.000000
    [TMS320C66x_0]  rGuardPowBlock[9]=181.000000;
    [TMS320C66x_0] 10 -> a = 100.000000, b = 121.000000, c = 221.000000
    [TMS320C66x_0]  rGuardPowBlock[10]=221.000000;
    [TMS320C66x_0] 11 -> a = 121.000000, b = 144.000000, c = 265.000000
    [TMS320C66x_0]  rGuardPowBlock[11]=265.000000;
    [TMS320C66x_0] 12 -> a = 144.000000, b = 169.000000, c = 313.000000
    [TMS320C66x_0]  rGuardPowBlock[12]=313.000000;
    [TMS320C66x_0] 13 -> a = 169.000000, b = 196.000000, c = 365.000000
    [TMS320C66x_0]  rGuardPowBlock[13]=365.000000;
    [TMS320C66x_0] 14 -> a = 196.000000, b = 225.000000, c = 421.000000
    [TMS320C66x_0]  rGuardPowBlock[14]=421.000000;
    [TMS320C66x_0] 15 -> a = 225.000000, b = 256.000000, c = 481.000000
    [TMS320C66x_0]  rGuardPowBlock[15]=481.000000;

    But now if I comment out the line
    printf("%d -> a = %f, b = %f, c = %f", j, a, b, c);
    The 'for' is:
     for(j = 0; j < 16; j++)
     {

      cGuardCorrBlock[j] += HistoryBuf[j] *
       Conj(HistoryBuf[j + 1]);

      a = SqMag(HistoryBuf[j]);
      b = SqMag(HistoryBuf[j + 1]);
      c = a + b;

      rGuardPowBlock[j] += c;

      printf("\r\n rGuardPowBlock[%d]=%f; \n",j, rGuardPowBlock[j]);
     }
    And I get the wrong output again:
    [TMS320C66x_0]  rGuardPowBlock[0]=0.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[1]=2.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[2]=8.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[3]=18.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[4]=32.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[5]=50.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[6]=72.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[7]=98.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[8]=128.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[9]=162.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[10]=200.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[11]=242.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[12]=288.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[13]=338.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[14]=392.000000;
    [TMS320C66x_0]
    [TMS320C66x_0]  rGuardPowBlock[15]=450.000000;

    I don't know if it's a bug or we're not fully understanding how the compiler optimizer works, but below is the assembly code generated,
    Regards
    Johannes

    Assembly code for correct output (with printf line):
    34         for(j = 0; j < 16; j++)
    00009874:   0F7C805B            ADD.L2        4,B31,B30
    00009878:   0EBC22F7 ||         STW.D2T2      B29,*+B15[1]
    0000987c:   022FFF8B ||         SET.S2        B11,31,31,B4
    00009880:   0EA90079 ||         ADD.L1        A8,A10,A29
    00009884:   0F2921E1 ||         ADD.S1        A9,A10,A30
    00009888:   0FA54840 ||         ADD.D1        A9,A10,A31
    35         {
              $C$L4:
    0000988c:   00000000            NOP           
    00009890:   03909DF8            XOR.L1X       A4,B4,A7
    00009894:   033CC3C4            STDW.D2T1     A7:A6,*+B15[6]
    00009898:   02FC0324            LDNDW.D1T1    *+A31[0],A5:A4
    0000989c:   023CC3E6            LDDW.D2T2     *+B15[6],B5:B4
    000098a0:   00006000            NOP           4
    000098a4:   03909E02            MPYSP.M2X     B4,A4,B7
    000098a8:   0314BE03            MPYSP.M2X     B5,A5,B6
    000098ac:   0190BE00 ||         MPYSP.M1X     A5,B4,A3
    000098b0:   0210BE02            MPYSP.M2X     B5,A4,B4
    000098b4:   00004000            NOP           3
    000098b8:   038C979B            FADDSP.L2X      B4,A3,B7
    000098bc:   0318EEDA ||         FSUBSP.S2       B7,B6,B6
    000098c0:   00000000            NOP           
    000098c4:   02FC02E6            LDW.D2T2      *+B31[0],B5
    000098c8:   033CE3C6            STDW.D2T2     B7:B6,*+B15[7]
    000098cc:   033CE3E6            LDDW.D2T2     *+B15[7],B7:B6
    000098d0:   00006000            NOP           4
    000098d4:   0294C79A            FADDSP.L2       B6,B5,B5
    000098d8:   00002000            NOP           2
    000098dc:   02FC02F6            STW.D2T2      B5,*+B31[0]
    000098e0:   02F802E6            LDW.D2T2      *+B30[0],B5
    000098e4:   00006000            NOP           4
    000098e8:   0294E79A            FADDSP.L2       B7,B5,B5
    000098ec:   00002000            NOP           2
    000098f0:   02F802F6            STW.D2T2      B5,*+B30[0]
    000098f4:   03F80324            LDNDW.D1T1    *+A30[0],A7:A6
    000098f8:   00000000            NOP           
    000098fc:   02F40324            LDNDW.D1T1    *+A29[0],A5:A4
    00009900:   00002000            NOP           2
    00009904:   039CEE00            MPYSP.M1      A7,A7,A7
    00009908:   0198CE00            MPYSP.M1      A6,A6,A3
    0000990c:   0294AE00            MPYSP.M1      A5,A5,A5
    00009910:   02108E00            MPYSP.M1      A4,A4,A4
    00009914:   00004000            NOP           3
    00009918:   0210A799            FADDSP.L1       A5,A4,A4
    0000991c:   018CEE98 ||         FADDSP.S1       A7,A3,A3
    00009920:   00002000            NOP           2
    00009924:   0008A411            B.S1          printf (PC+17696 = 0x0000de40)
    00009928:   070C8798 ||         FADDSP.L1       A4,A3,A14
    0000992c:   021000A0            SPDP.S1       A4,A5:A4
    00009930:   020C10A2            SPDP.S2X      A3,B5:B4
    00009934:   023800A1            SPDP.S1       A14,A5:A4
    00009938:   023C63C4 ||         STDW.D2T1     A5:A4,*+B15[3]
    0000993c:   023C43C6            STDW.D2T2     B5:B4,*+B15[2]
    00009940:   01820163            ADDKPC.S2     $C$RL0 (PC+8 = 0x00009948),B3,0
    00009944:   023C83C4 ||         STDW.D2T1     A5:A4,*+B15[4]
              $C$RL0:
    00009948:   022D9C40            ADDAW.D1      A11,A12,A4
    0000994c:   01900264            LDW.D1T1      *+A4[0],A3
    00009950:   0FAD9C40            ADDAW.D1      A11,A12,A31
    00009954:   02031F2A            MVK.S2        0x063e,B4
    00009958:   020000EA            MVKH.S2       0x10000,B4
    0000995c:   063C42F4            STW.D2T1      A12,*+B15[2]
    00009960:   028DC798            FADDSP.L1       A14,A3,A5
    00009964:   BC45                STW.D2T2      B4,*B15[1]
    00009966:   0C6E                NOP           1
    00009968:   02900274            STW.D1T1      A5,*+A4[0]
    0000996c:   01FC0264            LDW.D1T1      *+A31[0],A3
    00009970:   020C00A0            SPDP.S1       A3,A5:A4
    00009974:   00000000            NOP           
    00009978:   10089C13            CALLP.S2      printf (PC+17632 = 0x0000de40),B3
    0000997c:   E0500000            .fphead       p, l, W, BU, nobr, nosat, 0000010b
    00009980:   023C43C4            STDW.D2T1     A5:A4,*+B15[2]
    00009984:   06302058            ADD.L1        1,A12,A12
    00009988:   04B06CA1            SHL.S1        A12,0x3,A9
    0000998c:   00358AF8 ||         CMPLT.L1      A12,A13,A0
    00009990:   04250059            ADD.L1        8,A9,A8
    00009994:   CE8310AB ||  [ A0]  MVK.S2        0x0621,B29
    00009998:   C63C42F5 ||  [ A0]  STW.D2T1      A12,*+B15[2]
    0000999c:   D1B40FDB ||  [!A0]  MV.L2         B13,B3
    000099a0:   D22C16A1 ||  [!A0]  MV.S1X        B11,A4
    000099a4:   CF254840 ||  [ A0]  ADD.D1        A9,A10,A30
    000099a8:   01A90079            ADD.L1        A8,A10,A3
    000099ac:   CFFFDD91 ||  [ A0]  B.S1          $C$L4 (PC-276 = 0x0000988c)
    000099b0:   CFA5507B ||  [ A0]  ADD.L2X       B10,A9,B31
    000099b4:   CE8000EB ||  [ A0]  MVKH.S2       0x10000,B29
    000099b8:   CEA14840 ||  [ A0]  ADD.D1        A8,A10,A29
    000099bc:   C20C2265     [ A0]  LDW.D1T1      *+A3[1],A4
    000099c0:   CEBC22F7 ||  [ A0]  STW.D2T2      B29,*+B15[1]
    000099c4:   CF7C805B ||  [ A0]  ADD.L2        4,B31,B30
    000099c8:   C22FFF8B ||  [ A0]  SET.S2        B11,31,31,B4
    000099cc:   CFA92078 ||  [ A0]  ADD.L1        A9,A10,A31
    000099d0:   C30C0264     [ A0]  LDW.D1T1      *+A3[0],A6
    000099d4:   D73E52E4     [!A0]  LDW.D2T1      *++B15[18],A14
    000099d8:   D53C33E4     [!A0]  LDDW.D2T1     *++B15[1],A11:A10
    51         }

    Asembly code for wrong output (without printf line):


    34         for(j = 0; j < 16; j++)
    0000986c:   0F7C805B            ADD.L2        4,B31,B30
    00009870:   0DAD9C41 ||         ADDAW.D1      A11,A12,A27
    00009874:   063C42F5 ||         STW.D2T1      A12,*+B15[2]
    00009878:   0FA86079 ||         ADD.L1        A3,A10,A31
    0000987c:   E0200000            .fphead       n, l, W, BU, nobr, nosat, 0000001b
    00009880:   0F2861E1            ADD.S1        A3,A10,A30
    00009884:   022FFF8A ||         SET.S2        B11,31,31,B4
    35         {
              $C$L4:
    00009888:   00000000            NOP           
    0000988c:   0390BDF8            XOR.L1X       A5,B4,A7
    00009890:   033C83C4            STDW.D2T1     A7:A6,*+B15[4]
    00009894:   02FC0324            LDNDW.D1T1    *+A31[0],A5:A4
    00009898:   043C83E6            LDDW.D2T2     *+B15[4],B9:B8
    0000989c:   00006000            NOP           4
    000098a0:   02953E02            MPYSP.M2X     B9,A5,B5
    000098a4:   02A0BE01            MPYSP.M1X     A5,B8,A5
    000098a8:   03111E02 ||         MPYSP.M2X     B8,A4,B6
    000098ac:   02113E02            MPYSP.M2X     B9,A4,B4
    000098b0:   00004000            NOP           3
    000098b4:   0394979B            FADDSP.L2X      B4,A5,B7
    000098b8:   0314CEDA ||         FSUBSP.S2       B6,B5,B6
    000098bc:   00000000            NOP           
    000098c0:   02FC02E6            LDW.D2T2      *+B31[0],B5
    000098c4:   033CA3C6            STDW.D2T2     B7:B6,*+B15[5]
    000098c8:   033CA3E6            LDDW.D2T2     *+B15[5],B7:B6
    000098cc:   00006000            NOP           4
    000098d0:   0294C79A            FADDSP.L2       B6,B5,B5
    000098d4:   00002000            NOP           2
    000098d8:   02FC02F6            STW.D2T2      B5,*+B31[0]
    000098dc:   02F802E6            LDW.D2T2      *+B30[0],B5
    000098e0:   00006000            NOP           4
    000098e4:   0294E79A            FADDSP.L2       B7,B5,B5
    000098e8:   00002000            NOP           2
    000098ec:   02F802F6            STW.D2T2      B5,*+B30[0]
    000098f0:   02F80324            LDNDW.D1T1    *+A30[0],A5:A4
    000098f4:   02108E00            MPYSP.M1      A4,A4,A4
    000098f8:   0294AE00            MPYSP.M1      A5,A5,A5
    000098fc:   E0100000            .fphead       p, l, W, BU, nobr, nosat, 0000000b
    00009900:   00004000            NOP           3
    00009904:   0290A798            FADDSP.L1       A5,A4,A5
    00009908:   0E740264            LDW.D1T1      *+A29[0],A28
    0000990c:   00000000            NOP           
    00009910:   0294A798            FADDSP.L1       A5,A5,A5
    00009914:   00002000            NOP           2
    00009918:   0270A798            FADDSP.L1       A5,A28,A4
    0000991c:   00002000            NOP           2
    00009920:   02740274            STW.D1T1      A4,*+A29[0]
    00009924:   01EC0264            LDW.D1T1      *+A27[0],A3
    00009928:   00000000            NOP           
    0000992c:   00089C10            B.S1          printf (PC+17632 = 0x0000de00)
    00009930:   00000000            NOP           
    00009934:   0204B2AA            MVK.S2        0x0965,B4
    00009938:   020C00A1            SPDP.S1       A3,A5:A4
    0000993c:   020000EA ||         MVKH.S2       0x10000,B4
    00009940:   023C22F6            STW.D2T2      B4,*+B15[1]
    00009944:   01830163            ADDKPC.S2     $C$RL0 (PC+12 = 0x0000994c),B3,0
    00009948:   023C43C4 ||         STDW.D2T1     A5:A4,*+B15[2]
              $C$RL0:
    0000994c:   06302058            ADD.L1        1,A12,A12
    00009950:   01B06CA1            SHL.S1        A12,0x3,A3
    00009954:   00358AF8 ||         CMPLT.L1      A12,A13,A0
    00009958:   020D0059            ADD.L1        8,A3,A4
    0000995c:   CDAD9C41 ||  [ A0]  ADDAW.D1      A11,A12,A27
    00009960:   CF2861E1 ||  [ A0]  ADD.S1        A3,A10,A30
    00009964:   C22FFF8B ||  [ A0]  SET.S2        B11,31,31,B4
    00009968:   C63C42F5 ||  [ A0]  STW.D2T1      A12,*+B15[2]
    0000996c:   D1B40FDA ||  [!A0]  MV.L2         B13,B3
    00009970:   02288079            ADD.L1        A4,A10,A4
    00009974:   CEAD9C41 ||  [ A0]  ADDAW.D1      A11,A12,A29
    00009978:   CFA861E1 ||  [ A0]  ADD.S1        A3,A10,A31
    0000997c:   CFFFE513 ||  [ A0]  B.S2          $C$L4 (PC-216 = 0x00009888)
    00009980:   CF8D507A ||  [ A0]  ADD.L2X       B10,A3,B31
    00009984:   C2902265     [ A0]  LDW.D1T1      *+A4[1],A5
    00009988:   CF7C805B ||  [ A0]  ADD.L2        4,B31,B30
    0000998c:   D22C1FD8 ||  [!A0]  MV.L1X        B11,A4
    00009990:   C3100265     [ A0]  LDW.D1T1      *+A4[0],A6
    00009994:   D7801852 ||  [!A0]  ADDK.S2       48,B15
    00009998:   D53C33E4     [!A0]  LDDW.D2T1     *++B15[1],A11:A10
    0000999c:   D63C33E4     [!A0]  LDDW.D2T1     *++B15[1],A13:A12
    47         }

  • Please show the exact compiler options you use when the problem occurs.

    Thanks and regards,

    -George

  • -O2 , othes are left to the default setting

  • "C:/ti/bin/cl6x" -mv6600 --abi=eabi -O3

  • Regarding the very first post which begins ...

    zhongfan yang said:
    I've found an unexplicable compiler behavior. The following code does NOT give correct answer  when compiler optimization level is set above 2(-O2).

    Thank you for submitting a test case.  I can reproduce your results.  I have filed SDSCM00049539 in the SDOWP system to have this investigated.  Feel free to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George