This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Optimized code misses stack fill

Hi,

I have an optimizer problem with a piece of code that transfers multi-byte variables to a byte array, for later transmission over a byte based device such as a uart.

The C function code is the following:

void Var2buff(void *var,int *buf,int le)
{
    switch(le) {
        case 1:
            buf[0] = __byte((int *)var,0);
            break;

        case 2: {
            int tmp = *((int*)var);
            buf[0] = __byte((int *)&tmp,0);
            buf[1] = __byte((int *)&tmp,1);
            }
            break;
        default: {
            int64 tmp = *((int64*)var);
            int i;
            for (i = 0; i < le; i++)
                buf[i] = __byte((int *)&tmp,i);
            }
    }
    return;
}

The first 2 cases accomodate single and 2-byte variables, the for loop in the default case should handle any other case up to 8 bytes since the longest data type is 64bit.

The unoptimized assembly (annotated partly in dutch) is shown below. Note that this operates correct

Stack is 16 bit wide
32 bit variabelen occupy 2 positions, low word at low address, high word at high address
SP wijst naar volgende LEGE locatie

function entry:
&var (source data) in XAR4 (32 bit address)
&buf (uart buffer) in XAR5 (32 bit address)
le (length) in AL (16 bit value)

Stack
1,2: &var (=XAR4)
3,4: &buf (=XAR5)
5: length (=AL)
6: not used due to alignment requirements???
7,8: copy lower32 bits of tmp (default case)
9,10: copy upper 32 bits of tmp (default case)
11: for loop counter i (default case)
12: not used due to alignment requirements???


        Var2buff:
                                                        // Put values on stack
33add6:   FE0C        ADDB         SP, #12                // Reserve 12 locations
33add7:   9645        MOV          *-SP[5], AL            // 16 bit move, push AL (Accumulator Low) is int le (length of variable)
33add8:   A044        MOVL         *-SP[4], XAR5        // 32 bit move, push XAR5 is base address van buf, *+XAR5[0] = buf[0], *+XAR5[1] = buf[1] etc.
33add9:   A842        MOVL         *-SP[2], XAR4        // 32 bit move, push XAR4 is address van var
178         switch(le) {
33adda:   6F2C        SB           C$L10, UNC            // Unconditional short branch naar C$L10 (evaluatie van switch conditie staat op C$L10)
-------------------------------------------------------------------------------------------------------------------------
180                 buf[0] = __byte((int *)var,0);
        C$L6:                                                // One byte
33addb:   8A42        MOVL         XAR4, *-SP[2]            // Lees address van var terug in XAR4
33addc:   8344        MOVL         XAR5, *-SP[4]            // Lees base address van buf terug in XAR5
33addd:   C6C4        MOVB         AL.LSB, *+XAR4[0]        // laad lsb van AL met waarde uit var[0]. NB: wordt MSB gewist? Ja, pag 347 van spru430
33adde:   96C5        MOV          *+XAR5[0], AL            // Kopieer Accumulator Low naar buf[0]
181                 break;
33addf:   6F2C        SB           C$L11, UNC            // Unconditional short branch naar C$L11
-------------------------------------------------------------------------------------------------------------------------
184                 int tmp = *((int*)var);
        C$L7:                                                // Two bytes
33ade0:   C542        MOVL         XAR7, *-SP[2]            // Lees address van var in XAR7
33ade1:   92C7        MOV          AL, *+XAR7[0]            // Lees waarde in address var naar 16 bit Accumulator Low
33ade2:   9646        MOV          *-SP[6], AL                // Kopieer Accumulator Low naar stack => tmp staat op SP-6
185                 buf[0] = __byte((int *)&tmp,0);
33ade3:   8344        MOVL         XAR5, *-SP[4]            // Lees base address van buf terug in XAR5
33ade4:   5CAD        MOVZ         AR4, @SP                // mov 16 bit waarde van stack pointer?????, clear AR4-high
33ade5:   DC86        SUBB         XAR4, #6                // trek 6 af, XAR4 wijst nu naar SP-6, dus naar inhoud van tmp
33ade6:   C6C4        MOVB         AL.LSB, *+XAR4[0]        // Even argument: LSbyte (8 bit) wordt gekopieerd naar Accumulator Low,  MSB gewist (pag 347 van spru430)
33ade7:   96C5        MOV          *+XAR5[0], AL            // Kopieer Accumulator Low naar buf[0]
186                 buf[1] = __byte((int *)&tmp,1);
33ade8:   8344        MOVL         XAR5, *-SP[4]            // Lees base address van buf terug in XAR5
33ade9:   5CAD        MOVZ         AR4, @SP                // mov 16 bit waarde van stack pointer?????, clear AR4-high
33adea:   DC86        SUBB         XAR4, #6                // trek 6 af, XAR4 wijst nu naar SP-6, dus naar inhoud van tmp
33adeb:   C6CC        MOVB         AL.LSB, *+XAR4[1]        // Oneven argument: MSbyte (8 bit) wordt gekopieerd naar Accumulator Low,  MSB gewist (pag 347 van spru430)
33adec:   96CD        MOV          *+XAR5[1], AL            // Kopieer Accumulator Low naar buf[1]
188                 break;
33aded:   6F1E        SB           C$L11, UNC            // Unconditional short branch naar C$L11
-------------------------------------------------------------------------------------------------------------------------
190                 int64 tmp = *((int64*)var);
        C$L8:                                                // Four bytes
                                                            // Fill stack positions SP-7 (=upper), SP-8 (=lower), SP-9 (=upper) and SP-10 (=lower)
33adee:   8A42        MOVL         XAR4, *-SP[2]            // Copy address of 64-bit var back to XAR4
33adef:   C4D4        MOVL         XAR6, *+XAR4[2]            // Copy upper 32- of the 64 bits (16-bit indices 2 and 3) in address var to 32 bit XAR6
33adf0:   06C4        MOVL         ACC, *+XAR4[0]            // Copy lower 32- of the 64 bits (16-bit indices 0 en 1) in address var to 32 bit Accumulator
33adf1:   1E4A        MOVL         *-SP[10], ACC            // Store lower 32 bits of tmp in stack: fill stack positions SP-7 (=upper), SP-8 (=lower)
33adf2:   C248        MOVL         *-SP[8], XAR6            // Store upper 32 bits of tmp in stack: fill stack positions SP-9 (=upper) and SP-10 (=lower)
192                 for (i = 0; i < le; i++)
33adf3:   2B4B        MOV          *-SP[11], #0            // -SP[11] is for loop counter (=i)
33adf4:   9245        MOV          AL, *-SP[5]
33adf5:   544B        CMP          AL, *-SP[11]            // Compare 'le' with 'i'
33adf6:   6515        SB           C$L11, LEQ            // Exit if i is less or equal to le
193                     buf[i] = __byte((int *)&tmp,i);
        C$DW$L$_Var2buff$6$B, C$L9:
33adf7:   3B01        SETC         SXM
33adf8:   854B        MOV          ACC, *-SP[11]        // Copy 'i' to Accumulator
33adf9:   8344        MOVL         XAR5, *-SP[4]        // Copy SP-4 (=32 bit start address of destination 'buf[]')
33adfa:   584B        MOVZ         AR0, *-SP[11]        //
33adfb:   5CAD        MOVZ         AR4, @SP                // Copy SP (top of stack) to AR4
33adfc:   DC8A        SUBB         XAR4, #10            // Subtract 10, now XAR4 points to SP-10 = lsword of tmp
33adfd:   560100A5    ADDL         @XAR5, ACC            // Add index to XAR5 to get buf[i]
33adff:   C694        MOVB         AL.LSB, *+XAR4[AR0]    // get byte tmp[i] where i is a byte index!. Copy single byte to Accumulator Low,  MSB gewist (pag 347 van spru430)
33ae00:   96C5        MOV          *+XAR5[0], AL        // copy byte in Accumulator to address pointed to by XAR5 (=buf[i]
192                 for (i = 0; i < le; i++)
33ae01:   0A4B        INC          *-SP[11]            // increment for loop counter (=i)
33ae02:   9245        MOV          AL, *-SP[5]        // get length
33ae03:   544B        CMP          AL, *-SP[11]        // compare length with loop counter i
33ae04:   62F3        SB           C$L9, GT            // branch to repeat loop
194                 }
        C$DW$L$_Var2buff$6$E:
33ae05:   6F06        SB           C$L11, UNC            // exit loop, unconditional short branch naar C$L11
-------------------------------------------------------------------------------------------------------------------------
178         switch(le) {
        C$L10:
33ae06:   5201        CMPB         AL, #0x1
33ae07:   ECD4        SBF          C$L6, EQ        // Short branch fast
33ae08:   5202        CMPB         AL, #0x2
33ae09:   ECD7        SBF          C$L7, EQ
33ae0a:   6FE4        SB           C$L8, UNC
199     }
        C$L11:
33ae0b:   FE8C        SUBB         SP, #12
33ae0c:   0006        LRETR        
202     {

As mentioned, this assembly code is correct.

When I enable the compiler at level 2 I get the following (broken) assembly code

Optimized code

Stack is 16 bit wide
32 bit variables occupy 2 positions, low word at the lower address, high word at the higher address
SP points to next empty location of the stack

function entry:
&var in XAR4 (32 bit address)
&buf in XAR5 (32 bit address)
le (length) in AL (16 bit waarde)

Stack
1,2,3,4: 64 bit variable tmp when le == 4
5: tmp when le==2

// AL (Accumulator Low) is int le (length of variable)
// XAR5 is base adres van buf, *+XAR5[0] = buf[0], *+XAR5[1] = buf[1] etc.
// XAR7 is address of var (XAR4 unoptimized)


33ad41:   FE06        ADDB         SP, #6                // Reserve 6 locations
33ad42:   C5A4        MOVL         XAR7, @XAR4            //????????????????
178         switch(le) {
33ad43:   5201        CMPB         AL, #0x1                // Compare Accumulator Low ('le') with 0x01
33ad44:   EC1B        SBF          C$L3, EQ                // Jump for case 1:
33ad45:   5202        CMPB         AL, #0x2                // Compare Accumulator Low ('le') with 0x02
33ad46:   EC0E        SBF          C$L2, EQ                // Jump for case 2:
-------------------------------------------------------------------------------------------------------------------------
                                                    // Implementation of default:
192                 for (i = 0; i < le; i++)
33ad47:   5200        CMPB         AL, #0x0                // Compare Accumulator Low ('le') with 0x00
33ad48:   6519        SB           C$L4, LEQ            // Immediate exit (case 0:)

                                                        // Default (>2 bytes) implementation
33ad49:   9CFF        ADDB         AL, #-1                // decrement 'le'
33ad4a:   88A9        MOVZ         AR6, @AL
33ad4b:   D000        MOVB         XAR0, #0x0            // XAR0 is for loop counter (=i)
193                     buf[i] = __byte((int *)&tmp,i);
        C$DW$L$_Var2buff$5$B, C$L1:
33ad4c:   5CAD        MOVZ         AR4, @SP                // Copy SP (top of stack) to AR4
33ad4d:   DC84        SUBB         XAR4, #4                // Subtract 4, now XAR4 points to SP-4 = lsword of tmp
33ad4e:   C694        MOVB         AL.LSB, *+XAR4[AR0]    // read tmp byte from stack (NB values are not copied to the stack!)
33ad4f:   9685        MOV          *XAR5++, AL
192                 for (i = 0; i < le; i++)
33ad50:   D801        ADDB         XAR0, #1            // increment for loop counter (=i)
33ad51:   000EFFFB    BANZ         -5,AR6--
        C$DW$L$_Var2buff$5$E:
33ad53:   6F0E        SB           C$L4, UNC
-------------------------------------------------------------------------------------------------------------------------
184                 int tmp = *((int*)var);
        C$L2:                                                // Two bytes
33ad54:   92C7        MOV          AL, *+XAR7[0]            // copy value at address var to 16 bit Accumulator Low
33ad55:   9645        MOV          *-SP[5], AL                // Copy Accumulator Low to stack => tmp is located at SP-5
185                 buf[0] = __byte((int *)&tmp,0);
33ad56:   5CAD        MOVZ         AR4, @SP                // mov 16 bit waarde van stack pointer?????, clear AR4-high
33ad57:   DC85        SUBB         XAR4, #5                // trek 5 af, XAR4 wijst nu naar SP-5, dus naar inhoud van tmp
33ad58:   C6C4        MOVB         AL.LSB, *+XAR4[0]        // Even argument: LSbyte (8 bit) wordt gekopieerd naar Accumulator Low,  MSB gewist (pag 347 van spru430)
33ad59:   96C5        MOV          *+XAR5[0], AL            // Kopieer Accumulator Low naar buf[0]
186                 buf[1] = __byte((int *)&tmp,1);
33ad5a:   5CAD        MOVZ         AR4, @SP                // mov 16 bit waarde van stack pointer?????, clear AR4-high
33ad5b:   DC85        SUBB         XAR4, #5                // trek 5 af, XAR4 wijst nu naar SP-5, dus naar inhoud van tmp
33ad5c:   C6CC        MOVB         AL.LSB, *+XAR4[1]        // Oneven argument: MSbyte (8 bit) wordt gekopieerd naar Accumulator Low,  MSB gewist (pag 347 van spru430)
33ad5d:   96CD        MOV          *+XAR5[1], AL            // Kopieer Accumulator Low naar buf[1]
188                 break;
33ad5e:   6F03        SB           C$L4, UNC
-------------------------------------------------------------------------------------------------------------------------
180                 buf[0] = __byte((int *)var,0);
        C$L3:                                                // Eén byte
33ad5f:   C6C7        MOVB         AL.LSB, *+XAR7[0]        // laad lsb van AL met waarde uit var[0]. NB: wordt MSB gewist? Ja, pag 347 van spru430
33ad60:   96C5        MOV          *+XAR5[0], AL            // Kopieer Accumulator Low naar buf[0]
199     }
-------------------------------------------------------------------------------------------------------------------------
        C$L4:                                                // exit
33ad61:   FE86        SUBB         SP, #6                    // release stack space
33ad62:   0006        LRETR        
202     {
        Par2buff:
33ad63:   FE06        ADDB         SP, #6

The default case is indicated by the comment: "// Implementation of default:"

When I follow the code, data is copied from the stack pointed to be XAR4) to the buffer (pointed to by XAR5),

but the data was never copied from var to the stack!

The C source line             int64 tmp = *((int64*)var);

is not translated and not present in the annotated assembly generated by the compiler.

Does anybody have an idea if this is a compiler bug or just ambiguous code and some bad luck?

Best regards,

Paul

  • Thank you for a detailed test case.  I am also able to generate incorrect assembly code.  I filed SDSCM00048158 in the SDOWP system to have this issue addressed.  Feel free to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • I believe that your code is violating the "strict aliasing" rule, because __byte is documented to operate on int or array of int, and in the default case you're giving it the address of an int64.  tmp, therefore, is being accessed through a type that is not its effective type.

    Why not use

      for (i = 0; i < le; i++)
        buf[i] = __byte((int *)var, i);

    in all cases -- it casts the pointer-to-void var to pointer-to-int and thus operates on array of int without the creation of a temporary variable.

    Or for full control, use something like

      for (i = 0; i < le; i+=2)
      {
        buf[2*i]   = (((int*)var)[i]) & 0xff;
        buf[2*i+1] = ((((int*)var)[i])>>8) & 0xff;
      }

    which again treats var like an array of int, but explicitly takes out 8-bit pieces to store into buf[].

  • Thanks very much!

    Although I cannot find any source explicitly confirming it, not only the char pointer, but also the void pointer is excepted from the strict aliasing rule?! Otherwise casting void* to int* would also be invalid.

    Paul

  • Well, no.  You're always using pointer-to-int.  It's just that you've stored the pointer in a pointer-to-void variable along the way.

    A pointer to char can legally point to any type.  A pointer to void is more of a placeholder for a pointer;  since you can't dereference it (it has no pointed-to type!), it doesn't really point to anything.  But when you take that value and convert it to a pointer-to-nonvoid, then it's following the rules for that type.

    In this case, the reasoning is a little circular.  Because you're using __byte, you must be pointing to a set of contiguous ints.  Therefore the pointer was original pointer-to-int before you put it into a pointer-to-void variable, and therefore it's okay to convert it to pointer-to-int.

    If you're intentionally using __byte to read 8-bit pieces of something that isn't int, well, then you're breaking the rules.

  • I don't expect a programming course, so feel free not to answer, but the following question arise:

    In the original code the problem seemed to be the source line

    int64 tmp = *((int64*)var);

    and not

    buf[i] = __byte((int *)&tmp,i);

    since the first is not present in the annotated assembly and the decoded assembly misses the corresponding instructions to fill the tmp stack variable. The code to read from the tmp stack variable into buf[i] is present in the assembly.

    In the first line, the strict aliasing rule is not necessarily violated: If the original variable pointed to by the void pointer 'var' is int64, this C code should be correct, but the compiler does not generate code.

    In the second line, the strict aliasing rule is obviously violated since an int64 pointer is cast to an int pointer. The generated code seems to be correct though. I'm now less sure that the original problem is a violation of 'strict aliasing' rule. What do you think?

    The second question is: what is the correct way to read 8-bit pieces of anything that isn't int? Is a union with an int array followed by __byte() the only way?

    This requires changing the function interface from passing the variable length to passing the variable type since from your answer I understand that I must always convert the pointer back to the original type, thus a pointer to long integer and a pointer to float must not be confused although both types are 32 bits.Is this correct?

    Paul

  • The short answer is that all this is in the realm of "undefined behavior," as defined by the language standard, and, to quote http://c-faq.com/ansi/undef.html, "In particular, there is no guarantee that at most the undefined bit of the program will behave badly, and that the rest of the program will perform normally."

    The longer answer is partly that the compiler doesn't necessarily operate line-by-line, and partly that in this case the flaw in the second line causes the compiler to believe that the first line is unneeded.  That's the effect of undefined behavior:  the compiler is free to believe that it can't happen, which may cause effects far away from the problem spot.

    Yes, a union is the usual recommended way to do this kind of thing.  You could do something like this untested bit of code:

    union
    {
      long      l;
      long long ll;
      float     f;
      int       ints[sizeof(long long)];
    } U;

    extern int *B;

    void Var2buff(int *var, int *buf, int le)
    {
      int i;
      for (i = 0; i < le; i++)
        buf[i] = __byte(var, i);
    }

    void f()
    {
      U.ll = 0x1234LL;
      Var2buff(&U.ints[0], B, sizeof(long long));
    }

    Of course this is all rather target-dependent.  You have the __byte intrinsic, and you have some implicit knowledge that int and char are 16 bits, and the device on the other end of the UART knows how to interpret C2000 data formats once they're reassembled from 8-bit pieces.