This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/DRA786: C-compiler generating incorrect code if using optimization (-O2 or higher)

Part Number: DRA786

Tool/software: TI C/C++ Compiler

A customer observed NaN errors in a function operating on float numbers. He identified the location of this error and the related code is shown below.

;----------------------------------------------------------------------

; 57 | pEx->QualityParams->statesBuffer2[0] = z;

;----------------------------------------------------------------------

LDW .D1T2 *+A23[A17],B9 ; [A_D64P] |57| <0,44>

NOP 1 ; [A_L66]

.dwpsn file "../main.c",line 55,column 9,is_stmt,isa 0

FADDSP .L2 B9,B4,B4 ; [B_L66] |55| <0,46> ^

NOP 2 ; [A_L66]

FSUBSP .L2 B4,B5,B4 ; [B_L66] |55| <0,49> ^

|| ADDAD .D2 B9,15,B5 ; [B_D64P] |59| <0,49>

 

There's just one NOP between reading data into the B9 register using the LDW instruction and using the B9 register in FADDSP which doesn't match the required number of delay slots for the LDW instruction. This problem is seen in C6000 cgtools v8.1.3 and all other 8.1.x revisions up to v8.1.8. It only happens if compiler options -O2 or -O3 are used. The generated code is working as expected if using C6000 cgtools v8.2.0 or newer. Right now the customer is evaluating, if it's safe to switch to the newer compiler version (v8.2.x) even in a very late development stage of his project.

His basic question I couldn't answer so far: Is the compiler error described above known, and if that's the case, is it ensured this problem was fixed in the newer v8.2.x releases? I'm hoping that somebody from the compiler team will have the answer. If needed I can also provide test code which allows to reproduce the error.


Best regards,

Manfred

  • Manfred Becker said:
    There's just one NOP between reading data into the B9 register using the LDW instruction and using the B9 register in FADDSP

    That's probably intentional.  The <0, 44> (and the like) in the comments indicates this code is inside a software pipelined loop.  Code inside such a loop can be scheduled out of order.  In this specific case, that means the compiler intends to read the old value of B9 even though a new value is in flight to it.  You can't do that while interrupts may occur.  But, inside a software pipelined loop, interrupts are disabled.  

    So far, then, this code does not look wrong.  Nonetheless, the NaN means something is wrong.  The could be due to an error in either the user's code, or the compiler.  It is not common, but I have seen compiler optimization uncover problems in the source code.  

    Is it practical to narrow the scope down to something like: These are the inputs to the function, this is the output seen when no optimization is used, and this is the output seen when -O2 is used.  Hopefully, that's practical, and you can follow the directions in the article How to Submit a Compiler Test Case.

    Thanks and regards,

    -George

  • Thank you for submitting a test case by other channels.  I can reproduce the problem behavior.  On the first iteration of a loop, an MPYSP instruction is reading a register which is not initialized within the function, and thus contains an unknown value.  I submitted the entry CODEGEN-6159 in the SDOWP system.  You are welcome to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • George,

    I did some more analysis on the loop code generated by the compiler. The PIPED LOOP KERNEL code is using conditional instructions to exclude part of the instructions for the first iteration of this loop. Usage of uninitialized register data happens during this first loop iteration, resulting in warning flags like NaN to be set in FADCR, FAUCR, and/or FMCR registers. Actually the generated invalid results generated during the first loop iteration are not stored to memory because related store instructions are executed conditionally as well, and the condition for the store instruction is still FALSE. Means, the generated output of the optimized loop is correct, but unfortunately warning flags inside the floating-point configuration registers get set.

    I could see the same behavior even with never v8.2.x or v8.3.x compiler versions if using build option "--opt_for_speed=3" in combination with -O2 or -O3.

    The customer was testing for several warning flags of the floating-point configuration registers at the end of the audio processing loop to detect calculation problems which may lead to faulty audio data, just to have the chance to mute audio before sending invalid data to the amplifier. If looking at the above described mechanism for setting these warning flags, does that mean the FADCR, FAUCR, and FMCR cannot be used for this kind of error detection?

    Best regards,

    Manfred

  • This situation is addressed in the C6000 compiler manual.  Please search for the sub-chapter titled Floating Point Control Register Side Effects.  

    Regarding the MPYSP instruction I mention above ... During the first iteration of the loop, it executes even though one of the input registers is not initialized.  We call this speculative execution.  Here is a method to prevent the compiler from using speculative execution that can change the bits in floating point control registers.  Add this line near the start of the file ...

    #include <c6x.h>

    That provides a definition of the floating point control registers FADCR, FAUCR, and FMCR.  Then, near the end of a function where you want to prevent speculative execution, write a statement that merely refers to one of these registers ...

    FADCR;

    This causes all the floating point instructions to be predicated just like the load and store instructions.  Another way to say this: There is no more speculative execution of floating point instructions that can change the floating point control registers.

    This change does not appear to cause much performance impact on the code example you sent.  However, it is possible that it will cause other functions to run slower.

    Note the reference to FADCR must appear inside each function where speculative execution should not occur.  This may not be practical in the full application.

    Manfred Becker said:
    does that mean the FADCR, FAUCR, and FMCR cannot be used for this kind of error detection?

    If it is not practical to prevent speculative execution as I describe in this post, then yes.

    Thanks and regards,

    -George