This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Binary Comparison for builds with same source mismatch

We're tracking an issue where most of the users see that the binary difference between two eabi elf files that are converted to binary using hex6x tool are different.  It is a very inconsistent issue;  that is, the user can build twice and get the same binary output, then get something different on the third.  We are all using different OSes;

Windows XP 64-bit, Windows XP 32-bit, Windows 7 64-bit. 

Another user had to build 7 times for the binary file to be different.  Because of the inconsistency of generating different binaries, for the same source code, it almost feels like a "race" condition. 

Looking at the symbol map file, its usually the .text section that changes.  Further analysis reveals which object (.obj) file to look at to determine where the issue is.  That being said, for each build, the issue "moves" around.  If its a  structure definition in one build, its an __STI__ function in another.

Any help is appreciated. 

Thanks,

W.

Code Composer Version Version: 5.1.0.09000

Windows XP, Windows 7, 32-bit and 64-bit.

Project settings: -O2, -no debug symbols.

Build settings:  gmake -k -j ${NUM_PROCESSORS} 

--when we turn off num processors flag, we still see the problem.

  • What is the version of the compiler you use?  Please see this wiki article for more detail about tools versions.

    Thanks and regards,

    -George

  • George,

    Code Composer Studio Compiler Tools : 7.3.1
    BIOS: 6.32.5.54
    C6000 IPC 1.23.5.40
    NDK (Target Content) 2.20.6.35
    XDCTools 3.22.4.46
    RTSC/XDCtools (Target Runtime Support) 3.23.2.47
  • Based on what we know so far, we do not know the root cause of the bug, nor do we know exactly what might fix it.

    I have two suggestions to consider.  They both involve upgrading the compiler to a more recent version.

    We are aware of one bug which roughly meets this description.  The first release which fixes this bug is 8.0.x.  Are you able to upgrade to this release?  To understand what such an upgrade means, please see this wiki article.  Also, 8.0.x only supports EABI, and not the older COFF ABI.  If you build for COFF ABI, then you cannot use 8.0.x.

    A lower risk upgrade is to change to version 7.3.23.  This is the latest release in the 7.3.x release stream.  The only difference from 7.3.1 is bug fixes.  There is no guarantee it fixes your particular problem.  And we know it has the problem that is not fixed until 8.0.x.  

    Thanks and regards,

    -George

  • George,

    Thank you for your reply. Upgrading our CCS environment would involve many man hours of testing and integration. Perhaps, the situation we're seeing can be mitigated by another means, that would instruct, or trick the compiler/linker into working around it. This may be more optimistic, than realistic, granted.

    Is there a link you could provide in your next reply to Texas Instrument's bug database that would hold a description of the issue you may think is related ?

    Thanks in advance,

    -Will

  • William Martin said:
    Is there a link you could provide in your next reply to Texas Instrument's bug database that would hold a description of the issue you may think is related ?

    We do have an entry in the SDOWP system.  But it needs some changes and additions to make it usable by general customers.  Rather than do that, it is easier for me to summarize it here.  A customer experienced the compiler generating different assembly output on builds that should see no difference.  It was difficult for us to reproduce and track down.  But we finally found the root cause.  The cause lies in an error deep in the infrastructure of the compiler.  Uninitialized memory was being read.  The compiler infrastructure has since undergone a significant change.  The customer agreed to upgrade to a version of the compiler which uses the new infrastructure.  Thus the bug in the old infrastructure was not fixed.

    William Martin said:
    Perhaps, the situation we're seeing can be mitigated by another means, that would instruct, or trick the compiler/linker into working around it.

    Unfortunately, this is not practical.  The uninitialized memory read occurs in a processing phase that is many times removed from the C source.  It is not possible to say, with any certainty, how it can be avoided.

    Thanks and regards,

    -George

  • Hi George,

         Thanks. 

    1.  Can you tell me which version of Code Composer Studio would hold the CGT tools version you mentioned above?

    2.  Can you zip up the example project/source which can cause the issue to occur?

    Will  

       

  • William Martin said:
    Can you tell me which version of Code Composer Studio would hold the CGT tools version you mentioned above?

    The bug was reported in C2000 compiler version 5.2.15.  All TI compilers, including those for C2000 and C6000, are implemented from a common code base, with customization as needed.  The bug is from the common code.  CCS ships on an independent schedule, and uses whatever compiler versions are most recent at the time of release.  This compiler version probably went with CCS 5.5, though it might be CCS 5.4.

    William Martin said:
    Can you zip up the example project/source which can cause the issue to occur?

    It's a C2000 project.  And it is customer code that I cannot make available.

    Given that you have focused tightly on this one bug, I should point out that we do not know if this bug is the cause of your problem.  It could be something else entirely.

    Thanks and regards,

    -George

  • George,

    Does this issue occur only when optimization settings are turned on? I'm experimenting with my build to isolate the factors further and didn't see the binary comparison between two unoptimized builds change for 6 iterations. I wrote a script to get a larger sample.

    Unoptimized settings are -g, clear the size, speed settings and then in the optimization tab, clear the setting for optimize (from -O5). 
     
    thanks,

    Will

  • Whether optimization settings will or will not expose the problem changes from source file to source file. There is no safe option set.
  • George or Archaeologist,

    Should the latest Compiler version work with CCSv5.1 that Will is using?

    Regards,
    RandyP
  • It is very likely that you will have no problems.  But I cannot say I am as confident about it as if you were using CCS 6.1.  

    Thanks and regards,

    -George

  • When I attempt this, I get some really interesting warnings/errors

    I have to hand-type.

    C60_abi.c:356 internal warning #10282: (".bss")
    C60_abi.c:356 internal warning #10282: (".neardata")
    C60_abi.c:356 internal warning #10282: (".rodata")

    Undefined symbol
    _________________
    __c6xabi_unwind_cpp_pr0
    __c6xabi_unwind_cpp_pr3
    __c6xabi_unwind_cpp_pr4
    __cxa_allocate_exception
    __cxa_throw

    I read something about grouping these together in a GROUP so that they are close together, because they all end of as near data, per definition. That didn't seem to fix it.

    Any thoughts on these? Are the Undefined symbols because of the warning #10282s?

    Thanks,
    Will
  • I don't recognize ...

    William Martin said:
    C60_abi.c:356 internal warning #10282: (".bss")
    C60_abi.c:356 internal warning #10282: (".neardata")
    C60_abi.c:356 internal warning #10282: (".rodata")

    I presume these appear in the Console window when you build.  It would be good to see the entire console output from a build.  Create the text file with the icon in the upper right bar of the Console view titled Copy Build Log.

    To see these ...

    William Martin said:
    Undefined symbol
    _________________
    __c6xabi_unwind_cpp_pr0
    __c6xabi_unwind_cpp_pr3
    __c6xabi_unwind_cpp_pr4
    __cxa_allocate_exception
    __cxa_throw

    ... you must be building C++ code with exceptions.  Is that correct?  Those symbols are defined in the RTS library.  This library is not shipped with the compiler installation, but the linker should automatically build it for you when needed.  That must not be happening for some reason.  Seeing all the console build output should help explain what happened.

    Thanks and regards,

    -George

  • George Mock said:
    C60_abi.c:356 internal warning #10282: (".bss")

    The internal text for that error message is: "While defining Static Base symbol, .bss not placed. It must have a run address to allow definition of __TI_STATIC_BASE__."  Make sure your linker command file explicitly places the .bss section.

  • thanks- --I disabled the exceptions and don't recall that warning/message showing up anymore. The linker command file has .bss, .rodata, and .neardata all "near" each other in a group. I have successfully built the image in Code Composer v.5.x, using the 8.x CGTs.

    I'm now running the build over and over, in a script, to determine if the CRC changes between builds. I'll post when I know more.
  • George,
    is the bug that was fixed in 8.0.x only affecting output files of ELF format? That is, if I switched my project over to COFF format, would there be no risk of that same compiler issue?

    Thanks
  • COFF ABI is not supported in the C6000 compiler versions 8.0.x and higher.

    Thanks and regards,

    -George

  • George,

        Thank you.  I believe your first answer pertained to our situation, where a person would stay at CCS v.5, using CGT 7.x.    COFF may fix the problem, or it may not. 

         I have had a chance to migrate our projects over to CCS v.6, but am waiting on a few outside libraries to be built with conformity to the toolset.  I'm getting linker errors stating I can't use these libs built with CGT 7.3x. 

    Thanks,

    Will  

        

  • Just another data point, I'm seeing the same CRC issue with COFF formatted output files. I switched to use COFF in my projects and its still there. No reply needed from TI support, just another factoid on my current situation. I still have not gotten all the CCS 6.x build put together. We'll have an answer next week.

  • I think we need to take a step back and re-analyze this issue. From the TI side, we don't have a sufficiently precise description of the problem to say with any certainty at all whether what you are facing is a known defect. We've guessed at a few things that, had we been right, would have been a quick analysis, but I'm not convinced. I'd like to ask some questions to make sure we're barking up the right tree.

    You say that your programs usually differ in the .text section. Have you looked at the disassembly for two .out files where the difference occurs? Is the difference a slight reordering of some instructions? Is the register allocation slightly different? Do the programs operate correctly except for the failed binary comparison?
  • Archaeologist,

    From another post on Saturday(?), I can't see the posting in this view. I see you were trying to accumulate all the facts about the problem.

    1.  Here's an example of the re-ordering (of registers only), from two actual builds.  There are much more changes in the Assembly dump, but this was the easiest to hand-copy from our closed system.   I've seen instructions actually be re-ordered as well.

    File1.asm , build 13

    CALLP                   .S2                         __mpyd, B3

    ||           MV                        .L2X                      A5, B5                                 ; | 219|

    ||           MV                        .L1                         A11, A4                              ; | 219|

    ||          MV                        .S1                         A10, A5                               ; | 219|

     

    File1.asm, build 14

    CALLP                   .S2                         __mpyd, B3

    ||           MV                        .L2X                      A5, B5                                 ; | 219|

    ||           MV                        .L1                         A11, A5                              ; | 219|

    ||          MV                        .S1                         A10, A4                               ; | 219|

    2.  The object file sizes are different.  The dump of the assembly of those object files shows re-ordering of assembly, different units (in some cases), and different registers being used.

    3.  I've reproduced the problem in both COFF/ELF formatted output now with CGT 7.3.1 (same listed above). 

    4.  Still working on the CGT 8.x solution.  Sometime this week we'll have the answer.

    5.  Optimization seems to make the problem happen.  I haven't seen the problem in an unoptimized build. 

    That's all I can remember from your post on Sat.  Let me know what else you need to match this up to the SDO you have. 

    -Will

  • Do the different executables still run correctly?
  • We can't confirm or deny this. It is believed there is no problem running with an image that has a different binary makeup. My analysis of the assembly language thus far hasn't shown we are losing, or changing, computation of said algorithms.
  • Archaeologist,

    During some analysis of assembly, I see types of data that either use math functions (e.g., sqrt) or are const float in a namespace get re-ordered in the generated assembly code most frequently.
    E.g.,

    filename: MyTypesFile.hpp

    #ifndef guarded

    #define guarded

    namespace MyVariables
    {
    const float GENERIC_SPEED_OF_HYPERLOOP = 714.123; // mph
    const float GENERIC_ALTITUDE = 200.0; // above sea level
    const float AIR_VELOCITY = 1.13;
    const float AIR_DENSITY = 1.225; // kg/m^3
    const float FORCE_OF_DRAG = (AIR_VELOCITY * AIR_VELOCITY) * 1/2 * AIR_DENSITY; //not the complete equation

    };

    #endif
    Then, in a class this is used, these get defined at run-time in STI__classname.cpp. At this point, I see the variables be defined in different locations when comparing between builds...sometimes 2-3 lines apart, and sometimes 10 lines.

    After looking at the object file dump of these constants, some appear in the .cinit section, and others in .text.  


    Is this something you might believe is occurring due to the SDO in question?

    -Will

  • Possibly, but I suspect an earlier stage of the compiler. Let's continue to narrow it down. Please add the options --src_interlist and --keep_asm and diff the resulting .asm files. You should see optimized C code in the assembly comments. I want to know if those differ when you see the assembly code differ.
  • Archaeologist,

         I have two answers for you.  I turned on just those two options. 

    1.   I have one file, in one build, that has exhibited no change in the src interlist.  I see the branch labels have changed, however.  E.g.,  C$73  in one file, C$74 in another, then you have B C$73 and B C$74, respectively.   Essentially, branching to a new label created by the compiler/optimizer.

    2.  I have another build where two files are different.  The one in #1 shows up again, with same differences.  However, another file has very distinct, and noticeable, differences in the source interlist.  In some cases, it looks innocuous.  In one case, however full instructions are completely missing in the differences.  Posting an example:

    ;C$L50 ;C$L49
    ; PIPED LOOP EPILOG ; PIPED LOOP EPILOG
    ;C$L51: ;C$L50:
    "------G29:" "------G29:"
    "------U$351: = (const double* const) obj+2096; "------K$350: = (const double* const) obj+2096;
    "------U$354: = (const double*) this+2096; "------K$351: = (const double*) this+2096;
    "------U$361: = (double* const) obj+2104; " 185 --------L$11 = 3
    "------U$365: = (const double* const) obj+2112; ";--------U$76=0   ;missing on left side
    "------U$368: = (const double*) this+2096; <NOTHING>
    (REPEATED assembly of above) <NOTHING>
    … (REPEATED assembly of above) <NOTHING>
    "------U$410: = (const double*) this+2160; <NOTHING>
    " 185 --------L$11 = 3 <NOTHING>
    ";-------   #pragma MUST_ITERATE (3,3,3) ;-------   #pragma MUST_ITERATE (3,3,3)
    ";------   #pragma LOOP_FLAGS(4096u) ";------   #pragma LOOP_FLAGS(4096u)

    So, where its "repeated" on the left, there is no equivalent on the right.  I am hand typing from a closed system, and putting "repeated" in cases where the tag increments, and the assembly looks similar to what I've typed with pattern ( C$N: = (const double* const) obj/this+X)).    These lines aren't just shifted down later in the second build.  They are not found later in the assembly code in the right file.  Just to be clear, I'm not making a statement about the correctness, or executability, of the code, only that you were looking for specific cases where they are different. 

    What would you like me to try next?

    -Will

  • This is definitely not the bug that George originally referred to. I'm going to have some other experts look at this. If they don't recognize this as a known issue, we're most likely going to need a test case to analyze.
  • Okay, we have a new candidate for the culprit bug: SDSCM00047263. This bug was fixed in C6000 version 7.3.13. Upgrade to that version or higher and see if the differences in the optimized C code go away. Either way, you should seriously consider upgrading to the latest release on the 7.3.x branch, as there are many bug fixes between 7.3.0 and 7.3.23
  • Archaeologist, would you believe the binary comparison between a 7.3.0 build and 7.3.23 build would be exactly the same, after pushing through the hex6x tool to generate a flashable binary? I upgraded, am running into a new "warning" of NDK has some libraries in "COFF" format, so I'm not certain if that would affect the final binary comparison between 7.3.x and 7.3.23 or its just they would be different. From a quick glance at the outputted assembly for both, I see many differences (I've only compared one instance, of the 7.3.1 build to 7.3.23); .fields are now .bits, and c interlist/assembly is different.

    Thanks,

    Will

  • I would expect it to be similar between 7.3.0 and 7.3.23 unless one of the bug fixes affected that bit of code. However, I would not at all be surprised to see a few minor differences come up (such as .bits), especially given the long gap between patch versions.
    What's more important is to see if 7.3.23 is deterministic. You should be comparing multiple runs of 7.3.23, not comparing 7.3.23 to 7.3.0
  • Agreed. Software management will ask the amount of variability one can expect with the upgrade. I'm satisfied that we know it will be different. The amount (time and $$) of testing and analysis will be thorough. Thanks.

    I've performed 15 builds with consistent binary output.