This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

  • TI Thinks Resolved

Compiler: cl6x compiler to produce bitwise reproducible output

Prodigy 160 points

Replies: 16

Views: 498

Tool/software: TI C/C++ Compiler

Hello,

I need cl6x compiler to provide bitwise reproducible output (see also https://reproducible-builds.org.e. multiple compilations of the same source base (done by different users, in their directories) should give exactly the same binary. I am using CGT 7.3.23.

Two issues found:

Issue #1

I've found random bytes changed in a binary. I've found that compiler is creating temporary file, which then compiles, and that temporary file name is included into binary' .symtab section :

$  readelf -s .symtab myobject.obj

Symbol table '.symtab' contains 999 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL HIDDEN ABS 07894VfUHkc
[...]

How to get rid of this? This entry here is meaningless, as mentioned temporary file is anyway removed after compilation.

I did some reverse-engineering, and it seems that compiler is using some sort of gen_tempname function (https://github.molgen.mpg.de/git-mirror/glibc/blob/master/sysdeps/posix/tempname.c)., as I can see getpid and getimeofday syscalls when I execute the compiler (using strace tool). But I am unable to use LD_PRELOAD, as compiler is statically linked...

Issue #2

Build path is included into the binary's debugging symbols. I would like to be able to map vairable string into some arbitrary one. Similarly, GCC provides the following option: -ffile-prefix-map (see description here: https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html 

  • I am interested in this topic, because this is one of blockers to deliver to substantial build time savings, by improving build cache hit ratio.
  • I have similar problem in https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/743189. I have found the following workaround. I set "keep generated assembly language (.asm) file (--keep_asm, -k). in Assembler Options. This results in deterministic names.

  • In reply to Lukas Sk:

    I saw that request, but do not see any solution, and topic was locked, thus created my own (and I have issue with cl6x, not arm compiler).

    Marvelous! That somewhat workarounds first issue I mentioned. Not a production though... Those assembly files will significantly increase build dir size (4 times of object file size). When I have thousands of objects (yeah, quite a big project), then I can count that in gigabytes...

    In worst case, I'll just implement yet-another-cl6x-wrapper (among "line buffering" wrapper and similar ones), which will instantly remove those files afterwards. Or use --asm_directory=$(BUILDDIR)/trash, and remove it after build. Need to rethink...

    Thanks anyway, that is some good initial approach until we get something production-ready from TI experts.

  • In reply to Bartlomiej Kucharczyk:

    Now, when using --keep_asm, I also get some numbers, which are the same as last modification timestamp of the compiled *.asm file.

    Looking at binary hexdump, I see strings like the following:
    /path/to/asm/file.asm:$C$L6:1546604415

    Looking at file.asm:
    $ stat /path/to/asm/file.asm -c "%Y"
    1546604415

    So, it turned out that setting --keep_asm does not solve my issue...

  • In reply to Bartlomiej Kucharczyk:

    A good summary on this topic is in this forum thread.  

    Consider using the utility objdiff from the cg_xml package.  By default, it ignores the debug information and the symbols.  This reduces the constraints imposed on the build.

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click This Resolved My Issue on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    Hello George,

    Thanks for the answer. It shed some light on the topic.

    I agree that some aspects of build process are not compiler/linker responsibility (e.g. maintaining order of inputs), but some other are, and I think that my request address such things.
    When I execute the same command, on the same host, in the same directory, I'd expect exactly same result (i.e. md5sum/sha256sum should match in both).
    Or I'd expect at least some easy method to fake build environment, so that compiler gives predictable results...

    In such case, I can keep only fingerprint (e.g. md5sum hash) of an executable + environment description (a few kilobytes), and compare rebuilt binaries with it, to assure I got exactly the same content (using tools that are available on any linux box). I cannot imagine how to achieve this efficiently with objdiff...

    Argument that "we don't test something, thus not delivering" does not seem to be relevant in this discussion. It's not a matter of testing or not, but willingness to support this kind of use case, and actually start doing anything related to this. And, based on amount of similar questions to mine, it seems there are some people who would be interested in bitwise identical binaries.

    So, maybe question should be: will you add tests (and support) for this?

  • In reply to Bartlomiej Kucharczyk:

    The solution currently provided by TI compilers does not work this way ...

    Bartlomiej Kucharczyk
    When I execute the same command, on the same host, in the same directory, I'd expect exactly same result (i.e. md5sum/sha256sum should match in both).

    Instead, some executable or library is established as the baseline, and then objdiff is used to test whether subsequent builds are the same.

    Bartlomiej Kucharczyk
    maybe question should be: will you add tests (and support) for this?

    Unfortunately, that is not on our roadmap.

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click This Resolved My Issue on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    Hmm... that's sad news.

    Can anything be done you add this topic into your roadmap?

    Anyway, how I could compute a fingerprint (e.g. MD5 hash) of an executable/library, that could be used later to compare against newly built executable/library? 

    It is also acceptable for me to get some way to strip those debugging symbols (strip6x tool did not work for me -- still some build paths were in the objects).

  • In reply to Bartlomiej Kucharczyk:

    Bartlomiej Kucharczyk
    Can anything be done you add this topic into your roadmap?

    I filed CODEGEN-5738 in the SDOWP system.  This does not report a bug, but requests support in the compiler for reproducible builds.  You are welcome to follow it with the SDOWP link below in my signature.  (However, it seems SDOWP is having problems today.  It should be resolved soon.)

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click This Resolved My Issue on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    Thank you! In the meantime, I've made a tool which is erasing some of the useless data from the binary:
    github.com/.../erase.py

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.