This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

  • TI Thinks Resolved

Compiler: cl6x compiler to produce bitwise reproducible output

Prodigy 160 points

Replies: 16

Views: 629

Tool/software: TI C/C++ Compiler

Hello,

I need cl6x compiler to provide bitwise reproducible output (see also https://reproducible-builds.org.e. multiple compilations of the same source base (done by different users, in their directories) should give exactly the same binary. I am using CGT 7.3.23.

Two issues found:

Issue #1

I've found random bytes changed in a binary. I've found that compiler is creating temporary file, which then compiles, and that temporary file name is included into binary' .symtab section :

$  readelf -s .symtab myobject.obj

Symbol table '.symtab' contains 999 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL HIDDEN ABS 07894VfUHkc
[...]

How to get rid of this? This entry here is meaningless, as mentioned temporary file is anyway removed after compilation.

I did some reverse-engineering, and it seems that compiler is using some sort of gen_tempname function (https://github.molgen.mpg.de/git-mirror/glibc/blob/master/sysdeps/posix/tempname.c)., as I can see getpid and getimeofday syscalls when I execute the compiler (using strace tool). But I am unable to use LD_PRELOAD, as compiler is statically linked...

Issue #2

Build path is included into the binary's debugging symbols. I would like to be able to map vairable string into some arbitrary one. Similarly, GCC provides the following option: -ffile-prefix-map (see description here: https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html 

  • In reply to Bartlomiej Kucharczyk:

    Bartlomiej Kucharczyk
    I've made a tool which is erasing some of the useless data from the binary

    Thank you for the contribution.  But I don't see the advantage of this approach over the one used by objdiff.  objdiff doesn't erase anything, it just skips over the "useless data".

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click This Resolved My Issue on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    For me:

    • I'm able to compute and store fingerprint using md5sum (or other standard hash tool available on linux).
    • It was easier to write it than examining objdiff source code and figuring out what is "important data", to compute hash of that value. Most probably working with some ELF library is better.
  • In reply to Bartlomiej Kucharczyk:

    "When I execute the same command, on the same host, in the same directory, I'd expect exactly same result (i.e. md5sum/sha256sum should match in both)."

    I think this runs afoul of the C standard, since at the very least __DATE__ and __TIME__ are populated with the time and date of the build.
  • In reply to Keith Barkley:

    You are right, thanks for pointing that out.

    However I do not understand how it is in contradiction with C standard. I haven't said anywhere that "all C source code will be 100% bitwise reproducible".:-)

    We simply try to avoid those macros in source code (and other pitfalls that are causing builds to be irreproducible). In case someone is using it, we'll catch that immediately (md5sum will differ even no change in source code/build environment). 

    But let's go one step back, and ask: do you have any reasons for using variable __DATE__ and __TIME__? Because personally, I cannot see any, so that could be good education for me. ;-)

  • In reply to Bartlomiej Kucharczyk:

    Typically as part of a revision string. Someone must think its a good idea, or it would not be an ancient part of the standard. 8^)
  • In reply to Keith Barkley:

    Maybe that was useful when programming was done by carving rocks, and there was no git. ;-) I don't know...
    If you use any version control system, revision string can be generated from version control, deterministically.

    Topic is also discussed here:
    reproducible-builds.org/.../
    reproducible-builds.org/.../

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.