This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Application goes wrong when apply CCS C Optimization level 3

Hi

I am porting public key infrastructure RSA from PolarSSL (http://polarssl.org/) on C6748, it is working fine except the signing or decryption process is very slow, take roughly 1.5s for each operation. When I tried to optimize it at level 1 then it still work well but the speed is not optimized much, so I tried level 3, but then it goes wrong. I logged and found out the part that not correct, but I don't know the reason and how to fix it.

https://www.dropbox.com/s/6vm4edlfnogun99/bn_mul.h

https://www.dropbox.com/s/tr4nkyd27sjr6zr/bignum.h

https://www.dropbox.com/s/edaqs027jxd3h0y/bignum.c

In file bignum.c, function mpi_montmul to compute Montgomery multiplication: A = A * B * R^-1 mod N, values of u1 is not correct if optimization level = 2 or 3, these values are affected by function mpi_mul_hlp. Look into function mpi_mul_hlp more details, it calls to macros MULADDC_INIT, MULADDC_CORE & MULADDC_STOP, which are implemented optimal based on each chipset such as intel i386, amd, ARM v3, alpha, ..., but certainly not for our C6748, so it will use default C code like below:

#define MULADDC_INIT \
{ \
t_uint s0, s1, b0, b1; \
t_uint r0, r1, rx, ry; \
b0 = ( b << biH ) >> biH; \
b1 = ( b >> biH );

#define MULADDC_CORE \
s0 = ( *s << biH ) >> biH; \
s1 = ( *s >> biH ); s++; \
rx = s0 * b1; r0 = s0 * b0; \
ry = s1 * b0; r1 = s1 * b1; \
r1 += ( rx >> biH ); \
r1 += ( ry >> biH ); \
rx <<= biH; ry <<= biH; \
r0 += rx; r1 += (r0 < rx); \
r0 += ry; r1 += (r0 < ry); \
r0 += c; r1 += (r0 < c); \
r0 += *d; r1 += (r0 < *d); \
c = r1; *(d++) = r0;

#define MULADDC_STOP \
}

So in summery, experts here please help me to find out:

1. Why it is working wrong if optimization level >= 2

2. And how to implement these MULADDC_INIT / MULADDC_CORE for C6748. This part consumes most of the time when doing RSA decrypting or signing, so if we are able to implement it specifically for C6748 it can improves a lot. 

Thank you very much!

Long

  • Long,

    This is a compiler related issue so I am moving this to the compiler forum. Providing the version of compiler you are using might help in replicating this issue.

    What could also help narrow down the issue is if you are able to single step through the code and find which part of the code is causing this issue. For this you will have to build the code in both O2 and O3 with -g (Full symbollic debug) option enabled.

    Regards,

    Rahul

  • Are you using COFF ABI or EABI?  Be aware that in COFF, the type "long" is a 40-bit type, and it looks like the code assumes "long" is a 32-bit type.  Try using EABI.

    Usually programs that work well at low optimization and fail at high optimization have an error in the source code, such as an uninitialized variable.

    Do you get any warnings when compiling this program?

    This test case is not complete.  You do not show the header file config.h, the function main, or the command-line arguments.  We need all of these, plus the version of the compiler (which is not the same as the CCS version).

  • Hi Rahul & Archaeologist

    I am using TI compiler version 7.3.1, and output format is EABI. The reason I did not attach the full & runnable project is because it is quite complicated to separate it with my own project code (private). So I will first try to describe the situation as clear as possible, if could not solve then I will try to prepare a test package.

    There is a promising progress in this problem: I found the place that goes wrong in optimization level 3, and can fix it but what a pity I don't know why it happens and whether or not it is a good solution to go. Please check bignum.c, function mpi_mul_hlp, line 988, there is a loop:

    do {

            *d += c; c = ( *d < c ); d++;
        }
        while( c != 0 );
    And for some reason, the compiler optimized it in a way that value of *d is not correct. At the time before the loop: *d = 0 & c = 0, so it is expected that the loop run only once and value of *d must be 0 after the loop (note that I am referring pointer d before it moves to the next location (d++) ). However it turns out the value of *d is a number other than 0 (I am suspecting it may be the address of pointer d instead). 
    Now, I try to change it this way: 
    do {

          t_uint temp = *d;
         if (c != 0)
                 *d = temp + c;
         c = ( *d < c );
         d++;
    }
    while( c != 0 );

    => It worked now, the value of *d after this loop is 0 as expected (the original pointer d, not d++), but I really confuse of how can it be. Could you please explain what happens here and what is the better solution?

    *** Now, the decryption function is working well, but oh my god, another bug happens :( After decrypting data, I continue to do a RC2 decryption, and the optimization makes it go wrong again, i.e. function psRc2Decrypt in file rc2.c is working wrong, till now I did not find the place and the reason yet.

    https://www.dropbox.com/s/ftt1yd73iavq0t8/rc2.h

    https://www.dropbox.com/s/07oan8yswfrt8nq/rc2.c

    These kind of bugs are really annoying and take a lot of my time to debug and fix, so it is very helpful if you can tell me the reason and how to coding to avoid these bugs in the future.

    Please understand for the inconvenience of not including full project here, and I tried to described all the required information as I could, if you need more information please let me know.

    Thank you very much!

    Long

  • Do you get any warnings when you compile the program?

    You could try upgrading to compiler version 7.3.6, perhaps the problem is a bug that is already fixed.

    I understand that you are reluctant to reveal your source code, but please understand that we must have a reproducible test case in order to analyze the bug, particularly when optimization is involved.  I cannot see any problem with the code fragments you show above.  If the bug manifests in the files you've already sent, it should be fairly easy to create a dummy test harness that populates the input to those functions with the data that would have been calculated by your own code.

  • Hi Archaeologist

    No, I don't get any warning in this code. I thought by showing off the wrong code and the correct code you might figure out something.

    I will try update the compiler to see if it fixed.

    Thanks,

    Long

  • Does PolarSSL come with its own test vectors?  Do they pass on C6000?

  • They have some tests, but I didn't port that part to C6000. I will also check if I can port a specific test case and give it to you.

    Thanks!