Tool/software: TI C/C++ Compiler
Hi C2000 Team,
My customer is looking into using the F280049 and has notice this compiler discrepancy. They are using the F2808 in their previous design.
I’ve been doing some testing with the F2808 and the F280049 in an attempt to quantify the performance increase we could expect from a migration from the ’08 to the ‘0049. Before I can get into the “meat” of these tests, though, I need to do some baseline testing to make sure everything is “set” correctly. So I created some simple test code to execute on both, and I get good results. Here’s my test code (matrix multiplication):
for( i = 0; i < N; i += 1 )
{
for( j = 0; j < N; j += 1 )
{
for( k = 0; k < N; k += 1 )
{
C[i][j] += A[i][k] * B[k][j];
}
}
}
My Environment:
CCS v 6.2.0.00050
Compiler version TI v18.1.1.LTS
Test code executes from M0 ram
Core = 100MHz
Matrices A, B, and C are all 64x64
This snippet of code was purposely written “textbook” style, without any regards to optimization (this is part of the test). Furthermore, the algorithm itself it is a good representation of how we currently use the F2808, so it should give me a good idea of how the F280049 would perform for the same application. I measure the execution speed by toggling a GPIO pin after each pass and watching it on a scope (the old fashioned way).
I’m glad to say that the assembly code generated by the compiler for each processor is identical, as expected (thankfully), and both execute in exactly the same time when both are running their cores at 100MHz (82.1ms execution time per pass). This is a good baseline for my performance comparison investigation, and I am happy to see this.
However, when I set the compiler optimization level to “2” (speed = 2 = default), the F2808 gains an advantage in terms of execution speed. Upon further investigation, I found that this advantage is a result of a difference in the compiler optimizations, which is puzzling. The F280049 fully supports the entire F2808 instruction set, so I would expect (at the very least) identical assembly code generated – but that is not the case. The F2808 optimizer “wins” by creating a tighter “inner loop” for this test code snippet (see assembly listings below). This results in the F280049 taking 11% longer to perform the calculation. Should I expect my code to run 11% slower on the F280049 if I decide to migrate? Please tell me “no” and that a new compiler revision is in the works.
Below is a side-by-side assembly listing (generated by the compiler) for the F280049 and F2808 – I’ve highlighted the differences that result in a better optimization for the F2808 (not sure why there is ANY difference).
I literally copied the F2808 asm code and place it in the F280049 “main()” procedure and it worked fine. Execution time is now identical between the two processors. Still not sure why the F280049 compiler chose a sub-par solution.
Please let me know if you have any question, I appreciate your feedback.
Regards,
~John