Hi,
I'm trying to run the Linpack benchmark on C6678 evm board. This is the source file of Linpack I'm using.
I saw some slides from TI claiming that you got 11GFLOPs /core using the same chip running that benchmark.
(page 5 of the slides)
So I compiled and run the linpack benchmark on C6678 with same configuration, it could only get 130MFLOPs/core without optimization. Then I manually optimized the source code and turn on all the compiler optimizations but could only get up to 500MFLOPs. (theoretical limit is 16GFLOPs @1GHz)
Even though I didn't use OpenMP and only run the benchmark on one core, it should get to at least some GFLOPs instead of MFLOPs.
So I'm wondering maybe there's something important that I missed to get high performance? Or would you please points to me some directions on how to get the numbers in the slides?
Thanks in advance,
Shang