Hi,
I tested double-precision floating-point arithmetic and found that 100 multiplication and addition calculations require 2902 CPU cycles(649270-646338).
It's very slower than the datasheet describe.
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
I tested double-precision floating-point arithmetic and found that 100 multiplication and addition calculations require 2902 CPU cycles(649270-646338).
It's very slower than the datasheet describe.
The GMAC performance in the data sheet is based on raw performance of the DSP core when the code is pipelined across the the DSP instructions pipe. The value provided by compiler may be slightly lower based on how the code is written, what optimization level is used, what is the location of the code and what memory latency it adds, etc
Please locate all code and data to L2 memory, turn on L!P and L1D cache and select the -O3 option with the compiler. Make sure data is aligned to cache line boundary and use simple optimizations as indicated here to see highest performance from the core.
The performance of the core is independently evaluated by BDTI and other customers. You can also check out core benchmarks here and refer to our DSP lib for performance entitlement techniques.
Regards,
Rahul