Hello:
We're using the TI TMS320C6713 and TMS320C6748 in designs currently. We're looking for higher performance for our next generation products. It sounds like the roadmap will not have higher speed parts until 2014, but that will not meet my project schedule, so therefore, I'm looking to decrease the cycle time of the algorithm. From our current timing analysis on our existing design, we’re running at 288MHz, and scaling to the 456MHz clock rate of the higher speed grade part shows that I’ll be about 1% over our timing budget.
Since the C66x part won’t be available for a while, let me ask a few additional questions. First a description of what I’m doing. The application in question will be EDMAing pairs of 16-bit unsigned integers from an external FIFO into buffers in IRAM. The data pairs are scaled and ratioed and then a histogram bin (also in IRAM) incremented. So for each data pair there are a few multiplies, one addition and a divide, then an array dereference and increment. I compared performance of our 6713 design (core clocking at 300MHz) and our 6746 based design (core clocking at 288MHz) and scaled both to the 456MHz clock of the higher speed grade 6746. I know the 6713 isn’t available in the higher speed grade, but it was what I had available at the moment so I used it for my initial benchmark and when I finally got around to running it on the 6746 I was surprised at the results. The 6713 code was compiled with cl6x V5.1.0 under CCStudio V3.1 and the 6746 code was compiled with cl6x V7.32.6 in CCStudio V5.2. Both were compiled at –O3 (no –g) and had the –mv6710 or –mv6740 flags set (for the 6713 and 6746, respectively). Here are the questions:
- The normalized performance results give me about an 83ns cycle time for the 6713 and 101ns cycle time for the 6746. This shows the 6713 performing about 18% better than the 6746. This surprised me as I would have expected the 6713 and 6746 cores to have at least equivalent performance if not better for the C6746 at the same clock rate. Is this what you would expect as well?
- I did set the compiler flags to output the annotated assembler with optimization info. Interestingly, the output is significantly different for the main processing loop. I would have expected much more similar output from the two compilers. Or perhaps if I ask the question this way, would you expect the V7.x compiler give equivalent or better performance than the V5.x compiler?
- I did try modifying a number of things, decrementing the main for() loop instead of incrementing, changing from 16 to 32-bit variables, but with all my tweaking, the above results were the best I could achieve. Is there a single document for the latest version of the compiler that describes coding for performance or is the Wiki the best place to get this info these days? Most of what I have is a couple of years old at this point.
- Finally, is the 456MHz speed grade going to be the highest offered for the 6746 or is an faster part coming in the near future?
Thanks much for any additional info you can provide.
Best regards,
Paul