This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66+ cycle accurate simulation for C6678 DSP

Howdy Forum,

I'm trying to verify this result from the test reports using the existing FFT_Example_66_LE_COFF.

DSP_fft16x16_66_LE_COFF_c66-LE-COFF 1 (TCI6608-DevFuncSim-LE) ccs_base Passed " 461 (N=128) 752 (N=256)" " 1472 bytes"

When I run, however, I get the following results:

[TMS320C66x_0]
overhead = 2
start, stop, time fft16x16 = 337239, 337962, 721
start, stop, time fft16x32 = 348360, 349767, 1405
start, stop, time fft32x32 = 360340, 361739, 1397

I should note that I changed N from 256 to 128 in the example project, so my expected result should be the 461 cycles.

My target config is selected as C6678 Device Cycle Approximate Simulator, little endian.  However, when I go to Build Properties, the variant is "Custom C6000 Device".

  1. How should I change this to get the correct cycle estimates found in the benchmarks test report?
  2. How can I change the variant without blowing away all the "includes" and other settings relative to the project?
Thanks!
Jim
  • Jim

     

    I would like to look into your question.  Would you please send me the complete path to the project that you use, including the complete release name.

     

    Thanks

     

    Ran

  • Hi Ran,

    Sure, I'm using the FFT_Example_66_LE_COFF example in the C66x DSPLIB (v3.1.1.1.1):

    • C:\ti\dsplib_c66x_3_1_1_1\examples\FFT_Example_66_LE_COFF
    The only modification to this file is to make use of the TSCL register to get cycle estimates as shown below:

    //Write TSCL reg to start counting
    TSCL = 0;

    //Show me the overhead
    start = TSCL;
    stop = TSCL;
    overhead = stop-start;
    printf("overhead = %d\n", overhead);

    /* Generate the input data */
    generateInput ();

    /* Genarate the various twiddle factors */
    gen_twiddle_fft16x16(w_16x16, N);
    gen_twiddle_fft16x32(w_16x32, N);
    gen_twiddle_fft32x32(w_32x32, N, 2147483647.5);

    /* Call the various FFT routines */
    start = TSCL;
    DSP_fft16x16(w_16x16, N, x_16x16, y_16x16);
    stop = TSCL;
    count_time = stop-start;
    printf("start, stop, time fft16x16 = %d, %d, %d\n", start, stop, count_time-overhead);

    etc...

    Thoughts?

    Jim

  • I run the project on EVm and I got better reading than you, but very similar:

     

    Routine                    N=256                                   N=128

    FFT 16x16               1018                                       621

    FFT  16x32              2346                                       1195

    FFT  32x32              2290                                      1149

     

    Next I am going to "play" with the project to see if I can get the test numbers. Stay tune

     

    Ran

  • Explanation

    The results in the test report were generated using an old version of the simulator that did not take cache banks stalls into considerations.  The simulator that you use (and of course the real hardware that I use) take cache bank conflict into considerations.  Thus the differences.

     

    I submitted a request to correct the numbers

     

    Thanks  Ran

  • Ran,

    EXCELLENT followup, I appreciate it.  

    It's a bummer to hear that, it's like a 30-50% hit on quoted latency...  Bummer!

    Jim