This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Comparing benchmarks: C6747 on OMAPL137 and INTEGRA

Other Parts Discussed in Thread: SYSBIOS

Hello,

we were comparing benchmarks of FFT algorithms on the C6747 on OMAPL137 and INTEGRA. The DSP clock frequencies are 300MHz (OMAP) vs. 800 MHz (INTEGRA). So we expected a speedup of roughly 2.7 times. But our benchmarks show only a speedup of roughly 1.2 to 1.3 ... So we think there's something wrong with our settings, our benchmark or whatever...

We checked the clock of the DSP on Integra and it seems to really be at 800MHz. The benchmark algorithm should make use of the IRAM as much as possible - although with this comparison the absolute speed should not matter too much.

 

I'm attaching the project (CCS5) so I hope I don't forget any important information.

The benchmark is started in main.c (big surprise). The fft_tms_v1.c to fft_tms_v5.c contain the benchmark algorithms: an fft that can calculate many lines in different optimisations. Since we're currently evaluating INTEGRA to see if we can use it in the future, this benchmark is quite important to us. Any help will be very much appreciated.

 

cu

Markus3056.benchmark_Test1_clean.zip

 

 

  • Things seem to be a bit more complicated:

    The state of the benchmark I sent you yesterday was: The integra benchmark is too slow. My colleague found out that the L2 cache size is set to 0 by default (see attached screenshot below).

    So the L2 could not be used and the benchmark was slow. He used the sysbios to configure 64k as L2 cache size (see screenshot) and the benchmark was running much faster. These lines are in the .cfg file:

    ti_sysbios_family_c64p_Cache.initSize.l2Size =

    ti_sysbios_family_c64p_Cache.L2Size_64K;


    Unfortunately, now the fft results were wrong :( Using the api function ti_sysbios_family_c64p_Cache_getSize, he found out that the whole internal ram was used for L2 cache which messed up some data structures and lead to wrong results.

    Now, using the following code, he managed to get faster benchmarks (INTEGRA is 1.9 times faster for a 262144 lines FFT than OMAPL137) with correct results:

    #include <ti/sysbios/interfaces/ICache.h>
    #include <ti/sysbios/family/c64p/Cache.h>

        ti_sysbios_family_c64p_Cache_getSize(&csize);
        csize.l2Size = ti_sysbios_family_c64p_Cache_L2Size_64K;
        ti_sysbios_family_c64p_Cache_setSize(&csize);

     

    I would like to know why the Cache setting in the config file was not working. It seems to mean that we can't rely on the settings that we make in the config file, which is a little bit disturbing...

     

    Here come the screenshots for the settings that we made:

    Thanks for your help,

    Markus

  • Markus,

    Could you let me know what version of SYSBIOS you are using?

    Since you are changing the Cache.initSize in your .cfg file, you need to make sure that you aren't placing code/data into the part of L2 RAM that will become cache.  Would it be an option for you to change your platform to set the L2 Cache to "64K" instead of in the .cfg file?  This would prevent any code/data in L2 RAM from being placed into the memory that will become the cache.

    How did you determine that all of L2 is being use as cache when doing the setting in .cfg file?

    Perhaps post a screen shot of your .map file where it shows the memory segments and usage.  This would also give a clue whether this is the problem.

    Judah

  • Hello,

    I'm forwarding the information of my colleague:

    SYSBIOS version is 6.31.04.27

    The map file:

    ******************************************************************************
                   TMS320C6x Linker PC v7.0.4                     
    ******************************************************************************
    >> Linked Wed Mar 23 12:23:14 2011

    OUTPUT FILE NAME:   <benchmark_Test1.out>
    ENTRY POINT SYMBOL: "_c_int00"  address: c3c21080


    MEMORY CONFIGURATION

             name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      IROM                  11700000   00100000  00000000  00100000  R  X
      IRAM                  11800000   00040000  00000000  00040000  RW X
      L3_CBA_RAM            80000000   00020000  00000000  00020000  RW X
      SDRAM                 c3000000   01000000  00c2d770  003d2890  RWIX


    SECTION ALLOCATION MAP

     output                                  attributes/
    section   page    origin      length       input sections
    --------  ----  ----------  ----------   ----------------
    .pinit     0    c3000000    00000000     UNINITIALIZED

    .data      0    c3000000    00000000     UNINITIALIZED

    xdc.meta   0    c3000000    000000e7     COPY SECTION
                      c3000000    000000e7     benchmark_Test1.p674.obj (xdc.meta)

    .far       0    c3000000    00c107a0     UNINITIALIZED

    ....

     

    My colleage filled the L2 with 0 before the program started with the debugger. After the program finished, the whole L2 area was modified.

     

    Thanks for your help!

     

  • Markus,

    We weren't able to reproduct this on our end.  When we set the L2 Cache to be 64K from our .cfg file, we see that the cache gets configured to 64K correctly.

    We confirm this by going to the L2 cache config register (0x01840000).  It contained a value of 0x2 which maps to 64K.  We also confirmed that doing this at runtime with the Cache API's does the same thing.  In our case, it didn't change the value since it was already set to 64K.

    Since you're placing everything into SDRAM, you don't run the risk of overlapping your L2 RAM with L2 Cache settings unless you were copying into this memory region at runtime.

    My recommendation would be to confirm that the Cache is getting set correctly/incorrectly.  If its actually correct, then determine what's writing the L2 memory that is not cache.  If some L2 Cache was enabled, you should expect some of the L2 memory to be modified.

    Judah