This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

speed problem OMAP-L138 / C6748 vs. C5510A

Other Parts Discussed in Thread: OMAP-L138, OMAPL138

Hello,

I wanted to test and compare the OMAP-L138 (actually the C6748 core) speed with in our systems currently used C5510A.

Therefore I've written a simple software which multiplicates 1000 times two integer variables and toggles a GPIO Pin when it’s done.

Unfortunately the C6748 is much slower than the C5510A. I guess I didn't configure the device properly.

I'm using the Logic OMAP-L138 development kit.

 

The compiler is set to release compilation and after a while I set the linker command file to use DSP L2 Memory and not shared memory as default.

This settings boosts the speed up to a round about half (!) of the C5510A performance.

 

How could I speed up the device?

The core is running with 300 MHz, entire code and variables are located in the DSP L1 Ram. DSP is waked up and reset released.

The program is set to run Free.

 

Regards

Matthias

 

 

 

  • Matthias,

    What are you using to profile the code?

    So all your program and data are at L1, and L1 is configured as 32K RAM and ) cache?

    You can attach a test case and we can take a look.

     

  • Also, the end trigger is a GPIO toggle, what is the starting trigger?  Another GPIO toggle?

  • Hi,

    Mariana said:
    What are you using to profile the code?

    I'm setting a GPIO Pin high at the beginning of a while loop. After the calculations are done, the GPIO Pin is set to low.

    I'm using an Oscilloscope to measure the time between the transitions.

    ---

    volatile int a,b,c;

    a= 123;

    b=17;

    c=0;

     

    while(1){

             SETBIT(gpio_bank->OUT_DATA, gpio_bit);
             for(i=0;i<1000;i++){
             c= a * b;
             }
             CLRBIT(gpio_bank->OUT_DATA, gpio_bit);
     }

    The time just for the complete FOR-Loop is 117,2 µs. Without the multiplication 47µs.

    ---

    Mariana said:
    So all your program and data are at L1, and L1 is configured as 32K RAM and ) cache?

    Yes, all code and data are located at L1 and it is configured as 32K.

        void config_cache(void)
    {
      CSL_CacheRegsOvly cacheRegs = (CSL_CacheRegsOvly)CSL_CACHE_0_REGS;
      volatile unsigned int stall;

      // Set L1P size to 32K
      CSL_FINST(cacheRegs->L1PCFG,CACHE_L1PCFG_MODE,32K);
      stall = cacheRegs->L1PCFG;

      // Set L1D size to 32K
      CSL_FINST(cacheRegs->L1DCFG,CACHE_L1DCFG_MODE,32K);
      stall = cacheRegs->L1DCFG;

      // Set L2 size to 64k and normal opperation
      cacheRegs->L2CFG = CSL_FMKT(CACHE_L2CFG_MODE,64K)
                       | CSL_FMKT(CACHE_L2CFG_L2CC,NORMAL);
      stall = cacheRegs->L2CFG;

      // Set MAR[192] as cacheable
      CSL_FINST(cacheRegs->MAR[192],CACHE_MAR_PC,CACHEABLE);

    }

    I'm using the OMAPL138_ARM.gel file.

     

    Regards

    Matthias

  • Matthias,

    I see under 81us with this code:

    #include <stdio.h>

    #define Uint32 unsigned int

    #define TMR0_BASE  0x01C20000
    #define TMR0_TIM12 *(volatile Uint32*)(TMR0_BASE+0x10)
    #define TMR0_PRD12 *(volatile Uint32*)(TMR0_BASE+0x18)
    #define TMR0_TCR   *(volatile Uint32*)(TMR0_BASE+0x20)
    #define TMR0_TGCR  *(volatile Uint32*)(TMR0_BASE+0x24)

    #pragma DATA_SECTION(a, ".l1d");
    #pragma DATA_SECTION(b, ".l1d");
    #pragma DATA_SECTION(c, ".l1d");
    #pragma DATA_SECTION(i, ".l1d");
    volatile Uint32 a, b, c, i;

    void main (void)
    {
        Uint32 time_start, time_stop;

        /* Enable Timer */
        TMR0_TCR   = 0;
        TMR0_TGCR  = 0;
        TMR0_TIM12 = 0;
        TMR0_PRD12 = 0xFFFFFFFF;

        TMR0_TGCR  = 0x05;  // 32-bit, TIM12 out of reset
        TMR0_TCR   = 0x80; // Continuous Mode

        while(1) {
          time_start = TMR0_TIM12;

          for(i=0;i<1000;i++) {
            c = a * b;
          }

          time_stop = TMR0_TIM12;

          // Ignore timer overflow
          if(time_stop > time_start) {
            // TMR0 runs at oscin frequency of 24MHz
            printf("Time = %fus\n", ((time_stop-time_start)/24.0));
          }
        }
    }

     

  • Hello,

    I checked the PLL using OBSCLK, and I found that it worked on 300MHZ SYSCLK1.

    I saw them(SYSCLK1 and SYSCLK2 and SYSCLK4) by oscilloscope.

    But the cpu was very slow, and I don't know the reason.

    The speed seems like not 300MHZ but 24MHZ. The EVM has 24MHZ OSCIN.

    I had experience with C6424 and DM648 and C6747.

    The C6748 in EVM was ???