This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Slow Shared RAM performance on OMAPL137

Hi,
I was evaluating OMAPL137's Shared RAM performance.
The basic approach is to perform data copy from one SHARED RAM location to another.
The experiemental result showed that it took about 1us to copy 1 byte, which is very slow.
Of course, the result was averaged based on a large copy.

Shared RAM is considered internal RAM and hence is expected to run fast. However, experiement result showed otherwise.

Does anybody has the same oberservation?

My ARM Processor running at 40MHz

  • Hi Guosheng,

    How did you test this ?

    At what core are you using to test this and operating frequency ?

    Have you tested the same for SDRAM and internal RAM (DSP & ARM) and what about those results if any ?

    My ARM Processor running at 40MHz

    Why do you run ARM at 40MHz ?

  • Stalin,

    I run ARM at 40Mhz for power consumption reason. I tested it on OMAPL137.

    I created test code below to run for Shared Ram test. It gives about 7us/byte of copy, actually even worse then what I previously observe. (My code executed on Shared Ram also)

    I have also tested SDRAM (external, running at 71MHz, again, lower frequency for the sake of power). . Same code executing on SDRAM with SDRAM data copy, The performance is slightly worse.

    And for all the tests I run, DSP is in suspended state so there's no contention of Shared memory between processors that could slow down the performance.

    ------------------------------------------Main.c-------------------------------------------

    typedef unsigned int Uint32;
    typedef unsigned char Uint8;
    typedef volatile unsigned int VUint32;
    typedef volatile unsigned char VUint8;

    // System Control Module register structure
    typedef struct _DEVICE_SYS_MODULE_REGS_
    {
      VUint32 REVID;              //0x00
      VUint32 DIEIDR[4];          //0x04
      VUint8  RSVD0[12];          //0x14
      VUint32 BOOTCFG;            //0x20
      VUint8  RSVD1[20];          //0x24
      VUint32 KICKR[2];           //0x38
      VUint32 HOSTCFG[2];         //0x40
      VUint8  RSVD2[152];         //0x48
      VUint32 IRAWSTRAT;          //0xE0
      VUint32 IENSTAT;            //0xE4
      VUint32 IENSET;             //0xE8
      VUint32 IENCLR;             //0xEC
      VUint32 EOI;                //0xF0
      VUint32 FLTADDRR;           //0xF4
      VUint32 FLTSTAT;            //0xF8
      VUint8  RSVD3[20];          //0xFC
      VUint32 MSTPRI[3];          //0x110
      VUint8  RSVD4[4];           //0x11C
      VUint32 PINMUX[20];         //0x120
      VUint32 SUSPSRC;            //0x170
      VUint32 CHIPSIG;            //0x174
      VUint32 CHIPSIG_CLR;        //0x178
      VUint32 CFGCHIP[5];         //0x17C
    }DEVICE_SysModuleRegs;
    #define GPIO_BASE               0x01E26000
    #define SYSTEM ((DEVICE_SysModuleRegs*) 0x01C14000)
    #define GPIO_DIR23              *( volatile Uint32* )( GPIO_BASE + 0x38 )
    #define GPIO_OUT_DATA23         *( volatile Uint32* )( GPIO_BASE + 0x3C )
    #define GPIO_SET_DATA23         *( volatile Uint32* )( GPIO_BASE + 0x40 )
    #define GPIO_CLR_DATA23         *( volatile Uint32* )( GPIO_BASE + 0x44 )


    void DevicePinMuxControl(Uint32 regOffset, Uint32 mask, Uint32 value)
    {

      SYSTEM->PINMUX[regOffset] &= ~mask;
      SYSTEM->PINMUX[regOffset] |= (mask & value);

    }

     

    void BenchRamSpeed()
    {
    #define LOOP_IN_MAX  1508

        Uint32 LOOP_OUT_MAX= 1;
        Uint32 tmp;
        Uint32 loopOut;
        Uint32 loopIn;
        volatile Uint8 *pSrc;
        volatile Uint8 * pDst;

        SYSTEM->KICKR[0] = 0x83e70b13;  /* Kick0 register + data (unlock) */
        SYSTEM->KICKR[1] = 0x95a4f1e0;  /* Kick1 register + data (unlock) */

        DevicePinMuxControl(11, 0x0000F000, 0x00008000); // UART1_TXD --> GPIO
        tmp = GPIO_DIR23;
        tmp &= (~(1<< 26));
        GPIO_DIR23 = tmp;

        //reset
        GPIO_CLR_DATA23 = (1<< 26);

        //Performance big loop
        //pSrc = (Uint8 *)0xC0000000;
        //pDst = (Uint8 *)0xC0000400; //


        pSrc = (Uint8 *)0x80001000;
        pDst = (Uint8 *)0x80001400; //

        GPIO_SET_DATA23 = (1<< 26); // set to start
        //for (loopOut = 0; loopOut < LOOP_OUT_MAX; loopOut++)
        {
    //        pDst += loopOut;
    //        pSrc += loopOut;
            for (loopIn = 0; loopIn < LOOP_IN_MAX; loopIn++)
            {
    //            GPIO_SET_DATA23 = (1<< 26); // set to start
    #if 0
                *(pDst)= * (pSrc);
    #else
                *(pSrc+loopOut) = *(pDst+loopOut);
    //            pDst+=2;
    //            pSrc++;
    #endif
     //           GPIO_CLR_DATA23 = (1<< 26); //clear to end
            }
        }
        GPIO_CLR_DATA23 = (1<< 26); //clear to end

    }


    /*
     * main.c
     */
    int main(void) {
     
        while(1)
        {
            BenchRamSpeed();
        }

     return 0;
    }

     

  • Hi Guosheng

    Please look into the following wiki which has benchmark on shared RAM access.

    http://processors.wiki.ti.com/index.php/Shared_RAM_Access_Considerations_on_OMAPL1x/C674x/AM1x