This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[OMAP-L137] ARM and DSP performance sanity check

Other Parts Discussed in Thread: OMAP-L137

I have just fired up the ARM side of my OMAP-L137 processor after using the DSP side exclusively.

It seems 10X slower than it should be so I wrote a simple delay loop which I implemented all in internal memory (0xFFFF0000)

(Note I have no external memory on my PC board)

void hdelay(int32 count)
{
   volatile uint32 i;
   for(i=0;i<count;i++)
       ;
}

I am calling it as such:

  hdelay(100000000);

to iterate 100 million loops and I am timing the result (running at 300MHz) - (no interrupts or DMA) 16-bit instructions, optimizations ON.

On the ARM side it takes 30 seconds for 100M loops, since the inner loop is about 6 instructions that comes out to be 50ns per instruction (20MIPS).

On the 6747 side it takes 16 seconds for 100M loops (12 instructions including NOP stalls) = 10ns per instruction (100MIPS).

On another product the 6713 (300MHz) takes 8 seconds per 100M loops = (21 cycles per loop)  3.8ns per instruction (260MIPS).

I checked the clocks using the OBSCLK pin.

Am I missing something somewhere in my memory setup?

When I run the ARM test from SHARED memory I get similar results - I would think that it should take longer in shared RAM.

Do I need to cache internal memory in the ARM?

Where can I find wait state and cycle count information on the various memories inside the chip?

Thanks,

-howy