Hi,
We've been getting poorer than expected DDR memory performance on three ARM platforms, and were wondering if someone could help explain the results we're seeing, and/or possibly share some of their own experience.
To be clear, we're talking about uncached DDR reads and writes, performed on OMAP3 and AM335x and using regular ARM instructions, not DMA.
Overall performance, including instruction rates and cached memory bandwidth, is satisfactory and in line with expectations.
Just for background, we run Windows Embedded Compact 7.0 on all our platforms, and have been using the 'simple' benchmark application developed by BSquare and included in a number of OMAP3 BSP's provided by TI and, later, Adeneo.
We run the 'simple' application unchanged on the three platforms. It performs a number of tests, some of which involve allocating a 4KB page of uncached memory, and then using a series of ARM load/store-single or load/store-multiple instructions to access the memory.
Results:
Platform | CPU | Core clock | DDR type | Theoretical bandwidth | Test 1, uncached single read (ldr) | Test 2, uncached single write (str) | Test 3, uncached multiple read (ldm) | Test 4, uncached multiple write (stm) |
1 | AM3356 (custom board) | 800 MHz | 400MHz 16-bit DDR3 | 1600 MB/s | 20 MB/s | 25 MB/s | 40 MB/s | 51 MB/s |
2 | AM3359 (TI Starter Kit) | 720 MHz | 333MHz 16-bit DDR3 | 1333 MB/s | 18 MB/s | 23 MB/s | 37 MB/s | 47 MB/s |
3 | OMAP3503 (custom board) | 600 MHz | 166MHz 32-bit LPDDR | 1333 MB/s | 20 MB/s | 28 MB/s | 39 MB/s | 56 MB/s |
Suffice to say that these numbers are well below what we expect, for instance reading 40 MB/s (Test 3) on platform #1 is only 2.5% of the theoretical 1600MB/s.
While we understand that 1600 MB/s is not realistic, perhaps we were hoping to rather achieve something like 50-70% of that.
Or is it that our expectations are wrong?
The DDR timings are calculated based on TI's spreadsheet and the datasheet for the memory devices. I have attached the spreadsheet for platform 1, in case someone is able to spot something obvious. The memory device used on platform 1 is MT41K256M16HA-125IT.
The platforms are generally stable with no signs of memory integrity issues. DDR3 leveling has been performed where applicable.