This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM355 DDR2 read performance is very slow, write performance seems ok

 We have two products based off of the DM355EVM. I am now trying to implement HD video capture 720p@20FPS. We are running the DM355 at 216 MHz and the DDR2 at 171 MHz. This should give us a theoretical max memory bandwidth of (171 * 2 * 16)/ 8 = 684 MB/sec. My application shouldn't be coming up against that memory limit. But I keep seeing things that appear to be memory starvation.

I found this application to test memory bandwidth. I compiled it and ran it on the two products we have made and the DM355EVM. The results of that memory test are shown below:

root@MV0007:~# ./bandwidth-arm
This is bandwidth version 0.23d.
Copyright (C) 2005-2010 by Zack T Smith.

This software is covered by the GNU Public License.
It is provided AS-IS, use at your own risk.
See the file COPYING for more information.

Using 32-bit transfers.
Notation: kB = 1024 B, MB = 1048576 B.

Sequential read (32-bit), size = 256 B, loops = 14417920, 701.6 MB/s
Sequential read (32-bit), size = 512 B, loops = 6815744, 655.9 MB/s
Sequential read (32-bit), size = 768 B, loops = 4543812, 662.0 MB/s
Sequential read (32-bit), size = 1 kB, loops = 3407872, 665.2 MB/s
Sequential read (32-bit), size = 2 kB, loops = 1736704, 669.4 MB/s
Sequential read (32-bit), size = 3 kB, loops = 1157785, 670.8 MB/s
Sequential read (32-bit), size = 4 kB, loops = 868352, 669.7 MB/s
Sequential read (32-bit), size = 6 kB, loops = 578866, 666.4 MB/s
Sequential read (32-bit), size = 8 kB, loops = 425984, 659.7 MB/s
Sequential read (32-bit), size = 12 kB, loops = 76454, 169.6 MB/s
Sequential read (32-bit), size = 16 kB, loops = 45056, 136.4 MB/s
Sequential read (32-bit), size = 20 kB, loops = 32760, 124.6 MB/s
Sequential read (32-bit), size = 24 kB, loops = 27300, 116.4 MB/s
Sequential read (32-bit), size = 28 kB, loops = 23400, 116.3 MB/s
Sequential read (32-bit), size = 32 kB, loops = 20480, 116.2 MB/s
Sequential read (32-bit), size = 40 kB, loops = 16380, 116.2 MB/s
Sequential read (32-bit), size = 48 kB, loops = 13650, 116.2 MB/s
Sequential read (32-bit), size = 64 kB, loops = 10240, 116.2 MB/s
Sequential read (32-bit), size = 128 kB, loops = 5120, 116.3 MB/s
Sequential read (32-bit), size = 192 kB, loops = 3410, 116.5 MB/s
Sequential read (32-bit), size = 256 kB, loops = 2560, 116.3 MB/s
Sequential read (32-bit), size = 384 kB, loops = 1700, 114.9 MB/s
Sequential read (32-bit), size = 512 kB, loops = 1152, 114.9 MB/s
Sequential read (32-bit), size = 768 kB, loops = 680, 91.2 MB/s
Sequential read (32-bit), size = 1 MB, loops = 576, 109.1 MB/s
Sequential read (32-bit), size = 1.25 MB, loops = 459, 113.4 MB/s
Sequential read (32-bit), size = 1.5 MB, loops = 420, 114.9 MB/s
Sequential read (32-bit), size = 1.75 MB, loops = 360, 114.9 MB/s
Sequential read (32-bit), size = 2 MB, loops = 288, 114.9 MB/s
Sequential read (32-bit), size = 2.25 MB, loops = 280, 114.9 MB/s
Sequential read (32-bit), size = 2.5 MB, loops = 250, 114.9 MB/s
Sequential read (32-bit), size = 2.75 MB, loops = 230, 114.9 MB/s
Sequential read (32-bit), size = 3 MB, loops = 210, 114.9 MB/s
Sequential read (32-bit), size = 4 MB, loops = 144, 114.9 MB/s
Sequential read (32-bit), size = 5 MB, loops = 120, 114.9 MB/s
Sequential read (32-bit), size = 6 MB, loops = 100, 114.9 MB/s

Random read (32-bit), size = 256 B, loops = 12058624, 586.2 MB/s
Random read (32-bit), size = 512 B, loops = 6160384, 594.6 MB/s
Random read (32-bit), size = 768 B, loops = 4106907, 597.5 MB/s
Random read (32-bit), size = 1 kB, loops = 3080192, 599.4 MB/s
Random read (32-bit), size = 2 kB, loops = 1540096, 601.4 MB/s
Random read (32-bit), size = 3 kB, loops = 1026715, 601.2 MB/s
Random read (32-bit), size = 4 kB, loops = 770048, 600.4 MB/s
Random read (32-bit), size = 6 kB, loops = 513334, 595.9 MB/s
Random read (32-bit), size = 8 kB, loops = 327680, 508.1 MB/s
Random read (32-bit), size = 12 kB, loops = 60071, 131.7 MB/s
Random read (32-bit), size = 16 kB, loops = 36864, 105.7 MB/s
Random read (32-bit), size = 20 kB, loops = 26208, 97.0 MB/s
Random read (32-bit), size = 24 kB, loops = 21840, 93.4 MB/s
Random read (32-bit), size = 28 kB, loops = 18720, 91.5 MB/s
Random read (32-bit), size = 32 kB, loops = 16384, 90.6 MB/s
Random read (32-bit), size = 40 kB, loops = 11466, 89.5 MB/s
Random read (32-bit), size = 48 kB, loops = 9555, 89.1 MB/s
Random read (32-bit), size = 64 kB, loops = 7168, 89.0 MB/s
Random read (32-bit), size = 128 kB, loops = 3584, 88.5 MB/s
Random read (32-bit), size = 192 kB, loops = 2387, 88.2 MB/s
Random read (32-bit), size = 256 kB, loops = 1792, 87.8 MB/s
Random read (32-bit), size = 384 kB, loops = 1190, 82.6 MB/s
Random read (32-bit), size = 512 kB, loops = 896, 80.8 MB/s
Random read (32-bit), size = 768 kB, loops = 510, 75.1 MB/s
Random read (32-bit), size = 1 MB, loops = 384, 74.2 MB/s
Random read (32-bit), size = 1.25 MB, loops = 306, 73.7 MB/s
Random read (32-bit), size = 1.5 MB, loops = 252, 73.3 MB/s
Random read (32-bit), size = 1.75 MB, loops = 216, 73.0 MB/s
Random read (32-bit), size = 2 MB, loops = 192, 72.9 MB/s
Random read (32-bit), size = 2.25 MB, loops = 168, 72.7 MB/s
Random read (32-bit), size = 2.5 MB, loops = 150, 72.7 MB/s
Random read (32-bit), size = 2.75 MB, loops = 138, 72.5 MB/s
Random read (32-bit), size = 3 MB, loops = 126, 72.5 MB/s
Random read (32-bit), size = 4 MB, loops = 96, 72.2 MB/s
Random read (32-bit), size = 5 MB, loops = 72, 71.9 MB/s
Random read (32-bit), size = 6 MB, loops = 60, 71.7 MB/s

Sequential write (32-bit), size = 256 B, loops = 8912896, 427.7 MB/s
Sequential write (32-bit), size = 512 B, loops = 4456448, 427.6 MB/s
Sequential write (32-bit), size = 768 B, loops = 2970954, 427.7 MB/s
Sequential write (32-bit), size = 1 kB, loops = 2228224, 428.1 MB/s
Sequential write (32-bit), size = 2 kB, loops = 1114112, 427.6 MB/s
Sequential write (32-bit), size = 3 kB, loops = 742730, 427.5 MB/s
Sequential write (32-bit), size = 4 kB, loops = 557056, 427.0 MB/s
Sequential write (32-bit), size = 6 kB, loops = 371348, 427.5 MB/s
Sequential write (32-bit), size = 8 kB, loops = 278528, 426.8 MB/s
Sequential write (32-bit), size = 12 kB, loops = 185674, 427.0 MB/s
Sequential write (32-bit), size = 16 kB, loops = 139264, 426.9 MB/s
Sequential write (32-bit), size = 20 kB, loops = 111384, 427.1 MB/s
Sequential write (32-bit), size = 24 kB, loops = 92820, 426.8 MB/s
Sequential write (32-bit), size = 28 kB, loops = 79560, 427.2 MB/s
Sequential write (32-bit), size = 32 kB, loops = 69632, 427.0 MB/s
Sequential write (32-bit), size = 40 kB, loops = 55692, 427.1 MB/s
Sequential write (32-bit), size = 48 kB, loops = 46410, 427.3 MB/s
Sequential write (32-bit), size = 64 kB, loops = 34816, 427.2 MB/s
Sequential write (32-bit), size = 128 kB, loops = 17408, 427.2 MB/s
Sequential write (32-bit), size = 192 kB, loops = 11594, 426.9 MB/s
Sequential write (32-bit), size = 256 kB, loops = 8704, 425.6 MB/s
Sequential write (32-bit), size = 384 kB, loops = 5440, 404.5 MB/s
Sequential write (32-bit), size = 512 kB, loops = 4096, 404.6 MB/s
Sequential write (32-bit), size = 768 kB, loops = 2890, 432.0 MB/s
Sequential write (32-bit), size = 1 MB, loops = 2176, 432.0 MB/s
Sequential write (32-bit), size = 1.25 MB, loops = 1734, 431.7 MB/s
Sequential write (32-bit), size = 1.5 MB, loops = 1470, 431.9 MB/s
Sequential write (32-bit), size = 1.75 MB, loops = 1260, 431.5 MB/s
Sequential write (32-bit), size = 2 MB, loops = 1088, 431.5 MB/s
Sequential write (32-bit), size = 2.25 MB, loops = 980, 431.5 MB/s
Sequential write (32-bit), size = 2.5 MB, loops = 875, 431.2 MB/s
Sequential write (32-bit), size = 2.75 MB, loops = 805, 431.3 MB/s
Sequential write (32-bit), size = 3 MB, loops = 735, 431.0 MB/s
Sequential write (32-bit), size = 4 MB, loops = 544, 430.5 MB/s
Sequential write (32-bit), size = 5 MB, loops = 432, 429.8 MB/s
Sequential write (32-bit), size = 6 MB, loops = 360, 429.3 MB/s

Random write (32-bit), size = 256 B, loops = 4194304, 199.7 MB/s
Random write (32-bit), size = 512 B, loops = 2097152, 199.7 MB/s
Random write (32-bit), size = 768 B, loops = 1398096, 202.9 MB/s
Random write (32-bit), size = 1 kB, loops = 1048576, 200.9 MB/s
Random write (32-bit), size = 2 kB, loops = 524288, 200.7 MB/s
Random write (32-bit), size = 3 kB, loops = 349520, 199.9 MB/s
Random write (32-bit), size = 4 kB, loops = 262144, 199.8 MB/s
Random write (32-bit), size = 6 kB, loops = 174752, 199.4 MB/s
Random write (32-bit), size = 8 kB, loops = 131072, 199.0 MB/s
Random write (32-bit), size = 12 kB, loops = 87376, 199.0 MB/s
Random write (32-bit), size = 16 kB, loops = 65536, 199.1 MB/s
Random write (32-bit), size = 20 kB, loops = 52416, 199.1 MB/s
Random write (32-bit), size = 24 kB, loops = 43680, 199.1 MB/s
Random write (32-bit), size = 28 kB, loops = 37440, 199.2 MB/s
Random write (32-bit), size = 32 kB, loops = 32768, 199.2 MB/s
Random write (32-bit), size = 40 kB, loops = 26208, 199.3 MB/s
Random write (32-bit), size = 48 kB, loops = 21840, 199.3 MB/s
Random write (32-bit), size = 64 kB, loops = 16384, 199.3 MB/s
Random write (32-bit), size = 128 kB, loops = 8192, 199.2 MB/s
Random write (32-bit), size = 192 kB, loops = 5456, 198.9 MB/s
Random write (32-bit), size = 256 kB, loops = 4096, 196.4 MB/s
Random write (32-bit), size = 384 kB, loops = 2380, 171.9 MB/s
Random write (32-bit), size = 512 kB, loops = 1664, 160.2 MB/s
Random write (32-bit), size = 768 kB, loops = 1020, 149.5 MB/s
Random write (32-bit), size = 1 MB, loops = 768, 145.4 MB/s
Random write (32-bit), size = 1.25 MB, loops = 612, 141.9 MB/s
Random write (32-bit), size = 1.5 MB, loops = 504, 141.2 MB/s
Random write (32-bit), size = 1.75 MB, loops = 432, 140.4 MB/s
Random write (32-bit), size = 2 MB, loops = 352, 138.6 MB/s
Random write (32-bit), size = 2.25 MB, loops = 336, 139.7 MB/s
Random write (32-bit), size = 2.5 MB, loops = 300, 139.2 MB/s
Random write (32-bit), size = 2.75 MB, loops = 253, 138.5 MB/s
Random write (32-bit), size = 3 MB, loops = 231, 138.0 MB/s
Random write (32-bit), size = 4 MB, loops = 176, 136.4 MB/s
Random write (32-bit), size = 5 MB, loops = 144, 135.8 MB/s
Random write (32-bit), size = 6 MB, loops = 120, 135.1 MB/s

Main register to main register transfers (32-bit) 674.0 MB/s

Stack-to-register transfers (32-bit) 683.2 MB/s
Register-to-stack transfers (32-bit) 227.6 MB/s

Library: memset 193.5 MB/s

As can be seen, the read performance is terrible. If I am topping out at 116 MB/sec, than I am definitely having memory starvation problems. Write performance seems ok, but read performance is miserable. We are using a Micron MT47H128M16-3 part. We are using UBL to initialize the DDR timing. The code that does the initialization is shown below:

    /* per GEL file */
    LPSCTransition(LPSC_DDR2, PSC_ENABLE);
    SYSTEM->VTPIOCR &= 0xFFFFDF3F;// Clear bit CLRZ & PWRDN & LOCK bit(bit 13/6/7)
    SYSTEM->VTPIOCR |=  0x00002000;  // Set bit CLRZ (bit 13)

    while(!(SYSTEM->VTPIOCR & 0x8000));

    SYSTEM->VTPIOCR |= 0x00004000;  // Set bit VTP_IO_READY(bit 14)
    SYSTEM->VTPIOCR |= 0x00000180;  // Set bit LOCK(bit 7) and  PWRSAVE  (bit 8)
    SYSTEM->VTPIOCR |= 0x00000040;  // Powerdown VTP as it is locked  (bit 6)

    waitloop(11*33);              // Wait for calibration to complete

    /* DDR2 controller initialization */
    DDR->DDRPHYCR = 0x51006494;   //External DQS gating enabled
    LPSCTransition(LPSC_DDR2, PSC_SYNCRESET);
    LPSCTransition(LPSC_DDR2, PSC_ENABLE);
   
    DDR->PBBPR = 0x000000FE;       //VBUSM Burst Priority Register, pr_old_count = 0xFE
           
    DDR->SDBCR = 0x0000C632;       //Program SDRAM Bank Config Register

    DDR->SDTIMR0 = 0x2A923249;        //Program SDRAM Timing Control Register1
    DDR->SDTIMR1 = 0x3c17C763;        //Program SDRAM Timing Control Register2

    DDR->SDBCR   = 0x00004632;       //Program SDRAM Bank Config Register
    DDR->SDRCR = 0x00000535;        //Program SDRAM Refresh Control Register

The one thing I noticed is the T_XSNR parameter appears to be wrong (too fast) and the T_RFC also appears to be incorrect. Would that have the large effect on the read performance. Has anyone else seen problems like this?

 

Thank you.

  • FWIW, I downloaded and compiled the same test app and got about 200MB/s on my DM368 board.  CPU/DDR2 @ 420/340Mhz

    John A

  • Doesn't that seem wrong? What did you get for write performance? I've run this on an Intel Atom product and once the read/write sizes exceed the L2 cache, I see performance in the ~3 GB/sec for sequential read access. The max performance of this part should be 8,528 MB/sec. The sequential write performance is around ~1 GB/sec. I was concerned the program may not reflect read performances correctly.

    I would be happy with even 200 MB/sec on the DM355. But at 116 MB/sec I am suffering from memory starvation.

    Thanks for trying it on your target. That is a helpful data point.

  • Seq write from 856 down to 783.

    Rand write from 313 down to 215

    I guess it just means that if you want good mem performance you need to use the EDMA.  I've got my hands full just trying to figure out how to get a good interlaced D1 image, and decent quality on other lower resolutions.

    John A

     

  • I went back and checked the CPU usage, and sure enough it is pegged when reading/writing memory. But why would reads be "more expensive" then writes?

     

    What kind of problems are you having with the D1 image? I've been trying to get 720p video working on the DM355. I've had some luck, but the video is jittery. If I play with the IPIPE clock and the frame rate I can get the problem to be better or worse. What it appears to be is the DMA module (or more likely the memory) can't keep up with the required read or write rates of the IPIPE and the MPEG encoder.

  • Got no ideas about the memory performance.

    The problem I'm having with the video is that I think it's deinterlaced at full 720x480 resolution.  And when I capture at lower resolutions it looks very pixelated.  I don't believe that any filtering is done on the image before scaling and I want interlaced video at D1 res.  Capturing 720x480 and encoding into 3Mbit H264 is only showing about a 10-15% CPU hit in top.  I'm definitely not losing any frames.

    John A

  • Nick Butts said:
    I went back and checked the CPU usage, and sure enough it is pegged when reading/writing memory. But why would reads be "more expensive" then writes?

    Writes are "fire and forget" while reads will stall the CPU until the data is actually returned.  In your real application does the CPU read large blocks of memory or are those large blocks handled only by the accelerators and video ports?

  • Nick,

    I believe you program to measure DDR b/w uses CPU read/write, is that correct ? If yes, then I am afraid it may not be the correct way. DDR access by various modules inside the SoC is not done though CPU. For example, the Mpeg4 codec uses EDMA to access DDR. On the other hand, the VPSS has its own master on the VBUS to access DDR. They have much better thoughput than CPU access. In the system under consideration, the typical DDR throughput(practical) will be 50-70% of theoritical b/w, exact value depending data flow of appplication.

     

    regards

    Yashwant

  • I understand the distinction. I am having problems with the DM355 taking smooth video. I've posted another question:

    http://e2e.ti.com/support/embedded/f/354/p/84900/292912.aspx#292912

    When I ran this memory test, I thought I had root cause of this problem. But upon further investigation I believe my problem lies elsewhere.