This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6748 LCDK DDR2 Throughput

Hi,

We are trying to bench mark DDR2 RAM write/read speeds using the C6748 LCDK. We are running a small program which directly writes into DDR2 and verifies the written data after writing is completed. Code snippet is as below.

What we observe is write to RAM is taking a very long time. In this below example we are trying to write the whole ram(256MB) with known data and then reading back and comparing with known data. Write times are very slow and in the order of 10's of seconds. 

What can be the reason for this very low speed? We are using the C6748_LCDK gel file while generating the binary file.

Regards,

Manj

uint32_t verifyPattern(uint32_t in_begin_addr, uint32_t in_num_bytes)
{
uint32_t rtn = ERR_NO_ERROR;
uint32_t offset;
uint32_t *test_addr = (uint32_t *)in_begin_addr;

// write ram under test to all 5's.
for (offset = 0; offset < in_num_bytes; offset += sizeof(uint32_t))
{
*test_addr++ = 0x55555555;
}

// verify ram under test is 5's.
test_addr = (uint32_t *)in_begin_addr;
for (offset = 0; offset < in_num_bytes; offset += sizeof(uint32_t))
{
if (*test_addr++ != 0x55555555)
{
printf("data pattern (5) test failed at address: %08X\r\n", test_addr);
rtn = ERR_FAIL;
}
}

// write ram under test to all A's.
test_addr = (uint32_t *)in_begin_addr;
for (offset = 0; offset < in_num_bytes; offset += sizeof(uint32_t))
{
*test_addr++ = 0xAAAAAAAA;
}

// verify ram under test is A's.
test_addr = (uint32_t *)in_begin_addr;
for (offset = 0; offset < in_num_bytes; offset += sizeof(uint32_t))
{
if (*test_addr++ != 0xAAAAAAAA)
{
printf("data pattern (A) test failed at address: %08X\r\n", test_addr);
rtn = ERR_FAIL;
}
}

return (rtn);
}

  • Mithun,

    What do you expect the speed to be? If you take the DDR2 clock rate multiplied by the number of reads and writes you do and multiplied by the number of DDRCLK cycles required per single read or write operation, what order of magnitude of total time do you calculate?

    Reads require extra time inside the DSP compared to writes, since the result of the write is external while the result of the read is internal. This can add a few DDRCLK cycles to your total for each read.

    Code optimization can improve performance.

    CPUCLK and DDRCLK affect the total time. Also, what is the physical bus width, 16 or 32 bits to the DDR2 devices?

    Using a scope or logic analyzer, look at the DDR2 control signals, CS, RAS, CAS, and WE, and add a data line or two. Then see how many DDRCLK cycles there are between operations. You should be able to determine how much of that time is spent in DDR2 device access and how much time is spent waiting for the next CPU command.

    Regards,
    RandyP
  • Dear Randy,

    Thanks for your reply.

    Can we expect practical speeds at about 150MB/sec which will be required by the application to capture images at high speed. In a test application we are trying to write a 10MB buffer in RAM(DDR2) running at clock frequency of 150MHz, which is taking more than 7-8 seconds to complete the operation.

    I know clock is running at 300MHz for sure, but am not able to figure out why its taking so much time to write to RAM.

    CPU Clock is 300MHz and bus width is 16 bit to the DDR device.

    Let me know your thoughts.

    Regards,
    Mithun
  • Mithun,

    Thought 1: You said the 256MB write/read/write/read was taking 10's of seconds. Now you are saying a 10MB write is taking 7-8 seconds. Those sound very different in magnitude. Why is there this discrepancy between the numbers and times?

    Before I can give you anymore thoughts, I will need you to address the ones I posted above. Do the DDR parameter calculations, show your parameter values (not register values, please), and show your calculations. Also, look at the control lines to see how much time is being spent in DDR access and how much is spent waiting for the DSP.

    Using the DSP to do your reads and writes is very inefficient for good performance. Once you figure out what your timing parameters and situation are, you may need to improve the performance by using the EDMA3 to transfer data between external and internal memory. This will be much faster in many cases.

    Regards,
    RandyP