This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

6446 memory bandwidth DDR2

Hi,

I have issues with DM6446 C64+ core and the memory bandwidth.
The algorithms are slower on the c64+ core compared to older DSPs with
slower clock and similar cache sizes.

Measuring it closer show unexpected amount of stalling.

The DDR2 is 32-bit and clocked at at 162MHz and measurements show a write
bandwidth of 210MB/s and 480MB/s for read while the "theoretical maximum"
for the DDR2 is ~1300MB/s. It was measured by simple reading/writing one byte
every 128th byte in a large set of data. Each 128th byte read/write causes
a 160 resp 360 cycle stall. (!?)

I've looked at everything from bandwidth control, DSP priority, errata etc and adjusted
as recommended but still the same results.

Anybody with the same experience?

/N 

  • Please note that DDR2 access (read and write) is always done in bursts of 32 bytes; this means that accessing 1 byte takes the same amount of time as accessing 32-bytes.  Hopefully this helps explain some of the numbers you are seeing.  We have measured throughput numbers of ~95% of theoretical 1296 MB/sec theoretical maximum.

  • Hi, 

    >Please note that DDR2 access (read and write) is always done in bursts of 32 bytes;

    The point of read/write every 128th byte was to trigger cache miss and force a whole cache
    line (128bytes)  to be updated from DDR2 with minimal number of instruction cycles interfering
    with the measurement. In the real application all data is read sequentially and similar numbers
    there.

    >We have measured throughput numbers of ~95% of theoretical 1296 MB/sec theoretical maximum

    How did you achieve it? How was the memory moved? Manually? Relying on the caches? DSP code or DMA?

    /N

  • The ~95% throughput was achieved using EDMA from DDR2 ->DSP L1D  and DSP L1D -> DDR2.  I do not have details on the testing environment (software or hardware setup), but will try to get some more details.

  • Hi,

    We are interested in the bandwidth of the DSP cache <-> DDR2. To manually move the memory doesn't help us:

    1. Require major re-write of the existing code
    2. The setup time for each DMA transfer costs too much. (Especially if you try to use the EDMA3 APIs).

    The whole point of having a cache subsystem is to avoid manually move the memory. ;)

    I'm hoping someone have numbers on DSP cache<->DDR2 that are more encouraging than mine.

     /N