This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6678 Memory performance

Hi,

I am looking for information about memory performance in the C6678 DSP.

More specifically, when L1 and L2 are SRAM only, what is the throughput for each of them ?

Also, what is the throughput of the shared memory when accessed by all cores simultaneously

(that is, how many cores can read from the shared memory together) ?

Same for the DDR3. Does DDR3 access interfere with shared memory access (or anything else)?

I looked in the documentations and could not find anything related.

 

Thank you.

  • Ya,

    There is a performance application note that will be released soon, that will address these and more questions regarding memory and peripheral performance.

    Best Regards,

    Chad

  • Thank you.

    Do you know when? ( its a little bit urgent)

    Is there any way to get quick answers soon?

    Thank you again.

  • I expect it will be out next week.  I'll have someone try to follow up with some data points.  That said, you'll want to be more specific as to what the source and destination are specifically and what the transfer mechanism is (EDMA, IDMA, CorePacs, Peripherals)

    Best Regards,

    Chad

  • Thank you.

    I am using EDMA for DDR to L2 and shared memory.

    The CorePacs operate directly on L2 and shared memory.

    But I am also interested (for comparasion only) about access directly to DDR from the core.

  • Ya,

    Hopefully, the below information will be adequate until the performance application note is published.

    All throughput figures listed below assume that the DSP is operating at 1 GHz (1000 Mhz).

    Raw Memory Throughput:

    • L1D SRAM: L1D operates at DSP/1 frequency, so it can be accessed every cycle. L1D is accessed through a 256 bit interface, so the maximum throughput of L1D is 32,000 megabytes per second. ((256 bit buswidth)/(8 bits/byte)*(1000M)=32000MB/s)
    • L2 SRAM: L2 operates at DSP/2 frequency, so it can be accessed every other cycle. L2 is accessed through a 256 bit interface, so the maximum throughput of L2 is 16,000 megabytes per second. ((256 bit buswidth)/(8 bits/byte)*(1000M/2)=16000MB/s)
    • MSMC SRAM: MSMC operates at DSP/2 frequency, so it can be accessed every other cycle. MSMC has 4 memory banks, each of which is accessed through a 256 bit interface. All 4 banks can be accessed simultaneously by separate cores or other system masters. If the data in MSMC SRAM is allocated so that each of the 4 banks are being fulling utilized simultaneously, the maximum throughput of the shared memory is 64,000 megabytes per second. ((256 bit buswidth)/(8 bits/byte)*(1000M/2)*(4 banks)=64000MB/s). Up to 4 masters can access MSMC simultaneously.
    • DDR3: DDR3 data is accessed through a 64-bit interface. Assuming that DDR3-1333 is operating at the maximum rate, the theoretical maximum throughput 10,664 megabytes per second. ((64bits)/(8bit/byte)*(1333M)=10664MB/s). Only one master can access the DDR3 at a time.

     

    Maximum Throughputs for C66x core and DMA masters:

    • C66x Core: (128 bit buswidth)/(8bit/byte)*(1000M)=16000MB/s
    • IDMA: (256 bit buswidth)/(8bit/byte)*(1000M/2)=16000MB/s
    • EDMA0: (256 bit buswidth)/(8bit/byte)*(1000M/2)=16000MB/s
    • EDMA1: (128 bit buswidth)/(8bit/byte)*(1000M/3)=5333MB/s
    • EDMA2: (128 bit buswidth)/(8bit/byte)*(1000M/3)=5333MB/s

     

    DDR3 Accesses using EDMA:

    If you use EDMA0 to access DDR3, then you should be able to get data at a maximum rate of 10664 megabytes per second, excluding any idle time required for setting up the EDMA transaction. If using EDMA1 or EDMA2 to access DDR3, then the maximim rate that you can get data from DDR3 is 5333 megabytes per second, exculding any idle time required for setting up the EDMA trasaction.

     

    C66x Core Accessed to DDR3:

    Assuming that L1D, L2, and MSMC are all configured as SRAM (caching disabled) then single reads to DDR3 will stall the DSP 89 cycles. Burst reads to DDR3 will stall the DSP 43.2 cycles. Writes to DDR3 will not stall the DSP since the data will be written to DDR3 by the DDR3 controller; however, it will take some time for the data to actually arrive in DDR3.

     

    Regards,

    Derek

  • Thank you very much.

    This is exactly what I need.

  • Hi,

    One last question,

    You said that direct access to DDR3 stalls the DSP for 89 cycles. What causes the stall? DDR3 latency? How do you get this number?

    Thank you

  • Ya,

    Yes, this delay is primarily caused by DDR3 latency. There is also some delay in the bus for accessing DDR3, since the bus that is used to access DDR3 operates at a slower rate than the DSP.

    I will get back with you on how these numbers are being measured.

    Regards,

    Derek

  • Hi,

    any updates?

    Is the performance document published yet?

    I could not find it anywhere.

    Thanks.

  • Ya,

    We had a number of internal reviews with many changes for clarification and haven't published it yet.  We are really close and it should be out in the next few weeks.

    Best Regards,

    Chad

  • Hi,

    I have a problem with IDMA1 (L2<->L1) performance (only approx. 3,3Gb/s), pls. look thread http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/182014.aspx#656195

    Thanks

    Ivan

  • Ivan,

    I have someone who's digging into this and will be replying on the other thread.

    Best Regards,
    Chad

  • Dear Chad,

    Is the document published at this moment?

    Best regards,

    Teun

  • Teun,

    It's documented in the Keystone Throughput Performance Guide http://www.ti.com/lit/an/sprabk5/sprabk5.pdf

    Best Regards,

    Chad

  • Hello

         As per your statement for DDR3 access using EDMA should be able to get data at a maximum rate of 10664MB/s.

        Which means core can access 11.18 B/ns from DDR3?

    for my project i'm storing the input data 400-500MB inside the RAM. We need 492KB/ns of data from DDR3. 

    Is't possbile?

    Summery 

    I'm having 400-500MB data inside the RAM for my processing i need to retrieve the 492KB from DDR it can be done or not?, if it can be done kindly give me idea. 

  • Mark,

    I have the same problems,

    thanks.

  • Meng Zhang,

    I do not find anyone named Mark on this thread. And I find multiple problems described.

    Please post your new question on a new thread with an appropriate subject line so it can be addressed separately and accurately for your problem. If you wish, in that new post you can include a link to this thread for possible reference.

    Regards,
    RandyP

  • Sangili Kumar,

    Like the other recent poster on this thread, it will be best for you to post a new thread possibly including a link to this thread if you want to.

    You will want to clarify some things in your new post, please:

    SANGILI KUMAR said:

         As per your statement for DDR3 access using EDMA should be able to get data at a maximum rate of 10664MB/s.

        Which means core can access 11.18 B/ns from DDR3?

    Where did you get the 11.18 B/ns number?

    Unless the app note specifically says it can move data from DDR3 to a CorePac's L2 at that rate, you should not make this assumption. There may be datapoints in the app note that do address the rate at which data may be copied from DDR3 to a CorePac's L2.

    SANGILI KUMAR said:

    for my project i'm storing the input data 400-500MB inside the RAM. We need 492KB/ns of data from DDR3. 

    ...

    I'm having 400-500MB data inside the RAM for my processing i need to retrieve the 492KB from DDR it can be done or not?

    These two statements seem similar but differ by the important "/ns" modifier. This is missing in the second statement.

    In your new post, please restate your requirement. From what you have said here, I do not understand the requirement or how the 400-500MB is related to 492KB. More detail or explanation will be helpful in the new post.

    Regards,
    RandyP