Hi,
I am looking for information about memory performance in the C6678 DSP.
More specifically, when L1 and L2 are SRAM only, what is the throughput for each of them ?
Also, what is the throughput of the shared memory when accessed by all cores simultaneously
(that is, how many cores can read from the shared memory together) ?
Same for the DDR3. Does DDR3 access interfere with shared memory access (or anything else)?
I looked in the documentations and could not find anything related.
Thank you.
Ya,
There is a performance application note that will be released soon, that will address these and more questions regarding memory and peripheral performance.
Best Regards,
Chad
------------------------------------------------------------------------------------------------------------
Please click the Verify Answer button on this post if it answers your question.
Do you know when? ( its a little bit urgent)
Is there any way to get quick answers soon?
Thank you again.
I expect it will be out next week. I'll have someone try to follow up with some data points. That said, you'll want to be more specific as to what the source and destination are specifically and what the transfer mechanism is (EDMA, IDMA, CorePacs, Peripherals)
I am using EDMA for DDR to L2 and shared memory.
The CorePacs operate directly on L2 and shared memory.
But I am also interested (for comparasion only) about access directly to DDR from the core.
Hopefully, the below information will be adequate until the performance application note is published.
All throughput figures listed below assume that the DSP is operating at 1 GHz (1000 Mhz).
Raw Memory Throughput:
Maximum Throughputs for C66x core and DMA masters:
DDR3 Accesses using EDMA:
If you use EDMA0 to access DDR3, then you should be able to get data at a maximum rate of 10664 megabytes per second, excluding any idle time required for setting up the EDMA transaction. If using EDMA1 or EDMA2 to access DDR3, then the maximim rate that you can get data from DDR3 is 5333 megabytes per second, exculding any idle time required for setting up the EDMA trasaction.
C66x Core Accessed to DDR3:
Assuming that L1D, L2, and MSMC are all configured as SRAM (caching disabled) then single reads to DDR3 will stall the DSP 89 cycles. Burst reads to DDR3 will stall the DSP 43.2 cycles. Writes to DDR3 will not stall the DSP since the data will be written to DDR3 by the DDR3 controller; however, it will take some time for the data to actually arrive in DDR3.
Regards,
Derek
-------------------------------------------------------------------------------------------------------------------------------
If you need more help, please reply back. If this answers the question, please click Verify Answer , below.
Thank you very much.
This is exactly what I need.
One last question,
You said that direct access to DDR3 stalls the DSP for 89 cycles. What causes the stall? DDR3 latency? How do you get this number?
Thank you
Yes, this delay is primarily caused by DDR3 latency. There is also some delay in the bus for accessing DDR3, since the bus that is used to access DDR3 operates at a slower rate than the DSP.
I will get back with you on how these numbers are being measured.
any updates?
Is the performance document published yet?
I could not find it anywhere.
Thanks.
We had a number of internal reviews with many changes for clarification and haven't published it yet. We are really close and it should be out in the next few weeks.
I have a problem with IDMA1 (L2<->L1) performance (only approx. 3,3Gb/s), pls. look thread http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/182014.aspx#656195
Thanks
Ivan
Ivan,
I have someone who's digging into this and will be replying on the other thread.
Best Regards,Chad
Dear Chad,
Is the document published at this moment?
Best regards,
Teun
Teun,
It's documented in the Keystone Throughput Performance Guide http://www.ti.com/lit/an/sprabk5/sprabk5.pdf