Part Number: TMS320C6678
I would like to ask support regarding what is the fastest way to transfer data from MSMC to MSMC.
The source buffer is in a virtual non-cached while the destination buffer is in the standard cached MSMC. Cache is on L1
The buffer is of 2048 integers (8192 B, 8 kB), measures are taken with the TSCL/TSCH register. I have done multiple tests:
1) Transferring data with memcpy using the real cached address instead of the virtual address for source buffer, without doing any cache_inv or cache_wb, takes averagely 2.3us. Of course in this case data is not really transferred from memory to memory but it is in the cache of the specific core instead. memory performance (in cache) is 8kB/2.3us = 3.48GB/s (not even close to 16GB of declared MSMC, also this is on cache which should be faster)
2) Transferring data with memcpy using the real cached address instead of the virtual address for source buffer, doing cache_wb only, takes averagely 3us. Memcpy operation takes a little more than 2us and the cache_wb a little less than 1us. memory performance (data is taken from cache and wrote after wb to MSMC) is 8kB/3us = 2,7GB/s
3) Transferring data with memcpy using the real cached address instead of the virtual address for source buffer, doing cache_wb and cache_inv, takes averagely 6.1us. Memcpy operation takes 4.5us, invalidate takes 0.9us and wb takes 0.7. I can't ever understand why the memcpy takes much more than before for the same operation. memory performance (data correctly transferred MSMC to MSCM) is 8k/6.1us = 1.3GB/s
4) Transferring data with memcpy using the virtual address for source buffer, doing cache_wb, takes averagely 40us. I will not go in the detail of the two operation, Memcpy is the one that takes around 39us. can't understand why. memory performance (data correctly transferred MSMC to MSCM) is 8k/40us = 200MB/s
5) Transferring data with EDMA needs no cache operation, data is correctly transferred from MSMC to MSCM and takes around 4us. Let's say there is 1 us of overhead (even though i know is less). Memory performance 8k/3us = 2,7GB/s.
Based on this topic I would expect a much faster transfer. Is there someting wrong that I am doing?
I would like to have this data transfer in the shortest time as possible. for double access to MSMC (read and write) I would expect to have something similar to 8GB/s for the complete transfer. Am I wrong?
Please any advice and suggestion is very appreciated.
Thank you very much for your help in advance.
Please make sure you read the forum guidelines first.
We are glad that we were able to resolve this issue, and will now proceed to close this thread.
If you have further questions related to this thread, you may click "Ask a related question" below. The newly created question will be automatically linked to this question.
In reply to Yordan Kovachev:
In reply to Fabrizio Fortino:
In reply to lding:
Thank you very much for your reply.
I have read the document already, this is why I can't get why it takes so much to transfer data.
In the previous post I didn't mentioned that the measurements are taken for transfer only. The channel setup is done previously and then only transfer is triggered. I have considered 1us overhead, if it is more than that please advice.
Triggering multiple EDMA transfers in parallel looks like a good idea to me. How can I do that? How can I know when all the transfers are complete?
Thank you in advance.
Thank you for your reply. Do you think that my results are as expected from your benchmarks?
Thank you in advance,
thank you Eric, parallel transfer speeded up the copy.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.