This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
I would like to ask support regarding what is the fastest way to transfer data from MSMC to MSMC.
The source buffer is in a virtual non-cached while the destination buffer is in the standard cached MSMC. Cache is on L1
The buffer is of 2048 integers (8192 B, 8 kB), measures are taken with the TSCL/TSCH register. I have done multiple tests:
1) Transferring data with memcpy using the real cached address instead of the virtual address for source buffer, without doing any cache_inv or cache_wb, takes averagely 2.3us. Of course in this case data is not really transferred from memory to memory but it is in the cache of the specific core instead. memory performance (in cache) is 8kB/2.3us = 3.48GB/s (not even close to 16GB of declared MSMC, also this is on cache which should be faster)
2) Transferring data with memcpy using the real cached address instead of the virtual address for source buffer, doing cache_wb only, takes averagely 3us. Memcpy operation takes a little more than 2us and the cache_wb a little less than 1us. memory performance (data is taken from cache and wrote after wb to MSMC) is 8kB/3us = 2,7GB/s
3) Transferring data with memcpy using the real cached address instead of the virtual address for source buffer, doing cache_wb and cache_inv, takes averagely 6.1us. Memcpy operation takes 4.5us, invalidate takes 0.9us and wb takes 0.7. I can't ever understand why the memcpy takes much more than before for the same operation. memory performance (data correctly transferred MSMC to MSCM) is 8k/6.1us = 1.3GB/s
4) Transferring data with memcpy using the virtual address for source buffer, doing cache_wb, takes averagely 40us. I will not go in the detail of the two operation, Memcpy is the one that takes around 39us. can't understand why. memory performance (data correctly transferred MSMC to MSCM) is 8k/40us = 200MB/s
5) Transferring data with EDMA needs no cache operation, data is correctly transferred from MSMC to MSCM and takes around 4us. Let's say there is 1 us of overhead (even though i know is less). Memory performance 8k/3us = 2,7GB/s.
Based on this topic I would expect a much faster transfer. Is there someting wrong that I am doing?
C6678 Memory performance - Processors forum - Processors - TI E2E support forums
I would like to have this data transfer in the shortest time as possible. for double access to MSMC (read and write) I would expect to have something similar to 8GB/s for the complete transfer. Am I wrong?
Please any advice and suggestion is very appreciated.
Thank you very much for your help in advance.
Best Regards,
Fabrizio
Hi,
Thank you very much for your reply.
I have read the document already, this is why I can't get why it takes so much to transfer data.
In the previous post I didn't mentioned that the measurements are taken for transfer only. The channel setup is done previously and then only transfer is triggered. I have considered 1us overhead, if it is more than that please advice.
Triggering multiple EDMA transfers in parallel looks like a good idea to me. How can I do that? How can I know when all the transfers are complete?
Thank you in advance.
Regards,
Fabrizio
Hi,
Thank you for your reply. Do you think that my results are as expected from your benchmarks?
Thank you in advance,
BR
Fabrizio