[Edit RandyP: This thread is related to Debug help with C6474 EDMA3 example.
CCSv4.2.4.00033
Code Generation Tool TI v7.3.0
C6474 Symmetric Device Cycle Accurate Simulator, Big-endian
Files used are included in this thread and/or the one above for reference.]
I am mentioning some anomalies below, including bugs which I'm not able to fix. Benchmarks first:
Setup:
Buffer size: Uint32 80, so 320 bytes
Allocation: All buffers are in L2 memories, lo* are in local (core 0) and rm* are in remote (core 1)
Execution: The code runs on core 0
CPU clock speed: 1GHz (cycle time 1ns)
Compile optimization level: o3
Numbers above are in cycles/ns
Benchmarks:
transfers on platform: transfers on simulator:
--------------------------- ----------------------------
loRd, loWr cpu : 205 loRd, loWr cpu : 102
loRd, loWr cpu : 281 loRd, loWr cpu : 218
loRd, loWr cpu : 299 loRd, loWr cpu : 240
loRd, loWr edma : 561 loRd, loWr edma : 576
loRd, loWr edma : 740 loRd, loWr edma : 578
--------------------------- ----------------------------
loRd, rmWr cpu : 618 loRd, rmWr cpu : 643
loRd, rmWr cpu : 618 loRd, rmWr cpu : 645
loRd, rmWr cpu : 618 loRd, rmWr cpu : 724
loRd, rmWr edma : 607 loRd, rmWr edma : 522
loRd, rmWr edma : 611
--------------------------- ----------------------------
rmRd, loWr cpu : 3431 rmRd, loWr cpu : 2322
rmRd, loWr cpu : 3435 rmRd, loWr cpu : 2401
rmRd, loWr cpu : 3437 rmRd, loWr cpu : 2417
rmRd, loWr edma : 548 rmRd, loWr edma : 558
rmRd, loWr edma : 660 rmRd, loWr edma : 562
rmRd, loWr edma : 662
--------------------------- ----------------------------
rmRd, rmWr cpu : 4072 rmRd, rmWr cpu : 3011
rmRd, rmWr cpu : 4158 rmRd, rmWr cpu : 3081
rmRd, rmWr cpu : 4158 rmRd, rmWr cpu : 3091
rmRd, rmWr edma : 604 rmRd, rmWr edma : 576
rmRd, rmWr edma : 608
--------------------------- ----------------------------
Anomalies:
a) higher than expected EDMA transfer time
Documented EDMA steady state throughput: at least 2GB/s
Documented EDMA prolog and epilog costs: ~150 cycles (sum)
Expected EDMA transfer time: 160ns + 150ns = 310ns
Observed EDMA transfer time: about 610ns
b) platform and simulator cycle count mismatch
It seems the simulator is underestimating most transfer times,
even though it is supposed to be cycle accurate.
c) buffer manipulation errors on platform
The code works fine and destination and source buffers match up
after transfer on the simulator, but except (loRd, loWr) case,
none other matches on the platform.
Any comments or debug advice?
Thanks,
Manu
P.S.: The reference used for EDMA expected transfer time is http://focus.ti.com/lit/an/spraag8/spraag8.pdf It is for TCI6482, so it may not necessarily apply to C6474.