
transfers on platform:                  transfers on simulator:
---------------------------             ----------------------------

loRd, loWr cpu  : 205                   loRd, loWr cpu  : 102
loRd, loWr cpu  : 281                   loRd, loWr cpu  : 218
loRd, loWr cpu  : 299                   loRd, loWr cpu  : 240

loRd, loWr edma : 561                   loRd, loWr edma : 576
loRd, loWr edma : 740                   loRd, loWr edma : 578
---------------------------             ----------------------------

loRd, rmWr cpu  : 618                   loRd, rmWr cpu  : 643
loRd, rmWr cpu  : 618                   loRd, rmWr cpu  : 645
loRd, rmWr cpu  : 618                   loRd, rmWr cpu  : 724

loRd, rmWr edma : 607                   loRd, rmWr edma : 522
loRd, rmWr edma : 611
---------------------------             ----------------------------

rmRd, loWr cpu  : 3431                  rmRd, loWr cpu  : 2322
rmRd, loWr cpu  : 3435                  rmRd, loWr cpu  : 2401
rmRd, loWr cpu  : 3437                  rmRd, loWr cpu  : 2417

rmRd, loWr edma : 548                   rmRd, loWr edma : 558
rmRd, loWr edma : 660                   rmRd, loWr edma : 562
rmRd, loWr edma : 662
---------------------------             ----------------------------

rmRd, rmWr cpu  : 4072                  rmRd, rmWr cpu  : 3011
rmRd, rmWr cpu  : 4158                  rmRd, rmWr cpu  : 3081
rmRd, rmWr cpu  : 4158                  rmRd, rmWr cpu  : 3091

rmRd, rmWr edma : 604                   rmRd, rmWr edma : 576
rmRd, rmWr edma : 608
---------------------------             ----------------------------


Setup:
------
Buffer size: Uint32 80, so 320 bytes
CPU clock speed: 1GHz (cycle time 1ns)
Compile optimization level: o3
Numbers above are in cycles/ns

Anomalies:
----------
a) higher than expected EDMA transfer time
Documented EDMA steady state throughput: at least 2GB/s
Documented EDMA prolog and epilog costs: ~150 cycles (sum)
Expected EDMA transfer time: 160ns + 150ns = 310ns
Observed EDMA transfer time: about 610ns

b) platform and simulator cycle count mismatch
It seems the simulator is underestimating most transfer times,
even though it is supposed to be cycle accurate.

c) buffer manipulation errors on platform
The code works fine and destination and source buffers match up
after transfer on the simulator, but except (loRd, loWr) case, 
none other matches on the platform.
