This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Simulator inaccuracies in C6474 EDMA3 example



[Edit RandyP: This thread is related to Debug help with C6474 EDMA3 example.
CCSv4.2.4.00033
Code Generation Tool TI v7.3.0
C6474 Symmetric Device Cycle Accurate Simulator, Big-endian
Files used are included in this thread and/or the one above for reference.]

 

I am mentioning some anomalies below, including bugs which I'm not able to fix. Benchmarks first:

 

 

Setup:

Buffer size: Uint32 80, so 320 bytes
Allocation: All buffers are in L2 memories, lo* are in local (core 0) and rm* are in remote (core 1)
Execution: The code runs on core 0
CPU clock speed: 1GHz (cycle time 1ns)
Compile optimization level: o3
Numbers above are in cycles/ns

 

Benchmarks:

transfers on platform:                  transfers on simulator:

---------------------------             ----------------------------


loRd, loWr cpu  : 205                   loRd, loWr cpu  : 102

loRd, loWr cpu  : 281                   loRd, loWr cpu  : 218

loRd, loWr cpu  : 299                   loRd, loWr cpu  : 240


loRd, loWr edma : 561                   loRd, loWr edma : 576

loRd, loWr edma : 740                   loRd, loWr edma : 578

---------------------------             ----------------------------


loRd, rmWr cpu  : 618                   loRd, rmWr cpu  : 643

loRd, rmWr cpu  : 618                   loRd, rmWr cpu  : 645

loRd, rmWr cpu  : 618                   loRd, rmWr cpu  : 724


loRd, rmWr edma : 607                   loRd, rmWr edma : 522

loRd, rmWr edma : 611

---------------------------             ----------------------------


rmRd, loWr cpu  : 3431                  rmRd, loWr cpu  : 2322

rmRd, loWr cpu  : 3435                  rmRd, loWr cpu  : 2401

rmRd, loWr cpu  : 3437                  rmRd, loWr cpu  : 2417


rmRd, loWr edma : 548                   rmRd, loWr edma : 558

rmRd, loWr edma : 660                   rmRd, loWr edma : 562

rmRd, loWr edma : 662

---------------------------             ----------------------------


rmRd, rmWr cpu  : 4072                  rmRd, rmWr cpu  : 3011

rmRd, rmWr cpu  : 4158                  rmRd, rmWr cpu  : 3081

rmRd, rmWr cpu  : 4158                  rmRd, rmWr cpu  : 3091


rmRd, rmWr edma : 604                   rmRd, rmWr edma : 576

rmRd, rmWr edma : 608

---------------------------             ----------------------------

 

Anomalies:

a) higher than expected EDMA transfer time

Documented EDMA steady state throughput: at least 2GB/s

Documented EDMA prolog and epilog costs: ~150 cycles (sum)

Expected EDMA transfer time: 160ns + 150ns = 310ns

Observed EDMA transfer time: about 610ns

 

b) platform and simulator cycle count mismatch

It seems the simulator is underestimating most transfer times,

even though it is supposed to be cycle accurate.

 

c) buffer manipulation errors on platform

The code works fine and destination and source buffers match up

after transfer on the simulator, but except (loRd, loWr) case, 

none other matches on the platform.

 

Any comments or debug advice?

Thanks,
Manu 

P.S.: The reference used for EDMA expected transfer time is http://focus.ti.com/lit/an/spraag8/spraag8.pdf  It is for TCI6482, so it may not necessarily apply to C6474.

 

  • Manu,

    You have done a lot of good work here.

    Is the project supplied in the previous post, remote_memory_test.zip, the exact project that generates the data above?
    Which version of CCS are you using, CCSvn.m.x.yyyyy?
    What is the full name of the simulator you are using?

    Manu Bansal said:
    a) higher than expected EDMA transfer time

    Please expand your statements here, such as pointing to exactly where these things are documented. I do not know what results or analysis will be made of your test code, especially since it is not based on documentation from this device, but at first glance your statements do make sense.

    Manu Bansal said:
    b) platform and simulator cycle count mismatch

    It is true that for the CPU instructions inside the DSP core, the simulators are Cycle Accurate, but they may lose some accuracy outside the DSP core, especially when dealing with peripherals and cache interactions. We have made some (I think intelligent) choices in trading off simulator speed with gate level accuracy. If your results can be duplicated by the simulation team, it is possible your analysis could lead to updates depending on what are the causes.

    Manu Bansal said:
    c) buffer manipulation errors on platform

    This needs to be expanded more, too. By normal definition, the silicon is working correctly. So if the simulator gives different results, I would suspect the program as well as the simulator. It is an interesting scenario, though. Can you describe the scenario and the failure?

    Sorry that I am not offering any answers here. This is interesting, though.

    Regards,
    RandyP

  • Randy,

    Yes, the project above is exactly the code I used for those benchmarks. The good news is that buffer transfers line up now. It just took power cycling the board. However, it changed my benchmarks, which I'm reproducing ahead. For completeness, a new project export is attached with those fresh benchmarks included in the doc folder.1882.remote_memory_test.zip

    I used Code Composer Studio Version: 4.2.4.00033, Code Generation Tool TI v7.3.0 in big-endian configuration. The simulator was C6474 Symmetric Device Cycle Accurate Simulator, Big-endian. These details are also in doc/README of the enclosed project. Target configuration files for both the platform and the simulator are in targetconfs folder. The project is self-contained with just CG_TOOL_ROOT environment variable needing to be set.

    I still have questions about the benchmarks making sense. Also, with the fresh ones I post below, it seems even the EDMA transfer depends on which memories are used in the transfer, which is surprising. It's good to know that the simulator is only cycle approximate when it comes to peripherals.

    Thanks,
    Manu 

  • Fresh benchmarks:

     


    platform                                                simulator
    -------------------                                     -------------------
    loRd, loWr cpu  buf operation time: 299                 loRd, loWr cpu  buf operation time: 104
    loRd, loWr cpu  buf operation time: 299                 loRd, loWr cpu  buf operation time: 104
    loRd, loWr cpu  buf operation time: 299                 loRd, loWr cpu  buf operation time: 104
    loRd, loWr cpu  buf operation time: 299                 loRd, loWr cpu  buf operation time: 254

    loRd, loWr edma buf operation time: 699                 loRd, loWr edma buf operation time: 482
    loRd, loWr edma buf operation time: 699                 loRd, loWr edma buf operation time: 482
    loRd, loWr edma buf operation time: 705                 loRd, loWr edma buf operation time: 488
    loRd, loWr edma buf operation time: 705                 loRd, loWr edma buf operation time: 576
    -------------------                                     -------------------

    loRd, rmWr cpu  buf operation time: 622                 loRd, rmWr cpu  buf operation time: 645
    loRd, rmWr cpu  buf operation time: 622                 loRd, rmWr cpu  buf operation time: 645
    loRd, rmWr cpu  buf operation time: 622                 loRd, rmWr cpu  buf operation time: 645
    loRd, rmWr cpu  buf operation time: 622                 loRd, rmWr cpu  buf operation time: 645

    loRd, rmWr edma buf operation time: 707                 loRd, rmWr edma buf operation time: 574
    loRd, rmWr edma buf operation time: 709                 loRd, rmWr edma buf operation time: 582
    loRd, rmWr edma buf operation time: 711                 loRd, rmWr edma buf operation time: 586
    loRd, rmWr edma buf operation time: 715
    -------------------                                     -------------------

    rmRd, loWr cpu  buf operation time: 3431                rmRd, loWr cpu  buf operation time: 2334
    rmRd, loWr cpu  buf operation time: 3431                rmRd, loWr cpu  buf operation time: 2346
    rmRd, loWr cpu  buf operation time: 3443                rmRd, loWr cpu  buf operation time: 2348
    rmRd, loWr cpu  buf operation time: 3449                rmRd, loWr cpu  buf operation time: 2392

    rmRd, loWr edma buf operation time: 765                 rmRd, loWr edma buf operation time: 544
    rmRd, loWr edma buf operation time: 765                 rmRd, loWr edma buf operation time: 548
    rmRd, loWr edma buf operation time: 767                 rmRd, loWr edma buf operation time: 548
    rmRd, loWr edma buf operation time: 769                 rmRd, loWr edma buf operation time: 548
                                                            rmRd, loWr edma buf operation time: 550
    -------------------                                     -------------------

    rmRd, rmWr cpu  buf operation time: 4154                rmRd, rmWr cpu  buf operation time: 3100
    rmRd, rmWr cpu  buf operation time: 4154                rmRd, rmWr cpu  buf operation time: 3102
    rmRd, rmWr cpu  buf operation time: 4160                rmRd, rmWr cpu  buf operation time: 3108
    rmRd, rmWr cpu  buf operation time: 4160                rmRd, rmWr cpu  buf operation time: 3116 

    rmRd, rmWr edma buf operation time: 821                 rmRd, rmWr edma buf operation time: 582
    rmRd, rmWr edma buf operation time: 825                 rmRd, rmWr edma buf operation time: 594
    rmRd, rmWr edma buf operation time: 829                 rmRd, rmWr edma buf operation time: 606