This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

What is Maximum Throughput for EDMA , EMIF and UPP in OMAP L138

Hi All , 

A) Can any one please let me know what are the maximum throughput achievable for :-

      a) EDMA 

      b) EMIF [when clk is about 91.2Mhz]

      c) UPP 

       interface for ARM OMAP l138 

B)  Or if any one can let me know how to determine these value .       

[ It will be helpful if the values are for LINUX OS ]

Thank You,

Ashish Mishra 

[Banglore]

  • Dear Rahul , Thanks for the link they provided me some path to check the throughput. 

    Can you please help me for the following and please correct me if my understanding is going in wrong direction :-

    A) EDMA Throughput

         Going by the link , what i understood is that the MAXIMUM throughput achievable is somewhere around  532Mbytes/sec. 

         This throughput is for 4k buffer , A sync mode , Self chained & self linked setting

         a.1 ) Is the source & destination both on board memory 

                              or 

                  Either source or Destination is on board memory & destination is external interface like FPGA mapped via emif

        b.1) With the test i performed i am getting around 82 MBYTES /sec for on board [i.e source & destination both are on board memory ]

                 And around 17 Mbytes with OMAP to FPGA with EMIF conneected in between . 

                 [ My configuration is 1k Buffer ,A sync mode , and the PARAM is self linked ]

                 I am not configuring any value for RDRATE register[Default value "0"] and Defalut Burst Size [16 bytes by default]

    Can you please suggest some mechanism to increase the throughput .

     


    B) EMIF interface 

         Presently i have configured EMIF interface to setup/strobe/hold as 2/4/2 . 

         And clock at 91.2 Mhz . 

        What should be approx THROUGHPUt i should expect from the interface.I am driving EMIF with EDMA

        [My understanding i will get  [   91/value] MBytes where value is [2+4+2]   ]


    Please do correct me wherever my undersatning is wrong.

    I am working with these interface for the first time .



    Thank You,

    Ashish  


  • Dear Rahul  , 

    Can you please help me to resolve this issue . 

    Thank You,

    Ashish Mishra 

  • Hi Ashish

    Your understanding on EMIF throughput is correct, with a 91.2 MHz clock  , 2/4/2 setup hold strobe etc, the max theoretical throughput you can get is (EMIFA Clock/(Setup+Strobe+Hold)) * bus width (bytes) per second.

    The actual throughput should be slightly less, based on overheads through chip topology, things like un optimized EDMA transfers (due to geometry constrains of your transfer etc).

    On EDMA the max throughput you quoted is roughly 80-90% throughput possible from external memory (mDDR) or internal RAMs ... i.e. e.g. for DDR2 it would be 150 MHz x 2 (double data rate) x 2 (16 bits) = 600 Mbytes/sec. Typically the best throughput you can get from EDMA would depend on the max throughput of the slower end point (source or dest) , so if you are doing L2 to EMIFA transfers, the best throughput possible will be max theoretical throughput achievable by EMIFA (and not L2) etc.

    Hope this helps

    Regards

    Mukul

  • Dear Mukul , 

    Thanks for the input. 

    Referring to link http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SoC_Constraints

    EMIFA Asynchornous Memories  66.67 

      33.33 MHz x 16 bit   [ (EMA_CLK)/(Setup+Strobe+Hold) *16 bit]

    • Illustrated calculations assume a Setup/Strobe/Hold value of 1 cycle 
    • Assumes a 16 bit async interface
    For the same (33.33 / 3) * 16 =>  177.76 MBytes 
    But the calculation mentioned only 66.67 MBytes /sec 
    Can you please let me know , if i am missing any details .
    Thank You,
    Ashish  Mishra
  • Dear Mukul , 

    Can you please guide me for the same 

  • Hi All, 

    After some work the following are the out come we have arrived to :-

    a) EMIF with 92.2 Mhz

         Theoretical :- 21MBytes  [ with 1 clk for strobe ,1 clk for hold , 1 for start ]

         Practical      :- 17 MBytes [ with 2 clk for strobe ,4 clk for hold , 2 for start ]

    B) EDMA  with 1k Buffer :- Got 82MBytes /sec

    c) UPP :- Just Started on the same will keep updated 

  • for DDR2 it would be 150 MHz x 2 (double data rate) x 2 (16 bits) = 600 Mbytes/sec

    Interesting. We have the DDR set up to run at 150MHz, and ran some benchmarks before and after, and indeed saw a 10% improvement over the 133MHz numbers. The best we've seen is about 75 MB/s reading from DDR RAM to CPU. Can you explain why reading RAM is only at 15% of what one could expect?

  •  Hi Mike , 

    1. The said throughput you are getting is from ARM or DSP. I am evaluating with ARM+LINUX

    2. Also i have configured the EDMA Param to self LINK. 

        Probably you can configure to SELF LINK mode. 

    Ashish .K Mishra 

    [Banglore / India ]

  • I'm only using the ARM side under Linux. Tried with some own code, and with the "lmbench" publicly available code, which agrees with my findings. The CPU runs at 456MHZ, the DDR at 150MHz.

    But i'm not using any DMA, so I may be hijacking your thread here... But I'd expect that the CPU to RAM interface should be (at least) equally fast as the DMA.

  • Dear Mike ,

    It's always good to have the THROUGHPUT analysis of one more entity to give clear picture of device .

    Can you please share the TEST SET up , as i haven't evaluated for DDR2 ?   

    Neither have i ever used "lmbench"  !!!!!!

    Ashish K Mishra ,

    [Banglore / India ]

  • We use a LogicPD SOM module, which has mDDR if I'm not mistaken. I can share the benchmark results. "lmbench" is open source and can easily be cross-compiled for the ARM. It copies data around in memory and measures throughput.

    Horizontal is the speed in MHz of the CPU. This is the result using 16MB RAM.

    test 96 200 300 408 456
    bcopy 38.08 61.95 68.15 72.19 73.56
    bzero 93.1 192.65 252.07 253.93 254.11
    cp 52.46 75.01 84.86 91.34 93.47
    fcp 38.05 61.94 68.14 71.02 73.56
    frd 46.6 84.74 112.63 135.87 144.19
    fwr 93.05 192.53 252.98 253.86 254.02
    rd 66.15 110.5 134.45 156.46 162.15
    rdwr 43.66 72.59 92.97 110.42 109.96
    wr 350.73 712.09 916.33 929.8 936.22

    I did some changes to the DDR timing settings, and repeated the tests with the CPU running at 456 MHz.

    Frequency(MHz) 133 150 150 150
    RD latency 5 5 4 4
    CAS 3 3 3 3
    Refresh (us) 4.44 3.91 3.91 4.44
    Test Size (MB)
    rd 1 159.95 165.65 173.25 175.16
    rd 4 161.83 167.17 175.18 175.48
    rd 8 162.00 167.45 175.12 175.87
    rd 16 162.15 167.56 175.41 175.82
    wr 1 927.21 1022.84 1040.04 1025.47
    wr 4 930.12 1027.49 1043.02 1033.19
    wr 8 929.69 1027.88 1035.73 1034.53
    wr 16 936.22 1034.39 1035.46 1040.72
    rdwr 1 111.12 112.21 113.69 114.92
    rdwr 4 109.32 111.76 115.28 114.18
    rdwr 8 109.12 112.72 114.29 114.27
    rdwr 16 109.96 112.14 113.98 114.49
    cp 1 92.72 97.51 100.09 100.14
    cp 4 93.39 97.98 100.72 100.46
    cp 8 138.63 142.55 149.20 149.09
    cp 16 93.47 98.25 100.90 100.98
    fwr 1 252.43 279.60 280.27 280.86
    fwr 4 253.45 281.04 280.98 281.57
    fwr 8 253.79 281.68 282.02 282.38
    fwr 16 254.02 282.03 282.51 282.73
    frd 1 143.53 143.95 149.68 149.14
    frd 4 144.02 145.33 150.72 142.89
    frd 8 144.01 145.73 151.06 150.99
    frd 16 144.19 145.77 151.14 151.41
    fcp 1 73.13 77.18 78.74 77.82
    fcp 4 73.46 76.90 78.96 78.97
    fcp 8 107.99 112.27 114.72 115.29
    fcp 16 73.56 77.57 79.05 79.04
    bzero 1 252.59 280.31 280.43 281.14
    bzero 4 253.44 281.47 281.79 281.93
    bzero 8 253.87 282.38 281.59 283.21
    bzero 16 254.11 282.92 282.52 283.57
    bcopy 1 73.31 77.40 78.68 78.85
    bcopy 4 73.48 77.80 79.27 79.34
    bcopy 8 107.99 112.63 115.45 115.77
    bcopy 16 73.56 75.97 79.63 79.69

    As you can see, increasing the frequency and lowering timings had a positive effect. Even with the fastest settings as displayed above, the system is stable, an memtest can be run for hours without errors.

  • Hi Mike,

    I just start to work on emif to fpga on Linux system. From your post, I think you already go ahead on me on this topic.

    I come to ask if you can help on how to map the emif memory on Linux system. What documents you have referenced while working this topic. What files I should look into? If you had posted steps how to this mapping that will be great appreciated.

    Best Regards,

    Joe

  • Hi Joe,

    Always better to create new thread for your question or problem since old threads may get less attention than new one.

    Try to use ioremap function to map your physical address to logical (virtual) address and then try ioread() or iowrite() or __raw_readl() or __raw_writel() functions.

    Please refer the following thread which has FPGA init code.

    http://e2e.ti.com/support/embedded/linux/f/354/t/122942.aspx