This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

why the efficiency of LL2 is near to L1DSRAM on 66778

A few days ago ,I made a experiment on 6678.The following is what I did.

1.I closed the cache,and tested the FFT efficiency on L1D,namely the  the efficiency of L1DSRAM.

2. I open the L1D cache,and tested the FFT efficiency on LL2.

And from the results ,I could know that  the efficiency of LL2 is near to L1DSRAM,and the L1DSRAM's FFT efficiency is very influenced by the places of the input ,output and twiddle factor.

Question 1:why the efficiency of LL2 is near to L1DSRAM.Theoretically the efficiency of L1DSRAM is more faster than LL2,and what is the bottleneck ?

Question  2:why the L1DSRAM's FFT efficiency is  influenced by the places of the input ,output and twiddle factor so much?

 
  • Hi,

    Refer section "3 Memory Access Throughput Performance" and "7 FFTC Throughput" on Keystone Throughput Performance Guide. This document have detailed information about throughput performance data for Keystone Architecture C66x devices.

    http://www.ti.com/lit/sprabk5

    Thanks,
  • Your post is very interesting. 

    However, we miss lots of details like what size FFT,  
    Floating point or fixed point and so on and so forth and we do not have the code.
    I do not know if when you run the data in L2 you configure L1D to be SRAM and not cache
    If L1D is configure as cache you really access the data from L1 and not from L2

    The document that Ganapathi suggested indeed shows the differences in access time
    Between L1D and L2
    I may try and answer the second question.  The core can access L1 via two ports so
    The bandwidth is two reads from L1D per cycle (or read and write). But if the two reads
    Are from the same bank of memory, then the core stalls for one cycle.  A single bank of memory
    Cannot supports two operations at the same time.  Thus the alignment of the arrays may change
    The performances.  This is true not only to FFT but to any code that involves more than one IO
    Operation in a cycle)
    L1D had 8 banks of memory each one 32-bit wide, and each IO operation is done on 64-bit so
    It is done on a pair of banks.  Reading two values from the same two banks will
    Slow the execution

    Hope that this answer your question

    Ran