why the efficiency of LL2 is near to L1DSRAM on 66778

user4236509

A few days ago ,I made a experiment on 6678.The following is what I did.

1.I closed the cache,and tested the FFT efficiency on L1D,namely the the efficiency of L1DSRAM.

2. I open the L1D cache,and tested the FFT efficiency on LL2.

And from the results ,I could know that the efficiency of LL2 is near to L1DSRAM,and the L1DSRAM's FFT efficiency is very influenced by the places of the input ,output and twiddle factor.

Question 1:why the efficiency of LL2 is near to L1DSRAM.Theoretically the efficiency of L1DSRAM is more faster than LL2,and what is the bottleneck ?

Question 2:why the L1DSRAM's FFT efficiency is influenced by the places of the input ,output and twiddle factor so much?

over 9 years ago

0 Ganapathi Dhandapani95 over 9 years ago

TI__Mastermind 28085 points

Hi,

Refer section "3 Memory Access Throughput Performance" and "7 FFTC Throughput" on Keystone Throughput Performance Guide. This document have detailed information about throughput performance data for Keystone Architecture C66x devices.

http://www.ti.com/lit/sprabk5

Thanks,

0 ran35366 over 9 years ago in reply to Ganapathi Dhandapani95

TI__Genius 12805 points

Your post is very interesting.

However, we miss lots of details like what size FFT,
Floating point or fixed point and so on and so forth and we do not have the code.
I do not know if when you run the data in L2 you configure L1D to be SRAM and not cache
If L1D is configure as cache you really access the data from L1 and not from L2

The document that Ganapathi suggested indeed shows the differences in access time
Between L1D and L2
I may try and answer the second question. The core can access L1 via two ports so
The bandwidth is two reads from L1D per cycle (or read and write). But if the two reads
Are from the same bank of memory, then the core stalls for one cycle. A single bank of memory
Cannot supports two operations at the same time. Thus the alignment of the arrays may change
The performances. This is true not only to FFT but to any code that involves more than one IO
Operation in a cycle)
L1D had 8 banks of memory each one 32-bit wide, and each IO operation is done on 64-bit so
It is done on a pair of banks. Reading two values from the same two banks will
Slow the execution

Hope that this answer your question

Ran

Processors

Processors forum

why the efficiency of LL2 is near to L1DSRAM on 66778