This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA3: DSP & CM4 Cache Usage and Data Access Time

Part Number: TDA3

Hello Experts,

could You please help with the following request:

On TDA3x

DSP L2D Cache is enabled (DSP @ 500 MHz):

1) How many DSP cycles are needed to load a value from L2D cache into the 2x32 core registers?

2) How many DSP cycles are needed, if the data is already in cache?

3) How many DSP cycles are needed, if the data is in internal RAM?

4) How many DSP cycles are needed, if the data is in external RAM (DDR3 / 16-bit)?

CM4 Cache - similar questions

1) How many M4 cycles are needed, if data is in cache already?

2) How many M4 cycles are needed, if data is within internal RAM?

3) How many M4 cycles are needed, if data is in external RAM (DDR3 / 16-bit)?

Thanks and best regards,

Gregor

  • Hi Gregor,

    I have forwarded your question to an expert.

    Regards,
    Yordan
  • Hi Gregor,

    For the DSP case, the following are the numbers:

    DSP Stall Cycles for L1D access: 0 cycle

    DSP Stall cycles for L2 SRAM Hit: 5 cycles

    DSP Stall cycles for L2 Cache Hit: 7 cycles 

    I measured the following numbers from statcoll measurements of latency of access from DSP MDMA port:

    DDR access (Avg latency command to response) - 47 L3 cycles (266 MHz) equivalent to 88 cycles of DSP.

    OCMC access (Avg latency  command to response) - 9 L3 cycles (266 MHz) equivalent to 17 cycles of DSP.

    Note: Accesses are usually pipelined. So total time for access is not a direct multiplication of these cycles with the number of bytes to be transferred.

    Thanks and Regards,

    Piyali

  • Hi Piyali,


    thank You very much! Do we have also similar data for CM4?

    Thanks and best regards,

    Gregor

  • Hi Gregor,

    Based on LMBENCH tests performed on M4 earlier, please find below some data for M4 access latencies:

    L1 Hit latency is 6.30 ns.

    L1 miss L2 hit is ~50 ns.

    L3 latency (main memory) is ~1.8us 

    Thanks and Regards,

    Piyali

  • Hi Piyali,

    thank You very much!

    Could You also please comment on whether the speeds are for "First Access" or "Burst Access" and how about the access speeds for write operations?

    Many thanks and best regards,
    Gregor
  • Hi Gregor,

    I will try and see if this data is already available. If not, we will need to measure it. I will try to get back to you at the earliest.

    Thanks and Regards,

    Piyali

  • Hi Piyali,

    thank You very much, that would be great!

    Many thanks and best regards,
    Gregor
  • Hi Gregor,

    I have measured the access latencies for cortex M4 as below. I use simple memory access and SYSTICK to measure this with multiple runs.

     

    Situation

    Number M4 cycles

    Time

    L1 Hit (read/write)

    1

    4.7 ns

    L1 Miss, L2 Hit (cache line read)

    10

    47 ns

    L1 Miss, L2 Hit (write through)

    5

    23.5 ns

    L1 Miss, DDR (cache line read)

    51

    239.7 ns

    L1 Miss, DDR (write allocate + write)

    52

    244.4 ns

    L2 Non Cache read/write

    5

    23.5 ns

    DDR Non cache read (single)

    46

    216.2 ns

    DDR Non cache write (single)

    40

    188 ns

    Kindly note the DDR access numbers are best to average case and is influenced by the instance at which DDR is accessed – if your access goes to a location that resides in a closed page, then there will be lot of protocol overhead in pre-charging, activating that page etc before actually reading/writing to a location. If a periodic refresh gets scheduled in between that adds further delays. If you access a location in a page that is already open then your access latencies will match with the best case to average case numbers.

    Thanks and Regards,

    Piyali