TDA3: DSP & CM4 Cache Usage and Data Access Time

ROGERG

Mastermind 36420 points

Part Number: TDA3

Hello Experts,

could You please help with the following request:

On TDA3x

DSP L2D Cache is enabled (DSP @ 500 MHz):

1) How many DSP cycles are needed to load a value from L2D cache into the 2x32 core registers?

2) How many DSP cycles are needed, if the data is already in cache?

3) How many DSP cycles are needed, if the data is in internal RAM?

4) How many DSP cycles are needed, if the data is in external RAM (DDR3 / 16-bit)?

CM4 Cache - similar questions

1) How many M4 cycles are needed, if data is in cache already?

2) How many M4 cycles are needed, if data is within internal RAM?

3) How many M4 cycles are needed, if data is in external RAM (DDR3 / 16-bit)?

Thanks and best regards,

Gregor

over 8 years ago

0 Yordan Kamenov over 8 years ago

TI__Mastermind 42515 points

Hi Gregor,

I have forwarded your question to an expert.

Regards,
Yordan

0 Piyali Goswami over 8 years ago

TI__Mastermind 30245 points

Hi Gregor,

For the DSP case, the following are the numbers:

DSP Stall Cycles for L1D access: 0 cycle

DSP Stall cycles for L2 SRAM Hit: 5 cycles

DSP Stall cycles for L2 Cache Hit: 7 cycles

I measured the following numbers from statcoll measurements of latency of access from DSP MDMA port:

DDR access (Avg latency command to response) - 47 L3 cycles (266 MHz) equivalent to 88 cycles of DSP.

OCMC access (Avg latency command to response) - 9 L3 cycles (266 MHz) equivalent to 17 cycles of DSP.

Note: Accesses are usually pipelined. So total time for access is not a direct multiplication of these cycles with the number of bytes to be transferred.

Thanks and Regards,

Piyali

0 ROGERG over 8 years ago in reply to Piyali Goswami

TI__Mastermind 36420 points

Hi Piyali,

thank You very much! Do we have also similar data for CM4?

Thanks and best regards,

Gregor

0 Piyali Goswami over 8 years ago in reply to ROGERG

TI__Mastermind 30245 points

Hi Gregor,

Based on LMBENCH tests performed on M4 earlier, please find below some data for M4 access latencies:

L1 Hit latency is 6.30 ns.

L1 miss L2 hit is ~50 ns.

L3 latency (main memory) is ~1.8us

Thanks and Regards,

Piyali

0 ROGERG over 8 years ago in reply to Piyali Goswami

TI__Mastermind 36420 points

Hi Piyali,

thank You very much!

Could You also please comment on whether the speeds are for "First Access" or "Burst Access" and how about the access speeds for write operations?

Many thanks and best regards,
Gregor

0 Piyali Goswami over 8 years ago in reply to ROGERG

TI__Mastermind 30245 points

Hi Gregor,

I will try and see if this data is already available. If not, we will need to measure it. I will try to get back to you at the earliest.

Thanks and Regards,

Piyali

0 ROGERG over 8 years ago in reply to Piyali Goswami

TI__Mastermind 36420 points

Hi Piyali,

thank You very much, that would be great!

Many thanks and best regards,
Gregor

0 Piyali Goswami over 8 years ago in reply to ROGERG

TI__Mastermind 30245 points

Hi Gregor,

I have measured the access latencies for cortex M4 as below. I use simple memory access and SYSTICK to measure this with multiple runs.

Situation	Number M4 cycles	Time
L1 Hit (read/write)	1	4.7 ns
L1 Miss, L2 Hit (cache line read)	10	47 ns
L1 Miss, L2 Hit (write through)	5	23.5 ns
L1 Miss, DDR (cache line read)	51	239.7 ns
L1 Miss, DDR (write allocate + write)	52	244.4 ns
L2 Non Cache read/write	5	23.5 ns
DDR Non cache read (single)	46	216.2 ns
DDR Non cache write (single)	40	188 ns

Kindly note the DDR access numbers are best to average case and is influenced by the instance at which DDR is accessed – if your access goes to a location that resides in a closed page, then there will be lot of protocol overhead in pre-charging, activating that page etc before actually reading/writing to a location. If a periodic refresh gets scheduled in between that adds further delays. If you access a location in a page that is already open then your access latencies will match with the best case to average case numbers.

Thanks and Regards,

Piyali

Processors

Processors forum

TDA3: DSP & CM4 Cache Usage and Data Access Time