LDW from L2 in 6455

Sevan Varypaev

Prodigy 20 points

Does anyone hade this problem? May be someone knows the solution?

over 16 years ago

0 Brad Griffis over 16 years ago

TI__Guru*** 125430 points

That's in the ballpark of a single read-miss to L2 as documented in Chapter 3.2 of spru862.

0 Sevan Varypaev over 16 years ago in reply to Brad Griffis

Prodigy 20 points

Thanks.

So if I understand correctly, functions with many LDW instructions will execute much more faster on 6416 then 6455 because of read-misses? For example 8k points FFT (more than L1D) on 6455 will executed slower then on 6455?

0 Brad Griffis over 16 years ago in reply to Sevan Varypaev

TI__Guru*** 125430 points

Sevan Varypaev said:

So if I understand correctly, functions with many LDW instructions will execute much more faster on 6416 then 6455 because of read-misses? For example 8k points FFT (more than L1D) on 6455 will executed slower then on 6455?

That's a difficult question to answer. Keep in mind the number I mentioned is for a single read-miss. There is "miss pipelining" that would reduce that overhead in many algorithms. So although the L2 miss takes longer on 64x+ than 64x, you get many other advantages in 64x+ which could also help from performance perspective:

Larger L1 cache sizes (fewer misses).
Larger L2 cache sizes (fewer misses).
Compact instructions (better code density, i.e. more instructions can fit in a cache line)
Non-inclusive cache (reduces the number of "unnecessary" writeback-invalidates which ultimately result in a cache miss)
SPLOOP buffer (better code density, interruptible loops)
Additional instructions

Brad

Processors

Processors forum

LDW from L2 in 6455