This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

LDW from L2 in 6455

Does anyone hade this problem? May be someone knows the solution?

 

 

 

  • That's in the ballpark of a single read-miss to L2 as documented in Chapter 3.2 of spru862.

  • Thanks.

    So if I understand correctly, functions with many LDW instructions will execute much more faster on 6416 then 6455 because of read-misses? For example 8k points FFT (more than L1D) on 6455 will executed slower then on 6455?

  • Sevan Varypaev said:

    So if I understand correctly, functions with many LDW instructions will execute much more faster on 6416 then 6455 because of read-misses? For example 8k points FFT (more than L1D) on 6455 will executed slower then on 6455?

    That's a difficult question to answer.  Keep in mind the number I mentioned is for a single read-miss.  There is "miss pipelining" that would reduce that overhead in many algorithms.  So although the L2 miss takes longer on 64x+ than 64x, you get many other advantages in 64x+ which could also help from performance perspective:

    • Larger L1 cache sizes (fewer misses).
    • Larger L2 cache sizes (fewer misses).
    • Compact instructions (better code density, i.e. more instructions can fit in a cache line)
    • Non-inclusive cache (reduces the number of "unnecessary" writeback-invalidates which ultimately result in a cache miss)
    • SPLOOP buffer (better code density, interruptible loops)
    • Additional instructions

    Brad