Does anyone hade this problem? May be someone knows the solution?
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Does anyone hade this problem? May be someone knows the solution?
That's in the ballpark of a single read-miss to L2 as documented in Chapter 3.2 of spru862.
Thanks.
So if I understand correctly, functions with many LDW instructions will execute much more faster on 6416 then 6455 because of read-misses? For example 8k points FFT (more than L1D) on 6455 will executed slower then on 6455?
Sevan Varypaev said:So if I understand correctly, functions with many LDW instructions will execute much more faster on 6416 then 6455 because of read-misses? For example 8k points FFT (more than L1D) on 6455 will executed slower then on 6455?
That's a difficult question to answer. Keep in mind the number I mentioned is for a single read-miss. There is "miss pipelining" that would reduce that overhead in many algorithms. So although the L2 miss takes longer on 64x+ than 64x, you get many other advantages in 64x+ which could also help from performance perspective:
Brad