Hi,
Recently I am testing the effects of hardware prefetch on/off.
I wrote a simple benchmark based on the vector-add from the sample codes. I also attached the source file.
When I do the vector add, i.e. c[i] = a[i] + b[i], the basic idea of the testing is when I turn the hadware prefetch off for vector a, b, and c, I would expect that the performance dropped significantly, because the elements in the vector were accessed continuously. However, it did not happen. The performance when the hardware prefetch off is exactly the same as it is on.
[core 0] Hardware prefetch on, time: 0.56 sec
[core 0] Hardware prefetch off, time: 0.56 sec
[core 1] Hardware prefetch on, time: 0.56 sec
[core 1] Hardware prefetch off, time: 0.56 sec
[core 2] Hardware prefetch on, time: 0.56 sec
[core 2] Hardware prefetch off, time: 0.56 sec
[core 3] Hardware prefetch on, time: 0.56 sec
[core 3] Hardware prefetch off, time: 0.56 sec
[core 4] Hardware prefetch on, time: 0.56 sec
[core 4] Hardware prefetch off, time: 0.56 sec
[core 5] Hardware prefetch on, time: 0.56 sec
[core 5] Hardware prefetch off, time: 0.56 sec
[core 6] Hardware prefetch on, time: 0.56 sec
[core 6] Hardware prefetch off, time: 0.56 sec
[core 7] Hardware prefetch on, time: 0.56 sec
[core 7] Hardware prefetch off, time: 0.56 sec
I compiled both the host and target code in optimization level O3, so I suppose the hardwre prefetch is on by default.
So could you please look at the code I attached, and instruct me if I missed something.
Thanks
Cheng