I am comparing the performance of am335x (on a Beaglebone Black) vs am437x (on a MYIR Rico board).
Using u-boot on both platforms, I run the identical program (pseudo-program listed):
while (1) {
setGPIO()
clearGPIO()
for (ii = 0; ii < 50000; ++ii);
}
making sure the "for" loop is not optimized away and the same assembly code is generated for both platforms.
The am335x runs about 2.5 times faster than the 437x. Is that an expected result?
Additional info:
- the GPIO spikes are used to measure timing with an oscilloscope.
- the GPIO timings without the "for" loop are about 4-8 nanoseconds slower on the 437x (i.e. barely any difference in timing)
- when the "for" loop is replaced with a giant function greater than 256k in size so as to force the processors to access DRAM, the
437x gradually begins to outperform the 335x because of its wider memory path.
But the big question remains, why is the 335x outperforming the 437x when running from cache?
- Chuan Neng Lee
Precise Automation, LLC