Hi all,
we have problem with slow program execution on TMS570LC4357 compare to TMS570LS3137. Difference is too significant and we can't ignore it. At this moment I hope, that we have some bug in our code, but our ideas whera are depleted.
Here is maximally simplified code to measure it:
speedTest: mrc p15, #0, r1, c9, c13, #0 // Read PMCCNTR Register nop // 1 nop // 2 nop // 3 nop // 4 nop // 5 nop // 6 nop // 7 nop // 8 nop // 9 nop // 10 nop // 11 nop // 12 nop // 13 nop // 14 nop // 15 nop // 16 nop // 17 nop // 18 nop // 19 nop // 20 nop // 21 nop // 22 nop // 23 nop // 24 nop // 25 nop // 26 nop // 27 nop // 28 nop // 29 nop // 30 nop // 31 nop // 32 nop // 33 nop // 34 nop // 35 nop // 36 nop // 37 nop // 38 nop // 39 nop // 40 nop // 41 nop // 42 nop // 43 nop // 44 nop // 45 nop // 46 nop // 47 nop // 48 nop // 49 mrc p15, #0, r0, c9, c13, #0 // Read PMCCNTR Register sub r0, r0, r1 bx lr
Result on TMS570LS3137 is 6clock for MRC + 49*1clock for NOP. Function returns 55 ticks as expected.
But on TMS570LC4357 it is much slower. Expected result is same, but returned value is 81ticks.
And bad news. It is bigger difference on real code. For ex. one real function at TMS50LS3137 take 600 ticks (3us@180MHz). But on TMS570LC4357 same function need 1700 ticks (5us@300MHz)!
Where can be problem? GCLK = 300MHz, HCLK = 150MHz, flash data waitstates = 3, flash prefetch is enabled, cache is enabled. Boot code (flash & cache init) come from HalCoGen.