Some code I was given to integrate used a 2d array of floats in a loop. This ran fine in simulation in CCS, moving it to the hardware upped the execution time of the loop from 400 us to 4000 us.
Changing the code to use a 1d array brought the performance back in line around 600us.
Is there any known reason why 2d array would perform worse in real hardware?
The 2d and 1d code performs the same in simulation.