Recently,I transported my algorithm noto DM648,algorithm was so slow that i must optimize to improve performance
And,I have foud some solutions in E2E community ; Most of them,said like this;
1. CPU must visit externel memory (DDR) and DDR is very slow; its the bottleneck!
2. using on chip memory such as L1D cache/ram L1P cache/ram L2 cache/ram could improve code performace;
3. someone said that you could placing your code (wasting most time) in l1d or l1p ;
According the third point, I placing Funtion1(wasting most time in my project) into L1PSRAM (In DM648, L1P can be configured as 0kb 16kb 32kb,
in my project I configured L1P 16kb cache and 16kb sram);but it didn't work and improve any performance,Placing Funtion1 into DDR or L1PSRAM
have no difference.It was also very slow. my DDR were all cacheenable and that was not enough to improve performace,I wanted to use memory on chip (l1d l1p l2);
Who can tell me what the problem was in my case? why placing DDR or SRAM on chip has no difference? how to use on chip memory to improve code performance?
thx!