Hi, Ti Expertsof 8168,
I am migrating algorithm to the DSP core on RDK2.0 for 8168evm board. The code migration is successful, but the speed is much lower than my expect: Previous using c6run to use DSP core, the test on the board shows it can run at 3 fps, but on MCFW framework it gets only 0.5 fps, so I think there is something I missed.
So I have some question hope you expert could help me:
1 How to got a local time stamp on DSP? Currently I got the timestamp from other core, which is not precise to determine the algorithm's time consuming.
2 How to check or change DSP's cache status for the memory? The algorithm needs about 300ms for each frame, But from document it seems the SR2, frame buffer region have no cache enabled. What I need is , I hope I can invalidate the cache on the frame memory, and then in the following procedure use cache. Also, because the system heap for DSP is too small, I cut about 60M memory from other section to DSP system heap, and modified the configuration file. It ran OK, but I am not sure if the DSP/bios6 will enable cache on system heap automatically, resulting poor performance. Searched the configuration file, it seems not a obvious line specifying it.
Hope you can give some advice or speed reason assess for it, the speed is critical for the platform competitive advance.
My platform:
8168evm+ddr3
DVR_RDK2.0
algorithm speed with c6run on dsp: 300ms/frame, with mcfw alglink on dsp: 2000ms/frame, for each frame all Y data in memory is accessed.