Hello everyone,
I already posted at http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/99/t/5771.aspx#98164 which is a bit outdated. Here I wanted to start a thread about the same problem.
I am using dsplink_1_65_01_05_eng on DM3730, and trying to run an algorithm on DSP using data of buffers allocated with CMEM on ARM. Problem is: DSP algorithm turns out to be extremely slow when run on DSP, even performing buffer copy operation is very slow.
What I did:
- On the ARM side, I allocated my buffers using CMEM with either type POOL or HEAP, with flag CACHED and with alignment 4096 0r 128.
- I configured the DSP side **.tci file for the CMEM region of my physical DDR. I also set the MAR register for the whole DDR (DSP cache enable).
- I configured the ARM side **.c configuration file by adding a CMEM entry in the LINKCFG_memTable_00 array.
- When my ARM code runs, it passes pointers of input and output CMEM buffers to DSP code. Then DSP code processes the input buffers and writes the result into the output buffers. When it is done ARM code reads the output buffers.
- Both codes run successfully except the DSP code being very slow.
What could be the reason of DSP code running very inefficient?
Thank you for any suggestions!