First of all, since I'm using the DVSDK 2.00.00.22; do I need to close down or unload any other stuff to free up resources in the DSP?
Second: I have modified the readwrite example on the DSP side to instead do a simple MAC loop (on char size buffers). Looking at the compiler output, I get
ii = 1 Schedule found with 8 iterations in parallel
Good, each iteration of the MAC loop takes 1 cycle. I have measured (by doing nothing besides the memory handling) that the memory passing overhead is negilible, and thus I'll spend almost all of my time in this loop. Still, when measuring I only get ~290 MMAC/s. With the DSP at 594 Mhz I'm only at half the expected performance. What am I missing?
Moving on, I've been searching for how to utilize 4 16*16 or 8 8*8 MACs per cycle in the C64+ DSP but haven't found any. Anyone know of a "peak MMACs" example?
Thanks,
Orjan