I've written an algorithm for the C674x DSP core on my DM8168 chip and I need to profile it to work out which parts should be parallelised etc.
How do I do this? Ideally I'd get information on how my microseconds particular sections of C code take to execute.
Thanks,
Ralph