I'm hoping to get some hints, maybe the right document that I haven't found yet to help with my problem. I have a huffman encoder that I've ported from the ARM to the DSP on a DM3730 SoC. The functionality is there, but it takes a long time to run. The implementation is a straight port and the ARM runs at 600MHz. This should mean that the DSP runs at 520MHz.
Optimization is set to -O2, but I'm not sure how to find out other things such as:
am I wasting cycles in areas other than my algorithm on tracing? I define GT_TRACE 0 right before including gt.h, which I understand removes all of my GT_XTrace calls in my module. But is there tracing in other components of codec engine that are being called in the process of running this algorithm remotely? If so, how do I know and how do I turn it off? My gut tells me there is some overhead still in the system that I just need to find and root out.
Another thing I've noticed is that when I do have CE_DEBUG enabled, the timestamps aren't correct. My algorithm as measured round-trip from the ARM takes about 50ms, but the reporting I get from the CE_DEBUG messages is that it takes 100ms. Clearly the timestamper is assuming the wrong clock rate or tick rate somewhere. I tried modifying this in the server.cfg of my codec server, and set the clock frequency to 520 (MHz). Is this the right thing to do and the right place?
Any tips or tricks are appreciated. I've looked in a few places so far, but haven't found any definitive reasons why this would be significantly slower.
I've looked in the GPP to DSP porting guide [1] and the codec engine FAQ among a few others..
[1]