Dear All,
I'm trying to speed-up my program for the DSP DM642. In particular, I'm using a DM642 Evaluation Module. I use the CCS Version: 5.1.1.00031.
My program contains a porting of the core of the OpenCV 1.1 to calculate the optical flow and homography of two images. A pure C language can be pretty heavy to compute, and I was looking how to optimize my code.
I read a lot of information and I opted to use the
C62x/C64x Fast Run-Time Support (RTS) Library, to boost the operations.
My questions are related to the example contained into the library, in particular about the computation time that I obtained once I enabled the clock from the code composer studio.
I configured the target as simulator and run the program both in debug and release mode (optimization level 3).
I compare the operations addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_i
with +, -, *, /, 1./x
The computation time I got are the follows
Debug mode: +, -, *, /, 1./x
Pipelined addition time: 101.562500
Pipelined substraction time: 106.19
Pipelined multiplication time: 98.19
Pipelined division time: 328.25
Pipelined reciprocal time: 1394.88
Debug Mode: addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_i
Pipelined addition time: 285.687500
Pipelined substraction time: 298.56
Pipelined multiplication time: 224.94
Pipelined division time: 646.56
Pipelined reciprocal time: 609.56
Release Mode: +, -, *, /, 1./x
Pipelined addition time: 78.250000
Pipelined substraction time: 82.88
Pipelined multiplication time: 74.75
Pipelined division time: 306.44
Pipelined reciprocal time: 1373.50
Release Mode: addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_i
Pipelined addition time: 37.187500
Pipelined substraction time: 38.19
Pipelined multiplication time: 7.81
Pipelined division time: 59.13
Pipelined reciprocal time: 18.44
The results obtained with release mode suggest to use the Fast RTS library. However, I could not properly evaluate the performance with the emulator. I'm a novice and I would like to ask confirm if the Fast RTS with the DM642 should be fast as shown by the simulator.
Can You kindly confirm that the FastRTS will reduce the computation time, for similar operations, with the DM642?
Can you give me an advice about which library I should use to speed-up fixed points operations or which documentation I should read? The amount of information about this topic is pretty huge, and sometimes the information are dispersed (just in my opinion, as novice).
Thank you in advance for any help.
Regards,
Alessandro