TI E2E Community
Digital Signal Processors (DSP)
C6000 Single Core DSP
C64x Single Core DSP Forum
Fast RTS for DM642 - The computation times are comparable (simulator emulator)?
Dear All, I'm trying to speed-up my program for the DSP DM642. In particular, I'm using a DM642 Evaluation Module. I use the CCS Version: 5.1.1.00031.My program contains a porting of the core of the OpenCV 1.1 to calculate the optical flow and homography of two images. A pure C language can be pretty heavy to compute, and I was looking how to optimize my code.I read a lot of information and I opted to use theC62x/C64x Fast Run-Time Support (RTS) Library, to boost the operations.My questions are related to the example contained into the library, in particular about the computation time that I obtained once I enabled the clock from the code composer studio.I configured the target as simulator and run the program both in debug and release mode (optimization level 3).I compare the operations addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_iwith +, -, *, /, 1./xThe computation time I got are the followsDebug mode: +, -, *, /, 1./xPipelined addition time: 101.562500Pipelined substraction time: 106.19Pipelined multiplication time: 98.19Pipelined division time: 328.25Pipelined reciprocal time: 1394.88Debug Mode: addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_iPipelined addition time: 285.687500Pipelined substraction time: 298.56Pipelined multiplication time: 224.94Pipelined division time: 646.56Pipelined reciprocal time: 609.56Release Mode: +, -, *, /, 1./xPipelined addition time: 78.250000Pipelined substraction time: 82.88Pipelined multiplication time: 74.75Pipelined division time: 306.44Pipelined reciprocal time: 1373.50Release Mode: addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_iPipelined addition time: 37.187500Pipelined substraction time: 38.19Pipelined multiplication time: 7.81Pipelined division time: 59.13Pipelined reciprocal time: 18.44The results obtained with release mode suggest to use the Fast RTS library. However, I could not properly evaluate the performance with the emulator. I'm a novice and I would like to ask confirm if the Fast RTS with the DM642 should be fast as shown by the simulator.Can You kindly confirm that the FastRTS will reduce the computation time, for similar operations, with the DM642?
Can you give me an advice about which library I should use to speed-up fixed points operations or which documentation I should read? The amount of information about this topic is pretty huge, and sometimes the information are dispersed (just in my opinion, as novice).
Thank you in advance for any help.
You are obviously talented, knowledgeable, insightful, organized, and precise (1./x instead of 1/x). You are definitely more than a novice, and we are glad you are working with TI processors.
Just for your information, there is a DM64x Forum which might be more appropriate for your questions in the future; in this case, for purely DSP core-related questions, you are asking about things that are exact overlaps between this C64x forum and that DM64x one. If your questions were more directly related to the video ports or other peripherals on the DM642, the DM64x forum would be the better choice. There is also a TI C/C++ Compiler forum for optimization questions and a Code Composer Forum for simulator questions. A lot of choices and opportunities, and not as confusing as I make it sound.
Your questions are really asking whether the simulator is accurate and what optimization techniques we would recommend.
There are various simulator names, and the people on the Code Composer Forum can recite the names and features. I always use the ones that say Device in the name and have the part number, but I do not see a CCSv5 device simulator for the DM642. Which simulator are you using? If it does not model the memory that you are using, then the cycle counts will probably not match with the EVM. A device simulator will generally be within 5% at worst, and usually within 1-2% for most algorithms.
But for relative comparisons, your analysis above gives you all the right answers. Perhaps the pipelined multiplication will take a little more than 7.81 of whatever your units are, but the Release Configuration with the Fast RTS library will give you the fastest performance on the EVM, just as it did on the simulator.
Since you are running these tests on a simulator and an EVM, you might be at an early stage of this program. If so, I would strongly recommend moving to a newer processor. If you require some video ports, then there are DaVinci parts that would work, one of the best matches being the DM8148 or one of its derivative parts. But that would depend on more of your system requirements. Just moving to the DM647 would get you some more performance with just about the same peripheral architecture.
The DM647 gives you the C64x+ core. It is still a fixed-point processor, so it would need the Fast RTS library for better floating point performance.
The DM8148, C6748, and some other processors, have the C674x core. It has all the enhanced performance of the C64x+ fixed point core plus native floating point instructions; it is quite truly the best of both worlds since we were able to get the high clock speeds of the fixed point DSP and add very fast floating point, too.
Way too much information for your questions, but those are my opinions on what might be helpful to you.
If you need more help, please reply back. If this answers the question, please click Verify Answer , below.
Search for answers, Ask a question, click Verify when complete, Help others, Learn more.
thank you so much for the kind words, and for the helpful and precious information.
Thanks to your post, I have a lot of things I can study, search, and take in consideration. That helps me a lot.
Thank you again!
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.