• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » C6000 Single Core DSP » C64x Single Core DSP Forum » Fast RTS for DM642 - The computation times are comparable (simulator emulator)?
Share
C6000 Single Core DSP
  • Forums
  • Announcements
Options
  • Subscribe via RSS

Fast RTS for DM642 - The computation times are comparable (simulator emulator)?

Fast RTS for DM642 - The computation times are comparable (simulator emulator)?

This question is answered
Alessandro Moro
Posted by Alessandro Moro
on Apr 02 2012 04:09 AM
Prodigy90 points

Dear All,

  I'm trying to speed-up my program for the DSP DM642. In particular, I'm using a DM642 Evaluation Module. I use the CCS Version: 5.1.1.00031.


My program contains a porting of the core of the OpenCV 1.1 to calculate the optical flow and homography of two images. A pure C language can be pretty heavy to compute, and I was looking how to optimize my code.

I read a lot of information and I opted to use the

C62x/C64x Fast Run-Time Support (RTS) Library, to boost the operations.

My questions are related to the example contained into the library, in particular about the computation time that I obtained once I enabled the clock from the code composer studio.

I configured the target as simulator and run the program both in debug and release mode (optimization level 3).

I compare the operations addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_i

with +, -, *, /, 1./x

The computation time I got are the follows

Debug mode: +, -, *, /, 1./x

Pipelined addition time: 101.562500
Pipelined substraction time: 106.19
Pipelined multiplication time: 98.19
Pipelined division time: 328.25
Pipelined reciprocal time: 1394.88

Debug Mode: addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_i

Pipelined addition time: 285.687500
Pipelined substraction time: 298.56
Pipelined multiplication time: 224.94
Pipelined division time: 646.56
Pipelined reciprocal time: 609.56

Release Mode: +, -, *, /, 1./x

Pipelined addition time: 78.250000
Pipelined substraction time: 82.88
Pipelined multiplication time: 74.75
Pipelined division time: 306.44
Pipelined reciprocal time: 1373.50

Release Mode: addsp_i, subsp_i, mpysp_i, divsp_i, recipsp_i

Pipelined addition time: 37.187500
Pipelined substraction time: 38.19
Pipelined multiplication time: 7.81
Pipelined division time: 59.13
Pipelined reciprocal time: 18.44

The results obtained with release mode suggest to use the Fast RTS library. However, I could not properly evaluate the performance with the emulator. I'm a novice and I would like to ask confirm if the Fast RTS with the DM642 should be fast as shown by the simulator.
Can You kindly confirm that the FastRTS will reduce the computation time, for similar operations, with the DM642?

Can you give me an advice about which library I should use to speed-up fixed points operations or which documentation I should read? The amount of information about this topic is pretty huge, and sometimes the information are dispersed (just in my opinion, as novice).

Thank you in advance for any help.

Regards,

Alessandro

Simulator DM642 CCSv5.1
Report Abuse
  • Reply
You have posted to a forum that requires a moderator to approve posts before they are publicly available.
All Replies
  • RandyP
    Posted by RandyP
    on Apr 15 2012 19:52 PM
    Verified Answer
    Verified by Alessandro Moro
    Guru60000 points

    Alessandro,

    You are obviously talented, knowledgeable, insightful, organized, and precise (1./x instead of 1/x). You are definitely more than a novice, and we are glad you are working with TI processors.

    Just for your information, there is a DM64x Forum which might be more appropriate for your questions in the future; in this case, for purely DSP core-related questions, you are asking about things that are exact overlaps between this C64x forum and that DM64x one. If your questions were more directly related to the video ports or other peripherals on the DM642, the DM64x forum would be the better choice. There is also a TI C/C++ Compiler forum for optimization questions and a Code Composer Forum for simulator questions. A lot of choices and opportunities, and not as confusing as I make it sound.

    Your questions are really asking whether the simulator is accurate and what optimization techniques we would recommend.

    There are various simulator names, and the people on the Code Composer Forum can recite the names and features. I always use the ones that say Device in the name and have the part number, but I do not see a CCSv5 device simulator for the DM642. Which simulator are you using? If it does not model the memory that you are using, then the cycle counts will probably not match with the EVM. A device simulator will generally be within 5% at worst, and usually within 1-2% for most algorithms.

    But for relative comparisons, your analysis above gives you all the right answers. Perhaps the pipelined multiplication will take a little more than 7.81 of whatever your units are, but the Release Configuration with the Fast RTS library will give you the fastest performance on the EVM, just as it did on the simulator.

    Since you are running these tests on a simulator and an EVM, you might be at an early stage of this program. If so, I would strongly recommend moving to a newer processor. If you require some video ports, then there are DaVinci parts that would work, one of the best matches being the DM8148 or one of its derivative parts. But that would depend on more of your system requirements. Just moving to the DM647 would get you some more performance with just about the same peripheral architecture.

    The DM647 gives you the C64x+ core. It is still a fixed-point processor, so it would need the Fast RTS library for better floating point performance.

    The DM8148, C6748, and some other processors, have the C674x core. It has all the enhanced performance of the C64x+ fixed point core plus native floating point instructions; it is quite truly the best of both worlds since we were able to get the high clock speeds of the fixed point DSP and add very fast floating point, too.

    Way too much information for your questions, but those are my opinions on what might be helpful to you.

    Regards,
    RandyP

     

    If you need more help, please reply back. If this answers the question, please click  Verify Answer  , below.

    Search for answers, Ask a question, click  Verify  when complete, Help others, Learn more.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Alessandro Moro
    Posted by Alessandro Moro
    on Apr 22 2012 05:46 AM
    Prodigy90 points

    Dear RandyP,

      thank you so much for the kind words, and for the helpful and precious information.

    Thanks to your post, I have a lot of things I can study, search, and take in consideration. That helps me a lot.

    Thank you again!

    Best regards,

    Alessandro

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
TI E2E™ Community
  • Support Forums
  • Blogs
  • Videos
  • Groups
  • Site Support & Feedback
  • Settings
TI E2E™ Community Groups
  • TI University Program
  • Make the Switch
  • Microcontroller Projects
  • Motor Drive & Control
Other Communities
  • Deyisupport
  • Designsomething.org
  • beagleboard.org
  • TI on Element 14
  • TI on TechXchangeSM
Other Technical & Support Resources
  • WEBENCH® Design Center
  • Product Information Centers
  • Technical Documents
  • TI Design Network
  • TI Technical Articles
  • TI Training

All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
embedded processors, along with software, tools and the industry’s largest sales/support staff.

© Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
Trademarks | Privacy Policy | Terms of Use