• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » C6000 Multicore DSP » Keystone Multicore Forum (C66, 66A, AM5) » fft time in evm6678l
Share
C6000 Multicore DSP
  • Forums
  • Announcements
Options
  • Subscribe via RSS
Training Available
TI provides self-paced online training that introduces the primary components of the KeyStone II family of SoC devices.

  • KeyStone II SoC Overview >
  • KeyStone II Software Overview >
  • KeyStone II ARM Cortex-A15 Corepac Overview >
  • More Information >
  • Check out
    Multicore Mix blog
    • $core_v2_blog.Current.Name

      Geeks UNITE for Geek Pride Day

      Posted 22 hours ago
      by Lauren Reed1
      Happy Geek Pride Day from the Processors team! We wanted to celebrate...
    • $core_v2_blog.Current.Name

      OpenMP - All aboard!

      Posted 2 days ago
      by Debbie Greenstreet
      With so many end products today relying on multicore DSPs for...
    • $core_v2_blog.Current.Name

      A look back: Two years of Multicore Mix

      Posted 3 days ago
      by Lauren Reed1
      A big thank you to everyone who participated in our contest last...

    Forums

    fft time in evm6678l

    This question is not answered
    jie wang75279
    Posted by jie wang75279
    on Jul 28 2011 08:33 AM
    Prodigy240 points

    Hello, I have just buy a tmdxevm6678l ,I am using it!

    The question is that I debug a project in the path "Texas Instruments\dsplib_c66x_3_0_7",the name of project is DSPF_sp_fftSPxSP_66_LE_ELF.

    In the targetconfiguration ccxml I use    texas instruments xds100v1 usb emulator    tms320c6678. 

    I want to do a 1024 float fft ,before fft I use t1=clock(),after fft I use t2=clock();

    my result is :    [c66x_0] dspf_sp_fftspxsp item#:1 rsult successfu n = 1024 radix = 4 natc:570776, optc:379303

    because the c6678 is 1.25GHZ,so I calculate,the time of fft is 302us,it is too long ,In the data AVNET offered,for single precison floating point fft ,2048pt,radix 4,c66x@1.25GHZ the time is 14us.

    I want to know why this happen?

    Thanks in advance!

    6678 C6678EVM
    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • Xiaohui Li
      Posted by Xiaohui Li
      on Jul 29 2011 16:45 PM
      Prodigy300 points

      Hi,

      I tried dsplib_c66x_3_0_8 from the latest mcsdk_2_00_00_11 on C6678 EVM.  I set MAXN to 1024.  For 1024-point SP FFT, I got 12873 cycles.

      In your setup, could you change MAXN to 1024 and see what happens?

      -Xiaohui

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • jie wang75279
      Posted by jie wang75279
      on Aug 04 2011 01:30 AM
      Prodigy240 points

      Hi,

      I changed my config from debug mode to release mode,then for 1024-point SP FFT ,I got 15304 cycles.For 2048-point SP FFT,I got 33751 cycles.My EVM is configed to be 1GHz,so I calculate the time of 2048-point SP FFT is 33us.It is much more than the data from AVNET(15us).I want to know why?

      And what is different between the debug mode and release mode?

      6678
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • DanRinkes
      Posted by DanRinkes
      on Aug 04 2011 09:04 AM
      Expert8035 points

      Jie,

      The difference between debug and release mode is typically 2 things, 1. The majority of debug information is removed from the release version and 2. a higher optimization level is typically used in release mode. 

      Where is the data that you are operating on?  Is it in internal memory?  Or external?  If external, is data cache turned on and is the cache size large enough?

      Regards,

      Dan

       

      Be sure to check out the Embedded Processors Wiki at http://processors.wiki.ti.com.  You'll find additional information on a variety of TI Technology Related Topics ranging from Hardware to Software to Tools and more. 

      Please Create New Threads for New Issues, and even for similar issues that have already been reported.  Do not reply to a thread that has already been answered and say "I'm having the same problem".  You will get a faster response if you create a new issue that is not already marked answered.  

      -----------------------------------

      Don't forget to verify answers to your forum questions by using the green "Verify Answer" button.

      Did you read the CCS Forum Guidelines & FAQ? If not, PLEASE read it. If you haven't read it in awhile, please read it again to see if any updates were made.

      Having CCSv4.x problems? Check out the CCSv4 Troubleshooting Guide

      Reporting a problem?  Please try to include the relevant details requested in the Forum Usage Guidelines

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • jie wang75279
      Posted by jie wang75279
      on Aug 05 2011 02:25 AM
      Prodigy240 points

      Hi,

      I use multicore shared memory,does it need data cache turned on ?How much the cache size is should to be?

      6678
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • DanRinkes
      Posted by DanRinkes
      on Aug 05 2011 07:18 AM
      Expert8035 points

      Jie,

      Yes, the shared memory is external and does need cache turned on.

      The best answer that I can give you to the size of the data cache is "as large as you can afford".  Keep in mind, though, that with a 2-way set associative cache, you won't get any benefit of a cache larger than 1/2 the size of your data set. 

      Regards,

      Dan

       

      Be sure to check out the Embedded Processors Wiki at http://processors.wiki.ti.com.  You'll find additional information on a variety of TI Technology Related Topics ranging from Hardware to Software to Tools and more. 

      Please Create New Threads for New Issues, and even for similar issues that have already been reported.  Do not reply to a thread that has already been answered and say "I'm having the same problem".  You will get a faster response if you create a new issue that is not already marked answered.  

      -----------------------------------

      Don't forget to verify answers to your forum questions by using the green "Verify Answer" button.

      Did you read the CCS Forum Guidelines & FAQ? If not, PLEASE read it. If you haven't read it in awhile, please read it again to see if any updates were made.

      Having CCSv4.x problems? Check out the CCSv4 Troubleshooting Guide

      Reporting a problem?  Please try to include the relevant details requested in the Forum Usage Guidelines

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • jie wang75279
      Posted by jie wang75279
      on Aug 15 2011 00:33 AM
      Prodigy240 points

      DanRinkes,

      I change my data from shared memory to L2 ,but for 2048 float FFT,I still need 33us.

      I want to know why?It is not the external momery ,do I still need to turn the cache on?

       

      And Xiaohui Li's reply is as below,Is 12873 cycles(12us) for 1024 FFT the final result using cache?Is it too slow?Can it be faster?

      Xiaohui Li replied to Re: fft time in evm6678l in C66x Multicore DSP Forum.

      Hi,

      I tried dsplib_c66x_3_0_8 from the latest mcsdk_2_00_00_11 on C6678 EVM.  I set MAXN to 1024.  For 1024-point SP FFT, I got 12873 cycles.

      In your setup, could you change MAXN to 1024 and see what happens?

      -Xiaohui

       And without BIOS,Can I use cache?

      Regards,

      Jie

      6678
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Xiaohui Li
      Posted by Xiaohui Li
      on Aug 15 2011 09:05 AM
      Prodigy300 points

      Jie,

      12873 cycles for 1024 floating point FFT is the performance with both code and data (in, out, and twiddle factors) placed in L2 SRAM.  Were you able to duplicate the performance?  This is the performance we can get from the current version of C66x DSPLIB.  There will be future updates and we can expect some performnace improvement.

      What kind of performance are you looking for for both 1024 and 2048 FFT?

      Regards,

      Xiaohui

       

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Tim Wentz
      Posted by Tim Wentz
      on Aug 15 2011 09:54 AM
      Intellectual1280 points

      As an innocent bystander: I looked at the example, and I only see a macro for N and not MAXN -- so is that the right file? dsplib/examples/FFT_Example_66_LE_COFF? (and I'm building ELF, but as long as I link the library, I think it's fine)

       

      And that example file has 3 calls to an FFT routine. Are the times that you all are quoting for 1 of those or all 3?  For all 3, I'm getting 30000 cycles for all debug / optimization options (implying the library is optimized only), and for the 16x16 I get 5337 cycles, 16x32 12242, 32x32 13022 cycles.  Are those what you are talking about?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Xiaohui Li
      Posted by Xiaohui Li
      on Aug 15 2011 10:00 AM
      Prodigy300 points

      We were talking about single precision floating point FFT.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • jie wang75279
      Posted by jie wang75279
      on Aug 15 2011 21:12 PM
      Prodigy240 points

      Xiaohui Li ,

      I have duplicated this performance,my result is 14843cycles.

      The data from AVNET is 14us for single precision floating point FFT,2048pt,radix 4 ,1.25GHz,but the result of my test on EVM is different from the data from AVENT,so I afraid I made some mistake.

      Regards,

      Jie wang.

      6678
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • jie wang75279
      Posted by jie wang75279
      on Aug 15 2011 21:33 PM
      Prodigy240 points

      DanRinkes,

      I change my data from shared memory to L2 ,but for 2048 float FFT,I still need 33us.

      I want to know why?It is not the external momery ,do I still need to turn the cache on?

        And without BIOS,Can I use cache?

      Regards,

      Jie

      6678
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • James Steed
      Posted by James Steed
      on Sep 19 2011 13:33 PM
      Intellectual890 points

      What is the AVNET publication with cycle counts for FFT you referenced?  Would someone please provide me a link to it?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • jie wang75279
      Posted by jie wang75279
      on Sep 20 2011 22:23 PM
      Prodigy240 points

      Hello,

      I got these data from a conference of AVNET.

      6678
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Alberto Chessa
      Posted by Alberto Chessa
      on Sep 21 2011 01:09 AM
      Genius3740 points

       

      Hello,

      I can obtain more or less the performance declared only without a linker command files, that is code and data mapped from location 0. Since 0 is declared reserved, I suppose it maps to L1RAM, (maybe for compativbility with other CPU).

      With the following scenario:

      - code on MCSM (no L2 cachable)

      - FFT in , out and twiddle factors on DDR3, cachable

      - L2RAM configured as all cache

      I obtains the following results:

      - 1024 Complex:  min=12.934us, max=26,482us

      - 2048 Complex: min=29.586us, max=58.847us

      Where max is from the first execution, just before a code cache invalidate and a data cache flush, while min is from the second execution.

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hui Zhang84600
      Posted by Hui Zhang84600
      on Sep 21 2011 21:50 PM
      Prodigy10 points

      Hi, DanRinkes,

      I tried the code in "..\dsplib_c66x_3_0_0_8\packages\ti\dsplib\src\DSPF_sp_fftSPxSP\c66\DSPF_sp_fftSPxSP_66_LE_ELF" on EVM6678L, and I've got the same results with Liang Wen:

      Liang Wen

      from"TMS320C6670 Breakthrough performance for process-intensive applications"

       

      C66x @1.2 GHz Single precision floating-point FFT, 2048 pt. radix 4 costs 14.60 us.

       

      but the code in dsplib doesn't achieve this performance , maybe only half the speed...

      i also don't know why...

      here the result and the  scenario:

      • [C66xx_0] DSPF_sp_fftSPxSP Iter#: 9 Result Successful N = 2048 radix = 2 natC: 97296 optC: 33197 cycles 
      • [C66xx_0] DSPF_sp_fftSPxSP Iter#: 8 Result Successful N = 1024 radix = 4 natC: 40476 optC: 14762 cycles 

       

      • both code and data (in, out, and twiddle factors) placed in L2 SRAM
      • ccxml:  texas instruments xds100v1 usb emulator
      • I use clock() as well as on-chip Timer to measure performance, the results are almost the same
      and the project file: 7750.DSPF_sp_fftSPxSP.zip
      How can I replicate the results “C66x @1.2 GHz Single precision floating-point FFT, 2048 pt. radix 4 costs 14.60 us.” on EVM6678L?
      Thanks!
      c6678 fft
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use