• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » C6000 Multicore DSP » Keystone Multicore Forum (C66, 66A, AM5) » The difference between library file *.lib and *.ae66?
Share
C6000 Multicore DSP
  • Forums
  • Announcements
Options
  • Subscribe via RSS
Training Available
TI provides self-paced online training that introduces the primary components of the KeyStone II family of SoC devices.

  • KeyStone II SoC Overview >
  • KeyStone II Software Overview >
  • KeyStone II ARM Cortex-A15 Corepac Overview >
  • More Information >
  • Check out
    Multicore Mix blog
    • $core_v2_blog.Current.Name

      OpenMP - All aboard!

      Posted 11 hours ago
      by Debbie Greenstreet
      With so many end products today relying on multicore DSPs for...
    • $core_v2_blog.Current.Name

      A look back: Two years of Multicore Mix

      Posted 1 day ago
      by Lauren Reed1
      A big thank you to everyone who participated in our contest last...
    • $core_v2_blog.Current.Name

      It’s our second anniversary, but you get the present!

      Posted 8 days ago
      by Lindsey Bare
      It’s hard to believe it’s already been two years...

    Forums

    The difference between library file *.lib and *.ae66?

    This question is answered
    tianxing hou
    Posted by tianxing hou
    on Jun 21 2012 02:26 AM
    Intellectual990 points

    Hello,

    I would like to know the difference between file *.lib and *.ae66. Could you tell me?

    And, How does use the *.ae66 at a project for the CCS v5?

    I add the tsu_a.ae66 and the tsu_c.ae66 into the project, however when I compile the project, have the following errors:

    I have add the source and header files into the project as followed:

    Thank you.

    C66x *.ae66 MCSDK_Video
    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • Chad Courtney
      Posted by Chad Courtney
      on Jun 21 2012 17:02 PM
      Mastermind22595 points

      They're both extensions for Library files.  .ae66 is used for ELF format and .lib is used for COFF format in general to identify them separately.

      Best Regards,

      Chad

      ------------------------------------------------------------------------------------------------------------

      Please click the Verify Answer button on this post if it answers your question.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hongmei Gou
      Posted by Hongmei Gou
      on Jun 22 2012 12:55 PM
      Verified Answer
      Verified by tianxing hou
      Intellectual2940 points

      Hi tianxing,

      It looks like you are using the TSU component from MCSDK Video 2.0. If that is true, please define tsuContext (and its entries) in your application to address the last linking error.

      The first two linking errors can be due to project settings. If it is possible, please provide the complete compilation log or the CCS project so that we can take a look.

      Thanks,

      Hongmei

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jun 26 2012 02:15 AM
      Intellectual990 points

      Thank you, I have resolved the question.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jun 27 2012 20:39 PM
      Intellectual990 points

      Hi Hongmei,

      I used the TSU component from MCSDK Video 2.0. However, I found it will consume too much time.

      Could you tell me the performance of the TSU  component.

      Thank you.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hongmei Gou
      Posted by Hongmei Gou
      on Jun 28 2012 10:14 AM
      Intellectual2940 points

      Hi tianxing,

      The number of cycles consumed by TSU depends on input/output resolutions, as well as memory/cache configuration for the application.

      Below please find the number of Million cycles taken by each frame in our benchmarking.

      1080p to 720p 720p to 1080p 1080p to D1 D1 to1080p 720p to D1 D1 to 720p
      14M cyles 18M cyles 8M cyles 16M cyles 5M cyles 7M cyles

      These numbers are obtained with:

      L1D cache: 32K

      L1P cache: 32K

      L2 cache: 64K

      DDR: cache enabled; pre-fetch enabled.

      TSU scratch: placed in local L2

      Program: placed in MSMC

      We are also optimizing the cycle performance of TSU. The optimized TSU will be packaged in the next MCSDK Video release.

      Thanks,

      Hongmei

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jun 28 2012 20:35 PM
      Intellectual990 points

      Thanks, Hongmei.

      I find the TSU have two algorithm for the interpolation, one is based on the bicubic algorithm and another is based on the polyphase algorithm. What's the differences between them in performance.

      And I have tried them without memory/cache configuration. The result is not satisfactory. Could you provide an example for us.

      In the component location of tsu, I don't find the datasheet about the benchmarking and more information.

      I have some others questions about tsu in the  forum threads below. Could you give me some advice?

      http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/198056.aspx

      Thank you for your help again.

      Tianxing

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hongmei Gou
      Posted by Hongmei Gou
      on Jun 29 2012 18:28 PM
      Intellectual2940 points

      Hi Tianxing,

      Attached below please find our optimized TSU (CPU copy replaced with EDMA transfters) along with TSU unit test application. Please unzip it <mcsdk_video_2_0_0_10_install_dir>\components\ti\mas\tsu and try it out.

      1830.tsu.zip

      The unit test application is a CCSv5 project, which is located at tsu\test\ccsProject.  Several notes:
      1) Current TSU supports resolution up to 1920x1088
      2) Unit test is using EDMA, as "USE_EDMA" is pre-defined for the project. So the performance will be improved from what we posted earlier for using memcpy
      3) Configuration of unit test is defined in tsu\test\testVecs\config\testVecs.cfg. Please modify it according to your application:
      Line 1)Input YUV 4:2:0 clip name
      Line 2)Output YUV 4:2:0 clip name 
      Line 3)Input Image Width 
      Line 4)Input Image Height 
      Line 5)Output Image Width 
      Line 6)Output Image Height 
      Line 7)TSU Algorithm (TSU_POLYPHASE or TSU_BICUBIC) 
      Line 8)READ_INPUT_FROM_FILE or READ_INPUT_FROM_DDR2 
      4) For file IO, use READ_INPUT_FROM_FILE in Line 8). Also place your yuv input at tsu\test\testVecs\input. The output will be generated at tsu\test\testVecs\output.
      5) To avoid slow data IO, please use READ_INPUT_FROM_DDR2 in Line 8), and pre-load the input to 0x85000000 (#define DDR_READ_ADDR  0x85000000 as in tsu\test\src\main.c). In this mode, number of frames is set to 10 in the code (tsuTask()). Pleaes change that if needed.
      6) Cycles taken by each frame will be printed out in CCS console. They are recorded in global cycleArray[] also. Please use this for performance evaluation.
      Thanks,
      Hongmei
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jul 01 2012 21:54 PM
      Intellectual990 points

      Hi Hongmei,

      Thank you for your reply.

      I have build the project and can execute it successful for the image's resolution is 176*144, if my image's resolution is 1920*1088, what's the value of SIU_TSU_SCRATCH_SIZE? I don't know the connective between the scratch size and the resolution of image.

      When the resolution of image is 1920*1088, I can't read the file successful,  the program will dead in line 339 of main.c. My file as follow:

      4760.yuv420_1080p.rar

      For more, I have some questions about the use of EDMA3. You used the ECPY APIs, how can I get the datasheet about it. What should I do if I want to use the EDMA3?

      Thanks,

      Tianxing.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hongmei Gou
      Posted by Hongmei Gou
      on Jul 02 2012 15:44 PM
      Intellectual2940 points

      Hi Tianxing,

      Glad to know that you can build and run the TSU unit test application.

      The TSU unit test provides two ways for data IO: 1) fread and fwrite; 2) read input data from pre-loaded DDR (starting from 0x85000000) and also save output data in DDR (starting from 0x88000000). For testing HD, e.g, 1920x1088 as you tried, please use method 2 to avoid slow fread and fread as follows: 

      1) Use READ_INPUT_FROM_DDR2 in Line 8) of tsu\test\testVecs\config\testVecs.cfg

      2) Pre-load input YUV to 0x85000000 through "Memory Browser"

      3) Run .out file

      4) Save output YUV from 0x88000000 to PC through "Memory Browser"

      The program when running with 1920x1088 is not getting stuck. Instead, it's reading the input and it can take ~18 minutes to read a single 1920x1088 frame when using XDS560v2. If you are using XDS100 USB emulator, it will take even much longer.

      SIU_TSU_SCRATCH_SIZE in unit test has the same value as TSU_SCRATCH_SIZE in tsu\src\tsuinit.c. Currently this scratch size is defined for supporting up to 1920x1088. Cross check on the scratch size is in tsu\src\polyphase\tsuPolyphaseScaling.c: line 374-376.

      As for your question about EDMA, the optimized TSU is using ECPY/RMAN/IRES modules from framework components to achieve EDMA based data transfers. Underneath, it still uses the EDMA3 peripheral on C6678. Hope this clarifies. For details of ECPY/RMAN/IRES, please refer to link of framework components @ http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/fc/index.html.

      Thanks,

      Hongmei

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jul 02 2012 22:34 PM
      Intellectual990 points

      Hi Hongmei,

      According to your instruction I have executed the project successful, and I try the resolution of 1920*1088 to 720*480, it consume 6784480 cycles in average. Thank you very much for your help.

      I have a question for the TSU, now the TSU only support  the 1080p resolution, however the resolution of my image is 2432*2048, what should I do if I want to implement resize the image to other resolution, for example 1080p, D1 and so on.

      I have modified the code as follow:

      #define GG_TSU_BLOCK_SIZE 24064        --> #define GG_TSU_BLOCK_SIZE 35840
      #define IN_OUT_SIZE 3133440                     --> #define IN_OUT_SIZE 7471104

      I don't know how modify the SIU_TSU_SCRATCH_SIZE. I tried modify the value of SIU_TSU_SCRATCH_SIZE, however it didn't execute successful. I want to know if the size of scratch have upper limit. If it is association with that scratch placed in local L2.

      The yuv data of image as follow:

      3326.yuv.rar

      For more, I have some questions.

      1. What's the role of the tsuContext, if it is used only in the algorithm of polyphase filter?

      2. Why should use the EDMA3 or memcpy in the program, if it is used only in the algorithm of polyphase filter?

      Thank you for your instruction in these days again, it's so helpful for us.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hongmei Gou
      Posted by Hongmei Gou
      on Jul 03 2012 18:08 PM
      Intellectual2940 points

      Hi Tianxing,

      To use TSU for resolutions higher than 1080p, we need to make changes in TSU source code and then recompile TSU libs. The following defines in tsu\src\bicubic\tsuCubic.h need to be increased for higher resolutions:

      #define MAX_SIZE_X 1920
      #define MAX_SIZE_Y 1088

      The steps of recompiling TSU libs:

      1) In command window, go to dsp\mkrel, and then run "setupenvMsys.bat bypass" (as for sv01or sv04 described in http://processors.wiki.ti.com/index.php/MCSDK_VIDEO_2.0_Getting_Started_Guide#Set_up_environment_variables)

      2) go to TSU directory: bash-3.1$ cd ../../components/ti/mas/tsu

      3) Run xdc command to rebuild: bash-3.1$ xdc XDCARGS="c66le_elf src"

      Your changes for GG_TSU_BLOCK_SIZE and IN_OUT_SIZE are good. For SIU_TSU_SCRATCH_SIZE, you can start with a big number, say "#define SIU_TSU_SCRATCH_SIZE 177824". As there is cross check in tsu\src\polyphase\tsuPolyphaseScaling.c, exception will be reported if this large size is still not large enough. If no exception is reported, the actual scratch size can be found by recording the maximal value of (store_index + prev_pos) as used in the cross check. You can use a global variable to record this maximal usage in tsuPolyphaseScaling.c, recompile TSU lib and unit test, and then find its value in watch window after transizing is completed.

      /* Cross check on size of the TSU scratch */
      if( (store_index + prev_pos) > tsuContext.scratchSize) {
      tsu_exception(instId, TSU_EXC_UNEXPECTED_ERROR);
      }

      With above changes, I tried your 2432*2048 YUV input and it can be transized to 720p successfully.

      Thanks,

      Hongmei

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jul 03 2012 22:28 PM
      Intellectual990 points

      Hi Hongmei,

      Thank you for your help, I have execute the project successful, and implement the 2432*2048 --> 1920*1088, it consume 3185471 cycles in average, thank you very much.

      For more, I have some questions for the program.

      1. What's the mean of the tsuContext.alloc, tsuContext.free, tsuContext.availCoef, tsuContext.coeffHandle? I can't know how to use the struct of tsuContext. I find the use of the tsuContext.dataCopy and tsuContext.dataWait in the tsuPolyphaseDat.h and it will instead of the DAT_copy and DAT_wait in the tsuPolyphaseScaling.c. However I can't find the coeffHandle, availCoeff, alloc, free in the TSU code, you just init that in the main.c.

      2. There are some modifications between your code and early code, for example you modified the struct of tsuContext_t, add the DataCopy and DataWait in it, what's the mean of that.

      3. I tried to modify the SIU_TSU_SCRATCH_SIZE to a very large value, for example 2432*2048, however there are some errors when I build the project, as follow:

      #define SIU_TSU_SCRATCH_SIZE 77824    -->  #define SIU_TSU_SCRATCH_SIZE (2432*2048)

      4. If the TSU is compliant with the XDAIS standard? If I set the scratch to L2SRAM, other XDAIS algorithm can set scratch to L2SRAM too?

      5. What's the GMP and GMC modules?

      6. I used the cubic algorithm, it consume 1798293871 cycles. If it need more modification while I used the cubic algorithm?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Hongmei Gou
      Posted by Hongmei Gou
      on Jul 05 2012 18:03 PM
      Verified Answer
      Verified by tianxing hou
      Intellectual2940 points

      Hi Tianxing,

      Glad that we can help.

      tsuContext allows test application (instead of TSU lib) to have control on such items as buffer assignment, memory allocation/free, how to do data copying, and etc. This enables a more generic TSU. The structure of tsuContext_t is defined in tsu.h. Content of tsuContext is supplied from test application (e.g., main.c), including function pointer, base address and size of buffers. The test application also implements the related functions and allocate the related buffers. Internally in TSU, it's just using the function pointers and buffers supplied from the test application. You can search "tsuContext." inside TSU lib to find out how the tsuContext entries are used.

      For example, as you pointed out, dataCopy and dataWait are newly added as two entries of tsuContext. This allows application to choose how to do data copying and how to wait until data copying is completed. In test application (main.c), tsuContext.dataCopy is pointing to function siutsu_data_xfer, which implements data copying and application can choose either "EDMA" or "memcpy" for it. For tsuContext.dataWait (siutsu_data_wait()), no actions are needed for memcpy, while wait is needed to complete the data copying with EDMA before the output data is used and/or input data is modified.

      As for how to set SIU_TSU_SCRATCH_SIZE, please refer to our earlier post on 07/03 to find the maximal usage. There is no need to over-allocate. As the scratch buffer is allocated from local L2 (tsu\test\ccsProject\linker_c6678.cmd), a very large size which exceeds local L2 will result in linking error as you reported.

      TSU is not compliant with XDAIS standard. If it is ensured that TSU and other XDAIS based algorithms will not access the scratch at the same time, you can use the same scratch. If not, you can allocate another scratch from local L2 for other XDAIS based algorithms to use, as long as it can fit in local L2.

      GMP and GMC are global memory pool and global memory cell. Implementation details can be found from tsu\test\src\siuVigdkGmp.c and siuVigdkGmp.h.

      For cycles with bicubic interpolation, is "1798293871 cycles" collected from your application or the TSU unit test we recently provided? If it's the former, please recollect with TSU unit test. Cache settings can largely affect the cycle performance.

      Thanks,

      Hongmei

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Vivek Chengalvala
      Posted by Vivek Chengalvala
      on Jul 05 2012 18:45 PM
      Verified Answer
      Verified by tianxing hou
      Intellectual1100 points

      Tianxing,

      Bicubic interpolation is not hand-optimized for DSP, where as polyphase filter is optimized w/ scheduled assembly. Please use polyphase for your application.

      Regards,

      Vivek

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • tianxing hou
      Posted by tianxing hou
      on Jul 05 2012 19:49 PM
      Intellectual990 points

      Hi Hongmei,

      Thank you for your reply, it is so useful for us.

      Regards,

      Tianxing

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    12
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use