Because of the holidays, TI E2E™ design support forum responses will be delayed from Dec. 25 through Jan. 2. Thank you for your patience.

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Arm Neon Intrinsic optimization Support in CCS

Hai,

       I am using Code Composer Studio (CCS V5) and want to develop Neon Intrinsic code for  ARM Cortex A8.

For that I have enabled the

   "properties(of a created CCS project for Generic cortex A8)->Build->Advanced Options->Run Time Model Options->Generate SIMD Instructions targeting Neon(--neon)"

and

"properties->Build->ARM Compiler->Optimization->Optimization Level(--opt_level,O) - 3".

Summary of flags set: 

-mv7A8 --code_state=32 --abi=eabi -me -O3 -g --include_path="/opt/ti/ccsv5/tools/compiler/tms470_4.9.1/include" --diag_warning=225 --display_error_number --neon

But when i build the sample ARM Neon intrinsic code, found that assembly code generated is not the optimized ARM neon assembly code and generated is the normal ARM assembly code.

Kindly Suggest me to fix this problem...

Thanks & Regards,

    J.Moorthi

  • Please see this wiki article.  

    Thanks and regards,

    -George

  • Hai,

           I have set the following options in CCSv5 on Ubutu OS (according to the link u mentioned) and I have builded the sample C program. But again i got the disassembly that it is not optimized neon assembly code( Even there is no D,Q or S registers used in that generated assembly code).

                --float_support=VFPv3

               --abi=eabi

              --opt_level=2

              --opt_for_speed=3

              Run Time Model Options->Generate SIMD Instructions targeting Neon(--neon)

    and the sample C program i used is

    -----------------------------------------------

    int main(void) {
        int a[200],b[200],c[200];
        int i;
        for (i = 0; i < 200; i++)
         {
            a[i]= b[i]=i+1;
         }
        for (i = 0; i < 200; i++)
         {
             c[i]= a[i] * b[i];
         }
    }

    -----------------------------------------------

    Also    Properties->Build->ARM Compiler : Summary of flags set

    -mv7A8 --code_state=32 --float_support=VFPv3 --abi=eabi -me -O3 --opt_for_speed=3 -g --optimize_with_debug --include_path="/opt/ti/ccsv5/tools/compiler/tms470_4.9.1/include" --diag_warning=225 --display_error_number --neon

    Kindly guide me on this to get optimized neon assembly code...

    Thanks & Regards,

        J.Moorthi

  • When I use exactly your test case with exactly the options shown with ARM compiler version 4.9.1, I do in fact get assembly code which uses VMUL on Q registers.  If you do not, please post the output assembly language here.

  • For the second loop I get VLD1, VMUL, and VST1 instructions.

    Thanks and regards,

    -George

  • hai,

          The following are the Disassembly generated in CCS Debug mode.

    int main(void) {
    main:
      E24DDE96 SUB           R13, R13, #0x960
      E24D0010 SUB           R0, R13, #0x10
      E28D1E31 ADD           R1, R13, #0x310
        for (i = 0; i < 200; i++)
      E3A0C000 MOV           R12, #0x0
            a[i]= b[i]=i+1;
    $C$L1:
      E28C2001 ADD           R2, R12, #0x1
      E5A02010 STR           R2, [R0, #0x10]!
      E5A12010 STR           R2, [R1, #0x10]!
      E28C3003 ADD           R3, R12, #0x3
      E28C2004 ADD           R2, R12, #0x4
      E28CC002 ADD           R12, R12, #0x2
      E580C004 STR           R12, [R0, #0x4]
      E581C004 STR           R12, [R1, #0x4]
      E5803008 STR           R3, [R0, #0x8]
        for (i = 0; i < 200; i++)
      E1A0C002 MOV           R12, R2
      E35C00C8 CMP           R12, #0xC8
            a[i]= b[i]=i+1;
      E581200C STR           R2, [R1, #0xC]
      E5813008 STR           R3, [R1, #0x8]
      E580200C STR           R2, [R0, #0xC]
        for (i = 0; i < 200; i++)
      BAFFFFF0 BLT           $C$L1
    $C$DW$L$main$2$E:
      E28D0E32 ADD           R0, R13, #0x320
        for (i = 0; i < 200; i++)
      E3A0C032 MOV           R12, #0x32
      E1A0200D MOV           R2, R13
      E28D1D19 ADD           R1, R13, #0x640
             c[i]= a[i] * b[i];
    $C$L2:
      F4222A8D STRNVT        R2, [R2], #-0xA8D
      F4200A8D STRNVT        R0, [R0], #-0xA8D
    $C$DW$L$main$4$E:
      F2220950 EORNV         R0, R2, #0x140000
        for (i = 0; i < 200; i++)
      E25CC001 SUBS          R12, R12, #0x1
             c[i]= a[i] * b[i];
      F4010A8D STRNV         R0, [R1], #-0xA8D
        for (i = 0; i < 200; i++)
      1AFFFFF9 BNE           $C$L2
    }

    But I got the neon assembly code only in the main.asm file  [ when i enable (--keep_asm,-k) option, i got this main.asm file in Debug folder ]  which i have attached along with this post and kindly find the attached main.asm file.  When i switched to CCS Debug Mode, then the generated Disassembly is not the same neon assembly code generated in the main.asm file.  And there is no register set (like Q0,D0,S0,etc)  for Neon in CCS Debug mode. I want to debug the neon assembly code...

    8372.main.asm

    Kindly Suggest me to fix this problem...

    Thanks & Regards,

        J.Moorthi

  • The assembly file is as expected.  The disassembly has the right opcodes, but the wrong disassembled instructions.  This is not a compiler issue.  Perhaps the disassembler in CCS is incorrectly configured for this device?

  • Moorthi,

    Which exact version of CCSv5 are you using? I tried reproducing this with CCS 5.4 and I see the expected instructions in the CCS disassembly view. Please see screenshot below.

     

  • The screenshot didn't seem to make it. Trying again as a doc.

    5037.arm_neon_disassembly.docx

     

  • George,

        I am using the  CCS Version: 5.2.0.00069 and what are the correct Disassembler configuration to get the neon assembly code in Disassembler?...

    Thanks & Regards,

        J.Moorthi

  • Moorthi,

    I created a test project using your code and was able to see the correct/expected disassembly on both CCS 5.4 and 5.2. I am attaching a zip file of my project folder (I created it for Beaglebone Cortex A8 as I did not know the specific device you are working with). I loaded this to a Beaglebone board using both CCS 5.4 and 5.2 and the disassembly in both cases looks correct as in the screenshot I attached in my previous post.

    Could you try to load the .out file from the attached project to see if it works for you? Depending on the device you are working with you may or may not be able to use it directly, but at least it may help identify why your results are different than mine.

    2475.arm_neon_test.zip

     

  • George,

               I couldn't import your "arm_neon_test" project because When i import your "arm_neon_test" project, CCS gives the following error...

    Error: Import failed for project 'arm_neon_test' because its compiler definition is not available. Please install the ARMv5.0 compiler before importing this project.

     In my CCS, ARM compiler Version is 4.9.1 and I have selected the Arm for Generic Cortex A8 device. And i have attached the screen shot of the my project property. Also i have attached all the version details used in my CCS. So Kindly find those attached.

    6470.CCS_version_Details.txt
     
    Installion Details:
    ===================
    
        ARM BIOS6					6.33.4.39
        ARM Compiler Tools				4.9.1
        ARM Compiler v4.9 Help			4.9.0.201203231355
        ARM IPC					1.24.2.27
        ARM XDCTools				3.23.3.53
        BIOS 5					5.41.13.42
        BIOS 6					6.33.4.39
        Blackhawk CCSv5.2 Emulation for Linux	5.2.0.200
        Blackhawk Emulators				5.2.0.200
        C2000 BIOS6					6.33.4.39
        C2000 Compiler v6.1 Help			6.1.0.201203281421
        C2000 Emulation Flash			1.0.0.2
        C2000 Emulators				3.1.0.2
        C2000 IPC					1.24.2.27
        C2000 XDCTools				3.23.3.53
        C2800 Compiler Tools			6.1.0
        C5400 Compiler Tools			4.2.0
        C5500 Compiler Tools			4.4.1
        C5500 Compiler v4.4 Help			4.4.0.201203231357
        C6000 BIOS6					6.33.4.39
        C6000 Compiler Tools			7.3.4
        C6000 Compiler v7.3 Help			7.3.0.201203231357
        C6000 Device Support			1.0.4
        C6000 IPC					1.24.2.27
        C6000 Multicore Device Support		1.0.3
        C6000 XDCTools				3.23.3.53
        C6EZFlo					3.2.0.201205011655
        C6Flo					3.2.0.201205011655
        CCS NowFlash Emulators			5.1.0.0
        CCSINIT					5.1.0.0
        CCStudio p2 Tool Feature			5.2.0.201204111758
        Code Composer Studio Base Components	5.2.0.237
        Code Composer Studio IDE ARM Components	5.1.0.201205041800
        Code Composer Studio IDE C2000 Components	5.1.0.201205041800
        Code Composer Studio IDE C5400 Components	5.1.0.201205041800
        Code Composer Studio IDE C5500 Components	5.1.0.201205041800
        Code Composer Studio IDE C6000 Components	5.1.0.201205041800
        Code Composer Studio IDE Main Feature	5.0.1.201205041800
        Code Composer Studio IDE Workflow		5.0.1.201205041800
        Compiler Tools On-line Documentation	1.0.5.0
        DaVinci Device Support			1.0.8
        Debug Server				5.2.0.237
        DSP/BIOS (IDE Client)			5.41.13.42
        DSP/BIOS (Target Content)			5.41.13.42
        DVT - Graph Visualization			3.2.0.201205011655
        DVT - Profiler Analysis Manager		3.2.0.201205011655
        DVT - RTA Feature				3.2.0.201205011655
        DVT - System Analyzer			3.2.0.201205011655
        DVT - Trace Control				3.2.0.201205011655
        Eclipse IDE for C/C++ Developers		1.4.2.20120213-0813
        Image Analyzer				3.2.0.201205011655
        Integra Device Support			1.0.7
        IPC						1.24.2.27
        IPC (Multicore and I/O) (IDE Client)	1.24.2.27
        IPC (Multicore and I/O) (Target Content)	1.24.2.27
        Keystone1					1.0.3.0
        LWInstaller					5.2.0.00069
        OMAP Device Support				1.0.3
        Qt Eclipse integration			1.6.1
        Qt Integration				1.6.1
        ROV						3.2.0.201205011655
        RTSC/XDCtools (IDE Client)			3.23.3.53
        RTSC/XDCtools (Target Runtime Support)	3.23.3.53
        Shared Device Support			1.0.4
        Sitara Device Support			1.0.8
        Spectrum Digital Emulators			5.2.0.00
        SYS/BIOS (IDE Client)			6.33.4.39
        SYS/BIOS (Target Content)			6.33.4.39
        System Analyzer(UIA Target)(IDE Client)	1.1.0.04
        System Analyzer(UIA Target)(Target Content)	1.1.0.04
        Third-Party Components for GMF Runtime	1.5.0.v20110426-2230-7P8W6FHV2CYms9gAtyKPaw311A16
        TI Emulators				5.0.681.1
        TI Simulators				5.2.3.10
        Trace Analyzer				3.2.0.201205011655
        UIA						1.1.0.04
        XDAIS					7.21.1.07
        XDAIS (IDE Client)				7.21.1.07
        XDAIS (Target Content)			7.21.1.07
        XDCTools					3.23.3.53
    

     

    Should i upgrade the CCS or Is this version enough?...

     Thanks & Regards,

        J.Moorthi

  • Ah, I see that you are working with a Linux version of CCS, not Windows. I wonder if the disassembly issue could be specific to Linux as I had been trying to reproduce it on Windows. Let me try this out on a Linux system and will if I canget back to you on whether I can  reproduce it or not.

    Regardless of this specific issue, it is recommended to update to the latest version whenever possible. The latest is CCS 5.4 and is available for download at
    http://processors.wiki.ti.com/index.php/Download_CCS

  • Hai,

         I have followed the link ( http://processors.wiki.ti.com/index.php/How_to_create_GCC_projects_in_CCSv5 ) to create GCC project for ARM in CCSv5 on Ubuntu Linux OS  and i have created a sample GCC project with sample Neon Assembly code and tried to build & debug the sample Neon Assembly code. But i could build it only and i couldn't debug the sample code. while changing to CCS Debug mode, i got the following error...

      /bin/bash: /home/user1/test1/neongcctst/Debug/neongcctst: cannot execute binary file.

     I want to debug the Neon Assembly Code. So Kindly suggest me on this also...

    Thanks & Regards,

        J.Moorthi

  • AartiG said:
    I wonder if the disassembly issue could be specific to Linux as I had been trying to reproduce it on Windows. Let me try this out on a Linux system and will if I canget back to you on whether I can  reproduce it or not.

    Sorry about the delay in getting back on this issue. I finally tried it out on a Linux host machine with versions CCS 5.2.1 and CCS 5.4 and see the correct disassembly with both. The VLD1, VMUL, and VST1 instructions appear as expected in the disassembly view just like the screenshot I had attached earlier. So I'm afraid I'm not able to reproduce the behavior you are seeing. If possible, try updating to CCS 5.4.

    You also mentioned having trouble importing the project I attached, but one thing you could try is simply loading the .out file instead of importing and rebuilding the project. Start a project-less debug session in CCS , connect to the target and go to menu Load Program and browse to the .out file from the project I attached. Also as I mentioned earlier this program is targeted for a Beaglebone board. You did not mention the specific device/board you are using, but I assume the code should load to any Cortex A8 device.

  • Moorthi Jayaraman said:
    I have followed the link ( http://processors.wiki.ti.com/index.php/How_to_create_GCC_projects_in_CCSv5 ) to create GCC project for ARM in CCSv5 on Ubuntu Linux OS  and i have created a sample GCC project with sample Neon Assembly code and tried to build & debug the sample Neon Assembly code. But i could build it only and i couldn't debug the sample code. while changing to CCS Debug mode, i got the following error...

    Please start a new thread for new questions. Your earlier post with this same question was split into a new thread and is being worked on in the CCS forum so please continue tracking it there.

  • Hai,

         I have installed CCSv5.4.0 and I have checked it  by using a sample neon code for the device ARM Cortex-A8. Now i got the Neon Assembly code in Disassembly. But i could not debug the Neon Assembly Code. And in Debug mode,  i got error (GEL Expression) and shows that no source for the assembly function in the executable file...  (And i have selected Cortex-R4 CPU Functional Simulator,Little Endian only. Because There is no simulator for Cortex-A8 i found). And which simulator i have to select in the NewTargetConfiguration.ccxml?... And also is it possible to compile and debug Neon intrinsics using this CCSv5.4.0?

    Kindly help me to debug the Neon Assembly code for the device ARM Cortex-A8... (I have attached some screenshots regarding this error)





    Thanks & Regards,

        J.Moorthi

  • This thread has covered a lot of ground since it started.  None of the issues above are related to the compiler.  I suggest you start a new thread in the CCS forum.  And rather than try to address all your problems at once, I suggest you pick the one problem that is causing the most difficulty, and focus exclusively on it.

    Thanks and regards,

    -George