This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Please help on Cortex A8 neon instruction generation

Dear TI forum supporters:

To boost my algorithm performance on cortex A8,  I've tried to generate neon instruction by referencing

other article on the forum such as this

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/299516.aspx?pi199607=2 

I tried to use the simple code example from the posted thread for testing purpose.

int a[200],b[200],c[200];
int i;

for (i = 0; i < 200; i++)
{
a[i]= b[i]=i+1;
}

for (i = 0; i < 200; i++)
{
c[i]= a[i] * b[i];
}

Still, I failed to generate the neon code for my target platform.  From my CCS5.2.1 disassembly window, please

see my assembly code dump compared to the successful one. 

My compiler option is as below:

CFLAGS_INTERNAL = -c -qq -pdsw225 --endian=$(ENDIAN) -mv7A8 -O2 -g --opt_for_speed=5 --define=dm8146 --define=dm8148 --abi=$(CSWITCH_FORMAT) -eo.$(OBJEXT) --symdebug:dwarf -Dfar= -D_DEBUG_=1 -DMULTICHANNEL_OPT=1 --neon -k

I wonder if my developing environment is different. Currently I have CCS installed on windows mainly for debug purpose. I compiled the code thru gmake on windows, and the compiler path is set to SDK tools/tms470_5_0_1.

Below is my platform info. Could someone please help walk me thru the neon generation process.

My platform info: TMS320DM8148 (Vision-Mid) 

600-MHz ARM® Cortex™-A8 RISC MPU

500-MHz C674x™ VLIW DSP

200-MHz M3-ISS/M3-HDVPSS 

BIOS: avsdk_00_08_00_00 (sys-bios)

Thanks in advance,

Joey from Altek

  • I can get NEON instructions for that C source.  I suspect something subtle is getting in the way.

    Please send a test case I can build, perhaps preprocessed like this.  Please show the command line options exactly as the compiler sees them, and the compiler version number.

    Thanks and regards,

    -George

  • Hi, George:

    Thank you for your quick response. Other than the .pp files you requested, I also attach the .asm for your reference.

    Since I don't have a CCS project, I used the Makefile Method to create the .pp file.

    0844.Eagle_TaskMainrar.rar

    The test sample was written in a separate task, named "EagleK_TaskMain()". The following command line is copied from my console window:

    " # Compiling a8/src/AlleyView/Eagle_TaskMain.c to D:/EAGLE-II/src/outfiles/avsk_app/obj/ti814x-evm/a8host/debug/Eagle_TaskMain.oea8f ...
    D:/EAGLE-II/tools/tms470_5_0_1/bin/armcl --gcc -O3 -D_INCLUDE_NIMU_CODE -DA8_COMP_TASKS=1 -c -qq -pdsw225 --endian=little -mv7A8 -O2 -g --opt_for_speed=5 --def
    ine=dm8146 --define=dm8148 --abi=eabi -eo.oea8f --symdebug:dwarf -Dfar= -D_DEBUG_=1 -DMULTICHANNEL_OPT=1 --neon -k --preproc_with_comment --preproc_with_compile
    -Dxdc_target_name__=A8F -Dxdc_target_types__=ti/targets/arm/elf/std.h -Dxdc_bld__profile_debug -Dxdc_bld__vers_1_0_4_3_3 -Dxdc_cfg__header__='D:/EAGLE-II/src/
    outfiles/avsk_app/obj/ti814x-evm/a8host/debug/avsk_app_configuro/package/cfg/CortexA8AppMain_pea8f.h' -DHAVE_ERRNO_H -DMSGLEVEL=1 -DHAVE_INTTYPES_H -DHAVE_NETM
    AIN_H -D_INCLUDE_NIMU_CODE -D_NDK_EXTERN_CONFIG -DTI_CAMERA_MODE -D_LOCAL_CORE_a8host_ -D_REMOTE_hdvpss_drivers_ -D_REMOTE_hdvpss_examples_utility_ -D_REMOT
    E_hdvpss_platform_ -D_REMOTE_hdvpss_i2c_ -D_REMOTE_hdvpss_devices_ -D_REMOTE_hdvpss_proxyServer_ -D_REMOTE_iss_drivers_ -D_REMOTE_iss_platform_ -D_REMOTE_iss_i2
    c_ -D_REMOTE_iss_devices_ -D_BUILD_hdvpss_drivers_ -D_BUILD_hdvpss_examples_utility_ -D_BUILD_hdvpss_platform_ -D_BUILD_hdvpss_i2c_ -D_BUILD_hdvpss_devices_ -D_
    BUILD_hdvpss_proxyServer_ -D_BUILD_iss_drivers_ -D_BUILD_iss_platform_ -D_BUILD_iss_i2c_ -D_BUILD_iss_devices_ -DPLATFORM_EVM_SI -DTI_814X_BUILD -DVPS_TRACE_
    ENABLE -DVPS_ASSERT_ENABLE -DLOGGER_ENABLE -ID:/EAGLE-II/tools/tms470_5_0_1/include -ID:/EAGLE-II/tools/tms470_5_0_1/include/rts -Icommon/inc -Icommon/inc/Drive
    r/ -Icommon/inc/Driver/GPIO -Icommon/inc/Driver/I2C -Icommon/inc/Driver/Timer -Icommon/inc/Driver/DMADRV -Icommon/inc/Driver/UART -Icommon/inc/RcBrg -Icommon/in
    c/Cli -Icommon/inc/Util -Ia8/inc -Ia8/inc/AlleyView -Ia8/inc/App/EagleApp -Ia8/inc/App/ModApp -Ia8/inc/App/RawDump -Ia8/inc/Driver -Ia8/inc/Framework -Ia8/inc/F
    ramework -Ia8/inc/Framework/CaliDataLoader -Ia8/inc/Framework/CliCmd -Ia8/inc/Framework/CmdDxp -Ia8/inc/Framework/watchdog -Ia8/inc/Framework/FatFs -Ia8/inc/Fra
    mework/ImageDump -Ia8/inc/Framework/RawEthernet -Ia8/inc/Framework/FileHandler -Ia8/inc/Framework/Thermal -Ia8/inc/Framework/ULC -Ia8/inc/Framework/VehicleMgr -
    Ia8/inc/Framework/AlarmMgr -Ia8/src/Framework/DMASample -Ia8/inc/Framework/Monitor -Ia8/inc/Util -Ia8/src/App/AVB/AVB_Talker -Ia8/src/App/AVB/AVBTP/IEC61883 -Ia
    8/src/App/AVB/AVBTP/IEEE1722/inc -Ia8/src/App/AVB/AVBTP/IEEE1722/inc/cpts -Ia8/src/App/AVB/PTP/src/Ti-814x-PTP/gptp -Ia8/inc/Framework/UDS -Ia8/inc/Framework/Et
    herConsole -Ia8/inc/Driver/DCAN -Ia8/inc/Framework/eagleK/ -ID:/EAGLE-II/avsdk_00_08_00_00/bios_6_34_02_18/packages -ID:/EAGLE-II/avsdk_00_08_00_00/xdctools_3_2
    4_03_33/packages -ID:/EAGLE-II/avsdk_00_08_00_00/pdk/hdvpss_01_00_01_42/packages -ID:/EAGLE-II/avsdk_00_08_00_00/ipc_1_25_00_04/packages -ID:/EAGLE-II/avsdk_00_
    08_00_00/edma3_lld_02_11_06_01/packages -ID:/EAGLE-II/avsdk_00_08_00_00/pdk/biospsp_03_10_06_00 -ID:/EAGLE-II/avsdk_00_08_00_00/ndk_2_21_01_38/packages -ID:/EAG
    LE-II/avsdk_00_08_00_00/pdk/nsp_dm814x_01_00_00_10/packages -ID:/EAGLE-II/avsdk_00_08_00_00/pdk/VisionISS_01_05_00_00/packages -fr=D:/EAGLE-II/src/outfiles/avsk
    _app/obj/ti814x-evm/a8host/debug -fs=D:/EAGLE-II/src/outfiles/avsk_app/obj/ti814x-evm/a8host/debug -o D:/EAGLE-II/src/outfiles/avsk_app/obj/ti814x-evm/a8host/de
    bug/Eagle_TaskMain.oea8f a8/src/AlleyView/Eagle_TaskMain.c
    >> WARNING: object file specified, but linking not enabled
    "a8/src/AlleyView/Eagle_TaskMain.c", line 70: warning: last line of file ends without a newline"

    Thank you very much,

    Joey from Altek

  • I cannot explain why you don't see the NEON instructions.  I filed SDSCM00050200 in the SDOWP system to have this investigated and explained.  If we are lucky, I missed something simple.  Feel free to follow this issue with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • The key bit seems to be the presence of the "while(1)" loop.  If I comment out the "while" line, or if I replace it with another for-loop, I get VMUL (and VLD and VST).

    I don't yet have a good explanation why the infinite loop perturbs things in a way that interferes with vectorisation.

  • Hi, pf:

    Yes, after I remove the while(1), I would successfully obtain the neon instruction assembly code for

    the test example code.

    I applied the the rule to my algorithm on cortex A8. Unfortunately, the performance does not see

    significant improvement.

    Regards, 

    Joey from Altek