This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to add NEON assembly and intrinsic code on DM8148

Hi all.

I'm planing to use neon assembly and intrinsic code to do extra work such as memory copy and color space conversion to YUV420 planar on DM8148 EZSDK 5.03.01.15. For the testing, I copied memcpy_neon.S(memcpy-neon.tar.gz) into OMX decoder(~/ti-ezsdk_dm814x-evm_5_03_01_15/component-sources/omx_05_02_00_30/examples/ti/omx/demos/decode) and made some change on makefile such adding source code and flag(-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp ) but I got a complain from compiler like below.

If I use mempcy_neon.S, I got a something like below.

~/ti-ezsdk_dm814x-evm_5_03_01_15/component-sources/omx_05_02_00_30/makerules/rules_a8.mk:76: target `src/memcpy_neon.S' doesn't match the target pattern

So, If I changed it to memcpy_neon.c, the result looks like below.

./src/memcpy_neon.c:25: error: expected identifier or '(' before '.' token
./src/memcpy_neon.c:34: error: stray '#' in program
./src/memcpy_neon.c:35:7: error: invalid suffix "f" on integer constant
./src/memcpy_neon.c:39: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'r2'
./src/memcpy_neon.c:39: error: stray '#' in program
./src/memcpy_neon.c:40:7: error: invalid suffix "f" on integer constant
./src/memcpy_neon.c:41: error: stray '#' in program
./src/memcpy_neon.c:42: error: stray '#' in program
./src/memcpy_neon.c:43: error: stray '#' in program
./src/memcpy_neon.c:44: error: stray '#' in program
./src/memcpy_neon.c:45: error: stray '#' in program
./src/memcpy_neon.c:46:7: error: invalid suffix "b" on integer constant
./src/memcpy_neon.c:47: error: stray '#' in program
...
Also, I found a one interesting thing is that the "rules_a8.mk" has a armv5t flag such as "CFLAGS_INTERNAL = -fPIC -fno-strict-aliasing -MD -MF $(DEPFILE).P -march=armv5t -Dfar= -D_DEBUG_=1 -DMULTICHANNEL_OPT=1". 
I think that it is supposed to be something like "march=armv7-a" but I'm not sure. If anyone has an idea or information regarding how to get it working, please let me know.
Regards,
SK
  • To build with Neon assembly code, you should use the following compiler flags:

    -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

    For best results, you should handcode Neon assembly routines. 

  • Hi Anand,

    Thank you for the response. I think that my question was not clear. Basically, I got neon module being compiled and working. What I have replaced is memcpy to neon memcpy which has been written in assembly code. Long story short, the assembly code is compiled on command line like arm-none-linux-gnueabi-gcc -I. -O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -c ./src/memcpy_neon.S then add the obejct file into Makefile like below.  I'm using OMX decode demo application.

    EXTLIBS_a8host = $(omx_LIBPATH)/../lib/omxcore.av5T \
    $(omx_LIBPATH)/../lib/memcfg.av5T \
    $(omx_LIBPATH)/../lib/domx.av5T \
    $(omx_LIBPATH)/../lib/domx_delegates_shmem.av5T \
    $(omx_LIBPATH)/../lib/timmosal.av5T \
    $(omx_LIBPATH)/../lib/omxcfg.av5T \
    $(osal_PATH)/packages/linuxdist/build/lib/osal.a \
    $(osal_PATH)/packages/linuxdist/cstubs/lib/cstubs.a \
    $(fc_PATH)/packages/ti/sdo/rcm/lib/debug/rcm_syslink.av5T \
    $(fc_PATH)/packages/ti/sdo/fc/memutils/lib/release/memutils.av5T \
    $(osal_PATH)/packages/ti/sdo/xdcruntime/linux/lib/debug/osal_linux_470.av5T \
    $(fc_PATH)/packages/ti/sdo/fc/global/lib/debug/fcsettings.av5T \
    $(syslink_PATH)/packages/ti/syslink/lib/syslink.a_debug \
    $(linuxutils_PATH)/packages/ti/sdo/linuxutils/cmem/lib/cmem.a470MV \
    $(uia_PATH)/packages/ti/uia/linux/lib/servicemgr.a \
    ./memcpy_neon.o
    If I added the compile flag( -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp) to makefile, it doesn't work. 
    
    
    Best Regards,
    SK.
  • Anand,

    Can I clarify something?  http://processors.wiki.ti.com/index.php/Cortex-A8 says that Neon auto vectorization compiler directives are:

        -march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -ffast-math -mfloat-abi=softfp

    But in your post, you do not specify "-mfpu=neon -ftree-vectorize".  Are these necessary?

    Also, is it important to specify -O3 for gcc toolchains?  In previous projects, we have tended to set things to -Os ...

    Thanks,

    Dan -