Dear TI forum supporters:
To boost my algorithm performance on cortex A8, I've tried to generate neon instruction by referencing
other article on the forum such as this
http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/299516.aspx?pi199607=2
I tried to use the simple code example from the posted thread for testing purpose.
int a[200],b[200],c[200];
int i;
for (i = 0; i < 200; i++)
{
a[i]= b[i]=i+1;
}
for (i = 0; i < 200; i++)
{
c[i]= a[i] * b[i];
}
Still, I failed to generate the neon code for my target platform. From my CCS5.2.1 disassembly window, please
see my assembly code dump compared to the successful one.
My compiler option is as below:
CFLAGS_INTERNAL = -c -qq -pdsw225 --endian=$(ENDIAN) -mv7A8 -O2 -g --opt_for_speed=5 --define=dm8146 --define=dm8148 --abi=$(CSWITCH_FORMAT) -eo.$(OBJEXT) --symdebug:dwarf -Dfar= -D_DEBUG_=1 -DMULTICHANNEL_OPT=1 --neon -k
I wonder if my developing environment is different. Currently I have CCS installed on windows mainly for debug purpose. I compiled the code thru gmake on windows, and the compiler path is set to SDK tools/tms470_5_0_1.
Below is my platform info. Could someone please help walk me thru the neon generation process.
My platform info: TMS320DM8148 (Vision-Mid)
600-MHz ARM® Cortex™-A8 RISC MPU
500-MHz C674x™ VLIW DSP
200-MHz M3-ISS/M3-HDVPSS
BIOS: avsdk_00_08_00_00 (sys-bios)
Thanks in advance,
Joey from Altek