Hi,
I'm using Mistral EVM 8148 board for automotive following is the tool chain
av_bios_sdk_00_08_00_00
bios_6_34_02_18
ndk_2_21_01_38
ipc_1_25_00_04
I have distributed some of my image processing algorithm on ARM cortex-A8 core. following is the compiler option i have set -
-mv7A8 --code_state=32 --abi=eabi -me -O3 --opt_for_speed=3 --diag_warning=225 --display_error_number --neon
My algorithm takes a huge time to execute around 60sec on A8 whereas the same algorithm takes only 2sec on DSP. I have following questions -
1> How do i make sure that the A8 cache is enabled ? I have set the mmu correctly in the .cfg file as
var attri = {
type: Mmu.FirstLevelDesc_SECTION, // SECTION descriptor
bufferable: false, // bufferable
cacheable: true, // cacheable
accPerm:3, // read or write permissions
}; and
/* configure the PRIVATE_DATA_CORE_HOS as cacheable */
for (var i= 0x81100000; i < 0x83400000; i = i + 0x100000)
{
Mmu.setFirstLevelDescMeta(i, i, attri);
}
/* configure SHARED_FRAME_BUFFER */
for (var i= 0x88b00000; i < 0x8FF00000; i = i + 0x100000)
{
Mmu.setFirstLevelDescMeta(i, i, attri);
}
var Cache = xdc.useModule('ti.sysbios.family.arm.a8.Cache');
Cache.enableCache = true;
is this correct ?
2> In my code there are no Neon instruction generated, although i have provided --neon option. example
a loop
for(i = 0; i < 999; i++)
{
array[i] = i*i;
} is generated in assembly as
;* --------------------------------------------------------------------------*
;* BEGIN LOOP ||$C$L2||
;*
;* Loop source line : 132
;* Loop closing brace source line : 136
;* Loop Unroll Multiple : 3x
;* Known Minimum Trip Count : 333
;* Known Maximum Trip Count : 333
;* Known Max Trip Count Factor : 333
;* --------------------------------------------------------------------------*
||$C$L2||:
$C$DW$L$SetupMessageQueue$8$B:
.dwpsn file "F:/Vivek/Projects/DPD/SVN/trunk/Source Code/DPM/Embedded/a8/src/DPD_IPC.c",line 133,column 0,is_stmt,isa 2
;** -----------------------g9:
;** 134 ----------------------- *(U$37 += 3) = _smulbb(i, i);
;** 134 ----------------------- C$11 = i+1;
;** 134 ----------------------- U$37[1] = _smulbb(C$11, C$11);
;** 134 ----------------------- C$10 = i+2;
;** 134 ----------------------- U$37[2] = _smulbb(C$10, C$10);
;** 132 ----------------------- if ( (i += 3) < 999 ) goto g9;
;** 138 ----------------------- return s8Error;
SMULBB LR, V9, V9 ; [DPU_8_PIPE0] |134|
ADD A3, V9, #1 ; [DPU_8_PIPE1] |134|
ADD A4, V9, #2 ; [DPU_8_PIPE0] |134|
ADD V9, V9, #3 ; [DPU_8_PIPE1] |132|
SMULBB A3, A3, A3 ; [DPU_8_PIPE0] |134|
CMP A1, V9 ; [DPU_8_PIPE1] |132|
SMULBB A4, A4, A4 ; [DPU_8_PIPE0] |134|
STR LR, [A2, #12]! ; [DPU_8_PIPE0] |134|
STR A3, [A2, #4] ; [DPU_8_PIPE0] |134|
STR A4, [A2, #8] ; [DPU_8_PIPE0] |134|
without any vector instructions.
Please let me know whats wrong with the settings ?
Additional info
My output type is a8F (since m using NDK where there is no A8Fnv library)
compiler version is ARM 5.0.1
Library is rtsv7A8_T_le_n_v3_eabi.lib
Regards,
Vivek