DSP: TMS320C6455
Board:customized
Frequency:1GHz
VLIB: vlib_c64Px_3_2_1_0
CCS:5.5
Compiler:7.4.14 DEBUG
Command: -mv64+ --abi=coffabi -O3 --include_path="C:/ti/ccsv5/tools/compiler/c6000_7.4.14/include" --include_path="C:/Users/Hardware/Desktop/TLD_C6455_20150809_1/inc" --define=c6455 --display_error_number --diag_warning=225 --diag_wrap=off
function:
int16_t *gradx = memalign(8,width*height*sizeof(int16_t));
assert(gradx != NULL);
int16_t *grady = memalign(8,width*height*sizeof(int16_t));
assert(grady != NULL);
uint8_t *previousImage = memalign(CACHE_L1P_LINESIZE,TLD_IMG_WIDTH * TLD_IMG_HEIGHT * sizeof(uint8_t));
assert(previousImage!=NULL);
int a = TSCL;
VLIB_xyGradients(previousImage, gradx + width + 1, grady + width + 1, width, height-1);
//width = 512 height = 384 pt = width * height = 196608
int b = TSCL - a;
Result: b = 2.755.536 cycles (14 cycles / pt) which is much slower than the result showed in Cycle benchmarks (avg:1 cycle / pt)
Is there anything wrong or missed ?
Best regards