I used GenCodecPkg to create an algorithm to run on the DSP of a DaVinci DM6467T, using the IUNIVERSAL interface defined by XDM. The algorithm is a no-frills video deinterlacer, using linear interpolation. It throws away the odd lines, and regenerates them as the average of the pixel above and below. I use an inout buffer, and the deinterlacing algorithm operates in place on the buffer. I have modified the encode example application provided with DVSDK 3.10 to call the algorithm. The encode example has four threads: main, capture, video, and write.
The algorithm compiles and runs, but exhibits strange behavior. The top 3/4 of the video is deinterlaced properly, but the the bottom 1/4 is intermittent. Some lines are deinterlaced correctly. Other lines are left untouched. Some lines are partially untouched, and partially deinterlaced. It is worth noting that when a line is partially deinterlaced, the blocks of successfully deinterlaced data are always in multiples of 128 bytes. In other words, only a select few 128 block bytes of data are successfully copied back to the buffer during the last 25% of iterations of the primary for loop in the algorithm.
When using a simple line doubling algorithm, which performs fewer memory accesses, fewer operations, and runs faster, the full video is deinterlaced properly. To dig further, I reversed the order of the linear interpolation deinterlacer, so that it operates beginning at the bottom of the video, making its way to the top. As suspected, the intermittent failed lines occur at the top, further supporting the idea that the issue occurs at some time delay after the algorithm begins.
Any ideas?
Tool Versions:
DVSDK 3.10.00.19
Code Gen Tools 7.3.0
CCS 5.0.3.00028 (Linux)
The setup and calling of the algorithm in the application (video.c from encode example)
//Setup input and output args
genericInArgs.base.size = sizeof(IGENERIC_InArgs);
genericInArgs.input_foo = envp->deinterlaceAlgorithm;
genericOutArgs.base.size = sizeof(IGENERIC_OutArgs);
genericInOutBufs.numBufs = 1;
genericInOutBufs.descs[0].bufSize = 691200; //518400 691200
genericInOutBufs.descs[0].buf = (XDAS_Int8 *)Buffer_getUserPtr(hCapBuf);
UNIVERSAL_process(hGeneric, NULL, NULL, &genericInOutBufs, (IUNIVERSAL_InArgs *)&genericInArgs, (IUNIVERSAL_OutArgs *)&genericOutArgs);
The Process function of the algorithm running on the DSP
/*
* ======== GENERIC_GR_process ========
*/
/* ARGSUSED - this line tells the TI compiler not to warn about unused args. */
XDAS_Int32 GENERIC_GR_process(IUNIVERSAL_Handle h,
XDM1_BufDesc *inBufs, XDM1_BufDesc *outBufs, XDM1_BufDesc *inOutBufs,
IUNIVERSAL_InArgs *universalInArgs,
IUNIVERSAL_OutArgs *universalOutArgs)
{
/* Local casted variables to ease operating on our extended fields */
IGENERIC_InArgs *inArgs = (IGENERIC_InArgs *)universalInArgs;
IGENERIC_OutArgs *outArgs = (IGENERIC_OutArgs *)universalOutArgs;
//Get pointer to ioBuf
unsigned char * ioBuf = (unsigned char *)(inOutBufs->descs[0].buf);
int i = 0;
int j = 0;
int deinterlace_algorithm;
deinterlace_algorithm = inArgs->input_foo;
/*
* Note that the rest of this function will be algorithm-specific. In
* the initial generated implementation, this process() function simply
* copies the first inBuf to the first outBuf. But you should modify
* this to suit your algorithm's needs.
*/
outArgs->output_foo = 0;
//Line doubling
if(deinterlace_algorithm == 0)
{
for(i = 0; i < 960 ; i += 2)
//for(i = 0; i < (GR_IMAGE_HEIGHT/2) ; i ++)
{
custom_memcpy(&ioBuf[(i+1)*720], &ioBuf[i*720], 720);
outArgs->output_foo ++; //inArgs->input_foo * 5;
}
}
//Linear interpolation
else if(deinterlace_algorithm == 1)
{
for(i = 476; i >= 0; i -= 2)
{
for(j = 0; j < 720; j ++)
{
ioBuf[((i+1)*720) + j] = (ioBuf[((i+0)*720) + j] + ioBuf[((i+2)*720) + j]) / 2;
}
}
}
return (IUNIVERSAL_EOK);
}