Buffers from VFCC are 6 times slower than locally allocated buffers

Josh Watts

I have a simple OMX pipeline set up with the VFCC feeding CMUX which then calls my own function to process each video frame. The VFCC is able to keep up with the 45 frames per second my imager is producing, but I'm having difficulty accessing the data in the frame buffer at anywhere near that rate. My video frames are 16-bit YUV, 1280x960 (2467840 bytes), and it is currently taking around 77 ms to perform a fairly simple operation on every pixel in the frame:

src = (uint64_t*)be->pEncodeBuffer;
static uint16_t *shiftBuffer = NULL;
if (shiftBuffer == NULL) {
        shiftBuffer = (uint16_t*)malloc(1280*960*sizeof(uint16_t));
        memset(shiftBuffer, 0, 1280*960*sizeof(uint16_t));
}
dst = shiftBuffer;

for (x = 0; x < pMetaData->framesize; x += sizeof(uint64_t)) {
        temp_word = *src;
        *dst = ((temp_word>>12) & 0x0FFF0FFF);
        src++;
        dst++;
}

If instead of reading from the frame buffer, I read from a buffer locally allocated in the same way as shiftBuffer, it takes around 14 ms for the same block of code to execute:

static uint16_t *fakeBuffer = NULL;
if (fakeBuffer == NULL) {
        fakeBuffer = (uint16_t*)malloc(1280*960*sizeof(uint16_t));
}
src = (uint64_t*)fakeBuffer;

static uint16_t *shiftBuffer = NULL;
if (shiftBuffer == NULL) {
        shiftBuffer = (uint16_t*)malloc(1280*960*sizeof(uint16_t));
        memset(shiftBuffer, 0, 1280*960*sizeof(uint16_t));
}
dst = shiftBuffer;

for (x = 0; x < pMetaData->framesize; x += sizeof(uint64_t)) {
        temp_word = *src;
        *dst = ((temp_word>>12) & 0x0FFF0FFF);
        src++;
        dst++;
}

Why is this happening? Is it because the framebuffers were allocated by one of the M3 cores

over 14 years ago

0 RV over 14 years ago

Expert 2030 points

Buffers from video devices and cmem are not usually cached on local processor. So cache is disabled, which is fine, if you do the video processing in DSP using DMA.

I would check to see if this is the case.

0 Josh Watts over 14 years ago in reply to RV

Intellectual 945 points

Ah, that certainly would explain it. My next task is indeed to forward the framebuffer pointer over to the DSP for further processing anyways. Thanks for the tip!

Processors

Processors forum

Buffers from VFCC are 6 times slower than locally allocated buffers