MAC operation with VICP

Abhishek Singh Sisodia

Other Parts Discussed in Thread: CCSTUDIO

I am working on DM648. In my application I am doing MAC operation. Due to performance issues i want to do this task with VICP. I am using VICP version 3.2.0. It seems there is no MAC operation in this version. I am surprized to notice this fact. After some analysis i found that we can perform this( a0*b0 + a1*b1 where a0 and a1 are matrices and b0 and b1 are constants) operation using 2 API's namely CPIS_arrayScalarOp() and CPIS_matMul() . I have to do 2 i/p DMA operations and 2 DMA o/p operations. It seems that it is quite difficult to schedule and perform this task. Moreover DMA overhead is also seems to be huge since my operational buffres are 1920*1080.

My questions is " Is this advantageous to do such computation with VICP?" Though we can offload the DSP but i have nothing much to compute on DSP in parallel..

Moreover do TI have such a implementation with VICP API. Please let me know since it will save lot of effort and will help me to know how much VICP is advantageous in my implementation.

Thanks and Regards

Abhishek Singh

over 15 years ago

0 Gagan Maur over 15 years ago

TI__Expert 8150 points

Can you please describe your processing? Have your considered CPIS_filter?

> Moreover DMA overhead is also seems to be huge since my operational buffres are 1920*1080.
You can chain the APIs so that both the processing steps are applied back to back and the frame doesn't have to be refetched. Such capability is enabled by usage of underlying libs (computation lib and scheduling lib) as described in the document "VICP Computation Unit Library and VICP

Scheduling Unit Library for DM6446, DM6441, DM647, and DM648" here: http://focus.ti.com/docs/toolsw/folders/print/sprc831.html

> Though we can offload the DSP but i have nothing much to compute on DSP in parallel..
Can you split the frame? Have half the frame be processed by DSP and the other half by VICP?

Regards,
Gagan

0 Abhishek Singh Sisodia over 15 years ago in reply to Gagan Maur

Intellectual 535 points

Hi Gagan

Thanks for your reply. Actually I have to perform bilinear interpolation. I am doing this by performing vertical and then horizontal interpolation. I doubt if I can use any of the given API's for interpolation. So I decided to implement vertical and horizontal interpolation each seperately using VICP. For vertical interpolation my operations are like this:-

row1 - array of first row elements.

row 2- array of second row elements.

weights1 - array of weights for first row.

weights2 - array of weights for second row.

so the operation is output_array = row1*weights1 + row2*weights2

This output_array i will use further for horizontal interpolation.

Let me know how can i do this task.

One more thing to ask, in case of alpha blending the required input is 422_ILE and my input is 422 planar. So i need to call YCbCr pack and YCbCr unpack Api's before and after alpha blending. Correct me if I am wrong or in case if there is another way out.

Thanks and Regards

0 Gagan Maur over 15 years ago in reply to Abhishek Singh Sisodia

TI__Expert 8150 points

You will need to use the array_op APIs. There are two things you can do. First, use the CPIS APIs CPIS_arrayOp or CPIS_arrayScalarOp depending on if weight1 and weigh2 are arrays or scalars. This method is easy but will have IO overhead as you understand. The other option is to work at the level of computation unit library and chain the array Operations together. Basically (pseudo code):

imxenc_array_op (row1 * weight1);
imxenc_array_op (row2 * weight2);
imxenc_array_op (row1 + row2);

You can refer examples here: C:\CCStudio_v3.3\c64plus\vicplib\src\src_hw
Actually, look at C:\CCStudio_v3.3\c64plus\vicplib\src\src_hw\_alphablend.c that has example of chaining array_op APIs. This method is more efficient, has minimal IO overheard but is little bit more work.

> One more thing to ask, in case of alpha blending the required input is 422_ILE and my input is 422 planar. So i need to call YCbCr pack and YCbCr unpack Api's before and after alpha blending. Correct me if I am wrong or in case if there is another way out.

Yes the above is what you have to do. OR you can even apply alpha blending using multiple arrapOP APIs..

Regards,
Gagan

0 Abhishek Singh Sisodia over 15 years ago in reply to Gagan Maur

Intellectual 535 points

Thanks for the help Gagan.

But there are some more issues on which i want some clarification. Here are they:-

. To implement the alpha blending I have planned to modify CIPS_arrayOp() VICP call. The reason is that in this API call I need to give two input buffer pointers so the two input DMA's are happening. Thus I can simply edit _CPIS_setArrayOpProcessing() function and will do these operations:-

imxenc_array_scalar_op() //for foreground

imxenc_array_scalar_op() //for background

imxenc_array_op() // foreground + background

Now the problem is

1. Do i Need to use local scratch memory (coefficient memory) to hold intermediate results of the two imxenc_array_scalar_op() output??

2. Can I reuse the same input buffer i.e source buffer as the destination buffer too.?? If yes then what is the cost in terms of performance and how it will effect the maximum processing block size( will reduce to half ??).??.

3. And if I customize and add the changed files into my project will it overide the current library implementation??(I guess it will)

I hope my choice of customizing the CIPS_arrayOp() for alpha blending for planar format is good and easier.( please comment)

Reply awaited.

Regards

Abhishek Singh

0 Gagan Maur over 15 years ago in reply to Abhishek Singh Sisodia

TI__Expert 8150 points

> To implement the alpha blending I have planned to modify CIPS_arrayOp() VICP call.

Sounds good.

> 2. Can I reuse the same input buffer i.e source buffer as the destination buffer too.?? If yes then what is the cost in terms of performance and how it will effect the maximum processing block size( will reduce to half ??).??.

Yes. I think that is what you should do. Use the source buffer as the destination buffer for the first 2 APIs (imxenc_array_scalar_op) and use one of the source buffers as the destination buffer for the 3rd API (imxenc_array_op). No performance hit for doing that.
The block size will not double. You need to bring in both bufferA and bufferB to memory just like what is being done for CPIS_arrayOP API. Note, CPIS_arrayOP API also overlaps the destination with one of the source buffers. Thus, the IMGBUF used is basically for the two input arrays

> 3. And if I customize and add the changed files into my project will it overide the current library implementation??(I guess it will)

You can do that or you can add the function with a different name. If you give different name, you will need to update below header files to keep the same interface:
C:\CCStudio_v3.3\c64plus\vicplib\src\inc\_vicplib.h
C:\CCStudio_v3.3\c64plus\vicplib\inc\vicplib.h

> I hope my choice of customizing the CIPS_arrayOp() for alpha blending for planar format is good and easier.( please comment)

Yes it is.

Gagan

0 Abhishek Singh Sisodia over 15 years ago in reply to Gagan Maur

Intellectual 535 points

Hi gagan

I have done it successfully.

Thanks for great help.

Regards

Abhishek Singh

Processors

Processors forum

MAC operation with VICP