TDA4VM: Custom kernel using MMALIB - C7x - Memory issue

Vineeth CN

Part Number: TDA4VM

Hello TI,

We are trying to create a custom kernel for image rotation using MMALIB_linalg function. The sdk version we are using is 08_02_00_05 and the MMALIB version we are using is 02_03_00_04.

To achieve rotation, first the image matrix is transposed and then multiplied with a flipped identity matrix. This can achieve 90-degree rotations.

We started working from vision_apps/apps/basic_demos/app_c7x_kernel as a base and built our code from there onward. The edited custom code for target kernel is attached below for your reference.

We are facing an error while running this app in EVM. Its memory related issue.

We were getting compilation error when we configure "L2dmemory" to "L2RAM_C7x_1". So we configured "L2dmemory" to "DDR_C7x_1" in "ti-processor-sdk-rtos-j721e-evm-08_02_00_05_lin_evm/vision_apps/platform/j721e/rtos/c7x_1/j721e_linker_freertos.cmd" file. Below screenshot for your reference.

Now the compilation is fine and getting run time error ie, application is getting stuck in the middle and getting garbage values.

Suppose my image size is 512x512 I need to create one flipped identity matrix with size of 512x512.

Currently I am hardcoding the values for testing so if my array size is more than 512x200 (>100kb) am reading garbage values for the other parameters in the code. Below screenshot with array size of 320x320 it works fine but still execution stuck.

below screenshot with array size of 512x300. Am getting garbage value.

app_c7x_target_kernel_img_rot.cpp

I have attached the cpp file for your reference. Please look into the "static vx_status rot_img_pipeline_blocks" function. Line number 440.

The execution stuck at line number 564.

Please let me know if you need any other information.

over 2 years ago

0 William Leven over 2 years ago

TI__Intellectual 2410 points

Hello Vineeth,

There is an issue with l2dmemory and the matrixTranspose function that have been fixed in the latest release. Please see https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1176001/tda4vm-error-with-c7x-namespace-while-building-vision-apps-integrated-with-mmalib

Please note that defining the l2dmemory section does not work; the only known fix is to upgrade per the post above.

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Thank you very much William. I have downloaded latest SDK (ti-processor-sdk-rtos-j721e-evm-08_05_00_11) and this resolve the issue with .l2dmemory.

But I have some other issues with MMA transpose library. As informed at the beginning, We started working from "vision_apps/apps/basic_demos/app_c7x_kernel" and "/ti-processor-sdk-rtos-j721e-evm-08_05_00_11/mmalib_02_05_00_07/ti/mmalib/src/linalg_c7xmma/MMALIB_LINALG_matrixTranspose_ixX_oxXas" a base and built our code from there onward.

The edited custom code for target kernel (app_c7x_kernel_transpose), Log file(Log.txt), Input (img_1.bmp) and output (app_c7x_out_img.bmp) images(after running custom kernel code) are attached for your reference.

Fullscreen 2806.Log.txt Download

root@j7-evm:/opt/vision_apps# ./run_app_c7x_transpose.sh 
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=4) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
   823.857228 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
   823.865227 s:  VX_ZONE_INIT:Enabled
   823.865256 s:  VX_ZONE_ERROR:Enabled
   823.865262 s:  VX_ZONE_WARNING:Enabled
   823.866242 s:  VX_ZONE_INIT:[tivxInitLocal:145] Initialization Done !!!
   823.867401 s:  VX_ZONE_INIT:[tivxHostInitLocal:93] Initialization Done for HOST !!!
VCN: Into kernel register function 
VCN: into kernel validate function 
[C7x_1 ]    823.871955 s: VCN: In transpose create function!!
[C7x_1 ]    823.871990 s: In height: 320        , in width: 180 , In stride: 320
[C7x_1 ]    823.872007 s: BLK width: 320
[C7x_1 ]    823.872031 s: Available size: 385875968     Inside Create Function
[C7x_1 ]    823.872067 s: Iteration: 1   Block Height: 1         MBlock Height: 2        remaining height: 0     total size: 2560
[C7x_1 ]    823.872107 s: Iteration: 2   Block Height: 2         MBlock Height: 4        remaining height: 0     total size: 5120
[C7x_1 ]    823.872147 s: Iteration: 3   Block Height: 4         MBlock Height: 8        remaining height: 4     total size: 10240
[C7x_1 ]    823.872166 s: Num Sets: 90
 Loading [/opt/vision_apps/test_data/psdkra/app_c7x/img_1.bmp] ...
 Running graph ...
[C7x_1 ]    823.914125 s: VCN: In pipline function!!
[C7x_1 ]    823.914159 s: ----------------------------------------------------------------------------------------------------------------------
[C7x_1 ]    823.914194 s:                MMALIB_LINALG_matrixTranspose_ixX_oxX testing starts.
[C7x_1 ]    823.914224 s: ----------------------------------------------------------------------------------------------------------------------
[C7x_1 ]    823.914265 s: | No  | ID  | Status | Num pt  | Kernel Init   | Kernel Compute  | NatC Compute  | Arch. Compute | Efficiency  | Est.n
[C7x_1 ]    823.914304 s: |     |     |        |         |  cyc          |  cyc            |  cyc          | cyc (est.)    | vs Arch.( | cyc (e 
[C7x_1 ]    823.914344 s: ----------------------------------------------------------------------------------------------------------------------
[C7x_1 ]    823.914377 s: VCN: Image pointer setup 
[C7x_1 ]    823.914398 s: VCN: kernel processing parameters setup done !!
[C7x_1 ]    823.914415 s: VCN: In loop - 0 
[C7x_1 ]    823.914430 s: VCN: In loop - 1 
[C7x_1 ]    824.029278 s: VCN: In loop - 2 
[C7x_1 ]    824.144144 s: VCN: In loop - 3 
[C7x_1 ]    824.258983 s: VCN: In loop - 4 
[C7x_1 ]    824.373841 s: VCN: In loop - 5 
[C7x_1 ]    824.488627 s: VCN: In loop - 6 
[C7x_1 ]    824.603396 s: VCN: In loop - 7 
[C7x_1 ]    824.718151 s: VCN: In loop - 8 
[C7x_1 ]    824.833031 s: VCN: In loop - 9 
[C7x_1 ]    824.947841 s: VCN: In loop - 10 
[C7x_1 ]    825.062714 s: VCN: In loop - 11 
[C7x_1 ]    825.177572 s: VCN: In loop - 12 
[C7x_1 ]    825.292396 s: VCN: In loop - 13 
[C7x_1 ]    825.407341 s: VCN: In loop - 14 
[C7x_1 ]    825.522173 s: VCN: In loop - 15 
[C7x_1 ]    825.637035 s: VCN: In loop - 16 
[C7x_1 ]    825.751935 s: VCN: In loop - 17 
[C7x_1 ]    825.866784 s: VCN: In loop - 18 
[C7x_1 ]    825.981569 s: VCN: In loop - 19 
[C7x_1 ]    826.096324 s: VCN: In loop - 20 
[C7x_1 ]    826.211294 s: VCN: In loop - 21 
[C7x_1 ]    826.326156 s: VCN: In loop - 22 
[C7x_1 ]    826.441136 s: VCN: In loop - 23 
[C7x_1 ]    826.555979 s: VCN: In loop - 24 
[C7x_1 ]    826.670761 s: VCN: In loop - 25 
[C7x_1 ]    826.785620 s: VCN: In loop - 26 
[C7x_1 ]    826.900426 s: VCN: In loop - 27 
[C7x_1 ]    827.015299 s: VCN: In loop - 28 
[C7x_1 ]    827.130141 s: VCN: In loop - 29 
[C7x_1 ]    827.244974 s: VCN: In loop - 30 
[C7x_1 ]    827.359840 s: VCN: In loop - 31 
[C7x_1 ]    827.474688 s: VCN: In loop - 32 
[C7x_1 ]    827.597046 s: VCN: In loop - 33 
[C7x_1 ]    827.711933 s: VCN: In loop - 34 
[C7x_1 ]    827.826845 s: VCN: In loop - 35 
[C7x_1 ]    827.941701 s: VCN: In loop - 36 
[C7x_1 ]    828.056518 s: VCN: In loop - 37 
[C7x_1 ]    828.171396 s: VCN: In loop - 38 
[C7x_1 ]    828.286217 s: VCN: In loop - 39 
[C7x_1 ]    828.401119 s: VCN: In loop - 40 
[C7x_1 ]    828.515867 s: VCN: In loop - 41 
[C7x_1 ]    828.630650 s: VCN: In loop - 42 
[C7x_1 ]    828.745442 s: VCN: In loop - 43 
[C7x_1 ]    828.860271 s: VCN: In loop - 44 
[C7x_1 ]    828.975184 s: VCN: In loop - 45 
[C7x_1 ]    829.089942 s: VCN: In loop - 46 
[C7x_1 ]    829.204789 s: VCN: In loop - 47 
[C7x_1 ]    829.319599 s: VCN: In loop - 48 
[C7x_1 ]    829.434377 s: VCN: In loop - 49 
[C7x_1 ]    829.549259 s: VCN: In loop - 50 
[C7x_1 ]    829.664034 s: VCN: In loop - 51 
[C7x_1 ]    829.778892 s: VCN: In loop - 52 
[C7x_1 ]    829.893744 s: VCN: In loop - 53 
[C7x_1 ]    830.008548 s: VCN: In loop - 54 
[C7x_1 ]    830.123408 s: VCN: In loop - 55 
[C7x_1 ]    830.238257 s: VCN: In loop - 56 
[C7x_1 ]    830.353117 s: VCN: In loop - 57 
[C7x_1 ]    830.467965 s: VCN: In loop - 58 
[C7x_1 ]    830.582813 s: VCN: In loop - 59 
[C7x_1 ]    830.697604 s: VCN: In loop - 60 
[C7x_1 ]    830.812468 s: VCN: In loop - 61 
[C7x_1 ]    830.927284 s: VCN: In loop - 62 
[C7x_1 ]    831.042212 s: VCN: In loop - 63 
[C7x_1 ]    831.157053 s: VCN: In loop - 64 
[C7x_1 ]    831.271928 s: VCN: In loop - 65 
[C7x_1 ]    831.386817 s: VCN: In loop - 66 
[C7x_1 ]    831.501647 s: VCN: In loop - 67 
[C7x_1 ]    831.616469 s: VCN: In loop - 68 
[C7x_1 ]    831.731354 s: VCN: In loop - 69 
[C7x_1 ]    831.846411 s: VCN: In loop - 70 
[C7x_1 ]    831.961257 s: VCN: In loop - 71 
[C7x_1 ]    832.076113 s: VCN: In loop - 72 
[C7x_1 ]    832.190946 s: VCN: In loop - 73 
[C7x_1 ]    832.305823 s: VCN: In loop - 74 
[C7x_1 ]    832.420696 s: VCN: In loop - 75 
[C7x_1 ]    832.535553 s: VCN: In loop - 76 
[C7x_1 ]    832.650338 s: VCN: In loop - 77 
[C7x_1 ]    832.765214 s: VCN: In loop - 78 
[C7x_1 ]    832.880047 s: VCN: In loop - 79 
[C7x_1 ]    832.994885 s: VCN: In loop - 80 
[C7x_1 ]    833.109733 s: VCN: In loop - 81 
[C7x_1 ]    833.224534 s: VCN: In loop - 82 
[C7x_1 ]    833.339396 s: VCN: In loop - 83 
[C7x_1 ]    833.454244 s: VCN: In loop - 84 
[C7x_1 ]    833.569086 s: VCN: In loop - 85 
[C7x_1 ]    833.683893 s: VCN: In loop - 86 
[C7x_1 ]    833.798782 s: VCN: In loop - 87 
[C7x_1 ]    833.913545 s: VCN: In loop - 88 
[C7x_1 ]    834.028401 s: VCN: In loop - 89 
[C7x_1 ]    834.143252 s: VCN: In loop - 90 
 Saving [app_c7x_out_img.bmp] ...
 Done !!!
   834.261086 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:107] De-Initialization Done for HOST !!!
[C7x_1 ]    834.258093 s: VCN: In loop - 91 
[C7x_1 ]    834.258115 s: VCN: In loop - 92 
[C7x_1 ]    834.258130 s: VCN: In loop - 93 
[C7x_1 ]    834.258148 s: UDMA : ERROR: TR Response not completed!!
[C7x_1 ]    834.258163 s: UDMA : ERROR: TR Response not completed!!
   834.265453 s:  VX_ZONE_INIT:[tivxDeInitLocal:223] De-Initialization Done !!!
APP: Deinit ... !!!
REMOTE_SERVICE: Deinit ... !!!
REMOTE_SERVICE: Deinit ... Done !!!
IPC: Deinit ... !!!
IPC: DeInit ... Done !!!
MEM: Deinit ... !!!
DDR_SHARED_MEM: Alloc's: 5 alloc's of 408458 bytes 
DDR_SHARED_MEM: Free's : 5 free's  of 408458 bytes 
DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 
DDR_SHARED_MEM: Total size: 536870912 bytes 
MEM: Deinit ... Done !!!
APP: Deinit ... Done !!!
root@j7-evm:/opt/vision_apps#

I have some modifications in other files. Below are the details.

1) I did modify the vision_apps/platform/j721e/rtos/concerto_c7x_inc.mak as below (added these code).

LDIRS += $(VISION_APPS_PATH)/../mmalib_02_05_00_07/lib/C7100/debug
$(info $$LDIRS is [${LDIRS}])
#STATIC_LIBS += common_$(TARGET_CPU)00
STATIC_LIBS += common_C7100
#STATIC_LIBS += test_$(TARGET_CPU)00
STATIC_LIBS += test_C7100
STATIC_LIBS += mmalib_C7100
STATIC_LIBS += mmalib_cn_C7100
STATIC_LIBS += vx_app_c7x_target_kernel_rot

2) commented out TI_cache_inval() in TI_profile.h

3) Added below line of code in

vision_apps/platform/j721e/rtos/common/app_init.c >>

        appInit >>

            appRegisterOpenVXTargetKernels >>

 #ifdef C71
 {
 void app_c7x_target_kernel_img_add_register(void);

 app_c7x_target_kernel_img_add_register();

 //void app_c7x_target_kernel_img_rot_register(void); 

 //app_c7x_target_kernel_img_rot_register(); 

 void app_c7x_target_kernel_img_transpose_register(void); 

 app_c7x_target_kernel_img_transpose_register(); 

 }

4) copyed c7100 mma libs to vision app

from "/ti-processor-sdk-rtos-j721e-evm-08_05_00_11/mmalib_02_05_00_07/lib/C7100/debug"

to "/ti-processor-sdk-rtos-j721e-evm-08_05_00_11/vision_apps/out/J7/C71/FREERTOS/debug"

I am not getting output transposed image as expected.

Please let me know if you need any other information. Kindly help me to resolve this issue.

Thanks and Regards,

Vineeth

app_c7x_kernel_transpose.zip

+1 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Hi Vineeth,

Let's start by checking the inputs to MMALIB_LINALG_matrixTranpose.

1. Would you share the fields and values of Transpose_InitArgs at this line:

     MMALIB_STATUS status_init = MMALIB_LINALG_matrixTranspose_ixX_oxX_init_checkParams(

2. Would you share the value of status_init as a result of the above line.

3. Would you share the value of status_init as a result of this line:

  status_init = MMALIB_LINALG_matrixTranspose_ixX_oxX_init(TransposeKernelHandle,

4. To make sure the correct data is getting to the matrixTranspose function, would you print/check the values of the inputpSrc0L2[ping_npong] just prior to:

status_opt = MMALIB_LINALG_matrixTranspose_ixX_oxX_exec(TransposeKernelHandle,
                                                                        pSrc0L2[ping_npong],
                                                                        pOutTranspose);

5. It's possible matrixTranspose has correct inputs and is generating the correct output, but the DMA of the results is not good. To begin investigating this, please print the result just after the above line.

With these steps we should be able to determine if the issue is with the DMA of the data coming in, the parameter settings for matrixTranspose or the DMA of the results back out.

-Will

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Hi William,

Thank you very much for your valuable inputs. Most of the issues I resolved now, am getting data in "inputpSrc0L2" variable (input to the matrixTranspose).

But now I have facing other issue. With hardcoded values I am trying to test the matrixTranspose. I have observed that matrixTranspose is working fine only with square matrix (for ex: 3x3 matrix). In my test I have given 4x3 matrix but the output I am getting is only 3x3 and the last row in this case became filled with zeros. Below screenshot for your reference.

The actual image I am trying to transpose is 320x180 so due to the above issue my transposed array is filled with zeros in some part. Attached is the code for your reference. Please let me know if i am doing anything wrong.

Highly appreciate if you could help me quickly.

app_c7x_target_kernel_img_transpose.cpp

Thanks and regards,

Vineeth

0 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Vineeth,

Looks like you've made great progress and are almost there. One quick fix and I think you'll have the hardcoded matrix transposing. The current code you sent me has

   vxlib_src01.data_type=MMALIB_INT8;
   vxlib_src01.dim_x= 4;
   vxlib_src01.dim_y= 3;//prms->blkHeight;
   vxlib_src01.stride_y= 3;//prms->blkHeight;
   printf("VCN: Input data blkWidth valus is %d \n",vxlib_src01.dim_x);
   printf("VCN: Input data blkHeight and stride value is %d \n",vxlib_src01.dim_y);

   vxlib_dst1.data_type=MMALIB_INT8;
   vxlib_dst1.dim_x= 3;//prms->blkHeight;
   vxlib_dst1.dim_y= 4;
   vxlib_dst1.stride_y= 4;

The definition of the "stride_y" field of MMALIB_bufParams2D_t (i.e. vxlib_src01.stride_y and vxliv_dst1.stride_y) is the number of bytes from the beginning of one row to the beginning of the next. Assuming there are no gaps in your data, this would mean that vxlib_src01.stride_y == vxlib_src01.dim_x (i.e. the width of the matrix). Your current code is instead using the height (vxlib_src01.dim_y) of the matrix for this parameter.

If you change this bit of code to this, it should work...

 vxlib_src01.data_type=MMALIB_INT8;
   vxlib_src01.dim_x= 4;
   vxlib_src01.dim_y= 3;//prms->blkHeight;
   vxlib_src01.stride_y= 4;//prms->blkWidth;  CHANGED THIS...
   printf("VCN: Input data blkWidth value is %d \n",vxlib_src01.dim_x);
   printf("VCN: Input data blkHeight and stride value is %d \n",vxlib_src01.dim_y);

   vxlib_dst1.data_type=MMALIB_INT8;
   vxlib_dst1.dim_x= 3;//prms->blkHeight;
   vxlib_dst1.dim_y= 4;
   vxlib_dst1.stride_y= 3; // ...AND CHANGED THIS TOO

-Will

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Hi William,

After doing the changes as you mentioned, am not getting the o/p as expected. Attached are the o/p and code changed screenshot for your reference.

Thanks and regards,

Vineeth

+1 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Looks like I missed that your input matrix has 4 rows and 3 columns, which means that vxlib_src01.dim_x = 3 and vxlib_src01.dim_y = 4 and vxlib_src01.stride_y = 3, and vice versa for vxlib_dst1. Hopefully swapping those all around will solve the issue.

dim_x -> columns

dim_y -> rows

stride_Y -> number of bytes from start of 1 row to start of the next

-Will

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Hi Willam,

Am still getting wrong transposed output. Attached the code and output screenshot.

Thanks and Regards,

Vineeth

0 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

In this case (and normally) stride_y should be equal to dim_x. When you swapped dim_x and dim_y, stride_y also needed to change.

+1 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Thank you so much William. Am getting correct output now. I will work on my actual image matrix now. Appreciate your quick help.

Best Regards,

Vineeth

0 Vineeth CN over 2 years ago in reply to Vineeth CN

Prodigy 130 points

Hello William,

Need one more help. My image transpose is working fine with grayscale images. Now I want to do the same in RGB image. I am using RGB interleaved image format. Could you please help me with some tips or thing to be taken care for RGB image transpose.

Code is attached for your reference.

Thanks

/*
*
* Copyright (c) 2017 Texas Instruments Incorporated
*
* All rights reserved not granted herein.
*
* Limited License.
*
* Texas Instruments Incorporated grants a world-wide, royalty-free, non-exclusive
* license under copyrights and patents it now or hereafter owns or controls to make,
* have made, use, import, offer to sell and sell ("Utilize") this software subject to the
* terms herein.  With respect to the foregoing patent license, such license is granted
* solely to the extent that any such patent is necessary to Utilize the software alone.
* The patent license shall not apply to any combinations which include this software,
* other than combinations with devices manufactured by or for TI ("TI Devices").
* No hardware patent is licensed hereunder.
*
* Redistributions must preserve existing copyright notices and reproduce this license
* (including the above copyright notice and the disclaimer and (if applicable) source
* code license limitations below) in the documentation and/or other materials provided
* with the distribution
*
* Redistribution and use in binary form, without modification, are permitted provided
* that the following conditions are met:
*
* *       No reverse engineering, decompilation, or disassembly of this software is
* permitted with respect to any software provided in binary form.
*
* *       any redistribution and use are licensed by TI for use only with TI Devices.
*
* *       Nothing shall obligate TI to provide you with source code for the software
* licensed and provided to you in object code.
*
* If software source code is provided to you, modification and redistribution of the
* source code are permitted provided that the following conditions are met:
*
* *       any redistribution and use of the source code, including any resulting derivative
* works, are licensed by TI for use only with TI Devices.
*
* *       any redistribution and use of any object code compiled from the source code
* and any resulting derivative works, are licensed by TI for use only with TI Devices.
*
* Neither the name of Texas Instruments Incorporated nor the names of its suppliers
*
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* DISCLAIMER.
*
* THIS SOFTWARE IS PROVIDED BY TI AND TI'S LICENSORS "AS IS" AND ANY EXPRESS
* OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL TI AND TI'S LICENSORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/

/**
 * \file app_c7x_target_kernel_img_add.c Target Kernel implementation for Phase to RGB conversion function
 *
 *  This file shows a sample implementation of a target kernel function.
 *
 *  To implement a target kernel the below top level interface functions are implemented
 *  - vxTutorialAddTargetKernelPhaseRgb() : Registers target kernel to TIOVX target framework
 *  - vxTutorialRemoveTargetKernelPhaseRgb() : Un-Registers target kernel from TIOVX target framework
 *
 *  When registering a target kernel, the following callback function are implemented and registered with the TIOVX framework
 *  - app_c7x_target_kernel_img_add() : kernel execute/run function
 *  - app_c7x_target_kernel_img_addCreate() : kernel init function
 *  - app_c7x_target_kernel_img_addDelete() : kernel deinit function
 *  - app_c7x_target_kernel_img_addControl(): kernel control function
 *
 *  When working with target kernel
 *  - vxTutorialAddTargetKernelPhaseRgb() MUST be called during TIOVX target framework system init
 *     - This is done by using function tivxRegisterTutorialTargetKernels() in \ref vx_tutorial_target_kernel.c
 *  - vxTutorialRemoveTargetKernelPhaseRgb() MUST be called during TIOVX target framework system deinit
 *     - This is done by using function tivxUnRegisterTutorialTargetKernels() in \ref vx_tutorial_target_kernel.c
 *
 *  When registering a target kernel a unique name MUST be used to register the
 *  kernel on target side and HOST side.
 *
 *  Follow the comments for the different functions in the file to understand how a user/target kernel is implemented.
 */

#include <TI/tivx.h>
#include <TI/tivx_target_kernel.h>
#include "../app_c7x_kernel_transpose.h"
#include "stdio.h"

#if defined(SOC_AM62A)
#include <utils/udma/include/app_udma_utils.h>
#else
#include <utils/udma/include/app_udma.h>
#endif
#include <c7x.h>
#include "../../../../../mmalib_02_05_00_07/ti/mmalib/src/test/MMALIB_test.h"

#include <mmalib.h>
#include "../../../../../mmalib_02_05_00_07/ti/mmalib/src/linalg_c7xmma/MMALIB_LINALG_matrixTranspose_ixX_oxX/MMALIB_LINALG_matrixTranspose_ixX_oxX_idat.h"
#include "VXLIB_types.h"

vx_uint8 *Final_Data;
vx_uint8 *Output_Data;
vx_uint8 *Flipped_Data;
uint8_t *Flip_Identity_Data;


/**
 * \brief Target kernel handle [static global]
 */
static tivx_target_kernel app_c7x_target_kernel_img_transpose_handle = NULL;

typedef struct {

    vx_uint32 blkWidth;
    vx_uint32 blkHeight;

} tivxC7xKernelParams;


static vx_status Img_Transpose(tivxC7xKernelParams *prms,
                             tivx_obj_desc_image_t *src_desc0,
                             tivx_obj_desc_image_t *dst_desc,
                             void *src_desc0_target_ptr,
                             void *dst_desc_target_ptr)
{
    
    vx_imagepatch_addressing_t *PathchInput = (vx_imagepatch_addressing_t *)&src_desc0->imagepatch_addr[0];
    uint32_t Mem_Required;
    Mem_Required=PathchInput->dim_x* PathchInput->dim_y;
    printf("Memory required for calloc is %d \n",Mem_Required);
    /*
    Output_Data =(vx_uint8 *)calloc((Mem_Required*3),sizeof(vx_uint8));
    if(Output_Data == NULL)
        printf("VCN: ERROR ! Memory not allocated!\n");
    */
    vx_uint8 flip_data[(320*160)*4];     
    //vx_uint8 flip_data[51200];
    Output_Data=&flip_data[0];
    MMALIB_bufParams2D_t vxlib_src01, vxlib_dst1;
    MMALIB_LINALG_matrixTranspose_ixX_oxX_InitArgs Transpose_InitArgs;
    int32_t handleSize = MMALIB_LINALG_matrixTranspose_ixX_oxX_getHandleSize(&Transpose_InitArgs);
    MMALIB_kernelHandle TransposeKernelHandle = malloc(handleSize);
    char temp1[] = "MMALIB_LINALG_matrixTranspose_ixX_oxX";
    TI_profile_init(temp1);
    
    vx_status status = VX_SUCCESS;
    printf("VCN:X- Stride is %d Y-Stride is %d \n",PathchInput->stride_x,PathchInput->stride_y);
    vxlib_src01.data_type=MMALIB_INT32;
    //vxlib_src01.data_type=MMALIB_INT32;
    vxlib_src01.dim_x= PathchInput->dim_x;
    vxlib_src01.dim_y= PathchInput->dim_y;//prms->blkHeight;
    vxlib_src01.stride_y= PathchInput->dim_x*4;//prms->blkHeight;
    //printf("VCN: Input data blkWidth valus is %d \n",vxlib_src01.dim_x);
    //printf("VCN: Input data blkHeight and stride value is %d \n",vxlib_src01.dim_y);

    vxlib_dst1.data_type=MMALIB_INT32;
    //vxlib_dst1.data_type=MMALIB_INT32;
    vxlib_dst1.dim_x= PathchInput->dim_y;//prms->blkHeight;
    vxlib_dst1.dim_y= PathchInput->dim_x;
    vxlib_dst1.stride_y= PathchInput->dim_y*4;
    //printf("VCN: Output data blkWidth valus is %d \n",vxlib_dst1.dim_x);
    //printf("VCN: Output data blkHeight and stride value is %d \n",vxlib_dst1.dim_y);
    MMALIB_STATUS status_opt;
    MMALIB_STATUS status_init = MMALIB_LINALG_matrixTranspose_ixX_oxX_init_checkParams(
                    TransposeKernelHandle,
                    &vxlib_src01,
                    &vxlib_dst1,
                    &Transpose_InitArgs);

    if (status_init == MMALIB_SUCCESS)
    {
        //printf("success- MMALIB_LINALG_matrixTranspose_ixX_oxX_init_checkParams \n");
        TI_profile_start(TI_PROFILE_KERNEL_INIT);
        // MMALIB_asm(" MARK 0");
        Transpose_InitArgs.funcStyle = MMALIB_FUNCTION_OPTIMIZED;
        //Transpose_InitArgs.funcStyle = MMALIB_FUNCTION_NATC;
        status_init = MMALIB_LINALG_matrixTranspose_ixX_oxX_init(TransposeKernelHandle,
                                                                    &vxlib_src01,
                                                                    &vxlib_dst1,
                                                                    &Transpose_InitArgs);
        //TI_profile_start(TI_PROFILE_KERNEL_CN);

        TI_profile_stop();
    }
    else
        printf("VCN: Failed- MMALIB_LINALG_matrixTranspose_ixX_oxX_init_checkParams \n");
        
    status_opt = MMALIB_LINALG_matrixTranspose_ixX_oxX_exec_checkParams(TransposeKernelHandle,
                                                                                        Final_Data,
                                                                                        Output_Data);
    
    if (status_opt == MMALIB_SUCCESS)
    {
        //TI_profile_start(TI_PROFILE_KERNEL_INIT); // OPT - > INIT
        TI_profile_start(TI_PROFILE_KERNEL_OPT);
        
        status_opt = MMALIB_LINALG_matrixTranspose_ixX_oxX_exec(TransposeKernelHandle,
                                                                (vx_uint8 *)Final_Data,
                                                                Output_Data);

        // MMALIB_asm(" MARK 3");
        //printf("Finished- MMALIB_LINALG_matrixTranspose_ixX_oxX_exec \n");
        TI_profile_stop();
    }
    //free(Final_Data);
    memcpy((uint8_t *)dst_desc_target_ptr,Output_Data,(320*160)*3);
    return status;  
}


static vx_status Img_Rotate(tivxC7xKernelParams *prms,
                             tivx_obj_desc_image_t *src_desc0,
                             tivx_obj_desc_image_t *dst_desc,
                             void *src_desc0_target_ptr,
                             void *dst_desc_target_ptr)
{
     vx_status status = VX_SUCCESS;
/*
    uint32_t Mem_Required;
    uint32_t row,clm,num;
    vx_imagepatch_addressing_t *PathchInput = (vx_imagepatch_addressing_t *)&src_desc0->imagepatch_addr[0];
    Mem_Required=PathchInput->dim_x* PathchInput->dim_y;
    
    vx_uint8 rotated_data[51200];
    Flipped_Data=&rotated_data[0];
    //Flipped_Data=(vx_uint8 *)calloc(Mem_Required,sizeof(vx_uint8));
    //if(Flipped_Data == NULL)
    //    printf("VCN: ERROR ! Flipped_Data Memory not allocated!\n");
    
    //vx_uint8 Flip_Identity[160][160];
    //Flip_Identity_Data =&Flip_Identity[0][0];
    Flip_Identity_Data =(vx_uint8 *)calloc((PathchInput->dim_y*PathchInput->dim_y),sizeof(vx_uint8));
    if(Flip_Identity_Data == NULL)
        printf("VCN: ERROR ! Memory not allocated!\n");
    
    //VXLIB_bufParams2D_t vxlib_src0, vxlib_dst;
    MMALIB_bufParams2D_t mvxlib_src1, mvxlib_dst1, vxlib_src1_mult;

    MMALIB_STATUS status_init_multiply_check;
    MMALIB_STATUS status_init_multiply;

    MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_InitArgs Mutiply_InitArgs;
    int32_t handleSize_multiply = MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_getHandleSize(&Mutiply_InitArgs);
    MMALIB_kernelHandle MultiplyKernelHandle = malloc(handleSize_multiply);
    char temp[] = "MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX";
    TI_profile_init(temp);
    num=PathchInput->dim_y-1;
    //constructing a flipped identity matrix for 90 degrees clockwise rotation in the form of a 1D matrix
    for (row = 0; row < PathchInput->dim_y; row++){
        for (clm = 0; clm < PathchInput->dim_y; clm++){
            if ((num-row) == clm)
            {
                //Flipped_identity[row][clm] = 1;
                Flip_Identity_Data[clm+(row*PathchInput->dim_y)] = 1;
                //printf("%d-%d  \n",(clm+(row*num)),Flipped_identity[row][clm]);
                //printf("%d-%d  \n",(clm+(row*PathchInput->dim_y)),Flip_Identity_Data[clm+(row*PathchInput->dim_y)]);
            }
                
            else
                //Flipped_identity[row][clm]  = 0;
                Flip_Identity_Data[clm+(row*PathchInput->dim_y)] = 0;
        }
    }
    
    
  

    vxlib_src1_mult.data_type=MMALIB_INT8;
    vxlib_src1_mult.dim_x= PathchInput->dim_y;//prms->blkHeight;
    vxlib_src1_mult.dim_y= PathchInput->dim_y;
    vxlib_src1_mult.stride_y= PathchInput->dim_y;


    mvxlib_src1.data_type=MMALIB_INT8;
    mvxlib_src1.dim_x= PathchInput->dim_y;//prms->blkHeight;
    mvxlib_src1.dim_y= PathchInput->dim_x;
    mvxlib_src1.stride_y= PathchInput->dim_y;

    
    mvxlib_dst1.data_type=MMALIB_INT8;
    mvxlib_dst1.dim_x= PathchInput->dim_y;//prms->blkHeight;
    mvxlib_dst1.dim_y= PathchInput->dim_x;
    mvxlib_dst1.stride_y= PathchInput->dim_y;

    MMALIB_STATUS status_opt;
 
    status_init_multiply_check = MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_init_checkParams(MultiplyKernelHandle,
                                                                                                &mvxlib_src1,
                                                                                                &vxlib_src1_mult,
                                                                                                &mvxlib_dst1,
                                                                                                &Mutiply_InitArgs);
                
    if (status_init_multiply_check == MMALIB_SUCCESS)
    {
        //printf("Success - MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_init_checkParams \n");
        //TI_profile_start(TI_PROFILE_KERNEL_INIT);
        TI_profile_start(TI_PROFILE_KERNEL_INIT);
        Mutiply_InitArgs.funcStyle = MMALIB_FUNCTION_OPTIMIZED;
        //Mutiply_InitArgs.funcStyle = MMALIB_FUNCTION_NATC;
        status_init_multiply = MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_init(MultiplyKernelHandle,
                                                                                    &mvxlib_src1,
                                                                                    &vxlib_src1_mult,
                                                                                    &mvxlib_dst1,
                                                                                    &Mutiply_InitArgs);
        TI_profile_stop();
    }
    else
        printf("Error - MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_init_checkParams \n");

    if (status_init_multiply == MMALIB_SUCCESS)
    {

     
        //printf("Success - MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_init \n");
        status_opt = MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_exec_checkParams(
            MultiplyKernelHandle,
            (uint8_t *)Output_Data,
            (uint8_t *)Flip_Identity_Data,
            Flipped_Data); 
    }
                
    if (status_opt == MMALIB_SUCCESS)
    {
        //printf("Success - MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_exec_checkParams \n");
        TI_profile_start(TI_PROFILE_KERNEL_OPT);
        //TI_profile_start(TI_PROFILE_KERNEL_CN);
        
        status_opt = MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX_exec(
            MultiplyKernelHandle,
            (uint8_t *)Output_Data,
            (uint8_t *)Flip_Identity_Data,
            Flipped_Data);
            TI_profile_stop();
    }
    memcpy((uint8_t *)dst_desc_target_ptr,Flipped_Data,Mem_Required);


    uint32_t i;
    uint32_t j;
printf("\n");
printf("Transposed data is \n");

for(i=0;i<2;i++)
{
    for(j=0;j<PathchInput->dim_y;j++)
    {
        printf(" %d ",Output_Data[j+(i*PathchInput->dim_y)]);
    }
    printf("\n");
     printf("\n");
}

printf("\n");
printf("Flipped data is \n");

for(i=0;i<2;i++)
{
    for(j=0;j<PathchInput->dim_y;j++)
    {
        printf(" %d ",Flipped_Data[j+(i*PathchInput->dim_y)]);
    }
    printf("\n");
     printf("\n");
}


    if(status!=VX_SUCCESS)
    {
        printf("Kernel processing failed !!!\n");
        status = VX_FAILURE;
    }
   
    //free(Output_Data);
    //free(Flipped_Data);
    free(Flip_Identity_Data);
*/
    return status;
}



static vx_status add_img_execute(tivxC7xKernelParams *prms,
                             tivx_obj_desc_image_t *src_desc0,
                             tivx_obj_desc_image_t *dst_desc1,
                             tivx_obj_desc_image_t *dst_desc,
                             void *src_desc0_target_ptr,
                             void *dst_desc1_target_ptr,
                             void *dst_desc_target_ptr)
{
    vx_status status = VX_SUCCESS;

    Img_Transpose(prms, src_desc0, dst_desc, src_desc0_target_ptr, dst_desc_target_ptr);
    if (VX_SUCCESS == status)
    {
        status = Img_Rotate(prms, src_desc0, dst_desc, src_desc0_target_ptr, dst_desc_target_ptr);
    }
    else
     printf("Transpose is failed \n");
    return status;
}

/**
 * \brief Target kernel run function
 *
 * \param kernel [in] target kernel handle
 * \param obj_desc [in] Parameter object descriptors
 * \param num_params [in] Number of parameter object descriptors
 * \param priv_arg [in] kernel instance priv argument
 */
vx_status VX_CALLBACK app_c7x_target_kernel_img_transpose(
    tivx_target_kernel_instance kernel, tivx_obj_desc_t *obj_desc[],
    uint16_t num_params, void *priv_arg)
{
    vx_status status = VX_SUCCESS;
    tivx_obj_desc_image_t *src_desc0, *dst_desc1,*dst_desc;
    tivxC7xKernelParams *prms = NULL;

    if ((num_params != APP_C7X_IMG_TRANSPOSE_MAX_PARAMS)
        || (NULL == obj_desc[APP_C7X_IMG_TRANSPOSE_IN0_IMG_IDX])
        || (NULL == obj_desc[APP_C7X_IMG_TRANSPOSE_OUT0_IMG_IDX])
        )
    {
        status = VX_FAILURE;
    }

    if(status==VX_SUCCESS)
    {
        uint32_t size;

        status = tivxGetTargetKernelInstanceContext(kernel,
            (void **)&prms, &size);
        if ((VX_SUCCESS != status) || (NULL == prms) ||
            (sizeof(tivxC7xKernelParams) != size))
        {
            status = VX_FAILURE;
        }
    }

    if(status==VX_SUCCESS)
    {
        void *src_desc0_target_ptr;
        void *dst_desc1_target_ptr;
        void *dst_desc_target_ptr;

        /* Get the Src and Dst descriptors */
        src_desc0 = (tivx_obj_desc_image_t *)obj_desc[APP_C7X_IMG_TRANSPOSE_IN0_IMG_IDX];
        dst_desc1 = (tivx_obj_desc_image_t *)obj_desc[APP_C7X_IMG_TRANSPOSE_IN0_IMG_IDX];
        dst_desc  = (tivx_obj_desc_image_t *)obj_desc[APP_C7X_IMG_TRANSPOSE_OUT0_IMG_IDX];

        /* Get the target pointer from the shared pointer for all
           buffers */
        src_desc0_target_ptr = tivxMemShared2TargetPtr(&src_desc0->mem_ptr[0]);
        dst_desc1_target_ptr = tivxMemShared2TargetPtr(&dst_desc1->mem_ptr[0]);
        dst_desc_target_ptr = tivxMemShared2TargetPtr(&dst_desc->mem_ptr[0]);

        /* Map all buffers, which invalidates the cache */
        tivxMemBufferMap(src_desc0_target_ptr,
            src_desc0->mem_size[0], VX_MEMORY_TYPE_HOST,
            VX_READ_ONLY);

        tivxMemBufferMap(dst_desc1_target_ptr,
            dst_desc1->mem_size[0], VX_MEMORY_TYPE_HOST,
            VX_WRITE_ONLY);

        tivxMemBufferMap(dst_desc_target_ptr,
            dst_desc->mem_size[0], VX_MEMORY_TYPE_HOST,
            VX_WRITE_ONLY);
            vx_uint8 Input_Data[(320*160)*4];
            Final_Data = &Input_Data[0];
        /*
        Final_Data =(uint8_t *)calloc(153600,sizeof(uint8_t));  
        //Final_Data =(uint8_t *)calloc(51200,sizeof(uint8_t));
        if(Final_Data == NULL)
            printf("VCN: ERROR ! I/P Memory not allocated!\n");
*/
        memcpy(Final_Data,(uint8_t *)src_desc0_target_ptr,(320*160)*4);
        
        //memcpy(Final_Data,(uint8_t *)src_desc0_target_ptr,51200);
        printf("VCN: memory test without DMA\n");


        status = add_img_execute(prms, src_desc0, dst_desc1, dst_desc, src_desc0_target_ptr, dst_desc1_target_ptr, dst_desc_target_ptr);
        tivxMemBufferUnmap(src_desc0_target_ptr,
            src_desc0->mem_size[0], VX_MEMORY_TYPE_HOST,
            VX_READ_ONLY);

        tivxMemBufferUnmap(dst_desc1_target_ptr,
            dst_desc1->mem_size[0], VX_MEMORY_TYPE_HOST,
            VX_WRITE_ONLY);

        tivxMemBufferUnmap(dst_desc_target_ptr,
            dst_desc->mem_size[0], VX_MEMORY_TYPE_HOST,
            VX_WRITE_ONLY);

    }

    return (status);
}

/**
 * \brief Target kernel create function
 *
 * \param kernel [in] target kernel handle
 * \param param_obj_desc [in] Parameter object descriptors
 * \param num_params [in] Number of parameter object descriptors
 * \param priv_arg [in] kernel instance priv argument
 */
vx_status VX_CALLBACK app_c7x_target_kernel_img_transpose_create(tivx_target_kernel_instance kernel, tivx_obj_desc_t *param_obj_desc[], uint16_t num_params, void *priv_arg)
{
    vx_status status = VX_SUCCESS;
    uint32_t i;

    for (i = 0U; i < num_params; i ++)
    {
        if (NULL == param_obj_desc[i])
        {
            status = VX_FAILURE;
            break;
        }
    }

    if (VX_SUCCESS == status)
    {
        tivxC7xKernelParams * kernelParams = NULL;

        kernelParams = (tivxC7xKernelParams *)tivxMemAlloc(sizeof(tivxC7xKernelParams), TIVX_MEM_EXTERNAL);

        if(kernelParams == NULL)
        {
            status = VX_FAILURE;
        }
        
        tivx_mem_stats l2_stats;
        tivxMemFree(NULL, 0, (vx_enum)TIVX_MEM_INTERNAL_L2);
        tivxMemStats(&l2_stats, (vx_enum)TIVX_MEM_INTERNAL_L2);
            
        tivx_obj_desc_image_t *in_img_desc  = (tivx_obj_desc_image_t *)param_obj_desc[APP_C7X_IMG_TRANSPOSE_IN0_IMG_IDX];

        vx_imagepatch_addressing_t *pIn = (vx_imagepatch_addressing_t *)&in_img_desc->imagepatch_addr[0];
        vx_int32  in_width  = pIn->dim_x;
        vx_uint32 in_height = pIn->dim_y;

        kernelParams->blkWidth = in_width;
        kernelParams->blkHeight = in_height;

        if(status == VX_SUCCESS)
        {

            tivxSetTargetKernelInstanceContext(kernel, kernelParams,  sizeof(tivxC7xKernelParams));
        }
    }

    return status;
}

/**
 * \brief Target kernel delete function
 *
 * \param kernel [in] target kernel handle
 * \param obj_desc [in] Parameter object descriptors
 * \param num_params [in] Number of parameter object descriptors
 * \param priv_arg [in] kernel instance priv argument
 */
vx_status VX_CALLBACK app_c7x_target_kernel_img_transpose_delete(tivx_target_kernel_instance kernel, tivx_obj_desc_t *param_obj_desc[], uint16_t num_params, void *priv_arg)
{
    vx_status status = VX_SUCCESS;
    uint32_t i;

    for (i = 0U; i < num_params; i ++)
    {
        if (NULL == param_obj_desc[i])
        {
            status = VX_FAILURE;
            break;
        }
    }

    if (VX_SUCCESS == status)
    {
        uint32_t size;
        tivxC7xKernelParams *prms = NULL;

        status = tivxGetTargetKernelInstanceContext(kernel,
            (void **)&prms, &size);

        if (VX_SUCCESS == status)
        {

            tivxMemFree(prms, sizeof(tivxC7xKernelParams), TIVX_MEM_EXTERNAL);
        }
    }

    return status;
}

/**
 * \brief Add target kernel to TIOVX framework
 *
 */
void app_c7x_target_kernel_img_transpose_register(void)
{
    char target_name[TIVX_TARGET_MAX_NAME];
    vx_enum self_cpu;

    /**
     * - Get CPU ID of the running CPU
     *
     * Add kernel to target framework only if it is supported on this target
     * \code
     */
    self_cpu = tivxGetSelfCpuId();
    /** \endcode */

    if ((self_cpu == TIVX_CPU_ID_DSP_C7_1))
    {
        /**
         * - Find target name based on currently running CPU
         *
         * \code
         */
        strncpy(target_name, TIVX_TARGET_DSP_C7_1,
                TIVX_TARGET_MAX_NAME);
        /** \endcode */

        /**
         * - Register target kernel to TIOVX framework
         *
         * "APP_C7X_KERNEL_IMG_ADD_NAME" is the name of the target kernel.
         * See also \ref app_c7x_kernel.h
         *
         * This MUST match the name specified when registering the same kernel
         * on the HOST side.
         *
         * The registered target kernel handle is stored in a global variable.
         * This is used during app_c7x_target_kernel_img_add_unregister()
         *
         * \code
         */
        app_c7x_target_kernel_img_transpose_handle = tivxAddTargetKernelByName(
                    (char*)APP_C7X_KERNEL_IMG_TRANSPOSE_NAME,
                    target_name,
                    app_c7x_target_kernel_img_transpose,
                    app_c7x_target_kernel_img_transpose_create,
                    app_c7x_target_kernel_img_transpose_delete,
                    NULL,
                    NULL);
        /** \endcode */
    }
}

/**
 * \brief Remove target kernel from TIOVX framework
 *
 */
vx_status app_c7x_target_kernel_img_transpose_unregister(void)
{
    vx_status status = VX_SUCCESS;

    /**
     * - UnRegister target kernel from TIOVX framework
     *
     * The target kernel handle used is the one returned during
     * tivxAddTargetKernel()
     *
     * \code
     */
    status = tivxRemoveTargetKernel(app_c7x_target_kernel_img_transpose_handle);
    /** \endcode */

    if (VX_SUCCESS == status)
    {
        app_c7x_target_kernel_img_transpose_handle = NULL;
    }
    return status;
}

and Regards,

Vineeth

0 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Vineeth,

Are you trying to keep the RGB data interleaved, or are you trying to split the interleaved data into 3 planes of RGB and then transpose the planes individually? If the second option, then it would be simply creating 3 images corresponding to R, G and B, and then calling transpose on each of them individually just like you called transpose on the greyscale image.

If the first option, what is the size of each pixel in bytes (I quickly looked, but didn't find this in your code above)? For instance, if you have RGB as 5bits of R, 6 bits of G, and 5 bits of B, then each pixel is 16 bits -> 2 bytes. In this case, you would need to configure the transpose to work on 16-bit data. The MMA transpose supports 8-bit, 16-bit and 32-bit integer data types.

Best,

Will

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Hello Will,

Am reading the .bmp image as interleaved ie, not as 3 planes of RGB.

The image I am reading has bit depth of 24 ie, R-8 bits, G-8 bits and B-8 bits. What should be the configuration for this?

Thanks and regards,

Vineeth

0 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Vineeth,

The MMA transpose function does not transpose 24-bit data; the supported element/pixel sizes are 8-,16- and 32-bit elements/pixels.

This leaves a few options:

1. De-interleave the data, transpose the 3 color planes individually as 8-bit data, re-interleave the data (interleaving/de-interleaving can be accelerated with c7x streaming engines, I think).

2. Promote the data to 32-bit data (add 1 byte to the end of every pixel), transpose using the 32-bit transpose, reduce the data back to 24-bit (if necessary). (promotion/reduction might be accelerated with c7x streaming engines)

3. Transpose using c7x streaming engines and C code (not accelerated by MMA)

4. Other ideas?

-Will

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Hello Will,

Thank you so much for the detailed explanation and inputs. I will definitely try to implement this.

So I have one more question. Basically I will be getting YUV- NV12 data, we were converting it into RGB before sending it into MMA transpose (some of our algorithm need RGB data).

So if I configure the transpose 16bits for NV12 data then that should be a easy option to do a transpose right? Please let me know is there any challenges in that.

Thanks and Regards,

Vineeth

0 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Vineeth,

I'm not familiar with NV12 data (or image processing in general), but from my quick read, this is my understanding of NV12.

It is biplanar
The first plane is full resolution Y channel with 8-bit data. This plane can be transposed just like the greyscale you already have working (8-bit mode)
The second plane is half resolution (4x reduction), but U and V channels are mixed so this plane is 1/2 the size of the Y plane. Each "pixel" consists of 1 8-bit U value and 1 8-bit V value, so effectively 16-bits per matrix element. The first row would be [U00,V00] [U01,V01]... where [...] indicates a matrix element
- For the second plane, you would configure the transpose in 16-bit mode.

In summary, you'll need 2 transposes. One 8-bit transpose on the Y plane and one 16-bit transpose on the UV plane, also changing the dimensions to match the actual plan sizes.

Best,

Will

0 Vineeth CN over 2 years ago in reply to William Leven

Prodigy 130 points

Hi Will,

Sorry for the late reply, I was tied up with some other high priority work.

I have tested all the above mentioned approaches you have suggested, and all those logics works perfectly. Thank you so much.

Best regards,

Vineeth

0 William Leven over 2 years ago in reply to Vineeth CN

TI__Intellectual 2410 points

Hi Vineeth,

Great to hear it is working, and thanks for following up.

Best,

Will

Processors

Processors forum

TDA4VM: Custom kernel using MMALIB - C7x - Memory issue