This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/TDA4VM: MMALIB_LINALG_matrixTranspose calculation error

Part Number: TDA4VM
Other Parts Discussed in Thread: SYSBIOS

Tool/software: Code Composer Studio

Hi,

I am using the MMALIB_LINALG_matrixTranspose interface in mmalib, my psdk version is 06_02_00_21.

Below is my code

static void matTrans(int* out, int* in, int M, int N) {
    MMALIB_bufParams2D_t aBuffer;
    aBuffer.data_type = MMALIB_INT32;
    aBuffer.dim_x = N;
    aBuffer.dim_y = M;
    aBuffer.stride_y = aBuffer.dim_x * MMALIB_sizeof(aBuffer.data_type);

    MMALIB_bufParams2D_t bBuffer;
    bBuffer.data_type = MMALIB_INT32;
    bBuffer.dim_x = M;
    bBuffer.dim_y = N;
    bBuffer.stride_y = bBuffer.dim_x * MMALIB_sizeof(bBuffer.data_type);

    MMALIB_LINALG_matrixTranspose_ixX_oxX_InitArgs initArgs;

    //initArgs.funcStyle = MMALIB_FUNCTION_NATC;
    initArgs.funcStyle = MMALIB_FUNCTION_OPTIMIZED;

    int32_t handleSize = MMALIB_LINALG_matrixTranspose_ixX_oxX_getHandleSize(&initArgs);
    MMALIB_kernelHandle kernelHandle = malloc(handleSize);

    // Check that the parameters will generate a valid handle
    MMALIB_STATUS initCheck = MMALIB_LINALG_matrixTranspose_ixX_oxX_init_checkParams(kernelHandle,
      &aBuffer,
      &bBuffer,
      &initArgs);

    printf("Init check = %d.\n", initCheck);

    // Generate the handle
    MMALIB_STATUS initStatus = MMALIB_LINALG_matrixTranspose_ixX_oxX_init(kernelHandle, &aBuffer, &bBuffer, &initArgs);
    printf("Init status = %d.\n", initStatus);

    // Check that the execute arguments are valid for execution
    MMALIB_STATUS execCheck = MMALIB_LINALG_matrixTranspose_ixX_oxX_exec_checkParams(kernelHandle,
      in,
      out);
    printf("Exec check = %d.\n", execCheck);

    // Execute the kernel
    MMALIB_STATUS execStatus = MMALIB_LINALG_matrixTranspose_ixX_oxX_exec(kernelHandle,
        in,
        out);

    printf("Exec status = %d.\n", execStatus);

    printf("MatMulIntrinsics done...\n");
    free(kernelHandle);
}

int main(){

    int m, n;
    int M = 4;
    int N = 4;

    int matA[16] = { 0, 1,  2, 3,
                     4, 5,  6, 7,
                     8, 9, 10, 11,
                     12,13,14, 15 };
    int matB[16] = {0};

    matTrans(&matB[0],&matA[0],M,N);

    printf("matB = \n");
    for ( m = 0; m < M; m++)
    {
        for ( n = 0; n < N; n++)
        {
            printf("%d ",matB[ m*N + n]);
        }
        printf("\n");
    }
    return 0;
}

If the funcStyle parameter is MMALIB_FUNCTION_NATC, the result is correct

[C7x_1 ] 25.874141 s: Init check = 0.
[C7x_1 ] 25.874162 s: Init status = 0.
[C7x_1 ] 25.874178 s: Exec check = 0.
[C7x_1 ] 25.874195 s: Exec status = 0.
[C7x_1 ] 25.874212 s: MatMulIntrinsics done...
[C7x_1 ] 25.874226 s: matB =
[C7x_1 ] 25.874242 s: 0 4 8 12
[C7x_1 ] 25.874256 s: 1 5 9 13
[C7x_1 ] 25.874271 s: 2 6 10 14
[C7x_1 ] 25.874286 s: 3 7 11 15

When the funcStyle parameter is set to MMALIB_FUNCTION_OPTIMIZED, the result is as follows

[C7x_1 ] 32.679648 s: Init check = 0.
[C7x_1 ] 32.679671 s: Init status = 0.
[C7x_1 ] 32.679687 s: Exec check = 0.
[C7x_1 ] 32.679705 s: Exec status = 0.
[C7x_1 ] 32.679723 s: MatMulIntrinsics done...
[C7x_1 ] 32.679738 s: matB =
[C7x_1 ] 32.679754 s: 0 0 0 0
[C7x_1 ] 32.679771 s: 0 0 0 0
[C7x_1 ] 32.679785 s: 0 0 0 0
[C7x_1 ] 32.679800 s: 0 0 0 0

How should I solve this problem?

Regards,

Henry

  • Hi Henry,

    The MMALIB_FUNCTION_OPTIMIZED variant of the matrixTranspose implementation uses an internal buffer for the calculation. This internal buffer is assigned the section .l2dmemory. This section .l2dmemory must be mapped to a valid L2 memory area in your linker command file. Would you please double-check if your linker command file maps this section correctly?

    Thank you,

    Sandeep

  • Hi Sandeep,

    Is this the link command file?

    vision_apps\apps\basic_demos\app_tirtos\tirtos_linux\c7x_1\linker_mem_map

    MEMORY
    {
    /* L2 for C7x_1 [ size 480.00 KB ] */
    L2RAM_C7x_1 ( RWIX ) : ORIGIN = 0x64800000 , LENGTH = 0x00078000
    /* L1 for C7x_1 [ size 16.00 KB ] */
    L1RAM_C7x_1 ( RWIX ) : ORIGIN = 0x64E00000 , LENGTH = 0x00004000
    /* MSMC for C7x_1 [ size 7.78 MB ] */
    MSMC_C7x_1 ( RWIX ) : ORIGIN = 0x70020000 , LENGTH = 0x007C8000
    /* DDR for C7x_1 for Linux IPC [ size 1024.00 KB ] */
    DDR_C7x_1_IPC ( RWIX ) : ORIGIN = 0xA8000000 , LENGTH = 0x00100000
    /* DDR for C7x_1 for Linux resource table [ size 1024 B ] */
    DDR_C7x_1_RESOURCE_TABLE ( RWIX ) : ORIGIN = 0xA8100000 , LENGTH = 0x00000400
    /* DDR for C7x_1 for boot section [ size 1024 B ] */
    DDR_C7x_1_BOOT ( RWIX ) : ORIGIN = 0xA8200000 , LENGTH = 0x00000400
    /* DDR for C7x_1 for vecs section [ size 16.00 KB ] */
    DDR_C7x_1_VECS ( RWIX ) : ORIGIN = 0xA8400000 , LENGTH = 0x00004000
    /* DDR for C7x_1 for secure vecs section [ size 16.00 KB ] */
    DDR_C7x_1_SECURE_VECS ( RWIX ) : ORIGIN = 0xA8600000 , LENGTH = 0x00004000
    /* DDR for C7x_1 for code/data [ size 9.98 MB ] */
    DDR_C7x_1 ( RWIX ) : ORIGIN = 0xA8604000 , LENGTH = 0x009FC000
    /* Memory for IPC Vring's. MUST be non-cached or cache-coherent [ size 32.00 MB ] */
    IPC_VRING_MEM : ORIGIN = 0xAA000000 , LENGTH = 0x02000000
    /* Memory for remote core logging [ size 256.00 KB ] */
    APP_LOG_MEM : ORIGIN = 0xAC000000 , LENGTH = 0x00040000
    /* Memory for TI OpenVX shared memory. MUST be non-cached or cache-coherent [ size 31.62 MB ] */
    TIOVX_OBJ_DESC_MEM : ORIGIN = 0xAC040000 , LENGTH = 0x01FA0000
    /* Memory for shared memory buffers in DDR [ size 512.00 MB ] */
    DDR_SHARED_MEM : ORIGIN = 0xAE000000 , LENGTH = 0x20000000
    /* DDR for c7x_1 for local heap [ size 256.00 MB ] */
    DDR_C7X_1_LOCAL_HEAP ( RWIX ) : ORIGIN = 0xDC000000 , LENGTH = 0x10000000
    /* DDR for c7x_1 for Scratch Memory [ size 240.00 MB ] */
    DDR_C7X_1_SCRATCH ( RWIX ) : ORIGIN = 0xEC000000 , LENGTH = 0x0F000000
    }

    Could you help me check the link command file?

    Thank you,

    Henry

  • Hi Henry,

    Thank you for the portion of the map file.

    I was looking for where the section .l2dmemory is mapped in memory. If you are able to search for .l2dmemory or MMALIB_LINALG_matrixTranspose_i32s_o32s_IdentityBlock in the map file, you could provide the address where it is mapped.

    In addition, would you incorporate the following piece of code (or similar) just before you call MMALIB_LINALG_matrixTranspose_ixX_oxX_exec() function, and dump the printed output to a file and provide the file for further debugging?


    printf("Identity block address: %d\n", (int32_t)&MMALIB_LINALG_matrixTranspose_i32s_o32s_IdentityBlock[0]);



    printf("Content of identity block:\n");

    for (i = 0; i < 16; i++) {

        for (j = 0; j < 16; j++) {

             printf("%d, ", MMALIB_LINALG_matrixTranspose_i32s_o32s_IdentityBlock[i*16+j]);

        }

        printf("\n");

    }

    printf("End of identify block content\n");



    printf("Content of handle:\n");

    for (i = 0; i < handleSize/sizeof(int32_t);i++) {

         printf("%d, ", *(((int32_t *)kernelHandle)+i))

    }

    printf("End of handle content\n");


    Thank you,

    Sandeep

  • Hi, Sandeep

    I noticed a warning when compiling my application.

    warning: creating output section ".l2dmemory" without a SECTIONS specification

    Is this the cause of the problem?

    How do i fix it ?

    Thank you,

    Henry

  • Hi Henry,

    Indeed, I suspect the that the .l2dmemory misplacement is the source of the problem.

    When you build your executable, you would be using a linker command file (.cmd extension) to link. In this linker command file, the .l2dmemory must be assigned to a valid L2 memory section.

    You may find a sample linker command file called lnk.cmd from us in ti\mmalib\concerto\c71\ folder. You could search for the line  

    .l2dmemory        > L2SRAM

    in that file. You could use a similar line in your own linker command file.

    Thank you,

    Sandeep

  • Hi, Sandeep

    Here is my linker command file.

    -c
    -heap 0x20000
    -stack 0x20000
    -e _c_int00_secure


    SECTIONS
    {

    .vecs > DDR_C7x_1_VECS ALIGN(0x200000)
    .secure_vecs > DDR_C7x_1_SECURE_VECS ALIGN(0x200000)
    .text:_c_int00_secure > DDR_C7x_1_BOOT ALIGN(0x200000)
    .text > DDR_C7x_1

    .bss > DDR_C7x_1 /* Zero-initialized data */
    .data > DDR_C7x_1 /* Initialized data */

    .cinit > DDR_C7x_1 /* could be part of const */
    .init_array > DDR_C7x_1 /* C++ initializations */
    .stack > DDR_C7x_1 ALIGN(0x20000) /* MUST be 128KB aligned to handle nested interrupts */
    .args > DDR_C7x_1
    .cio > DDR_C7x_1
    .const > DDR_C7x_1
    .switch > DDR_C7x_1
    .sysmem > DDR_C7x_1 /* heap */

    /* .bss:taskStackSection:tiovx (NOLOAD) : {} > L2RAM_C7x_1 */
    .bss:taskStackSection > DDR_C7x_1
    .bss:ddr_shared_mem (NOLOAD) : {} > DDR_C7X_1_LOCAL_HEAP
    .bss:ddr_scratch_mem (NOLOAD) : {} > DDR_C7X_1_SCRATCH

    .bss:app_log_mem (NOLOAD) : {} > APP_LOG_MEM
    .bss:tiovx_obj_desc_mem (NOLOAD) : {} > TIOVX_OBJ_DESC_MEM
    .bss:ipc_vring_mem (NOLOAD) : {} > IPC_VRING_MEM

    .bss:l1mem (NOLOAD)(NOINIT) : {} > L1RAM_C7x_1
    .bss:l2mem (NOLOAD)(NOINIT) : {} > L2RAM_C7x_1
    /* .l2dmemory > L2RAM_C7x_1*/
    .bss:l3mem (NOLOAD)(NOINIT) : {} > MSMC_C7x_1

    .resource_table > DDR_C7x_1_RESOURCE_TABLE

    GROUP: > DDR_C7x_1
    {
    .data.ti_sysbios_family_c7x_Mmu_tableArray : type=NOINIT
    .data.ti_sysbios_family_c7x_Mmu_tableArraySlot : type=NOINIT
    .data.ti_sysbios_family_c7x_Mmu_level1Table : type=NOINIT

    .data.ti_sysbios_family_c7x_Mmu_tableArray_NS : type=NOINIT
    .data.ti_sysbios_family_c7x_Mmu_tableArraySlot_NS : type=NOINIT
    .data.ti_sysbios_family_c7x_Mmu_level1Table_NS : type=NOINIT

    }


    }

    When I add this line of code [ .l2dmemory     >   L2RAM_C7x_1 ], I get the following error.

    program will not fit into available memory, or the section contains a call
    site that requires a trampoline that can't be generated for this section.
    run placement with alignment fails for section ".l2dmemory" size 0x1c00.
    Available memory ranges:
    L2RAM_C7x_1 size: 0x78000 unused: 0x0 max hole: 0x0

    Could you help me modify this file?

    Thank you,

    Henry

  • Hi Henry,

    A sample linker command file is available at ti/mmalib/concerto/c71/lnk.cmd for you to make an MMALIB build.

    To make a sample test build, please use the build instructions given in the user guide.

    Thank you,

    Sandeep