TDA4VH-Q1: seg model infer fail

Shinting

Part Number: TDA4VH-Q1

Tool/software:

Issue Description: The same model, inferring the same image. The result obtained using the TIDL simulation tool is normal. However, when running the inference on the evaluation board, the result is abnormal. Comparing the binary results of both, it is found that most parts are the same, with the matching parts corresponding to the segmented results, which validate correctly. The parts where the binary results differ between the two correspond to the erroneous results.

Original Model:
segod_res18_410.zip
Converted Model:
tidl_io_seg_1.zip

tidl_net_seg.zip
Simulation Inference Result:
seg_out.zip
Evaluation Board Inference Result:
em_seg_output_nopad.zip
Binary Comparison Result:

Sample Image:

Floating-point numbers of each layer output:
seg_double_result.zip

3 months ago

0 Shinting 3 months ago

Prodigy 30 points

Additional Information:
Inference results obtained on the PC, which we believe are correct
import config:

Fullscreen tidl_import_seg.txt Download

#Basic Configuration Parameters
modelType          = 2
numParamBits       = 8


inputNetFile      = "../../test/testvecs/models/public/onnx/segod_res18_410.onnx"
outputNetFile      = "../../test/testvecs/output/seg/tidl_net_seg.bin"
outputParamsFile   = "../../test/testvecs/output/seg/tidl_io_seg_"

inDataNamesList = "data"
outDataNamesList = "od_pred_center_heat od_reg_map od_points_refine_map feature index"
inWidth  = 384
inHeight = 384
inNumChannels = 3

# Input Pre-Processing Parameters
inDataNorm  = 1
inMean = 123.675 116.28 103.53
inScale = 0.0171247538 0.0175070028 0.0174291939
inDataFormat = 0

# Performance Simulation Parameters
#perfSimTool = 	../../utils/perfsim/ti_cnnperfsim.out
#perfSimConfig = ../../test/testvecs/config/import/device_config.cfg
#graphVizTool = ../../utils/tidlModelGraphviz/out/tidl_graphVisualiser.out

# Range Collection (Quant) Parameters
#tidlStatsTool = ../../test/PC_dsp_test_dl_algo.out
inData = ../../test/testvecs/config/quant/quant_list_seg.txt
inFileFormat 		    = 2
postProcType 		    = 0
CalibrationOption = 7
quantizationStyle = 3
debugTraceLevel = 2
writeTraceLevel = 3

import log:
seg_0527.log

0 Adam Hua 3 months ago in reply to Shinting

TI__Expert 4910 points

Hi Shinting,

Shinting said:
Floating-point numbers of each layer output:

1. How did you get the results of each layer? In pc emulation mode or target evm mode? And how did you convert from bin to the txt files?

2. Could you also provide the infer config?

Regards,

Adam

0 Shinting 3 months ago in reply to Adam Hua

Prodigy 30 points

I obtained the results of each layer by adding the parameter writeTraceLevel = 3 during model conversion on the PC. I converted every 4 bytes in the binary file to a float, and then stored them in a txt file. As shown in the figure below:

infer config:

Fullscreen config_list_seg.txt Download

./seg/tidl_infer_seg.txt
0

tidl_infer_seg.zip

Fullscreen test_seg.txt Download

./seg/20210313_000506_000025.bmp

0 Adam Hua 3 months ago in reply to Shinting

TI__Expert 4910 points

Can you also try this on evm?

Shinting said:
I obtained the results of each layer by adding the parameter writeTraceLevel = 3 during model conversion on the PC

You may see a folder called "trace" where you execute the executable and the layer result will be saved into it. You may also check the /tmp folder if there is no "trace" folder

The target for these steps is to know which layer starts to produce this mismatch and will help further debug

Regards,

Adam

0 Shinting 3 months ago in reply to Adam Hua

Prodigy 30 points

On the evm, we perform inference using the encapsulated vision_apps.
How do I set writeTraceLevel = 3 to output the results of each layer?
Key code snippet:

Source code:

Fullscreen 2262.main.c Download

#include "em_api.h"
#include <stdlib.h>
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

//#define NOPAD

int InferPld(const char* inputDataPath)
{
    int ret = 0;
    uint8_t *pldInputP = (uint8_t *)malloc(768*768*3);
    uint8_t* tmpP = (uint8_t *)malloc(768*768*3);
    FILE *inputFp = fopen(inputDataPath, "r");
    if(inputFp != NULL)
    {
        fseek(inputFp, 54, SEEK_SET);
        fread(tmpP, 768*768*3, 1, inputFp);
        fclose(inputFp);
        /* Deinterleave RGB to planar */
        uint32_t chOffset = 768 * 768;
        for(int i = 0; i < 768; i ++)
        {
            uint8_t* pData = tmpP + ((768-1-i)*(768*3));
            for(int j = 0; j < 768; j ++)
            {
                pldInputP[(0 * chOffset) + (i * 768) + j] = *pData++;
                pldInputP[(1 * chOffset) + (i * 768) + j] = *pData++;
                pldInputP[(2 * chOffset) + (i * 768) + j] = *pData++;
            }
        }
        printf("---------pld test input\n");
    }


    uint8_t *outputP[10];
    for(int i=0; i<10; i++){
        outputP[i] = (uint8_t *)malloc(100000);
    }

    ret = em_tidl_init();
    printf("0======%d\n", ret);
    int netId = em_tidl_net_init("/run/media/rootfs-mmcblk0p1/ai_test/tidl_test_lnx/test_data/tidl_io_pld_1.bin", "/run/media/rootfs-mmcblk0p1/ai_test/tidl_test_lnx/test_data/tidl_net_pld.bin");
    printf("1======%d\n", netId);

    ret = em_tidl_infer(netId, pldInputP);
    ret = em_tidl_getOutput(netId, outputP);
    printf("2======%d\n", ret);

    FILE *outputFp = fopen("./em_pld_output", "w+");

#ifdef NOPAD
    for(int i=0;i<3;i++) //corner_cls corner_angle corner_reg  192*221*2
    {
        int index = 0;
        int dataSize = 192*192;
        int padSize = 192*29;
        fwrite(outputP[i]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
        fwrite(outputP[i]+index, dataSize, 1, outputFp);
    }
    
    //points_map 48*99*8
    int index = 0;
    int dataSize = 48*48;
    //int padSize = 48*51;
    int padSize = 0x980;
    int channel = 8;
    int outputIndex=3;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }

    //heat_map 48*99*3
    index = 0;
    dataSize = 48*48;
    //padSize = 48*51;  0x990
    padSize = 0x980;
    channel = 3;
    outputIndex=4;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }

    //first_pt_map 48*99*4
    index = 0;
    dataSize = 48*48;
    //padSize = 48*51;
    padSize = 0x980;
    channel = 4;
    outputIndex=5;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
#else
    fwrite(outputP[0], 192*221*2, 1, outputFp); //corner_cls
    fwrite(outputP[1], 192*221*2, 1, outputFp); //corner_angle
    fwrite(outputP[2], 192*221*2, 1, outputFp); //corner_reg
    fwrite(outputP[3], 48*99*8, 1, outputFp); //points_map
    fwrite(outputP[4], 48*99*3, 1, outputFp); //heat_map
    fwrite(outputP[5], 48*99*4, 1, outputFp); //first_pt_map
#endif
    
    fclose(outputFp);

    ret = em_tidl_net_deinit(netId); 
    ret = em_tidl_deinit(); 
    printf("3======%d\n", ret);
}

int InferOd(const char* inputDataPath)
{
    int ret = 0;
    /*inWidth  = 512
inHeight = 320
inNumChannels = 3*/
 
    uint8_t *odInputP = (uint8_t *)malloc(512*320*3);
    uint8_t* tmpP = (uint8_t *)malloc( 512*320*3);
    FILE *inputFp = fopen(inputDataPath, "r");
    if(inputFp != NULL)
    {
        fseek(inputFp, 54, SEEK_SET);
        fread(tmpP, 512*320*3, 1, inputFp);
        fclose(inputFp);
        /* Deinterleave RGB to planar */
        uint32_t chOffset = 512 * 320;
        for(int i = 0; i < 320; i ++)
        {
            uint8_t* pData = tmpP + ((320-1-i)*(512*3));
            for(int j = 0; j < 512; j ++)
            {
                odInputP[(0 * chOffset) + (i * 512) + j] = *pData++;
                odInputP[(1 * chOffset) + (i * 512) + j] = *pData++;
                odInputP[(2 * chOffset) + (i * 512) + j] = *pData++;
            }
        }
        printf("---------od test input\n");
    }


    uint8_t *outputP[10];
    for(int i=0; i<10; i++){
        outputP[i] = (uint8_t *)malloc(100000);
    }

    ret = em_tidl_init();
    printf("0======%d\n", ret);
    int netId = em_tidl_net_init("/run/media/rootfs-mmcblk0p1/ai_test/tidl_test_lnx/test_data/tidl_io_od_1.bin", "/run/media/rootfs-mmcblk0p1/ai_test/tidl_test_lnx/test_data/tidl_net_od.bin");
    printf("1======%d\n", netId);

    ret = em_tidl_infer(netId, odInputP);
    ret = em_tidl_getOutput(netId, outputP);
    printf("2======%d\n", ret);

    FILE *outputFp = fopen("./em_od_output", "w+");

#ifdef NOPAD
    /*
    hm      128*110*5
    hm_det  128*110*6
    wh      128*110*2
    reg     128*110*2
    hps_coord   128*110*8
    hps_conf    128*110*4
    hm_hp_car4  128*110*4
    hp_offset   128*110*2
    */

    //hm      128*110*5
    int index = 0;
    int dataSize = 128*80;
    int padSize = 128*30; //128*30
    int channel = 5;
    int outputIndex=0;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //hm_det  128*110*6
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 6;
    outputIndex=1;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //wh      128*110*2
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 2;
    outputIndex=2;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //reg     128*110*2
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 2;
    outputIndex=3;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //hps_coord   128*110*8
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 8;
    outputIndex=4;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //hps_conf    128*110*4
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 4;
    outputIndex=5;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //hm_hp_car4  128*110*4
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 4;
    outputIndex=6;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //hp_offset   128*110*2
    index = 0;
    dataSize = 128*80;
    padSize = 128*30; //128*30
    channel = 2;
    outputIndex=7;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
#else
    fwrite(outputP[0], 128*110*5, 1, outputFp); //hm
    fwrite(outputP[1], 128*110*6, 1, outputFp); //hm_det
    fwrite(outputP[2], 128*110*2, 1, outputFp); //wh
    fwrite(outputP[3], 128*110*2, 1, outputFp); //reg
    fwrite(outputP[4], 128*110*8, 1, outputFp); //hps_coord
    fwrite(outputP[5], 128*110*4, 1, outputFp); //hps_conf
    fwrite(outputP[6], 128*110*4, 1, outputFp); //hm_hp_car4
    fwrite(outputP[7], 128*110*2, 1, outputFp); //hp_offset
#endif
    fclose(outputFp);

    ret = em_tidl_net_deinit(netId); 
    ret = em_tidl_deinit(); 
    printf("3======%d\n", ret);
}

int InferSeg(const char* inputDataPath)
{
    int ret = 0;
    /*inWidth  = 384
inHeight = 384
inNumChannels = 3*/
    uint8_t *segInputP = (uint8_t *)malloc(384*384*3);
    uint8_t* tmpP = (uint8_t *)malloc( 384*384*3);
    FILE *inputFp = fopen(inputDataPath, "r");
    if(inputFp != NULL)
    {
        fseek(inputFp, 54, SEEK_SET);
        fread(tmpP, 384*384*3, 1, inputFp);
        fclose(inputFp);
        /* Deinterleave RGB to planar */
        uint32_t chOffset = 384 * 384;
        for(int i = 0; i < 384; i ++)
        {
            uint8_t* pData = tmpP + ((384-1-i)*(384*3));
            for(int j = 0; j < 384; j ++)
            {
                segInputP[(0 * chOffset) + (i * 384) + j] = *pData++;
                segInputP[(1 * chOffset) + (i * 384) + j] = *pData++;
                segInputP[(2 * chOffset) + (i * 384) + j] = *pData++;
            }
        }
        printf("---------seg test input\n");
    }

    uint8_t *outputP[10];
    for(int i=0; i<10; i++){
        outputP[i] = (uint8_t *)malloc(150000);
        memset(outputP[i], 0, 150000); // 将申请的空间全部设为0
    }

    ret = em_tidl_init();
    printf("0======%d\n", ret);
    int netId = em_tidl_net_init("/run/media/rootfs-mmcblk0p1/ai_test/tidl_test_lnx/test_data/tidl_io_seg_1.bin", "/run/media/rootfs-mmcblk0p1/ai_test/tidl_test_lnx/test_data/tidl_net_seg.bin");
    printf("1======%d\n", netId);

    ret = em_tidl_infer(netId,segInputP);
    ret = em_tidl_getOutput(netId, outputP);
    printf("2======%d\n", ret);

    FILE *outputFp = fopen("./em_seg_output", "w+");

#ifdef NOPAD
    /*
      od_pred_center_heat    96*(96+51)*5
      od_reg_map             96*(96+51)*2
      od_points_refine_map   96*(96+51)*4
      feature                384*(384+11)*9
      index                  384*384*1
    */

    //od_pred_center_heat    96*(96+51)*5
    int index = 0;
    int dataSize = 96*96;
    int padSize = 96*51; //96*51
    int channel = 5;
    int outputIndex=0;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //od_reg_map             96*(96+51)*2
    index = 0;
    dataSize = 96*96;
    padSize = 96*51; //96*51
    channel = 2;
    outputIndex=1;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //od_points_refine_map   96*(96+51)*4
    index = 0;
    dataSize = 96*96;
    padSize = 96*51; //96*51
    channel = 4;
    outputIndex=2;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //feature                384*(384+11)*9
    index = 0;
    dataSize = 384*384;
    padSize = 384*11 - 0x80; //384*11
    channel = 9;
    outputIndex=3;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
    //index                  384*384*1
    index = 0;
    dataSize = 384*384;
    padSize = 0; //no pad
    channel = 1;
    outputIndex=4;
    for(int i=0;i<channel;i++)
    {
        fwrite(outputP[outputIndex]+index, dataSize, 1, outputFp);
        index += dataSize;
        index += padSize;
    }
#else
    fwrite(outputP[0], 96*147*5, 1, outputFp); //od_pred_center_heat
    fwrite(outputP[1], 96*147*2, 1, outputFp); //od_reg_map
    fwrite(outputP[2], 96*147*4, 1, outputFp); //od_points_refine_map
    fwrite(outputP[3], 384*395*9, 1, outputFp); //feature
    fwrite(outputP[4], 384*384*1, 1, outputFp); //index
#endif
    fclose(outputFp);

    ret = em_tidl_net_deinit(netId); 
    ret = em_tidl_deinit(); 
    printf("3======%d\n", ret);
}



int main(int argc, char *argv[])
{
    

    InferPld("./test_data/pld/20210403_101047_000050.bmp");
    //InferOd("./test_data/od/20210706_154150_226_right.bmp");
    //InferSeg("./test_data/seg/20210313_000506_000025.bmp");

    return 0;
}

app_tidl_em.zip

0 Adam Hua 3 months ago in reply to Shinting

TI__Expert 4910 points

As we communicate, you can search for traceWriteLevel in your code and set it to 3 to get the intermediate results of each layer.

And it is good to know the results of using ti rtos executable produces the same results in PC and EVM. The issue seems to be in postprocessing of customer code. There isn't mismatch problem now.

This ticket can be closed then.

Regards,

Adam

Processors

Processors forum

TDA4VH-Q1: seg model infer fail