This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Abstract: The multi-batch model inference in PC emulation mode has a precision problem caused by pad filling error (During the filling process of TIDL_layerPadding at a certain layer, the index calculation is incorrect, which results in filling the normal data with padValue(0)). We have an advised solution, please check it.
Description: The multi-batch model (numBatches = 10) inference in pc mode will encounter a precision problem. We use the same input data in each batch and expect the same output, but the rest of batches have different results with the first batch.
1. Model Importer Opration
Onnx: resnet18v2.onnx (https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v2/resnet18v2.onnx)
Shell command: ./tidl_model_import.out resnet18v2_importer.txt
Please refer to the attachment resnet18v2.zip for details about the configuration and model.1715.resnet18v2.zip
2. Model Inference Operation
Environment:ubuntu20 x86
First, we create params with tidl model bin files, according to the example in vision_apps. Second, we create tivxTIDLNode and vxGraph. Then we called vxVerifyGraph and vxProcessGraph.
In our program, the tiovx-related symbols is assessed by linking libvx_tidl_rt.so.
The input data of vx_tensor used by 10 copies of the same image, refer to ILSVRC2012_val_00008685.bin.
3. Error Information
As showed in the following picture, the output of other batches is different with the first.
4. Analysis and Solution
We dump the output data of each layer during inference by “traceWriteLevel=3” and compare each batch data in each dump file, then we found that the data are different between each batch from 43rd layer which is Pooling layer.
Further analyzing shows that the problem is caused by function TIDL_layerPadding which called after the 42nd layer(BatchReshape) process. In this function, originally correct output values are overwritten by padValue(0).
/* tidl_alg_utils.c TIDL_layerPadding */ if (((padRFillZeros > 0) || (TIDL_PADDING_TYPE_TOP_LEFT == paddingType)) && (TIDL_PADDING_TYPE_PAD_LAYER != paddingType)) { status = TIDL_FillPaddedRows((uint8_t *)outPtrs[j], ...); // has not called in 42nd layer } if((padC > 0) && (TIDL_PADDING_TYPE_PAD_LAYER != paddingType) && (status == IALG_EOK)) { status = TIDL_FillPaddedCols((uint8_t *)outPtrs[j], ...); // has not update bufInfo->bufHeight }
And the root cause is that, expression “bufInfo->bufHeight = bufInfo->bufHeight / numBatches”exists in TIDL_FillPaddedRows but TIDL_FillPaddedCols doesn’t. So the index of 42nd layer output which computed to be filled is wrong since it has not called TIDL_FillPaddedRows and hasn’t update the value of bufInfo->bufHeight. By the way, the “bufInfo->bufHeight” in TIDL_FillPaddedCols of 42nd is 5120, which normally should be 512.
Solution: Advance the update of bufInfo->bufHeight from TIDL_FillPaddedRows to its father function TIDL_layerPadding,see as follows:
/* tidl_alg_utils.c TIDL_layerPadding */ sBufferInfo_t *bufInfo = &intAlgHandle->perfSimOutput->sdataFlowInfo[i].bufInfo[OUT_FEAT_MAP][WRITE]; bufInfo->bufHeight = bufInfo->bufHeight / TIDLLayer->outData.dimValues[TIDL_DIM_BATCH]; // update bufHeight before filling if (((padRFillZeros > 0) || (TIDL_PADDING_TYPE_TOP_LEFT == paddingType)) && (TIDL_PADDING_TYPE_PAD_LAYER != paddingType)) { status = TIDL_FillPaddedRows((uint8_t *)outPtrs[j], ...); // has not called in 42nd layer } if((padC > 0) && (TIDL_PADDING_TYPE_PAD_LAYER != paddingType) && (status == IALG_EOK)) { status = TIDL_FillPaddedCols((uint8_t *)outPtrs[j], ...); // has not update bufInfo->bufHeight }
The output data is correct after this modification:
5.Questions and Requirements
a. Whether the solution to this problem is feasible and is there any other point that has not been considered?
b. Will this issue be fixed in later versions ?