This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-DRA8X-TDA4X: tivxMapTensorPatch() Stride

Part Number: PROCESSOR-SDK-DRA8X-TDA4X

Hello,

I am trying to create an application using QNX 7.0 SDP to test out the DNN process. I am converting a ONNX based model using the TIDL importer. The model used float input for a point cloud (x,y,z,r). I am trying to figure out the nomenclature for the stride input the tivxMapTensorPatch() function. I am seeing plenty of examples relating to a camera based input but not a point cloud or other generic tensor type input.

So far, I am trying to base my code on the example located in tiovx/ch05_tidl/vx_tutorial_tidl.c:readInput. In the creation of the tensor I do not populated stride but I am unclear what I need for the tivxMapTensorPatch() function. The parameter description of "An array of stride in all dimensions in bytes. The stride value at index 0 must be size of the tensor data element type." does not make sense to me. The input type will be uint8_t after the importer. Is there any pictorial documentation that describes these terms in relation to a generic tensor (not picture or camera based example) using the rank of the tensor that maps to the sTIDL_IOBufDesc_t structure?

Thanks,

Weston

  • Hi,

    If the tensor is going to be passed to TI-DL then it has to be of the following type VX_TYPE_UINT8, VX_TYPE_INT8, VX_TYPE_UINT16, VX_TYPE_INT16. TI-DL doesn’t support float as input. The import tool converts a floating point model into fixed point and therefore, the input has to be of type 8-bits or 16-bits integer. If the model was trained with floating point input, TI-DL can apply scaling to the input in order to back-map from integer space to float space and this scaling can be specified in the import configuration file as parameters inDataNorm, inMean and inScale. See the TI-DL user’s guide, chapter ‘TIDL importer’. Usually when we run computer vision algorithms, input data are either 8-bits or 12-bits pixels, so we don’t specify these parameters during import. But for models like yours, then we need to specify. See for example the import config file tidl_import_mobileNetv2_2MP.txt where we normalize the pixel values [0 255] to the range [-1, 1] because the model was trained with float values.

    Now regarding the dimension of the tensor, please refer to function createInputTensors() in vision_apps/apps/dl_demos/app_tidl/main.c

    You should use this function as it is. The only think you may want to change is to replace ‘VX_TYPE_UINT8’ with the actual type you are using.

    Although the rank of the tensor is 2, TI-DL can only deal with tensor of rank 3, because the 3rd dimension is the # channel. In your case, the # channel should be 1. But that should be passed from ioBufDesc->inNumChannels[id]; which comes from the output of the import tool.

    Also since each element in the tensor is a quadruplet x,y,z,r , the width of the tensor should be 4x the number of data points. But this is something that had to be passed to the parameter ‘inWidth’ of the import tool.

    For instance if your tensor is a 2-D array of 100 points x 100 lines with each point being a quadruplet x,y,z,r, then the parameters are inWidth=400 and inHeight=100, inNumChannels=1 .

    The function createInputTensors() will assign the following values:

    input_sizes[0] = ioBufDesc->inWidth[id] /*400*/  + ioBufDesc->inPadL[id] + ioBufDesc->inPadR[id];

    input_sizes[1] = ioBufDesc->inHeight[id] /*100*/+ ioBufDesc->inPadT[id] + ioBufDesc->inPadB[id];

    input_sizes[2] = ioBufDesc->inNumChannels[id] /*1*/;

    The padding values are specified by your model: if the first layer is a convolution layer then you model must have specified some padding.

    To verify that these padding values were correctly taken by the import tool. You can open the file *.svg that was generated by the import tool along with the output *.bin file. This *.svg file has a graphical representation of the network. This graphical representation has metadata for each layer, describing its layout. If you use google chrome to open the file, you can hover the mouse over a layer and a box will pop-up with all its attributes, including padding values.

    For description of sTIDL_IOBufDesc_t structure, please refer to TIDL user’guide, Feature Specific Guides -> TIDL Input and Output tensors Format.

    regards,

    Victor

  • Hi Victor,

    Thank you for the information. I was able to generate a SVG for the model tidl_net_mobilenetv2.onnx since it uses configuration file parameters inDataNorm, inMean and inScale. I am not able to match these values directly to the structure sTIDL_IOBufDesc_t. Can you map the terms in bold to the proper structure variable in sTIDL_IOBufDesc_t?

    <!-- Layer 0 Data ID: 0: TIDL_DataLayer 0_original
    Input Dimensions : 0x0x0x0
    Output Dimensions : 1x3x224x224
    In Data IDs:

    Space : 2
    Base Mem: 0 (0.000) &#45; 152576 (0.153)
    Size : 152576 (0.153)
    Ch Pitch : 50852
    PadC_IO = 0, 1
    PadCR = 1, 1
    PadCRZeros = 1, 1
    PadCRFillZeros = 0, 0
    -->

    I also reviewed the "TIDL: Input and Output Tensors Format" documentation. This article only describes if the input is an image not as another data type (i.e. point cloud). I also read that if using inDataNorm, inMean and inScale that "TIDL aby adding BatchNorm Layer before passing input tensor to first processing layer". Does this mean when I create the input tensor with vxCreateTensor(context, 3, input_sizes, VX_TYPE_FLOAT32, 0) it would be type float because TIDL will handle the conversion before the data layer of the model? Then I would use the vxMapUserDataObject() functions to populate the input tensor with my float inputs? I am really struggling to understand this due to the fact that all variable nomenclature, code examples, and helper util functions (load image) are all image based.

  • Hi Weston,

    In:

    Space : 2 
    Base Mem: 0 (0.000) &#45; 152576 (0.153) 
    Size : 152576 (0.153) 
    Ch Pitch : 50852
    PadC_IO = 0, 1
    PadCR = 1, 1
    PadCRZeros = 1, 1
    PadCRFillZeros = 0, 0

    PadC_IO=A ,B . The 'I' means the first number A is the amount of column padding at the input and the 'O' means that the second number B is the padding at the output. So here we have '0' padding at the input and '1' padding at the output. Since it is a data layer you are dealing with, padding at the input will always be zero. You only need to take into account the value '1' as padding. A data layer doesn't really have any input, it is always a producer, never a consumer. If the layer was a convolution layer, than the input padding of that layer would likely be a non zero value.

    PadCR=A, B means that the first number A is the amount of column padding at the output and the second number B is the amount of row padding at the output. Usually A=B. 

    You can ignore PadCRZeros, PadCRFillZeros .

    Regarding your second question, the input tensor should always be in integer format so VX_TYPE_FLOAT32 will produce incorrect results. Correct format would be  VX_TYPE_UINT8, VX_TYPE_INT8, VX_TYPE_UINT16, VX_TYPE_INT16 .

    The parameters inDataNorm, inMean and inScale are used by TIDL to convert the integer input to float, internally.

    regards,

    Victor

     

  • The function TIDL_writeInfo() in the file tidl_import_common.cpp has more details:

    gIOParams.inWidth[numDataBuf] = tIDLNetStructure->TIDLLayers[i].outData[j].dimValues[TIDL_DIM_WIDTH];
    gIOParams.inHeight[numDataBuf] = tIDLNetStructure->TIDLLayers[i].outData[j].dimValues[TIDL_DIM_HEIGHT];
    gIOParams.inNumChannels[numDataBuf] = tIDLNetStructure->TIDLLayers[i].outData[j].dimValues[TIDL_DIM_NUMCH];

    gIOParams.inPadL[numDataBuf] = tIDLNetStructure->TIDLLayers[i].outData[j].padW;
    gIOParams.inPadT[numDataBuf] = tIDLNetStructure->TIDLLayers[i].outData[j].padH;
    gIOParams.inPadR[numDataBuf] = 0;
    gIOParams.inChannelPitch[numDataBuf] = tIDLNetStructure->TIDLLayers[i].outData[j].pitch[TIDL_CHANNEL_PITCH];

    int32_t totalHeight = (gIOParams.inChannelPitch[numDataBuf] +
    gIOParams.inWidth[numDataBuf] + gIOParams.inPadL[numDataBuf] -1)/ (gIOParams.inWidth[numDataBuf] + gIOParams.inPadL[numDataBuf]);
    gIOParams.inPadB[numDataBuf] = totalHeight - gIOParams.inPadT[numDataBuf] - gIOParams.inHeight[numDataBuf];

    Basically padW gets mapped to inPadL, padH to inPadT, padR=0 and padB uses a formula.

    You can insert some printf() and rebuild the import tools so this information gets displayed during import.

    regards,

    Victor