This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Getting information for bounding box

Part Number: TDA4VM

Tool/software:

Hi

In the app_tidl_od demo application from the TI Processor SDK RTOS, how can I access and read the detailed information of the detected objects?
Specifically, I want to retrieve the detection results such as scores, and bounding box coordinates for each object, so that I can use or log this data further.

I understand that the inference output from TIDL is typically returned as a tensor or array. 

How can I access the detection metadata (scores, coordinates, class IDs) from the output tensors?

Is there a utility function or a specific data structure in the demo code that decodes the tensor into meaningful detection results?

If I want to log or export the raw detection results, what would be the correct way to read and interpret the tensor contents?

For reference i am attaching the image here . 

Thank You 
Regards,
Komal 

  • Hi Komal,

    If you turn up write trace level you will get the intermediate outputs in a trace/ directory (it must already exist).  In your inference file:


    writeTraceLevel = 3

    This will generate a .y file (int representation) and _float.bin file (float32) for each layer.  This data is the intermediate output of all the layers then use the generated SVG file in model-artifacts to match with the ONNX graph.

    Regards,

    Chris

  • Hi 
    Thanks for the help. 

    After running inference and dequeuing the output inside
    vx_status app_run_graph_for_one_frame_pipeline(AppObj *obj, vx_int32 frame_id)

    I would like to retrieve the actual detection results (bounding boxes, scores, and class IDs) for each detected object.

    I understand that TIDL inference outputs tensors/arrays, but my question is:

    - How can I retrieve the number of detected objects for a single frame inference?

    - Which API / data structure contains these decoded detection results for both sequential and pipeline ?

    - How do I access the metadata of each detection (bounding box coordinates, class ID, confidence score)?

    - Where in the demo code is the detection output tensor decoded into bounding boxes + scores + class IDs?

    - Is there a provided utility or structure in app_tidl_od that parses the raw tensor into object metadata?

    - If I want to log/export these results, what is the correct way to interpret the tensor contents (e.g., format of [x, y, w, h, score, class_id])?

    To clarify, I’m not asking about intermediate feature maps (writeTraceLevel output). Instead, I want the final post-processed detection results available after vxGraphParameterDequeueDoneRef() returns the output.

    Could you please point me to the exact function / API where the output tensor is parsed, so I can log or save bounding boxes and scores?

    Thank you 

    Regards,
    Komal

  • Hi Komal,

    I do not have your model so I can only provide general guidelines.   Also, I can only comment on TIDL and please focus on that before adding the TIOX complexity.  Ensure your model is working correctly stand alone first. The output tensors are generated in a file designated in the inference configuration file.  For example:

    outData = ./out/result1.bin

    This file is input to PC_dsp_test_dl_algo.out in emulation.  For example ./PC_dsp_test_dl_algo.out s:myinference_file.txt

    After running your model the above will place all the output tensors in the file designated by outData.  They will be in the model's order but you can set the order.  For example, if you have offset, embed, and attr as outputs in your model, you can set this in the import config file by:

    outDataNamesList = "offset,embed,attr"

    Now lets say offset is 1,3,24,24 tensor, embed is 1,1,24,24 tensor, and attr is a 1,1,5,5 tensor.   The code to read the output tensors.

    offset_size = 1*3*24*24

    embed_size = 1*1*24*24

    attr_size = 1*1*5*5

    all_outputs =  np.fromfile('out/result1.bin',dtype=np.uint8)

    offset = all_outputs[0:offset_size]

    embed = all_outputs[offset_size:embed_size]

    attr = all_outputs[offset_size+embed_size:attr_size]

    The interpretations at this point is model specific and you, as the model user, should be able to make sense of what each tensor means. 

    Regards,

    Chris

  • Hii

    Vision SDK Used:
    /RTOS/ti-processor-sdk-rtos-j721e-evm-11_00_00_06/vision_apps

    Sample Application:
    app_tidl_od (YOLO-based object detection demo)
    The below config and network files which are already from the tidl samples present . 
    - > Config file:
    /ti/j7/workarea/tiovx/conformance_tests/test_data/psdkra/tidl_models/tidl_io_onnx_yolo_8200_416_1.bin

    - > Network file:
    /ti/j7/workarea/tiovx/conformance_tests/test_data/psdkra/tidl_models/tidl_net_onnx_yolo_8200_416.bin

     Target Platform:
    On-board testing with TDA4VM J721e EVM

     Regarded Outputs (for a single frame):

    0000000500_scaler_416x416.yuv
    pre_proc_output_0000000500_416x416_ch0.yuv
    tidl_output_0000000500_tensor_1_200x1x1_ch0.bin
    tidl_output_0000000500_tensor_0_5x200x1_ch0.bin
    mosaic_output_0000000500_1920x1080.yuv

    The .yuv files are clear (scaler input, pre-processed input, and mosaic output).

    The .bin files (tidl_output_*) contain raw tensor data.

      We need to access and interpret the detection results (bounding box coordinates, class IDs, and confidence scores) from the tensor output files. Specifically:

    1) What metadata format do these tensors follow?

    2) How can we parse the .bin tensor outputs to extract bounding boxes and detection scores?

    3) Is there an example or utility provided in Vision Apps / TIDL SDK that shows how to decode these tensors into human-readable detection information?


    Our Interpretation :
    Based on YOLO model structure, we received :

    tensor_0 (shape: 5×200×1) → Possibly contains bounding box coordinates and objectness scores. Each row may represent [x, y, w, h, confidence] for up to 200 proposals.

     tensor_1 (shape: 200×1×1) → Possibly contains the class ID (or class confidence distribution) for each of the 200 proposals.

    Final detection looks like :

    bounding_box = (x, y, w, h)
    score        = confidence
    class_id     = from tensor_1

    “I would like confirmation on whether this interpretation is correct. If not, could you please provide the exact tensor decoding scheme used by app_tidl_od with the YOLO model? In general bounding boxes will have 4 points but in this case we have only x, y, w, h how do we verify/validate the output is correct of bounding boxes ?

    ThankYou 
    Regards.
    Komal 

  • Hi Komal,

    Use something like this to read the .bin files:

    tensor1 = np.fromfile('tidl_output_0000000500_tensor_1_200x1x1_ch0.bin',dtype=np.uint8)

    The uint8 is the most common TIDL output but if you changed the output to int8/unit16/int16 adjust accordingly.   Then do:

    tensor1.reshape(tensorshape)

    You really should be testing this in emulation first and without OpenVX.  Isolate the problem first, then add complexity.  You must have edgeai-tidl-tools installed somewhere to compile the model.  Run your model in emulation first (https://github.com/TexasInstruments/edgeai-tidl-tools).

    Regards,

    Chris