This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: Encountered exceptions when looping through file reads for TIDL inference ,

Part Number: TDA4VH-Q1

Tool/software:

We encountered exceptions when looping through file reads for inference using the original SDK for testing.

Sometimes it prints exceptions similar to register errors, and sometimes the program simply freezes without any feedback.

In most cases, exceptions occur when the process of reading data -> performing TIDL inference -> saving data is looped around 3000 times.

However, if the program only performs a looped read test on a single image, it works normally.

The demo I run is TI_DEVICE_armv8_test_dl_algo_host_rt.out whic compile in  c7x-mma-tidl/arm-tidl/rt

 

This is my infer config

/cfs-file/__key/communityserver-discussions-components-files/791/tidl_5F00_infer_5F00_east_5F00_single.txt

This is my infer datesets

/cfs-file/__key/communityserver-discussions-components-files/791/test_5F00_east_5F00_single.txt

I modified this part of the source code to support reading per-frame feature maps.This file is located at  c7x-mma-tidl/arm-tidl/rt/test/src/

/cfs-file/__key/communityserver-discussions-components-files/791/tidl_5F00_tb_5F00_utils.c

  • Hi,

    I have redirected your question to the appropriate engineer.

    Thank you,

    Fabiana

  • Hi Guang,

    Are you opening and closing ort::sessions between runs?  A memory leak can happen in this situation.

    Additionally, here is what I will need to accurately debug this.

    • What rev of SDK/TIDL are you using?
    • Is this on the host or device?
    • A zip file of the .bin files from test_east_single.txt, I assume they are inputs, and listing them out in the text  only tells me there is something missing
    • How you run this on the PC and the device (command line invocation)
    • What was modified in tidl_tb_utils?
    • What is meant by "However, if the program only performs a looped read test on a single image"?  Are you reading the same image over and over or do you save a copy of the image in memory and re-run it.  Also please provide the command line on how you run this.

    Chris

  • No,The demo TI_DEVICE_armv8_test_dl_algo_host_rt.out  does not involve ort::sessions.

    • And the SDK I used is  ti-processor-sdk-rtos-j784s4-evm-09_02_00_05.tar.gz 
    • On devices
    • you can use the python script to generate the input data

                           

    import os
    import numpy as np
    directory_path = "./data"
    if not os.path.exists(directory_path):
        os.makedirs(directory_path)
        print(f"Directory '{directory_path}' has been created.")
    else:
        print(f"Directory '{directory_path}' already exists.")
    featuremap_size = 3*512*512
    for line in open("./test_east_single.txt"): 
        t1 = np.random.randint(0,255,featuremap_size).astype(np.uint8)
        line = line.strip()
        t1.tofile(line)

    • I run the commad “./TI_DEVICE_armv8_test_dl_algo_host_rt.out
    • You can compare tidl_tb_utils.c with the original file in ti-processor-sdk-rtos-j784s4-evm-09_02_00_05.tar.gz. The modified code supports reading binary files line by line based on the test_east_single.txt.
    • If every line in file test_east_single.txt is the same. It's OK;

  • Hi Guang,

    Would it be possible for you to capture the values in the IEER and IEAR registers?  In the initial screenshot, the values may have been printed below what was captured in the screen capture.  A log output may be better than a screen capture for debugging purposes.  

    Regards,

    Chris

  • Hi Guang,

    I tried to  reproduce on our evm on sdk 9.2 but cannot see the problem you have:

    # NETWORK_EXECUTION_TIME =     0.91 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00000063935527256.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00000548692561272.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/011053.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00001346662392672.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00000138717023207.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00000521592019049.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00001189092925982.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.88 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file ./data/00001189092925982.bin
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file p�
                                       ���
    featuremap open failed p�
                             ���
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file p�
                                       ���
    featuremap open failed p�
                             ���
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.89 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file p�
                                       ���
    featuremap open failed p�
                             ���
     Freeing memory for user provided Net
     ----------------------- TIDL Process with TARGET DATA FLOW ------------------------
    
    # NETWORK_EXECUTION_TIME =     0.88 (in ms, c7x @1GHz) with DDR_BANDWIDTH (Read + Write) =     0.00,     0.00,     0.00 (in Mega Bytes/frame) ... .... .....opening file p�
    

    The test code cannot read more inputs as it already reached the bottom of your input list:

    ........
    ./data/0000032694881128.bin
    ./data/00000095986168515.bin
    ./data/413397.bin
    ./data/00000455051040242.bin
    ./data/0000037697380986.bin
    ./data/1561473.bin
    ./data/0000047811901727.bin
    ./data/2349480.bin
    ./data/20220212_190948_0000242.bin
    ./data/684389.bin
    ./data/0000022734689647.bin
    ./data/00000063935527256.bin
    ./data/00000548692561272.bin
    ./data/011053.bin
    ./data/00001346662392672.bin
    ./data/00000138717023207.bin
    ./data/00000521592019049.bin
    ./data/00001189092925982.bin

    Regards,

    Adam

  • Register_print.log

    This is all the register print log

  • Hi Guang,

    What sdk are you using? Why is your model running longer than myside? Can you help confirm  we are running the same model?

    ht@ht-OMEN:~/customer/hikauto/6318.Tmp$ md5sum tidl_net_onnx.bin 
    c7b34476e96b5a643b2db81be624e29c  tidl_net_onnx.bin
    ht@ht-OMEN:~/customer/hikauto/6318.Tmp$ md5sum tidl_io_onnx_1.bin 
    64e7e63f1f3c4bc7b7baa2138d0741b5  tidl_io_onnx_1.bin

    Regards,

    Adam

  • Due to confidentiality reasons, we have conducted model pruning when providing the model externally.
    This does not affect the ability to reproduce the issue.

  • Hi Guang,

    I see. But I can not reproduce your problem at my side. Can you try again on TI evm and default sdk?

    Regards,

    Adam