This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[TDA4VM]App_tidl_od:Tidl Intermediate Output Results

Hi

We have observed the  outputs generated from the tidl_od application using "writeTIDLOutput" function  by enabling the macro, "WRITE_INTERMEDIATE_OUTPUTS"  were different for different boots.

We have also used single image as multiple frames (800 or more frames)with our custom model for different boots.Observed the same issue .

Is this behaviour expected?

This issue was not observed  with standalone TIDL model inference (PC_dsp_test_dl_algo.out) for multiple runs for our custom model.

Also observed DHCP client timed out before de-init.Would this cause any issue?Please find the below logs:

J7EVM@QNX:/ti_fs/vision_apps# ./vx_app_tidl_od.out --cfg app_od.cfg
APP: Init QNX ... !!!
appIpcInit: IPC: Init QNX ... !!!
appIpcInit: IPC: Init ... Done !!!
334.805994 s: REMOTE_SERVICE: Init ... !!!
334.806108 s: REMOTE_SERVICE: Init ... Done !!!
APP: Init ... Done !!!
334.806146 s: VX_ZONE_INIT:Enabled
334.806160 s: VX_ZONE_ERROR:Enabled
334.806173 s: VX_ZONE_WARNING:Enabled
334.806382 s: VX_ZONE_INIT:[tivxInit:85] Initialization Done !!!
334.806416 s: VX_ZONE_INIT:[tivxHostInit:48] Initialization Done for HOST !!!
Default param set!
Computing checksum at 0x00000057DA99FC80, size = 887792
[C7x_1 ] 335.153376 s: VX_ZONE_WARNING:[tivxKernelTIDLCreate:614] All Interrupts DISABLED during TIDL process
[MCU2_0] 340.967601 s: DHCP client timed out. Retrying.....
[MCU2_0] 508.967560 s: DHCP client timed out. Retrying.....
app_tidl_od: Iteration 0 of 1 ... Done. 570.238080 s: VX_ZONE_INIT:[tivxHostDeInit:56] De-Initialization Done for HOST !!!
570.246042 s: VX_ZONE_INIT:[tivxDeInit:130] De-Initialization Done !!!
APP: Deinit ... !!!
570.246108 s: REMOTE_SERVICE: Deinit ... !!!
570.246169 s: REMOTE_SERVICE: Deinit ... Done !!!
IPC: Deinit ... !!!
IPC: Deinit ... Done !!!
APP: Deinit ... Done !!!

Note: SDK used :ti-processor-sdk-rtos-j721e-evm-07_01_00_11

Thanks

Sithara Tresa Chacko

  • Hi Sithara,

        Instead of comparing complete tensor can you try comparing the final output only for the valid detected objects? I am suspecting that as final output is dumping the complete tensor for worst case and invalid detections are having garbage values. So if you can compare the output only for valid detected object and even if something is different then we need to see look into more details.

       You can refer the following function on how to read the final output 

    Function Name : tidl_tb_postProc

    File Name : tidl_image_preproc.c

      Look for the  code inside the condition else if (gParams.postProcType == 2)


    Regards,

    Anshu

  • Hi Anshu,

    Thankyou for the response.
    Our custom deep learning model is a three layer output model with C postprocessing (which we have added in app_tidl_od as a new node)
    We were able to get meaningul detections but not consistent for different boots. we have debugged and observed that the tidl output is different for different boot iterations for a dataset of 800+ frames. Can tidl outputs be inconsistent for different runs?

    Thanks &Regards
    Sithara Tresa Chacko

  • Hi Anshu,

    Our observation is that tidloutput is inconsistent only for different boots.

    For multiple runs of application  in a single boot tidloutput is consistent.

    Does it have anything to do with vision_apps_init.sh execution for every boot?

    Thanks &Regards
    Sithara Tresa Chacko

  • Hi Sithara,

        Sorry for delay in the response ( i was on sick leave). The behavior mentioned here is not expected but the SDK which you are using i.e. SDK 7.1 is more than a year old  and I would recommend you to try with the latest SDK ( SDK 8.1).

    Regards,
    Anshu

  • Hi Anshu

    Thankyou for the reply.

    Sure,I will check the behaviour on latest SDK.

    For tidl_od demo application on sdk 7.1 (QNX+TI RTOS)

    Although the bounding box information are same in different boots,we observed the inconsistency in tidl output only in different boots even with peelenet .

    We have run the same image as 128 frames.

    Below attached file contains the tidlwriteoutput and logs for your reference.

    app_tidl_od_out_diff.zip

    Please let us know if this difference is expected in tidloutput for any sdk version.

    Thanks

    Sithara Tresa Chacko

  • Hi Anshu

    We have checked the behaviour of tidloutput on ti-processor-sdk-rtos-j721e-evm-08_01_00_13 .Please find the attached zip file for multiple iterations for a single image(0000000505.yuv from tidl_demo_images in ti dataset)

    tidl_outputs.zip

    Tidl_od application is modified to run on sequential mode.

    1.Does the tidl binary output( bin files dumped for each frame using WRITE_INTERMEDIATE_OUTPUTS )differ for each run/boot of application?

    2.We observed the bounding box info also differ by 3-4 pixels for xmin and ymin values for the same image for each run/boot.

    3.We basically wanted to understand the impact of this difference to the succeeding nodes such as tracker.

    Let us know if above observation is expected,If so what are the possible reasons?

    Thanks &Regards

    Sithara Tresa Chacko

  • Hi Sithara,

           Just to understand the problem correctly. What you are observing is with the same image across different boots intermediate layer output is changing. Is this understanding correct? If yes can you confirm if the mismatch is happening at the same layer or its random?

    1.Does the tidl binary output( bin files dumped for each frame using WRITE_INTERMEDIATE_OUTPUTS )differ for each run/boot of application?

    For the same input I will not expect output to differ

    Regards,

    Anshu

  • Hi Anshu 

    Software modification : In ti-processor-sdk-rtos-j721e-evm-08_01_00_13 ,We have observed that user_object_array "trace data" initialisation and creation was missing in the modules/app_tidl_module.c and we have added the code with respect to previous sdk releases.

    We are able to capture layer trace for 24 identical frames (0000000505.yuv from tidl_demo_images in ti dataset) for two different boots and results were matching.We are not able to validate the results for remaining frames  104 identical frames as  tidl_trace dumping was time consuming and the logs were freezed.

    For eg:It took around 5 hours to dump 27 frame traces.

    Our previous observation was we were able to see mismatching  tidloutputs after some iniitial frames ,but the initial frame count was also inconsistent.

    It would be better if you can validate the same at your end and let us know the observation. Or Is there any other way to check the same?

    Thanks &Regards

    Sithara Tresa Chacko

  • HI

    Is there any updates?

  • Hi Sithara,

       I don't expect any dumps to take more than 5 hours. This is indicating something wrong is happening during trace dump. What is the size of the tensor where the execution is getting stuck?


    Regards,

    Anshu

  • Hi Anshu

    For the first iteration it got stuck at 27 th frame 89 th layer of tensor size(1*64*64*32)with size 0 bytes

    and for  the second iteration it got stuck at 25 th frame  11 th layer of tensor size (1*16*256*128) with 0 bytes.

    Thanks 

    Sithara Tresa Chacko

  • Hi Sithara,  

         Just one more clarification, if you run the network without intermediate traces then you do not see any issue?


    Regards,

    Anshu

  • Hi Anshu

    No we were not able to observe any such issues with intermediate trace disabled .

    Thanks &Regards

    Sithara