This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-TDA4VM: Question about EdgeAI App's performance

Part Number: SK-TDA4VM
Other Parts Discussed in Thread: TDA4VM

Hi,

I tried to check the SK-TDA4VM DL performance using our custom model.
I have a few questions compared to the TDA4VM.

Q1. I compiled custom model using EdgeAI benchmark, and it works. Is the compiled model's all nodes assigned to the C7x core and executed?

Q2. In artifacts folder, there is SVG files. In `tidl_net.bin.svg`, there are nodes with different colors, what does this mean? it means execute core? I attached SVG files.

Q3. I checked the performance using perf_stats(In edge_ai_apps/scripts). It shows only MSC statistics in HWA performance.
       The EdgeAI Apps docs show it using MSC, HWD and DSS. Why isn't the HWD, DSS printed in perf_stats?

Q4. When comparing the performance of SK-TDA4VM and TDA4VM, the usage of A72 and C66x cores is measured high.
      I think show this result because A72 core use GStreamer plugins and C66x use two tiovxcolorconverter in SK-TDA4VM's EdgeAI Apps. is that right?

artifacts.zip

Regards,
Lee.

  • Hi Lee,

    I will let a TIDL expert to answer your questions Q1 and Q2.

    For Q3
    The perf_stats mostly shows performance statistics related to VISS/MSC/LDC/NV/DOF/SDE/C66x/C7x loads and DDR bandwidth. On EdgeAI SDK, the CSI2Rx, DSS, HWD, HWE are linux device drivers. So they are not measured like other RTOS cores and accelerators. I am surprised that in the TDA4VM column you are showing LDC load, what demo are you running?

    For Q4
    There could be multiple things depending on what you are using, if you have connected a USB camera then the jpeg decoder will run on A72 which would increase the load. Also some post-processing logic after DL will be using OpenCV on ARM for drawing rectangles, text etc. which might be taking some load. GStreamer standalone as such does not take too much of A72 load and majority of custom elements will be offloaded to accelerators.

    Regards,
    Shyam

  • Hi Shyam,
    Thanks for answer.

    For Q3.
    I forgot the difference between TDA4VM and SK-TDA4VM that I saw in the FAQ. sorry.
    And in the table above, it is correct to use LDC for TDA4VM(Camrea based DL custom app), but SK-TDA4VM's LDC column is a typo error because it cannot be measured in SK-TDA4VM.

    For Q4.
    Okay. Video, RTSP input is the cause.

    And I had more question using EdgeAI devices.

    Q5. In my case, The DL-inference time is fast, but total time and frame rate is too slow. Is it possible to measure the time of any part other than DL inference?
          The EdgeAI App's data flow has 3 color convert process(including Pre-proc). so I think FHD input is too slow than HD input.

    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  9.00 ms (avg  9.16 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.89 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.39 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference': 10.00 ms (avg  9.16 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.89 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.39 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  8.00 ms (avg  9.16 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.89 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.39 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  9.16 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.89 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.39 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  9.16 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.89 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.39 fps

    Input: FHD → Output: FHD

    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  7.24 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 33.28 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 30.40 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  7.24 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 33.27 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 30.40 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  7.24 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 33.27 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 30.40 fps

    Input: HD → Output: HD

    Q6. The EdgeAI Apps sometimes shuts down with this error. Why this problem happening?
           It occurs frequently in RTSP input as well as video input, so i think it's too unstable. (FHD single Input → single 512x512DL → FHD Output)

    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  9.08 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.70 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.59 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  9.08 ms)
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'total time': 39.70 ms
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Metric 'framerate': 25.59 fps
    [UTILS] [Model Name:   ssd_lite_mobilenet_v2_fpn_512x512_onnx] Time for 'dl-inference':  7.00 ms (avg  9.08 ms)
    [00:14:18.000.000000]:ERROR:[getBuffer:0240] [flow0_sensor0] Could not get data from Gstreamer appsink.
    [00:14:18.000.000082]:ERROR:[inferenceThread:0292] Could not get 'camera' buffer from Gstreamer   445.949788 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:100] De-Initialization Done for HOST !!!
       445.954246 s:  VX_ZONE_INIT:[tivxDeInitLocal:193] De-Initialization Done !!!
    APP: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... Done !!!
    IPC: Deinit ... !!!
    IPC: DeInit ... Done !!!
    MEM: Deinit ... !!!
    MEM: Alloc's: 64 alloc's of 211693184 bytes
    MEM: Free's : 64 free's  of 211693184 bytes
    MEM: Open's : 0 allocs  of 0 bytes
    MEM: Deinit ... Done !!!
    APP: Deinit ... Done !!!
    root@j7-evm:/opt/edge_ai_apps/apps_cpp#

    Q7. The SK-TDA4VM module is too hot. I tested DL inference several times and the module rebooted, i think because it's too hot. I checked module's temperature, it was about 85℃. Is there a program that can check the soc temperature?

    Regards,
    Lee.

  • Hi Lee,

    Regarding Q1 and Q2,

    1. In the runtimes_visualization.svg, the nodes in the original model are depicted, and the single box around the graph here depicts all the nodes are delegated to C7x. Multilple boxes would indicate multiple subgraphs being offloaded to C7x and remaining nodes being executed on ARM.

    2. The *tidl_net.bin.svg files indicate TIDL imported subgraphs which are delegated to C7x. They are colored based on layer types, and all nodes here will be executed on C7x.

    Regards,

    Anand 

  • Hi Anand,

    Thanks for kindness answer.

    Does the model's GigaMACS affect the usage of C7x cores?
    In my test, models with larger GigaMACS had higher C7x core usage than model with small GigaMACS.

    Regards,
    Lee.

  • Hi Lee,

    Can you please explain how you are defining C7x usage here? The larger the model, the larger the compute. However, the performance of the model is not necessarily directly proportional to the GMACs since the efficiency of individual layer's computation on C7x depends on its properties and our hardware architecture.

    Regards,

    Anand