TDA4VM: Performance gap with <Performance and efficiency benchmarking with TDA4 Edge AI processors>

Jacky Lin

Part Number: TDA4VM

Hi,

As described in <Performance and efficiency benchmarking with TDA4 Edge AI processors>, Resnet50 could reach 162 fps, ssd-mobilenetv1 could reach 385 fps.

But I use download_models.sh in edgead-gst-apps to download and test resnet50（ONR-CL-6110-resNet50） and ssd-mobilenetv1（TFL-OD-2000-ssd-mobV1-coco-mlperf-300x300）with app_edgeai, from the log, dl-inference time cost is 7.54 ms(133 fps) with resnet50 and 3.08 ms（325 fps）with ssd-mobilenetv1, the gap is about 15%~20%.

Is there any thing I missed to boost its performance to reach the benchmark?

Thanks!

Jacky.Lin

over 2 years ago

0 Takuma Fujiwara over 2 years ago

TI__Guru 51503 points

Hi Jacky,

Apologies for the delayed reply. One difference could be how the performance numbers were collected. It is most likely that the appnote referred uses a different application to collect benchmarking numbers.

In the appnote <Performance and efficiency benchmarking with TDA4 Edge AI processors>, they most likely used a repository called edgeai-benchmark instead of edge_ai_apps: https://github.com/TexasInstruments/edgeai-benchmark. edgeai-benchmark is more of a smaller example focused on just benchmarking the models, whereas edge_ai_apps is more of a full demo that utilizes the entire SoC including more of the DDR, capture, and display. The gap may be coming due to more parts of the SoC being utilized and sharing resources between the different cores that are not exercised in edgeai-benchmark.

I will need a week or so to gather information on the appnote and try to reproduce those numbers. Please expect a response back on April 21 at the latest.

Regards,

Takuma

0 Jacky Lin over 2 years ago in reply to Takuma Fujiwara

Prodigy 40 points

Thanks Takuma, it would be helpful if there is a guide to reproduce those numbers with edgeai-benchmark easy, such as in edge_ai_apps.

BR,

Jacky

0 Takuma Fujiwara over 2 years ago in reply to Jacky Lin

TI__Guru 51503 points

Hi Jacky,

If you have some bandwidth, the edgeai-benchmark has some instructions to set it up, so you could do an experiment to see if the numbers can be reproduced using edgeai-benchmark: https://github.com/TexasInstruments/edgeai-benchmark

Unfortunately, I will need a few more days to take a deeper look. Thank you for your patience.

Regards,

Takuma

0 Takuma Fujiwara over 2 years ago in reply to Takuma Fujiwara

TI__Guru 51503 points

Hi Jacky,

Here is the result of my experiment. Using edgeai-benchmark for cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx, I was able to get the following numbers with SDK 8.6:

SUCCESS:20230422-002649: benchmark results - {'infer_path': 'cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx', 'accuracy_top1%': 75.48, 'num_subgraphs': 1, 'infer_time_core_ms': 6.615724, 'infer_time_subgraph_ms': 6.500432, 'ddr_transfer_mb': 26.965135, 'perfsim_time_ms': 0.0, 'perfsim_ddr_transfer_mb': 0.0, 'perfsim_gmacs': 0.0}

The points of interest are infer_time_core_ms, which is the total time of inference for 1 frame, and infer_time_subgraph_ms which is the portion of time spent in the AI hardware accelerator. Using edgeai-benchmark, we can get around 6.6 ms, which equates to around 151 FPS. This is around a 7% difference than the reported number in the app note, but I think this is a significant boost from the 20% gap that was originally seen with edgeai-gst-apps.

So to answer your original question, no, I do not think there is something that was missed to boost performance. The decrease in performance that you are observing is most likely due to the additional overhead from the application, when compared to a baremetal benchmark that eliminates overhead from capture, display, GStreamer, and other hardware/software.

Regards,

Takuma

Processors

Processors forum

TDA4VM: Performance gap with <Performance and efficiency benchmarking with TDA4 Edge AI processors>