Part Number: SK-AM62A-LP
Tool/software:
Hi,
(please excuse if this question is not on-topic for this forum!)
we are using the "Linux SDK for edge AI applications on AM62A", version 10.00.00.08. In order to gain a better understanding of Tensorflow-performance, we want to compare inference on float- and int8-quantized models *without* using the "tidl_tfl_delegate", that is we expect Tensorflow to run (multithreaded) the XNNPack delegate for the float-quantized models and "ruy"-enhanced computations for the int8-quantized models.
Using the "perf record" tool as well as runtime-inspection, we see that these assumptions are correct: the inference runs the respective operations for both model types, is multithreaded and utilizes the ARM+SIMD (Neon) instructions. However, inference on the int8-quantized models performs much slower than XNNPack for the other type.
We found a seemingly related issue #21698 in the tensorflow repository (on github) (INT TFLITE very much slower than FLOAT TFLITE · Issue #21698 · tensorflow/tensorflow), one explanation offered pointed to a less-optimized ISA (on x86), which does not cover our case.
Other than using the TIDL delegate, is there some other idea to improve the performance of inference on INT8 models?
Kind regards
Stefan Birkholz