AM68A: How to run my face/landmark detection model

Kojima Hiroyuki

Part Number: AM68A
Other Parts Discussed in Thread: SK-AM68, MATHLIB

Hi,

After the last inquiry, we continued our investigation and were able to partially operate the face detection and landmark detection models on the AM68A.
However, the errors and estimation accuracy are poor and I would like to improve it, so I would like some advice.
I am attaching the edgiai-tidl-tools@Host/edgeai-gst-apps@EVM code for your reference.

[my selection model]
*The DNN models are unified because we want to compare performance with other companies' products.
- Face detection
　github.com/.../Ultra-Light-Fast-Generic-Face-Detector-1MB
- Landmark detection

[create model-artifact]
- Exported onnx file with op11 specification using original checkpoint file.
Next, I created model-artifact from onnx using edgiai-tidl-tools.

[Issue]
-Face detection-1: I want to resolve the build error. I would like to know why the graph output is split into two parts.
>> python3 onnxrt_ep.py -c -m LightFace_version-slim-320

-Face Detection-2: Worked on AM68A by reducing box calculations from Face Detection-1. However, I don't know how to create prototxt so it is not optimized and I would like to solve this problem.
>> python3 onnxrt_ep.py -c -m LightFace_version-slim-320_without_postprocessing

-Landmark estimation: Worked with AM68A. However, the accuracy of the estimated position is low. I am using a small number of images for model-artifact generation, should I increase them?
>> python3 onnxrt_ep.py -c -m 3DDFA_mb1_120x120

1018.test_project.zip

over 1 year ago

0 Pratik Kedar over 1 year ago

TI__Mastermind 24041 points

Hi,

Could you please confirm my understanding on your env setup.

1. You are using am68a SK

2. You are using edgeai-linux-sdk, what version ?

3. You are using tidl-tools for artifacts generation, what version of tidl-tools (which github tag you are using ?)

lets take one issue at a time,

Kojima Hiroyuki said:
[Issue]
-Face detection-1: I want to resolve the build error. I would like to know why the graph output is split into two parts.
>> python3 onnxrt_ep.py -c -m LightFace_version-slim-320

Can you elaborate the model compilation flow you are following ?

Also can you revert with debug_level 2 compilation logs ?

Before model compilation have you tried arm mode execution for the same ? If not i would recommend to please follow the same (you can use -d to offload on arm core)

I will able to understand more on the issue, with above info.

0 Kojima Hiroyuki over 1 year ago in reply to Pratik Kedar

Prodigy 50 points

Hi,

Thank you for your reply, I will answer your question.

1. You are using am68a SK
→ SK-AM68
2. You are using edgeai-linux-sdk, what version ?
→ Ubuntu 22.04.LTS
3. You are using tidl-tools for artifacts generation, what version of tidl-tools (which github tag you are using ?)
→ edgeai-tidl-tools-09_01_03_00

4. Can you elaborate the model compilation flow you are following?
→ Attach the work code.

After the advice, I specified debug_level 2 compile log and checked the output result.

I found out that an error occurred because torch.exp was an unsupported instruction. I created a lookup table to work around the problem, but I encountered another error where the association between layers failed. The original code uses slice to divide the tensor into two, but this causes an error, so split is used. Please tell me the correct way to use it.

(working test code)
/Ultra-Light-Fast-Generic-Face-Detector-1MB/vision/ssd/ssd.py
/Ultra-Light-Fast-Generic-Face-Detector-1MB/vision/utils/box_utils.py
- center_form_to_corner_form(original)→center_form_to_corner_form_expands(modified)
- convert_locations_to_boxes(original)→convert_locations_to_boxes_expands(modified)

(How to create version-slim-320_op11.onnx)
　　sh convert_to_onnx.sh

0 Kojima Hiroyuki over 1 year ago in reply to Kojima Hiroyuki

Prodigy 50 points

Sorry, there was an error in the operation. Re-upload the dbgLV2 log files.

dbgLV2_LightFace_version-slim-320.zip

0 Kojima Hiroyuki over 1 year ago in reply to Kojima Hiroyuki

Prodigy 50 points

Re-upload face-detection test project files.

Ultra-Light-Fast-Generic-Face-Detector-1MB_testproject.zip

0 Pratik Kedar over 1 year ago in reply to Kojima Hiroyuki

TI__Mastermind 24041 points

Hi Kojima,

Kojima Hiroyuki said:
2. You are using edgeai-linux-sdk, what version ?
→ Ubuntu 22.04.LTS

I was expecting SDK version here, the edgeai-linux-sdk version you have flashed for target !

Kojima Hiroyuki said:
After the advice, I specified debug_level 2 compile log and checked the output result.

https://e2e.ti.com:443/home/parallels/Downloads/dbgLV2_LightFace_version-slim-320_240318.log

This is not visible, so am referring the latest log that you have attached here,

Kojima Hiroyuki said:
Sorry, there was an error in the operation. Re-upload the dbgLV2 log files.

dbgLV2_LightFace_version-slim-320.zip

From the debug logs shared above, it seems like, correct me in case,

The answer to the flow that you are using is edgeai-tidl-tools, 09_01_03_00 sdk version, SoC am68a

The model you compiled is LightFace_version-slim-320, can you help me locate that in shared zip as as its shared content is overwhelming.

Secondly, as I see there are 108 layers in your model out of which 107 are currently supported by TIDL, layer exp is not supported, that leads to create 2 subgraphs.

From the error logs, Calling ialg.algAlloc failed with status = -1120 this is specific to unsupported layer.

Can you share,

common_utils.py to check the compiler flags setting

model_configs.py to check mean, scales and num classes specific info

generated svg file for the above model, to look into

You can look into standard examples we have listed in model_config.py files, look for specific to detection, for your current use case.

0 Pratik Kedar over 1 year ago in reply to Pratik Kedar

TI__Mastermind 24041 points

Regarding other 2 issues,

Kojima Hiroyuki said:
Face Detection-2: Worked on AM68A by reducing box calculations from Face Detection-1. However, I don't know how to create prototxt so it is not optimized and I would like to solve this problem.
>> python3 onnxrt_ep.py -c -m LightFace_version-slim-320_without_postprocessing

You can refer to our documentation here : https://github.com/TexasInstruments/edgeai-tidl-tools/blob/09_01_06_00/examples/osrt_python/README.md#object-detection-model-specific-options

Kojima Hiroyuki said:
Landmark estimation: Worked with AM68A. However, the accuracy of the estimated position is low. I am using a small number of images for model-artifact generation, should I increase them?
>> python3 onnxrt_ep.py -c -m 3DDFA_mb1_120x120

Could you let us know, how are you comparing the accuracy of this flow ? You can use more images in your calibration set, and increase the calibration iteration to significant number, this will help.

0 Kojima Hiroyuki over 1 year ago in reply to Pratik Kedar

Prodigy 50 points

Hi,
Thank you for reply.

<My edgeai-linux-sdk, version>
Checked the EVM version
- PROCESSOR-SDK-LINUX-AM68 : 09.01.00.06(06 Dec 2023)
- Linux version : Linux version 6.1.46-g5892b80d6b
- Linux kernel : 5.15.0-46-generic

<Face Detection-2>
After that, I am trying to convert from torch.exp to LUT, but a problem is occurring.
Specifically, when converting a LUT by specifying the Gather instruction, if you export it to ONNX, it will be converted to the GatherElements instruction, which is an unsupported instruction.
I'm also considering torch.index_select, but I don't know how to effectively use a 2D index.

pytorch_tensorLutTest.zip

I will send you the test code for the relevant part, so could you give me some advice?
- see : ee = exp_tbl.gather(dim=1, index=idx)

<Landmark estimation>
The original ONNX is trained using 300W-LP for learning. Does building with TIDL require the same amount of training?

0 Pratik Kedar over 1 year ago in reply to Kojima Hiroyuki

TI__Mastermind 24041 points

Hi Kojima San

Firstly will appreciate the details as asked here :

Pratik Kedar said:
The model you compiled is LightFace_version-slim-320, can you help me locate that in shared zip as as its shared content is overwhelming.

Pratik Kedar said:
Can you share,

common_utils.py to check the compiler flags setting

model_configs.py to check mean, scales and num classes specific info

generated svg file for the above model, to look into

I wanted to check exp operators delegacy to ARM core.

Are you planning to replace exp operator from your torch model with supported one ? to delegate it to dsp for better speedup ?

I can add algo expert for better comment on alternative suggestions, meanwhile to answer on gather operator usage,

from the TIDL side we do support the gather, with few constraints

'line/vector' gathers are only supported for now
‘index_select’ operator in pytorch can be used to generate this operator in the onnx mode with restriction on indices tensor to be one-dimensional

You can read more about supported operators here : github.com/.../supported_ops_rts_versions.md

Kojima Hiroyuki said:
<Landmark estimation>
The original ONNX is trained using 300W-LP for learning. Does building with TIDL require the same amount of training?

What doe you mean by TIDL training here ?

0 Kojima Hiroyuki over 1 year ago in reply to Pratik Kedar

Prodigy 50 points

Thank you for your reply.

Since it is long, I will only respond to face detection messages.

Regarding face detection, I expected that it would be easier to understand if you could obtain the original from GITHUB and compare it with the difference command output,
but the answer is that it is difficult to do so, so I will present the difference.
There are 7 differences, and the points marked with (*) are important points. Please check mainly the /vision directory part.

---
[ diff_command : diff -r /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB ]

original : /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master
test : /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB

1. Only in /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB: convert_to_onnx.sh
-> batch script for ONNX file export
2. Only in /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB: detect_imgs_onnx.sh
-> batch script for ONNX running check
3. diff -r /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/convert_to_onnx.py /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB/convert_to_onnx.py
-> script for ONNX file export
4. diff -r /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/detect_imgs_onnx.py /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB/detect_imgs_onnx.py
-> script for ONNX running check

5. diff -r /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/requirements.txt /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB/requirements.txt
-> python requirements files.
(*) : 6. diff -r /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/vision/ssd/ssd.py /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB/vision/ssd/ssd.py
-> modified to export for TIDL @ is_test:__init__() , forward()
(*) : 7. diff -r /home/user/Origin/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/vision/utils/box_utils.py /home/user/Work/test_project_240327/host/onnx/Ultra-Light-Fast-Generic-Face-Detector-1MB/vision/utils/box_utils.py
-> modified to export for TIDL @ convert_locations_to_boxes() -> convert_locations_to_boxes_expands()
-> modified to export for TIDL @ center_form_to_corner_form() -> center_form_to_corner_form_expands()
---

Next, regarding the unsupported instruction torch.exp, can it be delegated to the DSP? 　
This time, I felt that it was not installed in "C7x MATHLIB", so I considered LUT memory reference.
After following the previous advice, I tried to compile it with a “line/vector” gather. It seems that the error in this part has been resolved,
but the entire process has stopped midway and has not been completed. (Core dump file output)
Please take a look at the execution log, which is stored under /build_error_log in the attached file.
I don't understand why "Invalid Index" is output with debug_level=3,4.

Upload your current project file.

test_project_240327.zip

---
+ how_to_setup.txt : setup & running....
|
+ build_error_log
| |
| + model-artifacts : debug_level=2 build file output
| + dbgLV1_LightFace_version-slim-320_240327.log : debug_level=1 build log
| + dbgLV2_LightFace_version-slim-320_240327.log : debug_level=2 build log
| + dbgLV3_LightFace_version-slim-320_240327.log : debug_level=3 build log (Invalid Index ?????)
| + dbgLV4_LightFace_version-slim-320_240327.log : debug_level=4 build log (Invalid Index ?????)
| + dbgLV3_LightFace_version-slim-320_240327.log : debug_level=5 build log
| + dbgLV4_LightFace_version-slim-320_240327.log : debug_level=6 build log
|
+ host
|
+ edgeai-tidl-tools_DIFF : diff edgeai-tidl-tools
| |
| + examples : AM68A compile script
| + model-artifacts : debug_level=2 build file output
| + models : onnx file (version-slim-320_op11.onnx)
| + output_images : -d offload on arm core file output
|
+ onnx
|
+ Ultra-Light-Fast-Generic-Face-Detector-1MB : Test project for AM68A evaluation
+ vision : modified to export for TIDL
+ convert_to_onnx.sh : batch script for ONNX file export
+ detect_imgs_onnx.sh : batch script for ONNX running check
---

Best regards, and thank you in advance.

0 Pratik Kedar over 1 year ago in reply to Kojima Hiroyuki

TI__Mastermind 24041 points

Hi Kojima,

Thanks for efforts to put down all the details, shared info is overwhelming ! It would have been great if you could have shared only requested info :

Pratik Kedar said:
Can you share,

common_utils.py to check the compiler flags setting

model_configs.py to check mean, scales and num classes specific info

generated svg file for the above model, to look into

Few inputs are still missing from your end, listing down for better clarity

Pratik Kedar said:
The model you compiled is LightFace_version-slim-320, can you help me locate that in shared zip as as its shared content is overwhelming.

Kojima Hiroyuki said:
Regarding face detection, I expected that it would be easier to understand if you could obtain the original from GITHUB and compare it with the difference command output,

The model is user specific, i would really appreciate if you can help by sharing the model zip file which has,

1. Model with exp layer (with OSRT flow, there should be 2 sub graphs created, if only single layer is unsupported)

2. Model without exp and line gather as suggested previously (with OSRT flow there should be single sub graph as gather operator are supported)

This is not answered correctly,

Pratik Kedar said:
Kojima Hiroyuki said:
<Landmark estimation>
The original ONNX is trained using 300W-LP for learning. Does building with TIDL require the same amount of training?

What doe you mean by TIDL training here ?

Again i would deeply appreciate if you only share only required details so we can come to the point directly.

0 Kojima Hiroyuki over 1 year ago in reply to Pratik Kedar

Prodigy 50 points

Thank you for your reply.

From now on, we will only be consulting on face detection, but if it is still difficult to consider, we will stop the investigation.

The data necessary for the survey has been uploaded in the past.

>> The model is user specific, i would really appreciate if you can help by sharing the model zip file which has,

The requested TIDL model is here:
\host\edgeai-tidl-tools_DIFF\models\public\version-slim-320_op11.onnx

The errored TIDL model-artifacts is here:
\build_error_log

Additionally, the original TIDL model is heret:
host\onnx\Ultra-Light-Fast-Generic-Face-Detector-1MB\models\onnx\version-slim-320.onnx

The original model zip file can be downloaded from:
github.com/.../Ultra-Light-Fast-Generic-Face-Detector-1MB

>> 1. Model with exp layer (with OSRT flow, there should be 2 sub graphs created, if only single layer is unsupported)

I'd like to be able to compute in C7/MMA as much as possible, so I'm considering using other methods such as memory references to avoid unsupported exp layers.

>> 2. Model without exp and line gather as suggested previously (with OSRT flow there should be single sub graph as gather operator are supported). This is not answered correctly,

If you know of a way to do tourch.exp processing using DSP or MATHLIB, please let me know. Currently, the CPU post-processing time is 29 msec is long, including the question part. (C7/MMA part is 5 msec)

Processors

Processors forum

AM68A: How to run my face/landmark detection model

https://e2e.ti.com:443/home/parallels/Downloads/dbgLV2_LightFace_version-slim-320_240318.log