Custom Yolov5-face model results issue

Sourabh Patil

Other Parts Discussed in Thread: TDA4VM

Hi,

I am using TDA4VM (J721E) board with SDK 09_00_00_00. I have compiled a Yolov5-face detection model which detects face as well as gives 5 key points on the face. The repo that I used is https://github.com/deepcam-cn/yolov5-face#pretrained-models This model gives output as [x,y,w,h,prob,x1,y1,x2,y2,,x3,y3,x4,y4,,x5,y5,class] (16 dimensions) where x,y,w,h are for box, and other points x1,....,x5,y5 are key points on face. I converted this model to onnx by keenly following the EdgeAI Yolov5 repo https://github.com/TexasInstruments/edgeai-yolov5 which is recommended to use according to the docs. I also exported the NMS to onnx which is needed to be done for using TIDL capabilities for faster post processing. While compiling I can see that all the operations in my model are supported by TIDL. But when I get the output from the compiled model it is 300x6 (expected was 300x16 including the key points).

I have compiled sample torch model (like widerface and coco based model) borrowed from edgeai-yolov5 model zoo by converting to onnx. It works fine as it is normal object detection without any keypoint prediction. It gave me expected output of shape 300x6 (xmin, ymin, xmax, ymax, prob, class). It ran smoothly on the board as post processing (NMS) was handled by TIDL. Now, in my case I could get valid onnx file (including NMS for my model), a valid prototxt file which is required for accelerated post processing (which we need to specify in compilation options, meta_arch_type as well as meta_layers_name_list taken from prototxt). Then also why I get output of shape 300x6. My onnx file gives expected output of (num_of_faces x 16) after applying NMS embedded in my onnx. But, when I compile it for TI board, it fails to do so.

My question is, can we still not run any custom model that can do some extra work like key point detection along with face detection on the TI board with TIDL acceleration? I suppose I am getting this output of 300x6 due to the reason that I did mention mata_arch_type and meta_layers_name_list which presumes that the model is only object detection and not something which is I am trying to do.

It would be really great if someone from experts can clarify this. I have already put up other queries which are unanswered yet!

Thanks and regards,
Sourabh

over 2 years ago