PROCESSOR-SDK-AM62A: The error of the inference of face landmarks model on AM62A EVM

Yunfeng Kang

Hi there,

I used Am62A to import and infer the Mediapipe face landmarks model. There is no problem of the import and inference on PC, and the accuracy of our model was also really good when I set tensor_bits to 32 and 16. Then I copied the artifact folder and model to EVM and did the inference. However, we encountered some errors and warnings:

 ****** tidlDelegate::Invoke ****** 
*******   In TIDL_subgraphRtInvoke  ******** 
MEM: ERROR: Alloc failed with status = 12 !!!
 33484.014255 s:  VX_ZONE_ERROR:[tivxMemBufferAlloc:87] Shared mem ptr allocation failed
 33484.014286 s:  VX_ZONE_ERROR:[ownAllocTensorBuffer:85] Could not allocate tensor memory
 33484.014304 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:820] map address is null
 33484.014315 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:824] map size is equal to 0
MEM: ERROR: Alloc failed with status = 12 !!!
 33484.014349 s:  VX_ZONE_ERROR:[tivxMemBufferAlloc:87] Shared mem ptr allocation failed
 33484.014362 s:  VX_ZONE_ERROR:[ownAllocTensorBuffer:85] Could not allocate tensor memory
 33484.014377 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:820] map address is null
 33484.014390 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:824] map size is equal to 0
MEM: ERROR: Alloc failed with status = 12 !!!
 33484.014416 s:  VX_ZONE_ERROR:[tivxMemBufferAlloc:87] Shared mem ptr allocation failed
 33484.014429 s:  VX_ZONE_ERROR:[ownAllocTensorBuffer:85] Could not allocate tensor memory
 33484.014442 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:820] map address is null
 33484.014455 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:824] map size is equal to 0
ERROR: Running TIDL graph ... Failed !!!
Sub Graph Stats 7366.000000 18301523863390768.000000 11140674680522872.000000 
*******  TIDL_subgraphRtInvoke done  ******** 


************ in ~tidlDelegate ************ 
 ************ in TIDL_subgraphRtDelete ************ 
  33486.374886 s:  VX_ZONE_ERROR:[ownReleaseReferenceInt:294] Invalid reference
 33486.374938 s:  VX_ZONE_ERROR:[ownReleaseReferenceInt:294] Invalid reference
************ in ~tidlDelegate ************ 
 ************ in TIDL_subgraphRtDelete ************ 
  33486.375791 s:  VX_ZONE_ERROR:[ownReleaseReferenceInt:294] Invalid reference
 33486.375843 s:  VX_ZONE_ERROR:[ownReleaseReferenceInt:294] Invalid reference
 
 
 ************ in TIDL_subgraphRtDelete ************ 
  33486.385346 s:  VX_ZONE_ERROR:[ownReleaseReferenceInt:294] Invalid reference
 33486.385391 s:  VX_ZONE_ERROR:[ownReleaseReferenceInt:294] Invalid reference
 33486.386197 s:  VX_ZONE_WARNING:[vxReleaseContext:1055] Found a reference 0xffff8e141bc8 of type 00000816 at external count 1, internal count 0, releasing it
 33486.386240 s:  VX_ZONE_WARNING:[vxReleaseContext:1057] Releasing reference (name=user_data_object_102) now as a part of garbage collection
 33486.386271 s:  VX_ZONE_WARNING:[vxReleaseContext:1055] Found a reference 0xffff8e142888 of type 00000816 at external count 1, internal count 0, releasing it
 33486.386286 s:  VX_ZONE_WARNING:[vxReleaseContext:1057] Releasing reference (name=user_data_object_115) now as a part of garbage collection
 33486.386308 s:  VX_ZONE_WARNING:[vxReleaseContext:1055] Found a reference 0xffff8e143548 of type 00000816 at external count 1, internal count 0, releasing it
 33486.386323 s:  VX_ZONE_WARNING:[vxReleaseContext:1057] Releasing reference (name=user_data_object_128) now as a part of garbage collection
 33486.386345 s:  VX_ZONE_WARNING:[vxReleaseContext:1055] Found a reference 0xffff8e144208 of type 00000816 at external count 1, internal count 0, releasing it
 33486.386360 s:  VX_ZONE_WARNING:[vxReleaseContext:1057] Releasing reference (name=user_data_object_141) now as a part of garbage collection

Our compile_options:

compile_options = {
    'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
    'artifacts_folder' : output_dir,
    'tensor_bits' : 16,
    'accuracy_level' : 9,
    'debug_level' : 5,

    'advanced_options:calibration_frames' : len(calib_images),  # min 10, default 20
    'advanced_options:calibration_iterations' : 50,              # min 10, default 50
    'advanced_options:quantization_scale_type' : 0,         # 0, 1, 3, 4

}

Our tflite model looks like:

there is no other special operations in our model.

Moreover, what's the meaning of "there are 16 subgraphs, 93 nodes delegated out of 493 nodes." This sentence appeared in our log as well. Does this sentence mean that only 93 nodes are offloaded to C7x-MMA?

We also found that we can set tensor_bits to 32 when we did the inference on EVM. That is weird. Could you clarify this circumstance?

Best regards,

Yunfeng Kang

over 1 year ago

0 Reese Grimsley over 1 year ago

TI__Genius 15056 points

Hello Yunfeng Kang,

Which SDK version are you on? There were several features added in 9.1 that may help for some of the layers that were not offloaded. Your understanding of the "93 of 493 nodes delegated" is correct -- only those 93 layers are running on the accelerator and the rest are on the CPU. I would expect performance to be quite slow here. 16 is the maximum number of subgraphs, so while more layers could be run on the accelerator, the topology of the supported/unsupported nodes

The tensor_bits option is only used for compilation, I believe. Inference on the EVM will ignore this option and focus on what the compiled binaries contain.

When compiling, there should be logs that write every individual layer parsed and whether it is supported or not. I know this shows with debug_level:2, but unsure about =5. debug_level>2 starts to do per-layer traces during compilation, which may not be useful at this stage (and will slow down the compilation process).

My recommendation here:

Update to 9.1 SDK / tidl-tools for testing this model
enable debug_level:2 and share logs if there are still many subgraphs.
- You can also call "export TIDL_RT_DEBUG=1" and "vx_app_arm_remote_log.out &" to get further logging on the target.
- If you see many OpenVX errors, I would recommend resetting the target as these may be the result of non-graceful shutdown.

Best,
Reese

0 Yunfeng Kang over 1 year ago in reply to Reese Grimsley

Prodigy 195 points

Hello Reese,

thanks for your reply. I used 08.06 version SDK. If I understand correctly, when tensor bits is set to 32, then after compile is done and copied to the EVM, the inference is also possible on EVM, although much slower than for int16 and int8.

Could you list the operators supported by the 9.1 SDK over the 8.6 SDK? My model is relatively simple, only includes Conv2d, Relu, DepthwiseConv2d, Neg, Add, Mul, Maxpool2d operators, I don't know why only 93 layers are offloaded.

Moreover, could you explain what's the meaning of "If you see many OpenVX errors, I would recommend resetting the target as these may be the result of non-graceful shutdown.". Should I reboot the EVM?

Best regards,

Yunfeng

0 Reese Grimsley over 1 year ago in reply to Yunfeng Kang

TI__Genius 15056 points

Hi Yunfeng Kang,

Yunfeng Kang said:
If I understand correctly, when tensor bits is set to 32,

I believe this only works for emulation mode on the PC, and that the compiled artifacts will likely cause a hang on the target processor.

Yunfeng Kang said:
Could you list the operators supported by the 9.1 SDK over the 8.6 SDK?

You can compare between the supported ops page from 8.6 release and current:

There are additions like gather, slice, more dimensions for Mat, Add, Sub, etc., kernel/stride dimensions for pooling layers, etc. as well as bugfixes across the SW stack.

Yunfeng Kang said:
"If you see many OpenVX errors, I would recommend resetting the target as these may be the result of non-graceful shutdown.".

I see the log as many prints like:

MEM: ERROR: Alloc failed with status = 12 !!!
 33484.014255 s:  VX_ZONE_ERROR:[tivxMemBufferAlloc:87] Shared mem ptr allocation failed
 33484.014286 s:  VX_ZONE_ERROR:[ownAllocTensorBuffer:85] Could not allocate tensor memory
 33484.014304 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:820] map address is null
 33484.014315 s:  VX_ZONE_ERROR:[tivxUnmapTensorPatch:824] map size is equal to 0

My first suggestion on seeing these is to reset the EVM (either reboot command or hard reset / power cycle). The OpenVX channels are stable during runtime, but sometimes run into errors if a program is terminated from Linux without the communication channels being shut down on both sides. This can leave the remote cores in an incorrect state, and a reboot is the easiest solution.

-Reese

Processors

Processors forum

PROCESSOR-SDK-AM62A: The error of the inference of face landmarks model on AM62A EVM