This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7: Shared memory usage

Part Number: AM62A7

Hello TI experts,
We are adjusting the memory allocation of AM62A7 to allocate more DDR for Linux. But currently there are two shared memory areas, and we are not sure under what circumstances they will be used. As shown in the attached image, the two regions are TIOVX_OBJ-DESC_MEM and DDR_SHARED.MEM, respectively. Our functional requirements are roughly as follows: we will obtain the camera image through the gstreamer pipeline, and then call c7x for inferring. In the Gstreamer pipeline, the raw image of the camera will be processed by plugins such as tiovxisp and tiovxmultiscaler. We know that the implementation of these plugins is related to TIOVX and will use computing cores beyond A53. We have two questions that we hope to receive support for:
1. We are not sure which of the two regions mentioned above will be used in the process of Gstreamer and deep learning inference, or whether both will be used. If possible, we would like to know the details, such as which memory is used to share image data when Linux hands over images to ISP or MSC for processing.
2. Is there a way to see the real-time usage of these shared memories in Linux through some means?5d07b0a5-196b-43ac-af32-514d7ebca186.png

  • Hi yangtian,

    It is our pleasure to help. 

    1. We are not sure which of the two regions mentioned above will be used in the process of Gstreamer and deep learning inference, or whether both will be used. If possible, we would like to know the details, such as which memory is used to share image data when Linux hands over images to ISP or MSC for processing.

    Both of these regions are used. The TIOVX_OB_DESC_MEM is used by the tiovx to share information about data objects between cores through IPC and RPmsg communication while the DDR_SHARED_MEM is used to share the actual data (frames/images) between various cores and HW IPs.

    2. Is there a way to see the real-time usage of these shared memories in Linux through some means?

    When running a tiovx application, it will print at the end of the execution statistics about the memory utilization. For example, this is what printed when I ran the edgeai-tiovx-apps using the object detection config.

    root@am62axx-evm:/opt/edgeai-tiovx-apps# ./bin/Release/edgeai-tiovx-apps-main configs/linux/object_detection.yaml 
    APP: Init ... !!!
    610183.110921 s: MEM: Init ... !!!
    610183.111020 s: MEM: Initialized DMA HEAP (fd=5) !!!
    610183.111247 s: MEM: Init ... Done !!!
    610183.111267 s: IPC: Init ... !!!
    610183.127892 s: IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
    .
    .
    .
    .
    .
    APP: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... Done !!!
    610186.206053 s: IPC: Deinit ... !!!
    610186.208457 s: IPC: DeInit ... Done !!!
    610186.208519 s: MEM: Deinit ... !!!
    610186.208532 s: DDR_SHARED_MEM: Alloc's: 47 alloc's of 48344052 bytes 
    610186.208543 s: DDR_SHARED_MEM: Free's : 47 free's  of 48344052 bytes 
    610186.208553 s: DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 
    610186.208570 s: MEM: Deinit ... Done !!!
    APP: Deinit ... Done !!!
    root@am62axx-evm:/opt/edgeai-tiovx-apps#

    I suggest that you check the utilization of the DDR_SHARED_MEM when running your application and you can reduce that size in memory map based on the actual need of your application. 

    DDR_C7X_1_LOCAL_HEP and DDR_C7X_1_SCRATCH are another two memory regions you can reduce. You can check the usage of these regions using root@am62axx-evm:/opt/vision_apps# ./vx_app_heap_stats.out . 

    Best regards,

    Qutaiba

  • Hello ,

    Thank you for your answer and suggestion. Through testing, we have found that the usage rates of DDR_C7X_1_LOCL_HEP and DDR_C7X_1_SCRATCH are approximately 40% and 2%, respectively. There is a significant difference in usage rates between the two regions, so I would like to ask another question. What are the specific differences in the functions of these two memory regions during deep learning inference?

    Best regards,

    Yangtian

  • Hi Yangtian,

    Good questions. 

    I'll add another mention to Qutaiba's response, that you can see DDR_SHARED_MEM utilitization in linux from a sysfs entry, like so: 

    cat /sys/kernel/debug/dma_buf/bufinfo
    #each allocated buffer will be shown -- there can be multiple. This is instantaneous data. 
    #the printout at runtime for this region through TIOVX will show total sum, but not instanteous usage

    Through testing, we have found that the usage rates of DDR_C7X_1_LOCL_HEP and DDR_C7X_1_SCRATCH are approximately 40% and 2%, respectively.

    These regions are both used by TI Deep Learning software, yes. The heap is used for model weights and model metadata. Intermediate tensors are stored sometimes in scratch, sometimes in heap. There is a model-preemption feature that can be disabled (enabled by default) such that all intermediate tensors are in scratch. By default (with preemption disabled), the scratch size may have low utilization on some models.

    BR,
    Reese