This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Using DmaUtilsAutoInc3d without interfering with TIDL

Part Number: TDA4VM

Hi,

We've been using the appUdma API from vision_apps until now. I'm trying to learn the DmaUtilsAutoInc3d API instead as the other one is quite limiting. My use-case is doing block-based processing on the C7X implemented in a variety of custom kernels.

I'm trying to understand the API by looking at the unit test and the .h

  • ti-processor-sdk-rtos-j721e-evm-08_06_00_12/pdk_jacinto_08_06_00_31/packages/ti/drv/udma/dmautils/include/dmautils_autoincrement_3d.h
  • /ti-processor-sdk-rtos-j721e-evm-08_06_00_12/pdk_jacinto_08_06_00_31/packages/ti/drv/udma/dmautils/test/dmautils_autoincrement_test/dmautils_autoincrement_test.c

I have a few questions regarding usage and performance 

DmaUtilsAutoInc3d_init()

Since I will use block-processing in multiple kernels, should I call DmaUtilsAutoInc3d_init() only once at startup of for every frame of every kernel? Does this method reserve some resources, which would then break TIDL? Or does it only allocate resources for the given context?

autoIncrementContext allocation

Documentation of DmaUtilsAutoInc3d_init() says to allocate in fastest memory. I assume this would the L2SRAM for the C7X, is that correct? However, I think L2SRAM should be reserved for block-processing data. What's the next best choice?

Udma_init()

Should every kernel context call Udma_init() to get a driver handle? Or can this be done only once and shared between all kernels?  
  
I know that it's currently called by appInit() through vision_apps' appUdmaInit(). Is it ok to init multiple times?  

Alignment

I see that dmautils_autoincrement_3d.c aligns everything to 128 bytes. I assume this is a requirement from DMA. Should I do the same when allocating in L2SRAM? Or is this handled by tivxMemAlloc(size, TIVX_MEM_INTERNAL_L2)?   

Compression

There's an example about compression, but I'm not sure to understand the usecase. Is this tied to video streaming?  
  
Thanks, 
 
Fred
  • Hi Fred,

         Please find my responses as below : 

    Since I will use block-processing in multiple kernels, should I call DmaUtilsAutoInc3d_init() only once at startup of for every frame of every kernel? Does this method reserve some resources, which would then break TIDL? Or does it only allocate resources for the given context?

       Whenever DmaUtilsAutoInc3d_init() is called it underneath requests UDMA driver to allocate DRU channels. In TIDL we always call these API's as part of TIDL_activate API and the resources are released as part of TIDL_deactivate API. Now if this API is called inbetween TIDL_activate and TIDL_deactivate then till the UDMA driver is able to allocate c7x channels there shouldn't be  a problem. Once UDMA driver fails to allocate channel then TIDL_activate is expected to return error. Hope this clarifies.

    Documentation of DmaUtilsAutoInc3d_init() says to allocate in fastest memory. I assume this would the L2SRAM for the C7X, is that correct? However, I think L2SRAM should be reserved for block-processing data. What's the next best choice?

    We keep this handle in L1D or L3 memory of c7x

    Should every kernel context call Udma_init() to get a driver handle? Or can this be done only once and shared between all kernels?  
      
    I know that it's currently called by appInit() through vision_apps' appUdmaInit(). Is it ok to init multiple times?  

     Expectation is to have a global handle which is passed across different modules.

    I see that dmautils_autoincrement_3d.c aligns everything to 128 bytes. I assume this is a requirement from DMA. Should I do the same when allocating in L2SRAM? Or is this handled by tivxMemAlloc(size, TIVX_MEM_INTERNAL_L2)?

       TR should be aligned to 64 bytes boundary, rest alignment is done to have access within cahcheline size (but this is not mandatory)

    There's an example about compression, but I'm not sure to understand the usecase. Is this tied to video streaming?  

      DRU in J721S2 , J784S4 supports on the fly compression decompression i.e. as part of DRU transfer compression  can be done and then the same can be decompressed via DRU. 

    Regards,

    Anshu

  • Thanks, Anshu.