This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

edmamgr usage in opencl

Expert 7680 points


Hi champions,

I want to use edmamgr offload the memory copy in opencl, and I studies the edmamgr example, and I have some questions for this example, could you help to clarify it:

1.  The kernel code doesn't call the EdmaMgr_init, it should be called before the EdmaMgr_alloc, and I checked the Makefile, no other source files were included, it make me confused.

2. There should be an edma resource configuration file to tell the EdmaMgr the hardware resources, and initialize the global variable "ti_sdo_fc_edmamgr_region2Instance", and I don't find this information in the example,

3. Does the edmamgr support the multi-core operation? The example resize the test to one DSP, is it feasible that I modify the resize number, and submit more EDMA tasks?

Zhan. 

  • Hi Zhan,

    1) OpenCL monitor calls edmaMgr_init during initialization, prior to any kernels being offloaded to DSP. So, Kernel code doesn't need to call edmaMgr_init. If we're not using OpenCL to manage the DSPs, (i.e. using bare metal), then DSP code needs to call edmaMgr_init.

    2) EDMA resource configuration is hidden from end users (inside OpenCL monitor) for ease of use. The available EDMA resources are statically partitioned. On K2H, there are 5 EDMA instances. One instance is saved for usage by A15s. The other 4 instances are equally partitioned between the 8 DSP cores. So, each core gets about 1/2 EDMA instance. 

    3) Yes, edmaMgr supports multicore. Each core can alloc/free the edma resources (params, channels) from its portion of statically partitioned resources. 

    It is possible to partition the EDMA resources in a custom way, but for this - you'll have to rebuild the OpenCL monitor after making the change.

    Regards,

    Vivek

  • Hi Vivek,

    I only find the monitor_evmk2h.out and monitor_evmk2h.syms in at the <opencle_install_Path>/lib, is it feasible that we share the opencl monitor project? And customer could modify it  based on the application requirement.

    Zhan.

  • hi Zhan,

    Currently OpenCL has been released only as an object release so far.

    It will be released as source release moving forward. Also, we're planning to make the complete source code available on git.ti.com shortly, and provide recipe to build both host and DSP monitor. It may take a few weeks to have everything on external git in place. 

    Please let us know the customer use case, so we are aware, why they are needing to change the default EDMA resource partitioning. 

    Regards,

    Vivek

  • Hi Vivek,

    1. Some customer wants to use the MCSDK-HPC to the non-HPC scenario, and they want to use OpenCL schedule 2 or 4 DSP cores, and the rest DSP will run the RTOS and use the peripherals and EDMA, If we can't provide the code now, is it feasible that we share the default EDMA resource partitioning, and customer could use the rest EDMA resources.

    2. I tested the multi-core edmamgr, the OpenCL manage all 8 DSP core, is it feasible that we configure the OpenCL to manage the specific DSP cores? 

    Zhan.

  • Zhan,

    1. Can you please elaborate on the use case why customer needs to partition managing DSP cores w/ and w/o OpenCL? If customer is willing to learn the memory map, and hand-partition the memory (MSMC, L2) and resources (EDMA, etc), why use OpenCL at all?

    2. At the moment, I believe that OpenCL cannot be restricted to only use a few cores (while the rest of the cores are used for something else). We can revisit this thought once you can justify the application need to do this.

    Regards,

    Vivek

     

  • Hi Vivek,

    1. Customer's scenario have two interface, PCIe and GE, the ARM will get the data from PCIe and send to DSP for encryption and decryption, and the board receive data from the GE which also need encryption and decryption, when the encryption and decryption is finished, it need send back to the remote client via GE.

    2. The encryption and decryption can't use the SA, A15 uses the DSP accelerate the encryption and decryption, and the OpenCL is an easy method to implement the framework.

    3. For the packet from GE, the PA could dispatch the packet to the DSP core directly, if  the A15 to receives the packet and send to DSP with OpenCL, it will need more memory and more cycles.

    Customer is evaluating the solution, and the finial solution doesn't nail down.

    Zhan.

     

  • Zhan,

    What other processing is happening on both ARM and DSP, other than encryption and decryption? I'm interested to see a system block diagram with both data I/O and compute. If you cannot post that information to the forum, we can have a webex + conf.call to go over.

    Regards,

    Vivek

  • Zhan,

    The feature your are asking for is device partitioning, which is a feature of OpenCL 1.2.  The implementation that TI supportes today is OpenCL 1.1 and device partitioning is not part of that specification.  Device partitioning will likely be the first feature from the OpenCL 1.2 spec that we do support.  However, it will not likely be before 4Q14.  

  • Vivek,

    Thanks for your support, I'll draw the block diagram later and send mail to you.

    Zhan.