This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Part Number: AM5729
Is it possible to add custom DSP algorithm on DSP while running TIDL?
Should customer separate DSP - e.g. customer algorithm running on DSP1 and TIDL on DSP2?
Please let me know how customer can add their DSP algorithm with TIDL and provide a guide for this.
Thanks and Best Regards,
Hi SI, it is possible to use only one DSP for TIDL. Which leaves free the other DSP. There are some examples in our User Guide. Link and classification example below
./tidl_classification -g 2 -d 1 -e 2 -l ./imagenet.txt -s ./classlist.txt -i 1 -c ./stream_config_j11_v2.txt
./tidl_classification -g 1 -d 2 -e 2 -l ./imagenet.txt -s ./classlist.txt -i ./clips/test10.mp4 -c ./stream_config_j11_v2.txt
We are glad that we were able to resolve this issue, and will now proceed to close this thread.
If you have further questions related to this thread, you may click "Ask a related question" below. The newly created question will be automatically linked to this question.
In reply to Paula Carrillo:
Thanks for this information.
Actually my customer already tried to run their custom algorithm on DSP1 while running TIDL on DSP2 with EVEs, but could not run TIDL at this time.
They are guessing there is a conflict in the IPC and they want to know how they can use IPC on their customer algorithm while running TIDL on other DSP.
Could you please provide guide on this?
They are suspecting the IPC buffer was overlapped and thus TIDL was not worked with their custom algorithm.
In reply to Sung-IL:
Please see if you can use the environment variable "TI_OCL_COMPUTE_UNIT_LIST" to partition the DSP cores, e.g.
In reply to Yuan Zhao:
They already use the environment variable "TI_OCL_COMPUT_UNIT_LIST" to partition the DSP cores as below.
root@am57xx-evm:~# cat /etc/ti-mctd/ti_mctd_config.json
"cmem-block-offchip" : "0",
"cmem-block-onchip" : "1",
"compute-unit-list" : "1",
"eve-devices-disable" : "0",
"linux-shmem-size-KB" : "256",
But, it still does not work.
Do you think this configuration is right?
They are still suspecting there was an issue in the IPC and they resolved an issue and can run their DSP application with TIDL, but there are 2 issues in this case.
Please let me know your opinion on below and guide us.
while running TIDL on DSP2 and IPU1, they moved the CMA of DSP1 to 0x9000 0000 ~ 0x9d00 0000 region as below - e.g. they moved PHYS_MEM_IPC_VRING in rsc_table_vayu_dsp.c to 0x9000 0000, and they confirmed TIDL worked well on DSP2 with their DSP application on DSP1.
[ 0.000000] Reserved memory: created CMA memory pool at 0x0000000090000000, size 208 MiB
[ 0.000000] OF: reserved mem: initialized node dsp1-memory@90000000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000009d000000, size 32 MiB
[ 0.000000] OF: reserved mem: initialized node ipu1-memory@9d000000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000009f000000, size 8 MiB
[ 0.000000] OF: reserved mem: initialized node dsp2-memory@9f000000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000009f800000, size 120 MiB
[ 0.000000] OF: reserved mem: initialized node ipu2-memory@9f800000, compatible id shared-dma-pool
[ 0.000000] cma: Reserved 24 MiB at 0x00000000be400000
Do you think this approach is right?
In this case, there are 2 issues.
1. there is only 204MB remained for their DSP application SW(0x9000 0000 ~ 0x9d00 0000)
2. They have tested IPC communication between Cortex-A15 ARM core and DSP1 while running TIDL on DSP2 and custom DSP application on DSP1, but they found some issues in the IPC communication after 1000 tests.
The details for IPC communication test are....
* They are sending IPC Qmsg(A15->DSP1) and OpenCL application repeated with 200ms delay.
* There is no issue in IPC communication when only OpenCL userspace application working without DSP1 communication.
* But, When OpenCL userspace application working with DSP1 communication repeatedly with 200ms delay, they found an issue in IPU1 IPC after 1000 trials.
Do you think their approach is right? or is there other way to run their DSP application with TIDL?
I am not sure if the IPC issues are related to the movement of DSP/IPU CMA regions.
1) One thing customer could try is to not make any modifications, put stock PSDK filesystem on the EVM sdcard. Then use the envrionment varaible I mentioned in previous post, run OpenCL application on Arm using DSP1 only, run TIDL application on Arm using DSP2 only, see if the aforementioned IPC problems manifest.
2) Can you clarify what images are running on each core? DSP1, DSP2, IPU1 are running OpenCL firmware? What is running in IPU2?
3) The CMA for IPU2 could be problematic:
This will step into 0xA000_0000, which is the DDR CMEM region. For OpenCL/TIDL to run properly on AM57x, we need the first CMEM block to start from 0xA000_0000.
4) TIDL-API underneath the hood is using OpenCL. The "/etc/ti-mctd/ti_mctd_config.json" is a global setting for OpenCL. When the customer set "compute-unit-list" to "1", all OpenCL and TIDL application will be talking to DSP2 only. Maybe customer is not running OpenCL firmware on DSP1? Are they running their own firmware on DSP1?
I don't have a booted AM57 EVM available at the moment. I'll see if I can find somebody in office to help press the power button on my EVM.
Thanks for your response. I added my response in below.
Do you mean customer should go back to original and try it again only after modifying the environment variable?
TIDL is running on DSP2 and IPU1. There is customer's DSP algorithm SW on DSP1. There is no custom SW in IPU2.
Customer's DSP SW running on DSP1 is just simple program communicating with A15 just for test now.
Do you think this IPU2 memory corruption into 0xA000 0000 region may cause IPC instability? I'll request customer to have IPC test again after securing '0xA000 0000' region
Yes. Their own firmware running without OpenCL FW on DSP1. They don't use OpenCL and just use IPC with their own application on DSP1 to communicate with A15.
[YZ] Yes, put a different SDCard in EVM, running stock PSDK filesystem unmodified. No custom firmware. Modify an OpenCL application (e.g. vecadd) to make it run in an infinite loop and run it only using DSP1. Run the TIDL application using IPU1(EVEs) and DSP2 only.
[YZ] I am not sure. But the reserved memory regions should not overlap.
[YZ] Customer might also look into all things IPC related and why IPC traffic on DSP1 interfered with IPC traffic on IPU1. How many buffers they defined for their MessageQ? Is the DSP1 firwmare responding to each MessageQ? Any flow control in place? Where does the IPC daemon log go? Have that exhausted the "/tmp" filesystem after 1000 iterations? This is why I suggested experiment in 1) for testing stock IPC implementation in OpenCL.
I have conducted the experiment 1) I mentioned in above post. A15 can have independent IPC communications with DSP1 and DSP2/IPU1. Enclosed are the details.
Window 1 running OpenCL "conv1d" example, using DSP1 only - source patched for running many iterations and program progress printWindow 2 running TIDL-API "mcbench" test on jacinto11_v2 network, using 4 EVEs (IPU1) and DSP2 only - source patched for progress printStart these two programs in two windows roughly the same time - conv1d examples for 10 million iterations - mcbench jacinto11_v2 network on 50 million frames - start time: Sat Sep 12 03:09:36 UTC 2020Status at two days later: - Mon Sep 14 13:28:39 UTC 2020 - Both applications are still running/progressing - Windows 1, "conv1d" is at iteration 278270 - Windows 2, "mcbench" on jacinto11_v2 network is at frame 14000200Conclusion: - Two applications can have independent IPC communications to DSP1 and DSP2/IPU1, and do not interfere with each other.
Recommendation to customer: make small changes at a time and debug at each step - Step 1: set up this experiment to validate independent IPC communications among A15 and other cores - Setp 2: replace "Window 2" application with customer's TIDL application, validate IPC communications - Step 3: replace "Window 1" with cutomer's IPC test (replace DSP1 firmware, but do not move DSP1 CMA memory yet), validate - Step 4: replace "Window 1" with customer's IPC test (move DSP1 CMA memory), validate
root@am57xx-evm:/usr/share/ti/examples/opencl/conv1d# diff -u main.cpp.orig main.cpp
@@ -79,6 +79,9 @@
int input_numcompunits = 0;
if (argc > 1) input_numcompunits = atoi(argv); // valid: 1, 2, 4, 8
+ for (int iter = 0; iter < input_numcompunits; iter++)
+ if (iter % 10 == 0) printf("\n\n\nITER %d:\n\n", iter);
Context context (CL_DEVICE_TYPE_ACCELERATOR);
@@ -275,6 +278,7 @@
cerr << "ERROR: " << err.what() << "(" << err.err() << ", "
<< ocl_decode_error(err.err()) << ")" << endl;
if (num_errors != 0)
root@am57xx-evm:/usr/share/ti/examples/opencl/conv1d# make clean; make
cl6x -mv6600 --abi=eabi -I/usr/share/ti/opencl -I/usr/share/ti/opencl -I/usr/share/ti/cgt-c6x/include -c -o3 -mw --symdebug:none k_extc.c
/usr/bin/clocl -t ti_kernels.cl k_extc.obj
g++ -c -O3 -I/usr/include -Wall main.cpp
root@am57xx-evm:/usr/share/ti/examples/opencl/conv1d# TI_OCL_COMPUTE_UNIT_LIST="0" ./conv1d 10000000; date
root@am57xx-evm:/usr/share/ti/tidl/examples/mcbench# diff -u main.cpp.orig main.cpp
@@ -174,6 +174,7 @@
for (uint32_t frame_idx = 0;
frame_idx < opts.num_frames + num_eops; frame_idx++)
+ if (frame_idx % 100 == 0) printf("\n\n\nFrame: %d\n\n", frame_idx);
ExecutionObjectPipeline* eop = eops[frame_idx % num_eops];
// Wait for previous frame on the same eop to finish processing
root@am57xx-evm:/usr/share/ti/tidl/examples/mcbench# make clean; make
rm -f mcbench stats_tool_out.* *.out
rm -f frame_*.png multibox_*.png
g++ -I/usr/include -O3 -Wall -Werror -Wno-error=ignored-attributes -I. -I/home/root/tidl/tidl_api/inc -std=c++11 -I/usr/share/ti/opencl -I/usr/share/ti/opencl main.cpp ../common/utils.cpp ../common/video_utils.cpp /home/root/tidl/tidl_api/tidl_api.a -L/usr/lib -lOpenCL -locl_util -lpthread -lopencv_highgui -lopencv_imgcodecs -lopencv_videoio -lopencv_imgproc -lopencv_core -o mcbench
root@am57xx-evm:/usr/share/ti/tidl/examples/mcbench# TI_OCL_COMPUTE_UNIT_LIST="1" TIDL_PARAM_HEAP_SIZE_EVE=3000000 TIDL_PARAM_HEAP_SIZE_DSP=3000000 TIDL_NETWORK_HEAP_SIZE_EVE=20000000 TIDL_NETWORK_HEAP_SIZE_DSP=2000000 ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/infer/tidl_config_j11_v2_lg2.txt -f 50000000 -i ../test/testvecs/input/preproc_0_224x224_multi.y -v; date
My customer is trying to run based on your scenarios now, but they are worrying it is hard for them to increase memory size for their custom DSP application.
Could you please let me know the memory map information of TIDL and OpenCL?
Is it only limitation for TIDL/OpenCL what you said that ‘For OpenCL/TIDL to run properly on AM57x, we need the first CMEM block to start from 0xA000_0000.’?
Could you please help on this and which cmem should be reserved?
They want to know which region can be used for their custom DSP application.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.