This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IWR6843AOP: Executing HWA-accelerated object detection control process on DSP?

Part Number: IWR6843AOP


I'm trying to understand the various configurations of the object detection DPC and how to utilize the DSP core.

Reading the documentation and the source code for the demos, it seems like there are two methods in the examples. The AOP example uses the HWAs (objdethwa) for all of the steps of object detection, and the ARM core to handle control and callbacks. The ISK demo uses the DSP (objdetdsp) for the entire chain after RangeProc.

But reading the documentation, it seems like there's an option to run HWA accelerated algorithms but dispatched from the DSP. I'm drawing this conclusion from the different DPU docs each saying the "HWA" version will run on the R4F or the DSP cores. Am I understanding that correctly?

Is there a code example of running the accelerated algorithms from the DSP core? Which documentation would help me in understanding the software development model used for building/linking combined MSS+DSS programs?

(What am I trying to do? Send back the largest, densest pointcloud possible to the host over UART. After switching to external 2Mbps UART and compressing coordinates, my bottleneck now is simply inter-frame processing time in the AOP demo. I want to try moving some or all processing to the DSP, as it has a faster core clock.)

  • Hello.

    I'm drawing this conclusion from the different DPU docs each saying the "HWA" version will run on the R4F or the DSP cores. Am I understanding that correctly?

    Your understanding is mostly correct.  Just to clarify, the AOP version of the project only uses the HWA core as opposed to the using the HWA and DSP jointly, so you can use both as well as just the HWA. 

    (What am I trying to do? Send back the largest, densest pointcloud possible to the host over UART. After switching to external 2Mbps UART and compressing coordinates, my bottleneck now is simply inter-frame processing time in the AOP demo. I want to try moving some or all processing to the DSP, as it has a faster core clock.)

    If you want to use the AOP and use the DSP with the HWA, you can simply use the ISK project and swap out the antenna_geometry.c file with the corresponding file in the AOP project.  You will also have to change the pinmuxing done in the ISK project to match the pinmuxing done in the AOP project as these devices are not pin-to-pin compatible.

    Which documentation would help me in understanding the software development model used for building/linking combined MSS+DSS programs?

    Looking through the ISK project should help with understanding the building/linking/post-build steps to generate an image for an MSS+DSS program.  At a high level, you want to build the DSS project first, then the MSS, and building the MSS project will also generate a single multi-core image that you can run.

    Sincerely,

    Santosh

  • Thank you for the reply!

    If you want to use the AOP and use the DSP with the HWA, you can simply use the ISK project and swap out the antenna_geometry.c file with the corresponding file in the AOP project. 

    I had originally thought I could do this (in fact the antenna geometry file is the same, and there is a define of ISK vs AOP that determines which is used)... but is the ISK example actually using the HWA? Or is it using the DSP implementation for each of the Procs?

    I'm looking at the document file `mmwave_sdk_03_06_02_00-LTS/packages/ti/datapath/dpc/dpu/aoaproc/docs/doxygen/html/index.html`.

    It says "Data processing is split in two paths, ping and pong path, one detected object per path. In each path processing is split between HWA and local core and it is divided in four stages: 2D-FFT calculation by HWA, Doppler compensation by local core, 3D-FFT calculation by HWA and XYZ estimation by local core." At the top of the page, it says that it can run on either the R4F or the DSP, which means either could be the "local core", right? And the DSP is faster than the ARM, so it's more performant to run on the DSP?

    But looking at the code in `out_of_the_box_6843_isk_dss`, the string "hwa" doesn't appear anywhere in the code. In the AOP example, it appears dozens of times. So it would seem that the ISK is using the actual DSP implementation of each of the DPC Procs, not the DSP-hosted HW accelerated path described in the document string I quoted above.

    I thought it would be easy to experiment and profile performance of HWA vs DSP, but I guess a good question now is whether I'm even on the right path? I want maximum performance from CFAR and AOA calculations. I want to process the most possible points in range and doppler space without locking up, compress the coordinates, and stream them over UART. I don't care about power. I only care about processing the most points I can before the next frame interrupt. Is DSP + HWA even the right way? Or is pure DSP faster?

  • Hello.

    Let me look into this and provide an update by the end of the day tomorrow.

    Sincerely,

    Santosh

  • But looking at the code in `out_of_the_box_6843_isk_dss`, the string "hwa" doesn't appear anywhere in the code.

    This is because this portion of the demo is run on the DSP.

    In the AOP example, it appears dozens of times

    As I stated earlier, the AOP project only utilizes the HWA. 

    I thought it would be easy to experiment and profile performance of HWA vs DSP, but I guess a good question now is whether I'm even on the right path? I want maximum performance from CFAR and AOA calculations. I want to process the most possible points in range and doppler space without locking up, compress the coordinates, and stream them over UART. I don't care about power. I only care about processing the most points I can before the next frame interrupt. Is DSP + HWA even the right way? Or is pure DSP faster?

    And looking through the source code, the ISK project is indeed just using the DSP to do all the signal processing, and the MSS is primarily to configure the device and send data back to the host machine via UART.  DSP is much faster, so all the signal processing is done there.

    Sincerely,

    Santosh

  • thank you!

    DSP is much faster, so all the signal processing is done there.

    To be clear, you're saying that the DSP alone is faster than DSP+HWA?

  • To be clear, you're saying that the DSP alone is faster than DSP+HWA?

    I was comparing DSP to HWA standalone.  Even with that said, the processing chain is already parallelized through each chirp as the range FFT is done as soon as the device has received the raw adc data from the chirp, and so for the remaining portions of the signal processing, if you tried to hop between the DSP and the HWA, you would lose time transferring information between the cores combined with the HWA being slower.  Therefore, it would be faster to just do everything on the DSP.

    Sincerely,

    Santosh

  • Thank you! I believe I am starting to understand.

    I think my misunderstanding was around the HWA's performance. I had mistakenly thought it was also clocked at 600MHz. But looking at block diagrams from several different data sheets for products not containing the DSP, and from what you're saying, it would seem that it's actually clocked at a much slower speed.

    Thank you very much for the insight you've offered!

  • Can you give any insight on the 2D AOA calculation? It seems this is a key part of the AOP demo which is missing from the ISK demo. How does the 2D AOA calculation differ from the standard AOA calculation?

  • Hello Aubrey.

    This question has started to branch from the original question; please create a new post on the E2E forums and we will provide a response as soon as possible.

    Sincerely,

    Santosh