This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AWR2944EVM: Maximum number of HWA computations/operations

Part Number: AWR2944EVM
Other Parts Discussed in Thread: AWR2944, MATHLIB

Tool/software:

Hello,

I plan to write a resource-intensive program that'll run onboard the AWR2944EVM using the HWA 2.0. Each iteration (per frame, perhaps) computes about 200,000 operations in total, mainly complex arithmetic, but also some FFTs.

1. Is it practical to run every single operation on the HWA or should the cores be used as well? If the latter is true, what is the best way to combine the two?

2. How can I estimate the time it will take for the AWR2944EVM/HWA to complete these operations? Is this quantity feasible at all or is there an upper limit induced by the HWA clock and available memory?

3. Also I noticed that the there are two R5F cores and one DSP core. All the applications I've seen only use two of these three. What happens to the third?

4. On a note related to the program: is it possible for the AWR2944EVM to receive data from external sensors, such as an IMU or GPS?

Thanks,

Aaron

  • Hi Aaron,

    Thanks for reaching out on e2e! Please find my responses on those topics as follows:

    1. Ideally, it would be best if most of the compute can be offloaded to the HWA. Any operation that cannot be done on the HWA can be done on the DSP/R5f. The best way to combine them would be to have a signal chain split across both the HWA/cores with DMA independently moving data around and load distributed evenly to get the best timings.
    2. For most operations, the HWA works as a streaming engine (i.e. one complex output per cycle). For example, the FFT would take ACNT * BCNT number of cycles with an extra overhead of ACNT number of cycles. This relation holds true for most operations and the ACNT/BCNT dictates the time taken.
    3. The R5f cores are in dual-core lockstep mode. Both together will be used as one and cannot be used independently as that would break the lockstep which is essential for the AWR2944 which is a safety automotive device.
    4. There are many data transfer interfaces available on the AWR2944 such as CSI2, SPI, I2C etc. which can be used for this purpose. I highly recommend you go through the datasheet to understand what is available and what can be used as per your use case.

    Regards,

    Kaushik

  • Hi Kaushik,

    Thanks for the quick response.

    1. I would like to be able to perform the following operations: complex addition/subtraction/multiplication/sqrt, sine and cosine, FFT, and floor. I have seen the sine and cosine, FFT, and complex multiplication operations in the HWA documentation, but not addition/subtraction, sqrt and floor. Will the latter two have to be done on the DSP/R5F using the mathlib library? What is the recommended method of splitting the signal chain and distributing the load and how much slowdown would it cause to transfer data back and forth between the HWA and DSP? Are there any documents/code examples I can look at?

    2. Just to clarify about the ACNT and BCNT. According to my understanding from the swru526b document on the HWA, ACNT refers to the number of samples to be processed while BCNT refers to the number of processing chains (number of RX channels). So if I have 256 ADC samples for each of 4 RX channels, then the FFT would take 256*4+256=1280 clock cycles, which at 300 MHz, translates to 4.27 us. Is this correct?

    Best,

    Aaron

  • Hi Aaron,

    Please find my responses as follows:

    1. The operations you mentioned can be performed on the DSP/R5f cores as a first option. But there are ways in which you can perform the some of the operations you have mentioned using the HWA as well.
      1. Addition - You can use the channel combination or Stats block for summing a sequence of numbers.
      2. Subtraction - You can reuse the same blocks from above and achieve a subtraction operation too (you can use BPM removal or vector multiplication to introduce a sign flip)
      3. Operations like sqrt and floor can be implemented in the DSP.
        • What is the recommended method of splitting the signal chain and distributing the load and how much slowdown would it cause to transfer data back and forth between the HWA and DSP?
        Things to consider are:
        1. Feasibility of being able to implement something on the HWA.
        2. Total time budget available.
        3. Total memory that can be used.
        4. Based on these inputs, you come up with a straightforward approach. Then, you can perform optimizations based on each of the CPU's utilization within your time budget and try to achieve more parallelization.
        You can look at the datapath processing chain of the mmWave MCUPlus SDK OOB demo.
    2. Yes. Your understanding is spot on.

    Regards,

    Kaushik