Other Parts Discussed in Thread: TDA4VH
Tool/software:
We are trying to replicate the cycle counts reported in DSPLIB user guide performance section for vector operations on C7x but our results do not match. Our questions are at the end. Below are the details about our setup.
- Using J784S4 RTOS SDK 10.01.00.04 on Ubuntu 22.04.1 host
- Have followed the DSPLIB build instructions at this link: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-j784s4/10_01_00_04/exports/docs/dsplib/docs/user_guide/build_instructions_linux.html
- Have followed the CCS baremetal instructions at this link: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-j784s4/latest/exports/docs/psdk_rtos/docs/user_guide/ccs_setup_j784s4.html#debugging-without-hlos-running-on-a72-rtos-only-baremetal
- Using a new J784S4XG01EVM rev PROC141E5(001)
We used the TSC register in the example, DSPLIB_add (dsplib/examples/DSPLIB_add/DSPLIB_add_examples.cpp) as shown in the code snippet below and we measured the following.
The results do not match the results published in the DSPBIL user guide here: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-j784s4/10_01_00_04/exports/docs/dsplib/docs/user_guide/performance_summary.html#DSPLIB_grouped
Now a couple questions.
- C71x_0 always runs faster than the other DSP cores (C71x_1/2/3). This is not expected, is this? If not, is this an artifact of the launch.js script?
- DSPLIB_add is a simple example in that the size is only 14. So, I expected EVM cycles to be ~100 per the DSPLIB performance summary, however, what is measured is roughly 3 times larger. Can you please help me figure out what changes are needed in DSPLIB_add_examples.cpp or in build command to achieve the cycle counts in the performance summary?