This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA80XMEVM: Performance measurements on the C7x DSP

Part Number: DRA80XMEVM

Hello TI support team,

I was wondering what options are available for measuring performance directly on the DSPs? I'm currently working with an DRA8 EVM board and I'm able to execute code on the C7x/C66x directly. At the moment we have no way to use offloading from the CPU, thus we can't measure execution times of code on the DSPs through the CPUs wall clock. I know, that there are TSC registers on the DSPs to get the cycle count.

Is there also a way to

  1. get an instruction count?
  2. or a wall clock time?

Otherwise, what is the clock rate of the DSPs? In the documentation we found numbers saying the C66x DSPs run with 1 GHz and the C7x run with 1.7 GHz. Can you confirm these numbers? Are the clock rates persistent or are they affected by different operation modes, if there are any (e.g. power safe, performance mode)?

Kind regards,

Florian

  • Hi Florian,

    This thread got wrongly assigned to me. Just wanted to let you know that I have notified my colleague who will provide response to this question. 

    Regards

    Karthik

  • Hi Florian,

    Yes you can use the 64bit TSC on C7x and 32bit TSCL and TSCH registers for C66 for measuring CPU cycles.

    The C7x DSP is running at 1GHz and C66 DSP is running at 1.25Ghz.

    Regards,
    Shyam

  • For IPC its tricky as multiple instructions can get pipelined in the same cycle.

    For C66x we can have atleast 1 instruction per cycle to 8 instructions per cycle (Instruction fetch packet is 256bits, opcode is 32bits)

    For C7x we can have atleast 1 instruction per cycle to 13 instructions per cycle (Instruction fetch packet is 512bits, opcode is 32bits)

    Regards,
    Shyam

  • Hi Shyam,

    thanks for your answer. So doing something like

    #include <c7x.h>
    #include <iostream> // clock rate measured with clock speed program static constexpr float c7x_clockRate = 1000000.0f; // static variable to store ticks static unsigned long m_ticks = 0; void tic_ms() { m_ticks = __TSC; } float toc_ms() { const float elapsedTicks = static_cast<float>(__TSC - m_ticks); const float elapsedMilliseconds = elapsedTicks / c7x_clockRate; return elapsedMilliseconds; } int main() { tic(); //do something const uint32_t time_ms = toc(); std::cout << "this took " << time_ms << " ms" << std::endl; return 0; }

    should give me a somewhat reliable time for C7x measurements?

    If not, what would be another way of getting computation times?

    Kind regards,

    Florian

  • Hi Florian,

    Apologize for the delay, yes this should work perfectly fine. 

    Other option is to use BIOS an use a timer. But for baremetal profiling, what you have is good enough.

    Regards,
    Shyam