Real time performance counter

matteo lucarelli

Other Parts Discussed in Thread: HALCOGEN

Hi,

I'm working with RM48HDK and need some kind of automatic counter updated at a known precise frequency (at least 1MHz) whithout the CPU intervention (so not using an IRQ). This counter will be access in read-only mode to be used for elapse time verification.

Probably there are more than a way to do this. Which one is the simpler?

Thank you,

Matteo

over 11 years ago

0 matteo lucarelli over 11 years ago

Genius 3265 points

Anyone? Nobody wants to give me some example code or some instructions for using the RTI as a performance/benchmarking counter?

0 matteo lucarelli over 11 years ago in reply to matteo lucarelli

Genius 3265 points

The problem could be soved by simply putting in my code:

rtiInit();

rtiStartCounter(rtiCOUNTER_BLOCK0);

and when I need it:

uint32 frc = rtiREG1->CNT[0].FRCX;

This will give me the value of the "free running counter" but nowhere (AFAIK) inside the HalCoGen is indicated a FRCCLK. Which frequency this counter goes? Do this one is the "PLL2" or the "RTI1CLK" inside HalCoGen?

Can I obtain its actual frequency inside the code? Is written in some register?

Thank you

0 Chuck Davenport over 11 years ago in reply to matteo lucarelli

TI__Guru 59540 points

Hello Matteo,

My apologies for the delays in responding to your posts. Generally, we recommend use of the PMU within the CPU for performance monitoring/measurement. However, it is possible to utilize the RTI in a similar manner if you account for the cycles to read counters.

The free running counter within the RTI will be clocked by the RTICLK and the frequency will be dependent on which clock source you select as the input to the RTICLK. Generally, the RTI SRC will be either OSCIN, HCLK, or VCLK.

I have also copied an expert on the device you are using in case he may have any additional insight or comments to make on your questions.

0 matteo lucarelli over 11 years ago in reply to Chuck Davenport

Genius 3265 points

Hi Chuck,

thank you for the reply. So to obtain the RTICLK frequency I must check the source and then get the frequency of the actual source. Right?

I could also use the PMU ... If I knew what is..!

I've found it only referenced in the datasheet, but nothing on the TRM, perhaps I'm looking in the wrong place.

Can you give me some detail about? How is it supposed to be used?

Matteo

0 Chuck Davenport over 11 years ago in reply to matteo lucarelli

TI__Guru 59540 points

Hello Matteo,

You can find out what the RTICLK frequency is by checking the RCLKSRC register located at offset 50h within the system module frame. This will give you which clock source is used to drive the RTICLK and if there are any dividers being used to divide down the RCLKSRC.

Also, you asked earlier which register to read for the free running counter. I believe this is the RTIFRC0 located in the RTI register space.

In regard to the PMU, this module is located within the core and is ARM IP. As such it is described in the Cortex-R4 TRM available through the ARM website. There are also additional posts about the PMU with the forum that will provide useful information by searching for PMU.

0 Jean-Marc Mifsud over 11 years ago in reply to Chuck Davenport

TI__Mastermind 22375 points

Matteo,

The PMU is supported in Halcogen. There is a tab "R4-MPU-PMU.
PMU has 3 counter that can be setup to count specific CPU events has:
Total cycle count,
Number of Instruction executed,
Number of exception......

I will try to prepare an example, but I'm busy the next 3 days.

0 Charles Tsai over 11 years ago in reply to Jean-Marc Mifsud

TI__Guru**** 190456 points

Hello Matteo,

Here is one example to use the PMU. The code is generarated by Halcogen. The three event counters are used to count the number of cycles, the number of instructions executed and the number of reads.

3755.PMU_example.zip

regards,

Charles

0 Charles Tsai over 11 years ago in reply to Charles Tsai

TI__Guru**** 190456 points

I forgot to say that you can refer to the sys_pmu.h where all the available PMU events that can be used. All the events are enumurated in the pmuEvent type. I only use three of them for the example. You can try other events if you want. Note that some events may have no effect such as counting cache misses as cache is not implemented.

regards,

Charles

0 matteo lucarelli over 11 years ago in reply to Charles Tsai

Genius 3265 points

Hi all,

thank you for the answers.

Well the PMU seems more practical than the RTI but I don't understand which counter to use to have a free running count at a known frequency. The cycle count does not seems to be time-regular.

Thank you

0 Jean-Marc Mifsud over 11 years ago in reply to matteo lucarelli

TI__Mastermind 22375 points

Matteo,

In the example sent by Charles, the g_pmuCounter0 is used to count total of cycles.
This is the one you want to use.
The reason why the cycle count is not always constant is the result of the Branch Prediction Unit.
When you reset the CPU, if Branch Prediction Unit is cleared and all entries in it's table are wiped out.
Once the cpu executes code and branches, it will "learn" your code. So it is normal to see differences from run to run, especially if you monitor a function that is called many time from a loop.
Run after run, the cycle count for this specific function should get better and reach a plateau.

I will suggest to refer to the ARM CORTEX R4 TRM to understand better the PMU and Branch Prediction Unit.

0 matteo lucarelli over 11 years ago in reply to Jean-Marc Mifsud

Genius 3265 points

So PMU cannot be really used to measure time as a constant frequency counter. Right?

If I understand correctly my way is then the RTI.

0 Jean-Marc Mifsud over 11 years ago in reply to matteo lucarelli

TI__Mastermind 22375 points

Matteo,

What I've tried to explain in my previous post is the way the CPU behaves.
The PMU counts the number of cycles the CPU uses to execute a given task.
On a for-loop kind of code, the CPU execution time may vary from loop to loop.

Here is an extract from the Cortex R4 TRM concerning the Branch Prediction.

5.2 Branch prediction

The PFU normally fetches instructions from sequential addresses. If a branch
instruction is fetched, the next instruction to be fetched can only be determined with
certainty after the instruction has completed execution at the end of the pipeline in the
DPU. If the branch is taken, the next instruction to be executed is not sequential. The
sequential instructions that the PFU has fetched while the branch instruction was
executing must be flushed from the pipeline and the correct instruction fetched. This has
the effect of reducing the performance of the processor.
The PFU can detect branches in the Pd-stage of the pipeline, predict whether or not the
branch is taken, and determine or predict the target address for a taken branch. This
enables the PFU to start fetching instructions at the destination of a taken branch before
the branch has completed execution in the DPU. The branch instruction is still executed
in the DPU to determine the accuracy of the prediction. If the branch was mispredicted,
the pipeline must be flushed and the correct instruction fetched. In general, more
branches are correctly predicted than mispredicted so fewer pipeline flushes occur and
the performance of the processor is enhanced.
Two major classes of branch are addressed in the processor prediction scheme:
1. Direct branches, including B, BL, CZB, and BLX immediate, where the target address
is a fixed offset, encoded in the instruction, from the program counter. If such an
instruction has been fetched, and the program counter is known, predicting the
destination of the branch only involves predicting whether the instruction passes
or fails its condition code, that is, whether the branch is taken or not taken.
2. Indirect branches such as load and Branch and eXchange (BX), instructions which
write to the PC, that can be identified as a likely return from a procedure call. Two
identifiable cases are:
• loads to the PC from an address derived from R13
• BX from R0-R14.
In these cases, if the calling operation can also be identified, the likely return
address can be stored in the return stack. Typical calling operations are BL and BLX
instructions.

There is a lot to read in the TRM. Do you have access the ARM Cortex R4 TRM?

0 matteo lucarelli over 11 years ago in reply to Jean-Marc Mifsud

Genius 3265 points

Yes, I have downloaded and read (although not completely understood) the Cortex R4 TRM.

My last question was: so PMU cycle count is meant to be used to performance measure but cannot be used as a time base. The answer seems "yes" ...or there's some other mode in the PMU to measure time?

Elsewhere RTI is the way to go to measure time. Right?

0 Jean-Marc Mifsud over 11 years ago in reply to matteo lucarelli

TI__Mastermind 22375 points

Matteo,

PMU counts events (Many kind as you can see in the TRM)
CPU cycles are view as event.
It is possible to configure the PMU to count CPU cycles. The functionality is similar as a chronograph.
You reset the timer, start and read at different times and eventually stop the counter.

If what you need is an event (Interrupt) at specific rate, so RTI is the solution.
RTI is able to generate 4 independent interrupts with different period.

0 Chuck Davenport over 11 years ago in reply to Jean-Marc Mifsud

TI__Guru 59540 points

Matteo,

Looking back at your original post, I believe I understand what you are after now. Correct me if I am wrong, but what you are looking for is something on the order of a tick counter running in the background at a minimum of 1MHz. The confution came in your request for a performance monitor which is the purpose of the PMU which counts cycles of execution.

If this is the case, the RTI counter may serve your needs provided you can set the divider large enough to slow down the source clock for the RTICLK enough to achieve an acceptable clock/counter frequency and granularity you are looking for.

Arm-based microcontrollers

Arm-based microcontrollers forum

Real time performance counter