This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Real time performance counter

Other Parts Discussed in Thread: HALCOGEN

Hi,

I'm working with RM48HDK and need some kind of automatic counter updated at a known precise frequency (at least 1MHz) whithout the CPU intervention (so not using an IRQ). This counter will be access in read-only mode to be used for elapse time verification.

Probably there are more than a way to do this. Which one is the simpler?

Thank you,

Matteo

  • Anyone? Nobody wants to give me some example code or some instructions for using the RTI as a performance/benchmarking counter?

  • The problem could be soved by simply putting in my code:

       rtiInit();

       rtiStartCounter(rtiCOUNTER_BLOCK0);

    and when I need it:

       uint32 frc = rtiREG1->CNT[0].FRCX;

    This will give me the value of the "free running counter" but nowhere (AFAIK) inside the HalCoGen is indicated a FRCCLK. Which frequency this counter goes? Do this one is the "PLL2" or the "RTI1CLK" inside HalCoGen?

    Can I obtain its actual frequency inside the code? Is written in some register?

    Thank you

  • Hello Matteo,

    My apologies for the delays in responding to your posts. Generally, we recommend use of the PMU within the CPU for performance monitoring/measurement. However, it is possible to utilize the RTI in a similar manner if you account for the cycles to read counters.

    The free running counter within the RTI will be clocked by the RTICLK and the frequency will be dependent on which clock source you select as the input to the RTICLK. Generally, the RTI SRC will be either OSCIN, HCLK, or VCLK.

    I have also copied an expert on the device you are using in case he may have any additional insight or comments to make on your questions.

  • Hi Chuck,

    thank you for the reply. So to obtain the RTICLK frequency I must check the source and then get the frequency of the actual source. Right?

    I could also use the PMU ... If I knew what is..!

    I've found it only referenced in the datasheet, but nothing on the TRM, perhaps I'm looking in the wrong place.

    Can you give me some detail about? How is it supposed to be used?

    Matteo

  • Hello Matteo,

    You can find out what the RTICLK frequency is by checking the RCLKSRC register located at offset 50h within the system module frame. This will give you which clock source is used to drive the RTICLK and if there are any dividers being used to divide down the RCLKSRC.

    Also, you asked earlier which register to read for the free running counter. I believe this is the RTIFRC0 located in the RTI register space.

    In regard to the PMU, this module is located within the core and is ARM IP. As such it is described in the Cortex-R4 TRM available through the ARM website. There are also additional posts about the PMU with the forum that will provide useful information by searching for PMU.

  • Matteo,

    The PMU is supported in Halcogen. There is a tab "R4-MPU-PMU.
    PMU has 3 counter that can be setup to count specific CPU events has:
    Total cycle count,
    Number of Instruction executed,
    Number of exception...... 

    I will try to prepare an example, but I'm busy the next 3 days.
     

  • Hello Matteo,

    Here is one example to use the PMU. The code is generarated by Halcogen. The three event counters are used to count the number of cycles, the number of instructions executed and the number of reads.

    3755.PMU_example.zip

    regards,

    Charles

  • I forgot to say that you can refer to the sys_pmu.h where all the available PMU events that can be used. All the events are enumurated in the pmuEvent type. I only use three of them for the example. You can try other events if you want. Note that some events may have no effect such as counting cache misses as cache is not implemented.

    regards,

    Charles

  • Hi all,

    thank you for the answers.

    Well the PMU seems more practical than the RTI but I don't understand which counter to use to have a free running count at a known frequency. The cycle count does not seems to be time-regular.

    Thank you

  • Matteo,

    In the example sent by Charles, the g_pmuCounter0 is used to count total of cycles.
    This is the one you want to use.
    The reason why the cycle count is not always constant is the result of the Branch Prediction Unit.
    When you reset the CPU, if  Branch Prediction Unit is cleared and all entries in it's table are wiped out.
    Once the cpu executes code and branches, it will "learn" your code. So it is normal to see differences from run to run, especially if you monitor a function that is called many time from a loop.
    Run after run, the cycle count for this specific function should get better and reach a plateau.

    I will suggest to refer to the ARM CORTEX R4 TRM to understand better the PMU and Branch Prediction Unit.

     

  • So PMU cannot be really used to measure time as a constant frequency counter. Right?

    If I understand correctly my way is then the RTI.

  • Matteo,

    What I've tried to explain in my previous post is the way the CPU behaves.
    The PMU counts the number of cycles the CPU uses to execute a given task.
    On a for-loop kind of code, the CPU execution time may vary from loop to loop.

    Here is an extract from the Cortex R4 TRM concerning the Branch Prediction.

    5.2 Branch prediction


    The PFU normally fetches instructions from sequential addresses. If a branch
    instruction is fetched, the next instruction to be fetched can only be determined with
    certainty after the instruction has completed execution at the end of the pipeline in the
    DPU. If the branch is taken, the next instruction to be executed is not sequential. The
    sequential instructions that the PFU has fetched while the branch instruction was
    executing must be flushed from the pipeline and the correct instruction fetched. This has
    the effect of reducing the performance of the processor.
    The PFU can detect branches in the Pd-stage of the pipeline, predict whether or not the
    branch is taken, and determine or predict the target address for a taken branch. This
    enables the PFU to start fetching instructions at the destination of a taken branch before
    the branch has completed execution in the DPU. The branch instruction is still executed
    in the DPU to determine the accuracy of the prediction. If the branch was mispredicted,
    the pipeline must be flushed and the correct instruction fetched. In general, more
    branches are correctly predicted than mispredicted so fewer pipeline flushes occur and
    the performance of the processor is enhanced.
    Two major classes of branch are addressed in the processor prediction scheme:
    1. Direct branches, including B, BL, CZB, and BLX immediate, where the target address
    is a fixed offset, encoded in the instruction, from the program counter. If such an
    instruction has been fetched, and the program counter is known, predicting the
    destination of the branch only involves predicting whether the instruction passes
    or fails its condition code, that is, whether the branch is taken or not taken.
    2. Indirect branches such as load and Branch and eXchange (BX), instructions which
    write to the PC, that can be identified as a likely return from a procedure call. Two
    identifiable cases are:
    • loads to the PC from an address derived from R13
    • BX from R0-R14.
    In these cases, if the calling operation can also be identified, the likely return
    address can be stored in the return stack. Typical calling operations are BL and BLX
    instructions.

     There is a lot to read in the TRM. Do you have access the ARM Cortex R4 TRM?

  • Yes, I have downloaded and read (although not completely understood) the Cortex R4 TRM.

    My last question was: so PMU cycle count is meant to be used to performance measure but cannot be used as a time base. The answer seems "yes" ...or there's some other mode in the PMU to measure time?

    Elsewhere RTI is the way to go to measure time. Right?

  • Matteo,

    PMU counts events (Many kind as you can see in the TRM)
    CPU cycles are view as event.
    It is possible to configure the PMU to count CPU cycles. The functionality is similar as a chronograph.
    You reset the timer, start and read at different times and eventually stop the counter. 

    If what  you need is an event (Interrupt) at specific rate, so RTI is the solution.
    RTI is able to generate 4 independent interrupts with different period.

  • Matteo,

    Looking back at your original post, I believe I understand what you are after now. Correct me if I am wrong, but what you are looking for is something on the order of a tick counter running in the background at a minimum of 1MHz. The confution came in your request for a performance monitor which is the purpose of the PMU which counts cycles of execution.

    If this is the case, the RTI counter may serve your needs provided you can set the divider large enough to slow down the source clock for the RTICLK enough to achieve an acceptable clock/counter frequency and granularity you are looking for.