AM6442: R5F FreeRTOS CPU load measurement grossly incorrect!?

Part Number: AM6442

Hi,

I have a small application mainly consisting of a 10 us period ISR that services a FPGA connected via GPMC. Using a 125 MHz ECAP counter, my ISR (the user part) takes more than 1000 of the 1250 cycles of the period, i. e., the CPU load surely is > 80 %.

I also have a 1 ms clock ISR that does

    /* measure cpu load (percentage with 2 decimals, 100 = 1%) */
    clock1ms_cpuload = TaskP_loadGetTotalCpuLoad();
    TaskP_loadResetAll();

cheerfully returning the grossly incorrect clock1ms_cpuload = 865, suggesting a CPU load of 8.65 %.

The CPU does a lot of slow accesses to GPMC, uncached shared memory and other peripherals. Doesn't  the TaskP_loadGetTotalCpuLoad take this into account?

Any explanation?

Thanks & regards,

Frank

  • Hello Frank,
    The `TaskP_loadGetTotalCpuLoad()` function measures overall CPU utilization in FreeRTOS-based applications by calculating the percentage of time the CPU is NOT idle.

     Basic Formula :
     
    CPU Load (%) = 100% - (Idle Task Time / Total Time) × 100%
     

    The function returns a value scaled by `TaskP_LOAD_CPU_LOAD_SCALE` (10000),
    so:
     **0** = 0.00% CPU usage
    **5000** = 50.00% CPU usage  
    **10000** = 100.00% CPU usage
    GPMC is a slow peripheral. I am not sure which memory type you interfaced—is it NAND, NOR, or PSRAM?

    If you still want more speed, then you need to use DMA with GPMC.

    How much throughput are you expecting?

    When the CPU tries to access uncached memory, the CPU takes more time compared to accessing cached Memory.

    What is the use case here for the CPU to read from uncached memory (IPC) ?
    Regards,
    Anil.
     
  • Hello Swargam,

    thank you for your reply.

    We use 16 bit AD muxed NOR to access the FPGA registers, the accesses last some 70 - 100 ns each. We know how to speed up this: a) optimise FPGA code, b) use some burst access.

    We also fixed the cache problem: We now use r5fss0-1 TCMB instead of MSRAM as shared memory for data exchange between the CPU cores r5fss0-0 and r5fss0-1: directly for the r5fss0-1 core communicating with the FPGA (non-cached @ 0x4101 0000) and cached @ 0x78x00 0000 for the other core. Speeds up a lot, mainly for the r5fss0-0.

    All this does not explain the IMHO wrong load measurement: We use a free running ECAP counter (0x to 0xFFFF FFFF) as a 125 MHz timestamp source and take time stamps reading this counter at the begin and at the end of the ISR. We get a difference > 1000 counts, which for a ISR period of 10 us = 1250 ECAP counts means a CPU load of > 1000/1250 = 80 %.

    We also read another counter from the FPGA and get similar results.

    TaskP_loadGetTotalCpuLoad() should return a value > 8000, not 865 or so.

    Thanks & regards,

    Frank

  • PS: CycleCounterP_getCount32() cannot be used as timestamp because it stops with the wfi instruction in the idle loop.