This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5726: CPU utilization in PET

Part Number: AM5726

Hi,

I would like to estimate the power consumption with the PET. However, I am worried about how to set a parameter of CPU utilization(%).
Please tell me how to estimate usage of each A15 and C66x core.
Can you provide sample software that will be a reference to estimate these utilization?
It is important to what extent CPU resources can be used for our system the thermal design and the power consumption.

Best regards,
H.U

  • The PET experts have been notified. They will respond here.
  • Hello H.U,

    The dhrystone benchmark is the highest power consumption reference application we have for the Cortex-A15 (runDhrystone command in the Linux SDK). 

    There are other DSP software examples in "/usr/share/ti/examples/opencl/" and at least one of them can be modified for taking power measurements. I took the runOclFloatCompute.sh program and modified it to remove any computations done on ARM. Then I also increased the test size and made the test repeat long enough to capture power measurements. See the zip attached (runs on Processor SDK Linux v4.2). float_compute.tar.gz

    We otherwise don't have a way to correlate your specific application to a % number to use in the Power Estimation Tool. 

  • Hi Ahmad_Rashed,

    Thank you for your reply.
    Is my thinking correct that he runDhrystone command runs a single A15 core with almost 100% utilization and the DSP software you provide runs the dual DSP core with 100% utilization?

    I am using RTOS SDK, is there a way to evaluate in this environment?

    Best regards,
    H.U

  • Hi H.U,

    It's difficult to make an assumption. The Dhrystone test is 100% integer only so you are using 100% of a subsystem in the Cortex-A15. It is the highest power in that sense. But an application that uses both integer and float could have higher or lower power consumption while still at 100% utilization. You would have to run your own application and characterize it. 

    The DSP test is just an OpenCL coding sample that I modified to loop long enough for power measurements. I think it does automatically deploy across all available DSPs. I would not say that this represents 100% power consumption (I haven't characterized the application so I can't give an estimate).  Power consumption on the DSP depends on how optimized the program is to the C66x architecture. Essentially how close can you can get to the raw limit of the architecture shown in table 1-1 of SPRUGH7? We have a high intensity test program in assembly code that maxes every part of the C66x core but the power is not representative of the real world so we don't share or support it. The coding sample is just float computation on an array, so it is not vectorized or DSP optimized at all.

    Unfortunately I don't have a way to run these programs in the RTOS environment. 

  • Hi H.U,

    I found an application note from Arm on how to run Dhrystone. See link. You could likely either take the source code and make an RTOS application out of it. Or you can run it on bare-metal / no OS. 

  • Hello H.U,

    A team member notified me that we have Dhrystone on RTOS. Navigate to <sdk-dir>/demos/posix-smp/bin/AM572x for the "out" binary file that you can run with CCS. Additional info / instructions: processors.wiki.ti.com/.../Processor_SDK_Posix-SMP_Demo
  • Hi Ahmad_Rashed,

    Thank you for your kindly support.
    I tried to test with your additional infomation : processors.wiki.ti.com/.../Processor_SDK_Posix-SMP_Demo

    The following DMIPS values were obtained from the test results of the A15 core:
    Since I was using OPP_NOM, the maximum value of A15's DMIPS is 3500(= 1Gh * 3.5 DMIPS/MHz), so I think the CPU utilization of this test program for A15 core is 88.74% (= 3106/3500 * 100).


    In the same way, the following DMIPS values were obtained from the test results of the C66x core.
    Since I was using OPP_NOM, the maximum value of C66x's DMIPS is 4800(= 600MHz * 8 ), so I think the CPU utilization of this test program for C66x core is 5.02% (= 241/4800 * 100).

    Is the above method of calculating my CPU resource correct?

    Best regards,
    H.U

  • Hi Ahmad_Rashed,

    I would appreciate your immediate attention to this issue, We need your help..

    Best regards,
    H.U

  • Sorry for the delay I am investigating this to see the proper way to respond.
  • H.U. -san

    Ahmad has been discussing this thread internally and  looking to provide additional guidance on this.

    However I wanted to help calibrate the expectations on this.

    Your original question was

    Can you provide sample software that will be a reference to estimate these utilization? It is important to what extent CPU resources can be used for our system the thermal design and the power consumption.

    In general for most of our power estimation tools/spreadsheet for ARM the worst case test we use is Dhrystone , this is consistent with most other vendors. 

    For DSP we typically use an "internal only" design hand coded test that essentially has 8 instructions per cycle (IPC of 8) running every cycle in an infinite loop. 

    This is a pseudo max test and represents 100% max utilization. 

    We are confirming if this is also the same test used for AM57x PET. 

    If it is confirmed, I want to make sure it is understood

    1) We do not share this internal test 

    2) In general no core kernels or real world applications come close to achieving an IPC of 8, so for all practical purposes you should be ok to put 50% utilization. This will account for typical parallelism and also any cache/memory access stalls etc. If you wanted to be more conservative you can try to keep 75% utilization. 

    This should be used as guiding principle to put something in the PET.

    If your customer has their algorithms already ready , they can try to run those too for actual thermal/application profiling. 

    3) If they do not have their application test ready, they can use any of the DSP algorithms in the processor sdk, or use some DSP core benchmarks like the one listed here

    There is an application note that shows how these benchmarks can be run - to measure power , you may need to modify them to run in a continuous / infinite loop instead of just one shot run. 

    Hope this helps.

    Regards

    Mukul