This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/TDA3LA: IPU SMP load computation

Part Number: TDA3LA


Tool/software: TI-RTOS

Hello Experts,

We are adding SMP support in our SDK on IPU1 of TDA3xx processor. After making changes as suggested by SMP wiki, we are able to see it is working functionally (ROV shows tasks running on both cores).

We are seeing an issue for load computation of both cores and seeing load too high compared to single core.

As per http://processors.wiki.ti.com/index.php/SMP/BIOS#How_do_I_measure_CPU_Load_when_running_in_SMP_mode_.3F, we need to run idle tasks on both cores and CPU load needs to be measured separately on each core. Is it still valid, do we need to measure the idle time of each core?

Also wiki mentions about bug SDOCM00115771,  is it fixed? 

Thanks.

  • Hi Prasad,
    SDOCM00115771 was fixed in BIOS 6.42.02. What version of BIOS are you using?
    Thanks,
    Janet
  • Hello Janet,

    I am using bios_6_46_04_53 so it is fixed my package. According to wiki, to workaround SDOCM00115771 we had to use below methods
    - Call Load_reset() from main() before calling BIOS_start().
    - Register an idle function on Core 1, 2, ... that calls Load_updateCurrentThreadTimer()

    As issue is fixed,
    1. Do i need to remove Load_reset and Load_updateCurrentThreadTimer from my code?
    2. Do i still need to register separate idle functions for both cores?
  • Hello Janet,

    Also i am seeing some issue in load calculation in SMP case. I see the totalTime count not matching for both cores even though they are called together.

    Below is code snippet for load calculation.

           /* Get the all loads first */

           idlTskHndl[0U] = Task_getIdleTaskHandle(0U);

           Load_getTaskLoad(idlTskHndl[0U], &idlTskLoadStat[0U]);

           Load_getGlobalHwiLoad(&hwiLoadStat);

           Load_getGlobalSwiLoad(&swiLoadStat);

    #if defined(BUILD_M4_0) && defined(IPU1_SMP_BIOS_INCLUDE)

           idlTskHndl[1U] = Task_getIdleTaskHandle(1U);

           Load_getTaskLoad(idlTskHndl[1U], &idlTskLoadStat[1U]);

    #endif

           time64 = (uint64_t)pAccPrfLoadLcl->totalIdlTskTimeLo[0U] & 0xFFFFFFFFU;

           temp = (uint64_t)pAccPrfLoadLcl->totalIdlTskTimeHi[0U] & 0xFFFFFFFFU;

           time64 = time64 | (temp << 32U);

           time64 += idlTskLoadStat[0U].threadTime;

           pAccPrfLoadLcl->totalIdlTskTimeLo[0U] = time64 & 0xFFFFFFFFU;

           pAccPrfLoadLcl->totalIdlTskTimeHi[0U] = (time64 >> 32U) & 0xFFFFFFFFU;

    #if defined(BUILD_M4_0) && defined(IPU1_SMP_BIOS_INCLUDE)

           uint32_t temp2,temp3;

           temp2 = idlTskLoadStat[0U].totalTime;

           temp3 = idlTskLoadStat[0U].threadTime;

           /* Save core1 CPU idle stats in totalIdlTskTimeLo[1U] for core1 CPU

              load computation */

           time64 = (uint64_t)pAccPrfLoadLcl->totalIdlTskTimeLo[1U] & 0xFFFFFFFFU;

           temp = (uint64_t)pAccPrfLoadLcl->totalIdlTskTimeHi[1U] & 0xFFFFFFFFU;

           time64 = time64 | (temp << 32U);

           time64 += idlTskLoadStat[1U].threadTime;

           pAccPrfLoadLcl->totalIdlTskTimeLo[1U] = time64 & 0xFFFFFFFFU;

           pAccPrfLoadLcl->totalIdlTskTimeHi[1U] = (time64 >> 32U) & 0xFFFFFFFFU;

    #endif

    Vps_printf( " Totaltime core0=%d, core1=%d, hwi=%d, swi=%d\n", temp2, idlTskLoadStat[1U].totalTime, hwiLoadStat.totalTime, swiLoadStat.totalTime);

    When printed these values i see much difference in the counts as shown in below prints. This causes issue in total load computation as I add both cores along with swi, hwi and tasks load.

    [IPU1-0] 88.316110 s: Totaltime core0=10003338, core1=10003350, hwi=10001604, swi=10001604
    [IPU1-0] 88.816903 s: Totaltime core0=10015611, core1=10015608, hwi=10015647, swi=10015647
    [IPU1-0] 89.317910 s: Totaltime core0=10019844, core1=10019853, hwi=10019958, swi=10019958
    [IPU1-0] 89.819038 s: Totaltime core0=10022043, core1=10022037, hwi=10021959, swi=10021959
    [IPU1-0] 90.319953 s: Totaltime core0=10018053, core1=10018053, hwi=10018026, swi=10018026
    [IPU1-0] 90.820960 s: Totaltime core0=10019910, core1=10019910, hwi=10019916, swi=10019916
    [IPU1-0] 91.321966 s: Totaltime core0=10019970, core1=10019967, hwi=10019964, swi=10019964
    [IPU1-0] 91.823064 s: Totaltime core0=10021551, core1=10021551, hwi=10021575, swi=10021575
    [IPU1-0] 92.324132 s: Totaltime core0=10020981, core1=10020987, hwi=10020978, swi=10020978
    [IPU1-0] 92.824132 s: Totaltime core0=10000065, core1=10000062, hwi=10000071, swi=10000071
    [IPU1-0] 93.325016 s: Totaltime core0=10017534, core1=10017531, hwi=10017495, swi=10017495
    [IPU1-0] 93.826084 s: Totaltime core0=10020801, core1=10020804, hwi=10020852, swi=10020852
    [IPU1-0] 94.327060 s: Totaltime core0=10019181, core1=10019178, hwi=10019178, swi=10019178
    [IPU1-0] 94.828066 s: Totaltime core0=10019904, core1=10019907, hwi=10019979, swi=10019979
    [IPU1-0] 95.328920 s: Totaltime core0=10011096, core1=10011087, hwi=10010988, swi=10010988
    [IPU1-0] 95.829256 s: Totaltime core0=10012434, core1=10012437, hwi=10012395, swi=10012395
    [IPU1-0] 96.330110 s: Totaltime core0=10016604, core1=10016604, hwi=10016613, swi=10016613
    [IPU1-0] 96.831116 s: Totaltime core0=10019859, core1=10019859, hwi=10019913, swi=10019913
    [IPU1-0] 97.332123 s: Totaltime core0=10019982, core1=10019982, hwi=10019982, swi=10019982
    [IPU1-0] 97.833190 s: Totaltime core0=10014138, core1=10014138, hwi=10020228, swi=10020228

  • Hi Prasad,

    I'm not sure what the problem is, but maybe you can try running one of our regression tests and see if that passes.  I loaded this test on one of the M3 cores of a TI8148 and it passed.  It's not calculating any swi or hwi load.  Here are the .c and .cfg files.

    /*
     * Copyright (c) 2016, Texas Instruments Incorporated
     * All rights reserved.
     *
     * Redistribution and use in source and binary forms, with or without
     * modification, are permitted provided that the following conditions
     * are met:
     *
     * *  Redistributions of source code must retain the above copyright
     *    notice, this list of conditions and the following disclaimer.
     *
     * *  Redistributions in binary form must reproduce the above copyright
     *    notice, this list of conditions and the following disclaimer in the
     *    documentation and/or other materials provided with the distribution.
     *
     * *  Neither the name of Texas Instruments Incorporated nor the names of
     *    its contributors may be used to endorse or promote products derived
     *    from this software without specific prior written permission.
     *
     * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
     * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
     * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
     * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
     * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
     * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
     * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
     * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
     * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
     * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
     * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     */
    
    /*
     *  ======== LoadTest2.c ========
     *  A simple test to check Task loads on two cores.  The load tasks should
     *  have loads of about 25% (core 0) and 35% (core 1).  The corresponding
     *  idle task loads should be about 75% and 65%.  This test can be used to
     *  verify that the Load module computes the correct load, even if timers
     *  are not initialized until after main().
     */
    
    #include <xdc/std.h>
    
    #define ti_sysbios_utils_Load__internalaccess 1
    
    #include <xdc/runtime/Assert.h>
    
    #include <xdc/runtime/System.h>
    #include <xdc/runtime/Timestamp.h>
    #include <xdc/runtime/Types.h>
    
    #include <ti/sysbios/BIOS.h>
    #include <ti/sysbios/knl/Clock.h>
    #include <ti/sysbios/knl/Task.h>
    
    #include <ti/sysbios/utils/Load.h>
    
    #include <xdc/cfg/global.h>
    
    #define NLOOPS        3
    
    #define TASKLOAD_0 25   /* For a load of about 25% */
    #define IDLELOAD_0 (100 - TASKLOAD_0)
    
    #define TASKLOAD_1 35   /* For a load of about 35% */
    #define IDLELOAD_1 (100 - TASKLOAD_1)
    
    #define THRESHOLD 2
    #define WITHIN_THRESHOLD(v, p) \
            (((v) - THRESHOLD < (p)) && ((v) < ((p) + THRESHOLD)))
    
    
    Void loadTask0Fxn(UArg arg1, UArg arg2);
    Void loadTask1Fxn(UArg arg1, UArg arg2);
    Void printLoads(UArg arg1, UArg arg2);
    
    Task_Handle loadTask_0, loadTask_1;
    Int updateCount = 0;
    
    static struct Load_Module_State *Load_module;
    
    
    
    /*
     *  ======== main ========
     */
    Int main(Int argc, Char* argv[])
    {
        Task_Params taskParams;
    
        Task_Params_init(&taskParams);
        taskParams.affinity = 0;
        loadTask_0 = Task_create(loadTask0Fxn, &taskParams, NULL);
    
        taskParams.affinity = 1;
        loadTask_1 = Task_create(loadTask1Fxn, &taskParams, NULL);
    
        /*
         *  The Load module's t0 is initialized to 0 in Load.xs.  This is the
         *  start of the Load update window.  We need to initialize the start
         *  of the update window start time to the current time.
         *  This should prevent the Load module from doing an update too early.
         *  Eg, with ducati timestamp, the counter may be at some verify high
         *  value, which would cause an immediate update from the Load idle
         *  function.
         */
        Load_module = (Load_Module_State *)&ti_sysbios_utils_Load_Module__state__V;
        Load_module->t0 = Timestamp_get32();
    
        BIOS_start();
    
        return (0);
    }
    
    /*
     *  ======== loadTask0Fxn ========
     */
    Void loadTask0Fxn(UArg arg1, UArg arg2)
    {
        UInt32 startTicks;
        UInt32 curTicks;
    
        for (;;) {
            Task_sleep(IDLELOAD_0);
    
            curTicks = startTicks = Clock_getTicks();
    
            while (curTicks - startTicks < TASKLOAD_0) {
                curTicks = Clock_getTicks();
            }
        }
    }
    
    /*
     *  ======== loadTask1Fxn ========
     */
    Void loadTask1Fxn(UArg arg1, UArg arg2)
    {
        UInt32 startTicks;
        UInt32 curTicks;
    
        for (;;) {
            Task_sleep(IDLELOAD_1);
    
            curTicks = startTicks = Clock_getTicks();
    
            while (curTicks - startTicks < TASKLOAD_1) {
                curTicks = Clock_getTicks();
            }
        }
    }
    
    /*
     *  ======== printLoads ========
     */
    Void printLoads(UArg arg1, UArg arg2)
    {
        Load_Stat stat;
        UInt      idlTaskLoad_0;
        UInt      loadTaskLoad_0;
        UInt      idlTaskLoad_1;
        UInt      loadTaskLoad_1;
    
        updateCount++;
    
        if (updateCount > NLOOPS) {
            System_printf("LoadTest2 finished!\n");
            BIOS_exit(0);
        }
    
        /* Check loads on core 0 */
        Load_getTaskLoad(Task_getIdleTaskHandle(0), &stat);
        idlTaskLoad_0 = Load_calculateLoad(&stat);
    
        Load_getTaskLoad(loadTask_0, &stat);
        loadTaskLoad_0 = Load_calculateLoad(&stat);
    
        if (!WITHIN_THRESHOLD(loadTaskLoad_0, TASKLOAD_0)) {
            System_printf("Core 0 Load Task load: %d\n", loadTaskLoad_0);
        }
        if (!WITHIN_THRESHOLD(idlTaskLoad_0, IDLELOAD_0)) {
            System_printf("Core 0 Idle Task load: %d\n", idlTaskLoad_0);
        }
    
        /* Check loads on core 1 */
        Load_getTaskLoad(Task_getIdleTaskHandle(1), &stat);
        idlTaskLoad_1 = Load_calculateLoad(&stat);
    
        Load_getTaskLoad(loadTask_1, &stat);
        loadTaskLoad_1 = Load_calculateLoad(&stat);
    
        if (!WITHIN_THRESHOLD(loadTaskLoad_1, TASKLOAD_1)) {
            System_printf("Core 1 Load Task load: %d\n", loadTaskLoad_1);
        }
        if (!WITHIN_THRESHOLD(idlTaskLoad_1, IDLELOAD_1)) {
            System_printf("Core 1 Idle Task load: %d\n", idlTaskLoad_1);
        }
    
        Assert_isTrue(WITHIN_THRESHOLD(loadTaskLoad_0, TASKLOAD_0), NULL);
        Assert_isTrue(WITHIN_THRESHOLD(idlTaskLoad_0, IDLELOAD_0), NULL);
    
        Assert_isTrue(WITHIN_THRESHOLD(loadTaskLoad_1, TASKLOAD_1), NULL);
        Assert_isTrue(WITHIN_THRESHOLD(idlTaskLoad_1, IDLELOAD_1), NULL);
    }
    

    LoadTest2.cfg

    Best regards,

    Janet

  • Hello Janet,

    Can you please answer my earlier questions?

    As issue is fixed, 
    1. Do i need to remove Load_reset and Load_updateCurrentThreadTimer from my code?
    2. Do i still need to register separate idle functions for both cores?

    I dont see you adding separate idle functions for  each core in SMP as mentioned in the wiki?

  • Hi Prasad,

    No, you don't need to call Load_reset() and Load_updateCurrentThreadTime() from your code.  The Load module adds an idle function for all cores but core 0, that calls Load_updateCurrentThreadTime().  So you no longer need to do this.

    Best regards,

       Janet

  • Does this mean idle function is only needed on core0?
  • The idle loop (Idle_run) is run on all cores.  Only the idle functions designated to run on a particular core are run on that core.  Here is the code for Idle_run():

    /*
     *  ======== Idle_run ========
     */
    Void Idle_run(Void)
    {
        Int i;

        /* CWARN.CONSTCOND.IF */
        if (BIOS_smpEnabled == TRUE) {
            /* UNREACH.GEN */
            UInt coreId = Core_getId();
            for (i = 0; i < Idle_funcList.length; i++) {
                if (Idle_coreList.elem[i] == coreId) {
                    Idle_funcList.elem[i]();
                }
            }
        }
        else {
            for (i = 0; i < Idle_funcList.length; i++) {
                Idle_funcList.elem[i]();
            }
        }
    }

    In the case of SMP, the Load module uses Idle.addCoreFunc() to add to Idle_funcList, specifying the core that the function is to be run on.  The code in Load.xs to add the idle functions on cores other than core 0 is this:

                for (var i = 1; i < numCores; i++) {
                    Idle.addCoreFunc(Load.updateCurrentThreadTime, i);
                }

    So you see, Load_updateCurrrentThreadTime() is added for all cores but 0.  However, the Load module adds Load_idleFxn() using Idle.addFunc() with no core specified.  This causes Load_idleFxn() to run on all cores.  I hope that makes sense.

    Best regards,

    Janet