This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Using DSP/BIOS IDL_busyObj for CPU load in 5.41.02.14

I am trying to use the DSP/BIOS IDL_busyObj to compute CPU idle time, and hence CPU load, in bios 5.41.02.14 on a DM648. I understand how the 3 variables, IDL_busyObj.acc, IDL_busyObj.num, and IDL_busyObj.max are used to compute idle load. It works great in debug builds. However, in release builds,, the .num and .acc members are correct, but the .max is garbage (very large negative value). For some reason, the DSP/BIOS is not running the idle loop min cycle time calibration in the release build, though all of the bios settings are the same as in debug.

I tried adding

bios.IDL.AUTOCALCULATE=1; //auto-calculate idle loop cycle count
bios.GBL.ENABLEINST = true; //enable instrumentation

to the .tcf, just in case the defaults are different in release builds, but did not help.

Interestingly, when I run release build, and then start the CPU load graph in CCS (3.3), not only does it work, but suddenly the value of IDL_busyObj.max in my executable becomes valid! So, CCS is somehow kicking off the idle loop calibration, that is not being done by default by DSP/BIOS (again, problem is only in release builds).

Can anyone help me with this? Thank you in advance for any suggestions.

Jim Gort

 

loopCount = IDL_busyObj.num-loopCountold;
totalTime = IDL_busyObj.acc-totalTimeold;
curMinIdleLoop = IDL_busyObj.max;

  • Jim -

    It looks like you've seen this article, that's good.

    I don't have knowledge of how the minimum idle loop is calculated, so unfortunately I can't give a great answer here. 

    However, I did have a thought. You said the 'max' value is correct if you run in CCS? If you run your application repeatedly in CCS, is the 'max' value consistent?

    Because the 'max' field is used to calculate the overhead of the idle loop, which I imagine is a fixed value for your device, I wonder if you could simply hardcode the value into your application. That is, don't bother looking at the 'max' field at runtime, just use a fixed value for 'minIdleLoop'.

    Maybe give that a try and see if the load values you calculate seem reasonable?

    Thanks,

    Chris

  • Hi Chris-

    Thank you for your response.

    Yes, the "max" value is always basically the same in debug build, and it basically matches the value I get in release build after "enabling" it by starting CPU load trace in CCS. Hard-coding to this is the fallback plan. However, we want this code to work on multiple platforms (DM642 and DM648), and be robust to upgrading the DSP/BIOS, etc. Hence, I really want to do it the *right* way.

    In release build, it does not work even if I take off all optimizations, and tell it to generate debug info. At that point, there is no difference I can find between release and debug builds, other than we have an _DEBUG prepoc def in debug build (for our use). They are linking with same version of BIOS and libraries. Also, I know there is a way to make it work, if I can just figure out how to kick the OS like is done when CCS starts the CPU trace.

    I noticed a function in the map file (IDL_F_calibrate) but I cannot figure out how to explicitly call it to see if that is magic trick. Also, can't figure out why it would be called in debug and not release, though it seems a whole lot of BIOS decisions are made by tools based on .tcf, and they may take into account wether its release or debug?.  

    Jim

  • I'm not familiar with the RTA implementation in CCSv3, but I am in CCSv4, and I know that the CCSv4 solution doesn't do anything special to kick the target.  

    I built the BIOS stairstep example with the release profile, and ran it in BIOS 5 RTA in CCSv4. I get a reasonable value (326 on an evmDM6437) on the first read of the STS records.

    RTA reads the STS records by sending down a command to call RTA_F_getsts, and this reads and clears the STS object and copies it to a buffer to be sent to the host. This reading and clearing is done in the idle loop.

    At what point in the execution of your application are you reading the object? It might be a problem if you're trying to read it before main...

    Thanks,

    Chris

  • Btw, I found IDL_F_calibrate and it does look key to the calculation of the idle loop overhead. It has some constraints on when it's called, though, so I don't think it would work to call it yourself. You could try placing a breakpoint on it, though, to ensure that it's being called before you try to read the STS object?

    Thanks,

    Chris

  • Hi Chris-

    I got to the bottom of it. For progeny, here is what is going on:

    1) Not sure what IDL_F_calibrate function does, but both builds (release and debug) call it only twice at the very top of code.

    2) The IDL_busyObj.max, which is the "min execution time of the idle loop" is actually computed as the min delta time between two consecutive calls to the idle loop. Presumably, when nothing is going on, and IDL loop is called twice in a row, this will be reached.

    3) Contrary to documentation (at least in bios 5.41.02.14 for the DM648) the global IDL_busyObj.max is updated with min delta every time the idle loop is entered, not just a certain number of times through (10,000, I think, according to documentation).

    4) If the time between calls to idle loop is VERY LONG, the counter wraps, so that the delta is computed as positive (it is usually a small negative number), and when this happens, the large positive wrapped number becomes the new min, never to be changed. Of course, the total cycle count wraps also during this, but that doesn't bother me, because I am not yet to the point where I am using its delta's for CPU load. The free running cycle counter is a decrementing counter, and the "max" is computed as current-previous each time through idle loop, so it is usually small negative. It "wraps" when current-previous is more than 1/2 full scale, so that the delta is saved in "max" as a positive number.  

    This is what was happening in my release builds. Because I do a long atomic process (erase and write flash) prior to main execution loop, this was causing the .max to become a wrapped, nonsense value, pinned as "max". My work around is to just sleep for a bit, read the "max", and save it as a global prior to doing any long atomic operations.

    Running the CPU load graph in CCS must somehow reset the pinned MAX, which is what caused it to work (I would CPU load graph *after* the long atomic operation. If I start it before, it does not make it "work").

    Bugs in BIOS are 3) and 4). It *should* only test for min delta a limited number of times, as documented, not every time into idle loop. It *should* check for positive delta, and not consider it a min if it is positive! (this means negative number of cycles between 2 interations in idle loop--impossible!)

     

    Additional information to clarify usage, for others that want this functionality:

    Variables:

    extern STS_Obj IDL_busyObj;

    double m_load, m_percentIdle;
     int m_minIdleLoop;
     unsigned int m_loopCount, m_loopCountOld;
     unsigned int m_totalTime, m_totalTimeOld;

    1) At startup, when not much is going on, and from a task, set the min time of idle loop execution as follows: 

    TSK_sleep(10); // make sure we hit idle so first delta below not bad (in case
           // we just finished long atomic operation prior to calling this init)

     // reset the stats in case .max pinned to nonsense value prior to calling this init
     IDL_busyObj.max=0x80000000;
     IDL_busyObj.acc=0;
     IDL_busyObj.num=0;

     TSK_sleep(100); // make sure we have good read of min--lots of loops throug idle

     m_minIdleLoop = IDL_busyObj.max; //the minimum delta between idle loops, per BIOS
     m_minIdleLoop *= -1; //because its saved as negative

     m_loopCountOld = IDL_busyObj.num;  //init the number of times through idle loop
     m_totalTimeOld = IDL_busyObj.acc;  //init the total instruction count

    2) In your main loop, compute the (instantaneous) % Idle time as follows:

     m_loopCount = IDL_busyObj.num-m_loopCountOld; //positive counter
     m_totalTime = IDL_busyObj.acc-m_totalTimeOld;

     m_loopCountOld=m_loopCountOld+m_loopCount;
     m_totalTimeOld=m_totalTimeOld+m_totalTime;

     m_totalTime *= -1;  //decrementing counter
     
     if(m_totalTime>0) //totaltime=0 means no OS tics between calls
     {
      // Calculate the percentage of time (0 to 1) spent idle.
      m_percentIdle = ((double) m_loopCount * (double) m_minIdleLoop) / (double) m_totalTime;
     
      /* Calculate the CPU load as a value 0 - 100. */
      m_load = ((1 - m_percentIdle) * 100);

    }

    Note that the resolution is limited by the time through the idle loop, since idle time is computed by #times through * time_per_idle_loop. Also, the IDL_busyObj is only updated each DSP/BIOS tic. Hence, depending on how often the main loop executes, you may want to average m_load over many loops. Also note that m_totalTIme is a 32-bit counter, so you need to call 2) often enough so it does not wrap on itself.,

  • Hey Jame,
    Thanks for you reply. It is really very helpful. However, I am still confused since I am a newbie in this area.
    how to use the IDL_busyObj, is it required to modify the bios configuration?
    I am not sure whether the default tconfini.tcf should be the right one to modify or not. And what commands should I add into this .tcf file?
    Once I rebuild the bios after modification, should I port this new bios to the board where DSP resides? Or I can just use this as a part of tool chain for cross compilation.
    If you don't mind, could you please send me a copy of the modified .tcf file and the code to read DSP utilization? You can reach me by chengwang@wayne.edu. Thanks for your time.
    Regards,
    Cheng

  • Hi Cheng-

    You don't have to rebuild the bios. The .tcf configures which components of the bios will be included in your project. I'm not 100% certain which (if any) is required to include the IDL_busyObj, but the "special" configurations I have in my .tcf that may be related tot his are:

    bios.enableRealTimeAnalysis(prog);
    bios.enableRtdx(prog);
    bios.enableTskManager(prog);

    You can figure out which (if any) are required to have IDL_busyObj supported, but with all three you should be fine, assuming the version of bios you have supports it.

    Cheers,

    Jim

  • hi:

        i am very happy to see your reply. i have a problem to ask you ,it is about DSP load,  in the above ,the first step is must ?why? it is right to cal m_minIdleLoop like this.

     

                           thanks

             best regards

     

                         

  • Hi Fengwei-

    It is only necessary to calibrate the minIdleLoop as I did if your application can, as mine does, sometimes do chunks of processing in a task that has higher priority than the Bios Idle task. When this happens, the counter for the Bios minIdleLoop value wraps, and the Bios comes up with a negative value for the minIdleLoop, which is interpreted as a large positive number (both bugs in Bios, imho). The net result is you end up with a HUGE, erroneous, value for minIdleLoop, which ruins the CPU load calculation.

    minIdleLoop is *supposed* to be the minimum execution time of one idle loop, and is measured by the Bios by keeping track of the minimum delta counter between two executions of the idle loop. Thus, the minimum is reached when the idle loop is run consectutively (nothing else going on in the system). I create a way to explicitly do this (all tasks are sleeping), and just save the value.

    Hope this clears it up,

    Jim

  • Hi James Gort:

             I copy your program in CCS,but when compile and run it ,the value of m_load is so so large,this is the program in my project,please help me to check the error.

           Thank you very much.

    double m_load, m_percentIdle;
    int  long m_minIdleLoop;
    unsigned int m_loopCount, m_loopCountOld;
    unsigned int m_totalTime, m_totalTimeOld;

    void LoadInit(void)

     TSK_sleep(10);
     IDL_busyObj.max = 0x80000000;
     IDL_busyObj.acc = 0;
     IDL_busyObj.num = 0;
     
     TSK_sleep(100);
     
     m_minIdleLoop = IDL_busyObj.max;
     m_minIdleLoop *= -1;
     
     m_loopCountOld = IDL_busyObj.num;
     m_totalTimeOld = IDL_busyObj.acc;
    }
    void LoadMain(void)

     m_loopCount = IDL_busyObj.num - m_loopCountOld;   
     m_totalTime = IDL_busyObj.acc - m_totalTimeOld;
     
     m_loopCountOld = m_loopCountOld + m_loopCount;
     m_totalTimeOld = m_totalTimeOld + m_totalTime;
     
     m_totalTime *= -1; 
     
     m_percentIdle = ((double) m_loopCount * (double) m_minIdleLoop) / double) m_totalTime;
     m_load = (1 - m_percentIdle) * 100;
    }

  • Hi Fengwei-

    I'm a little confused--the code snippet that came through in email I got from your post is not the same as the one above. In the email, your code has DSP_OPERATING_RATE as divisor in computation of m_percentIdle. That should not be there--the m_percentIdle is just a ratio values that are all in units of "processor tics", and hence no constants should be used in calculating the % (that is, the snippet above is correct).

    Other than that, make sure that your "LoadMain" function above is called at least once every 500 millisecs or so. If there is too much time between calls, the IDL_busyObj.acc will wrap, giving wrong value for m_totalTime, usually smaller than it *should* be, hence giving a percentIdle of more than 100% (the more times it wraps, the larger your erroneous % idle will be).

    Also, you should have check m_loopCount is not zero--this can happen if you call LoadMain too quickly back-to-back, and the Bios has not entered the idle loop at all between calls. In this case, just don't compute the m_percentIdle.

    Finally, you should add a low-pass filter on the m_load, as it jitters due to fact that Idle loop may only be called a couple times (or zero) between calls to LoadMain.

    If you still have problems, please post the values of all of the variables in a call to LoadMain (break on it after you are in steady-state for a while), including the IDL_busyObj values, along with the delta time (as measured by Bios timer) between the call you broke on and the previous call, and then I can help you find problem.

    Jim

  • Hi James Gort

     In the first, I am appreciated that you reply me in season. My english is poor,it is possible that my expression is not nicety,I hope  you can forgive me .
    Next, about this program ,"LoadMain" function is called at every 1000ms,I think it is accord with you said. How i should do to add a low-pass filter on the m_load, please say explicitly.
    Fianlly, I have another problem to ask you . My platform is DM648, I want to  calculate maximal idle count, how i should do ,can you tell me a method.
    Thank you very much.
                            best regards
                  fengweichen
  • Hi,

    I'm just wondering if this method works together well with DSP/BIOS LINK? My task is blocked at calling MPCS_enter() but I'm not sure if the IDL_busyObj is updated during this time, because its fields don't seem to be changed at all. Any clues?

    Thanks,

    -Tamas

     

  • Hi Tamas-

    I'm not familiar with DSP LINK, but maybe I can help. The DSP/BIOS Idle task is the lowest priority task, so if any higher priority task is active, the Idle task will not be called. This becomes a problem if a task is doing long, atomic (meaning no task switching--no Sleep, no wait on semaphore, etc.) operation. The Idle task weill not be called until it is done. And, the IDL_busyObj is only updated when the Idle task is called.

    So, if the number of times the Idle task is called is zero, it simply means you are at 100% CPU load! Also, note the above thread for dealing with counter wrap issues in cases where you starve the Idle task for long periods of time.

    Jim

  • Hi Jim,

    Thanks for your reply. I think, it's clear from your earlier posts how this is supposed to work. Actually, my question was if the MPCS_enter() function call yields the CPU to the idle thread or not.

    Nevertheless, you gave me an idea: I inserted a TSK_sleep(10) just before calling MPCS_enter() so that my worker task surely gives the CPU to the idle task for some time. And now, IDL_busyObj is updated! I think, this is a rather disappointing result, because it means that MPCS_enter() does some busy-loop kind of thing rather than a sleep kind of thing. I wish, I was wrong. :-(

    Is there anybody out there knowing more about DSP/BIOS LINK to explain if it's compatible with IDL_busyObj? Any help is greatly appreciated.

    Thanks,

    -Tamás

  • Hi Tamas-

    In the little bit of research I did about MPCS_enter() before I replied earlier, I think you came to correct conclusion--it waits in a stupid polling loop with no sleep! Bad Programmer!

    :)

    JIm

     

  • very hard to understand!