This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28066: Task Entry Function is not called from SYSBIOS even though task priority is 14 and its mode is READY

Part Number: TMS320F28066
Other Parts Discussed in Thread: SYSBIOS

Hi,

I am having an issue where a Task Entry Function is not called from SYSBIOS even though its task priority is 14 (not -1) and its mode is READY (I checked the task stat by a different task at run time).  This issue occurs on some of our boards but does not occur on all other boards (the same/identical HW design).   This task is statically created.  If I just modify code to output some debug logs, the issue does not occur.  Also, the issue does not occur with debugger running.  I confirmed that the task stack and system stack (ISR) are large enough so there's no stack overflow.

My Environment:

CCS version: 5.5.0.00077 

Compiler version: TI v6.1.3

XDCtool version: 3.25.4.88

SYSBIOS version: 6.35.1.29

The task is statically created as in below (instance#0):

/* Object__table__V */
ti_sysbios_knl_Task_Object__ ti_sysbios_knl_Task_Object__table__V[6] = {
    {/* instance#0 */
        {
            ((ti_sysbios_knl_Queue_Elem*)((void*)&ti_sysbios_knl_Task_Object__table__V[0].qElem)),  /* next */
            ((ti_sysbios_knl_Queue_Elem*)((void*)&ti_sysbios_knl_Task_Object__table__V[0].qElem)),  /* prev */
        },  /* qElem */
        (xdc_Int)0xe,  /* priority */
        (xdc_UInt)0x4000,  /* mask */
        ((xdc_Ptr)0),  /* context */
        ti_sysbios_knl_Task_Mode_INACTIVE,  /* mode */
        ((ti_sysbios_knl_Task_PendElem*)0),  /* pendElem */
        (xdc_SizeT)0x200,  /* stackSize */
        ((void*)ti_sysbios_knl_Task_Instance_State_0_stack__A),  /* stack */
        0,  /* stackHeap */
        ((xdc_Void(*)(xdc_UArg,xdc_UArg))((xdc_Fxn)taskReceive)),  /* fxn */
        ((xdc_UArg)(0x0)),  /* arg0 */
        ((xdc_UArg)(0x0)),  /* arg1 */
        ((xdc_Ptr)0),  /* env */
        ((void*)ti_sysbios_knl_Task_Instance_State_0_hookEnv__A),  /* hookEnv */
        1,  /* vitalTaskFlag */
        0,  /* readyQ */
        (xdc_UInt)0x0,  /* curCoreId */
        (xdc_UInt)0x0,  /* affinity */
    },

If you have any info/clue,  please kindly provide support.

Thank you!

  • Hi,

    My first guess would be that a higher priority task is starving out the lower priority task. If the problem happen when you had ccs attached,  I'd recommend looking at ROV to see what task is running. Or enable logging and use the execution graph.

    Since the problem only occurs without the debugger, can you add an idle function that flashes an LED or GPIO. Then you can determine that starvation is probably occurring.  How are the higher priority tasks give up processor?

    Todd

  • Thanks for the reply.  This is the highest priority task so it should not be starved.  The task function is not even called.  Other lower priority tasks are fine.

  • What info if any can you gather from the system when the bad state occurs?

    Is the program out in the weeds or is another thread still running?

  • Other threads are running (task function is called).  There are two tasks with the priority 14 (highest in the system).  The other task with 14 is ok (task function is called and running) but this task is not (task function is not called) even though its task mode is READY  and its priority 14 (not -1).  I gathered this info from a different task using task_stat API.  Other tasks (lower priority)  are ok as well. 

  • The next set of questions would be:

    1.  What are you doing in the system?  You keep mentioning the priiority is not (-1), seems to suggest that at some point the Task priority is set to -1?

    2.  How easily can you reproduce the problem?  How long does it take you to see the problem?

    3.  Have you tried "run free" in the debugger to see if you can reproduce the issue?

  • 1)Yes.  At the beginning, one startup task (lowest priority task) runs (the rest of the tasks are set to -1 priority).   Then the startup task sets back the original valid priority. 

    In startup task,

    taskKey = Task_disable();

    // sets back the valid priority to all the tasks - Task_setPri
    Task_restore(taskKey);
    

    NOTE: The main reason why I mentioned the priority -1 is that a task with -1 priority is not executed so I just wanted to say this is not the case.

    2) this is not 100% reproducible.  But once it happens, then after that it happens easily.

    3)I tried to reproduce the issue with debugger connected but so far no luck.

    Thanks!

     

  • Hi, any help is appreciated.   what's the reason for the task function not being called even though its priority is 14 and mode is READY? I also confirmed that the correct function pointer is stored in the task object table, ti_sysbios_knl_Task_Object__table__V at run time.  The mode is READY but the task function is not called.  Also, lower priority tasks are being executed, which should be incorrect because this task is the highest priority task and the mode is READY.

  • Do you have a way to read state of memory and print it once the problem happens?

    Unless you have some way to do that, there is no way to tell where the Task is.

    Each Task level has a ready queue (double link list) associated with it.  Most likely what's happened is the Task was removed
    from the priority 14 ready queue.  If you could read memory, then we could look at the queues and try to figure out where the task is.

    In the Task module state there's a array of readyQ.  Is there anyway you can print the state of those queues from another Task?

  • Thanks for the reply.  I will try to print readyQ once the issue happens.

  • I have not output the entire readyQ in the log but I logged "readyQ" in the task object (ti_sysbios_knl_Task_Object__table__V).

    When the issue does not occur, this readyQ has 0xb4f8.  This address comes from the area below so it looks ok.

    0000b4c0 _ti_sysbios_knl_Task_Module_State_0_readyQ__A

    When the issue occurs, this readyQ has 0x38.  This comes from the area below.   

    00000037 ___PLAT__

    As per the OFD200 output, this is xdc.meta


    <2> "xdc.meta" Load Address: 0x00000000 Run Address: 0x00000000 Size: 0xf6 Alignment: 1 Loaded Onto Device: No Address Unit Size: 16 bits File Offset: 0xfb6 # Relocs: 0 Reloc File Offset: 0x00000000 # Lines: 0 Line File Offset: 0x00000000 TI-COFF s_flags: 0x00000050 TI-COFF s_flag: STYP_COPY TI-COFF s_flag: STYP_DATA

    If you have any info on this, please kindly advise.

  • I checked the readyQ in task module from a lower task. readyQ at index 14 has its own address in "next"/"prev" variable, which means that it's empty. I checked this value from a lower priority task so this value is the same regardless of the issue being there or not.

    Also, I checked one more thing, qelem value in task object table of this task.  When the issue is not there, this "qelem.next" in the task object table contains value to a task object table of another task with the same priority (14).  When the issue happens, this "qelem.next" contains 0x38, which does not make sense.

    To make sure that the memory area is not corrupted, I checked curCoreId in the task object table of this task.  This value is zero regardless of the issue being there or not.

     

  • I gained more info on the issue.

    At the begining of main function, readyQ in the task object of this task contains correct address, 14th index of _ti_sysbios_knl_Task_Module_State_0_readyQ__A.
    Our software initializes HW (clock,gpio,uart etc). Then, it changes the priorities of all the tasks (except housekeeping(i.e startup task) task which is the lowest priority task) to -1 by calling Task_setPri().   Thus, all the tasks with -1 priority are in inactiveQ.

    Then our software starts sysbiois by calling to tibios_start().  housekeeping task runs. then application related initialization is performed.  After this, I checked the readyQ in the task object of this task. readyQ contains the address of inactiveQ in task module state, which is correct.  Thus, at this point, it looks fine.

    Then priority of all other tasks are set back to the original priority by set_pri(). then, task_restore() is called which triggers task switch  to the highest priority task. 

    When the issue occurs,  the task function of this task is not called, and readyQ of the task object of this task has 0x38.  The issue easily becomes non-reproducible when I change the code for debugging purpose, and so this makes it hard for me to find the place where 0x38 is written to readyQ.

    There are total of 6 tasks in our software. This task is the 1st entry in the object table (ti_sysbios_knl_Task_Object__table__V).  Thus, when there's no issue,  this task is the 1st one that is executed as its priority is the highest, 14. When the issue occurs, this task is not executed, and so another task with priority 14  is executed first.

    I also checked whether the task is in inactiveQ or terminatedQ when the issue happens but both of them are empty.  I will check whether an hwi interrupt is related or not but it looks not likely.  Please kindly advise if you have any clue/info.

    [Code snippet]

    In Main function,
    // init HW
    initHW();

    // set priority to -1, except housekeeping task
    for (j=0; j < COUNT_OF(taskHandles); j++)
    {
    if (taskHandles[j] == STARTUP_TASK )
    continue;
    originalPriorities[j] = Task_getPri(taskHandles[j]);
    Task_setPri(taskHandles[j], -1);
    }

    // start the OS. This function never returns.
    tibios_start();

    In housekeeping task,

    xdc_UInt taskKey;

    // initi application
    initApplication();

    taskKey = Task_disable();

    // turn on all tasks.
    for (j=0; j < COUNT_OF(taskHandles); j++)
    {
    if (taskHandles[j] == STARTUP_TASK )
    continue;

    Task_setPri(taskHandles[j], originalPriorities[j]);
    }

    // re-enable task preemption
    Task_restore(taskKey);

  • Can you explain if this process is called over and over and its only after a while that you see the problem occur?

    Looks like something could be corrupting that pointer in the Task object.

    Could you try adding to your .cfg file:

    Task.objectCheckFlag = true;

    You could also install your own custom check function:

    * *.cfg
    * @p(code)
    * var Task = xdc.useModule('ti.sysbios.knl.Task');
    *
    * // Enable Task object data integrity check
    * Task.objectCheckFlag = true;
    *
    * // Install custom Task object check function
    * Task.objectCheckFxn = "&myCheckFunc";
    * @p


    By default we only check fields that do not change in the task object, but you might be able to customize it.

  • Thanks for the reply.  The issue is that the task function is not called so the task does not run at all.  As I mentioned in my previous post, I checked readyQ in the task object of this task, and it has the address of inactiveQ after the application related initialization so it looks fine.  Also, I checked readyQ in module state, and this has also the correct address of readyQ (i.e _ti_sysbios_knl_Task_Module_State_0_readyQ__A) so it looks fine since the readyQ in the task object is assigned to the address of 14th readyQ when the priority of the task is changed from -1 to 14.  hwi interrupt is disabled in task_setpri() so hwi interrupt is disabled.  

    I will try your suggestion. 

    Thanks!

  • objectCheckFlag is not available in the sysbios version (6.35.1.29) that I am using.

  • Is there possibility of HW failure (ex, RAM or flash)?   If so, could you let me know how to check it?

  • No, I'm quite sure that's not the case.  In a case like that, I would expect some sort of exception or bus error...something more catastrophic.
    This sounds more like some sort of race condition that is causing a corruption.

  • Ok, it's just that the issue occurs only on 2 boards.  If you have any clue/info on how the readyQ in the task object of this task (I've already provided the code snippet) gets wrong value (0x38), please kindly advise.

  • I have no idea why a value of 0x38 would be put there.  This is where you need a debugger setup a set a hardware probe point at the address and see when it gets written that value.

    Would it be possible to move to a later BIOS version?

  • Unfortunately, I have not been able to reproduce the issue using a debugger.  Yes, I can try a later BIOS version to find the root cause.  Which version do you recommend?   The only thing is that the issue may not be reproducible.  As I mentioned, just adding some code for debugging purpose makes the issue non-reproducible.  

  • I have not changed the sys biois version but I investigated it further.

    I found that just putting one line of asm("nop") makes the issue goes away.  I put it in the code that does not even get executed (only gets executed more than 5 seconds after the issue occurs).  This change causes 2 bytes shift in some .text section in Flash but RAM sections remain the same.

    Also, instead of putting asm("nop")  at that location, I put it in Task_Self() in Task.c.  This change makes the impact to .text section minimum as Task_self is placed  much closer to the end of .text section.  My application code text section remains the same in FLASH.  With this change, the issue goes away.  

    This 2 bytes shift does not affect code placement in Sector in Flash.  

    Please kindly advise if you have any info/clue.  

      

  • There is no known bug where a shift in code by 2 bytes would fix an issue.

  • Thanks for the reply.  I tried a later version of SYS/BIOS (6.35.6.56).  With that, the issues goes away.   Since the aforementioned 2 bytes shift in the flash also makes the issue go away,  I cannot say for sure what the root cause is.