When all tasks go to sleep, code jumps (crashes?) to FXN_F_selfLoop

BryanHughes

I'm running CCS 3.3 with DSP/BIOS 5.33.06 and code generation tools 5.2.2. I have a series of tasks that don't always run at the same time. When the right circumstances occur and all tasks go to sleep (priority -1), my code jumps to FXN_F_selfLoop, and all of my SWIs stop working (aka nothing can get it to leave that code). Any ideas what is causing this? If I take the TSK_setpri() calls out of one task that causes it to run all the time, and consequently one task is running all the time, then this behavior does not occur. Any ideas what is causing this?

over 15 years ago

0 BryanHughes over 15 years ago

Intellectual 570 points

Slight update...hardware interrupts stopped working too. The flags are still being set in IFR, and IER is still set properly, but the interrupt routine is never called. I've done a temporary workaround...instead of other tasks waking this task up, I set it to poll some variables that the other tasks (started by interrupts) use to wake up this thread. It works, but isn't ideal and I would like to properly fix this.

0 RandyP over 15 years ago in reply to BryanHughes

TI__Guru* 84110 points

It probably will not matter, but which specific C2000 device are you using?

Every application is different and how priorities are handled is different. Even though I do not fully understand how you are starting and managing your tasks, you do not seem to be doing anything that should cause the bad behavior that you get.

The BIOS scheduler should always be running, and it should be able to handle the absence of any active tasks. In fact, it is perfectly valid to build a BIOS project that only uses HWIs or only HWIs and SWIs.

To help us figure out a fix for your case, it would be helpful to understand a little more about your intended program flow. To start, the interaction between TSK/SWI/HWI and any IDL tasks might be useful.

0 BryanHughes over 15 years ago in reply to RandyP

Intellectual 570 points

I'm using the F2808. The project itself involves getting some F2808's connecting via SPI to do some parallel processing. I'm currently implementing a subset of MPI, which is when I noticed the behavior.

I have a total of 9 tasks (excluding idle) that basically all try and sleep whenever they aren't doing anything in order to increase performance (I found time slicing is very expensive). The tasks are assigned to two different priority levels, and time slice with other threads of the same priority.

FollowUpMonitor - this task basically provides PRD like functionality while providing a higher level management functionality. This task runs off of a PRD (the PRD itself raises this tasks priority from -1).

ProcessInboundMessages - this task processes any inbound messages and provides inbound buffer capabilities. This task is woken anytime a message arrives on the SPI (interrupt driven)

ProcessOutboundMessagesonPortx - this task processes any outbound messages on port x and provides outbound buffer capabilities (there are four total for the four SPI ports) and is woken up by any other task that needs to transmit something on the given port

MPISendService - in charge of coordinating MPI transmissions. This is woken up as a result of an MPI_Send() or MPI_ISend() call

MPIReceiveService - in charge of coordinating MPI receives. This is woken up whenever a data transfer specific to MPI is received by the ProcessInboundMessages task

UserService - this is the MPI "program" and is woken up once the system has finished initializing itself. This is the task that I set to repeatedly yield instead of sleeping to keep the system stable. Ideally this task would sleep whenever a blocking MPI send/receive is called.

I only use the idle task to pump RTDX info (aka I don't do anything custom with it)

I don't have any interrupts enabled except for the four SPI receive interrupts (fyi, I'm using the SPI FIFOs, and the interrupt is the FIFO level interrupt).

I have three other PRDs: one periodically performs some house keeping (handles transmission timeouts, etc), another refreshes a seven segment display, and the last drives a heartbeat LED

0 RandyP over 15 years ago in reply to BryanHughes

TI__Guru* 84110 points

My goal is to help you get your system working rather than trying to debug the specific issue you have run into. My recommendation would be to use semaphores to cause your tasks to "sleep" instead of using TSK_setpri. And the only reason is that this is how I have always seen task management done and the way we teach it in our workshops.

When a task gets to the point where it is ready to sleep, that means it needs new data or new conditions before there will be anything useful to do. At this point, you can add the line of code SEM_pend( &SEM_mySemaphore, SYS_FOREVER ); and the task will sleep until some other code sets the semaphore with the code SEM_post( &SEM_mySemaphore );

For example, whenever a message arrives on the SPI, the HWI would call SEM_post( &SEM_ProcessInboundMessages ); which would then allow ProcessInboundMessages to come out of a SEM_pend( &SEM_ProcessInboundMessages, SYS_FOREVER ) statement. When ProcessInboundMessages is finished processing on that message, then it will loop back to its SEM_pend line and wait for the next message.

I believe that any place where you are switching the priority back-and-forth, you can also use a SEM_pend/SEM_post pair and get the same functionality. I have never heard of anyone have the problems that you have doing it that way.

0 Brad Griffis over 15 years ago in reply to BryanHughes

TI__Guru*** 125430 points

BryanHughes said:

Slight update...hardware interrupts stopped working too. The flags are still being set in IFR, and IER is still set properly, but the interrupt routine is never called. I've done a temporary workaround...instead of other tasks waking this task up, I set it to poll some variables that the other tasks (started by interrupts) use to wake up this thread. It works, but isn't ideal and I would like to properly fix this.

Did your code clear GIE and not re-enabled it?

0 BryanHughes over 15 years ago in reply to Brad Griffis

Intellectual 570 points

I double checked and no, it's not an issue with GIE. I just enable it during initialization and then don't touch it...I only mess with PIE enables, and I haven't updated the interrupt code in months, and this behavior didn't appear until a few days ago.

I'm looking into using semaphores, but it will take a while to vet the changes. I'll get back to you when I have more info.

0 BryanHughes over 15 years ago in reply to RandyP

Intellectual 570 points

I've switched everything over to semaphores: TSK_sleep() and TSK_setpri() calls have been replaced with SEM_pend() calls. My code got a nice speed boost, but it still crashes.

I have had a new problem crop up: kernel view crashes CCS. Not sure if it's related, but at this point in time who knows. I created a new thread over at http://e2e.ti.com/forums/t/11312.aspx

0 RandyP over 15 years ago in reply to BryanHughes

TI__Guru* 84110 points

This is good news for TI because you are less likely to be worrying about a DSP/BIOS bug in the interrupt processing, which is what the "all tasks at priority -1" problem was pointing to.
This is good news for you since you get a speed boost. And it is a good datapoint for me to remember when considering code optimization techniques.
Did you do a search on "kernel object viewer" and "KOV" to see if there are any other reports on this? We had problems with KOV in CCS 2, but I thought it was working well in CCS 3. I have flagged your KOV posting to watch the progress there. I have not used it much so I have no useful opinion.
Even though changing to semaphores was a time-consuming effort that did not solve your problem, it may make it easier to debug. And at least you got the speed boost.
Since you are using a common system approach (i.e. one that I understand) with the SEM_pend/post calls, we are pretty much back at the beginning of trying to find why you are crashing.

BryanHughes said:

I double checked and no, it's not an issue with GIE. I just enable it during initialization and then don't touch it...I only mess with PIE enables,

This implies to me that you searched the code for changes to GIE and found none. Even though Brad specifically asked about "your code" affecting GIE, he was most likely looking for what the value is for GIE when the crash occurs.

Do you have CCS access when it crashes so you can inspect registers? It is implied that you do have this access since you said that IFR and IER are still behaving. If IFR is set and IER is set and GIE is set, then you should immediately branch to an interrupt routine.

BryanHughes said:

... and I haven't updated the interrupt code in months, and this behavior didn't appear until a few days ago.

What did change in your code "a few days ago"? That is usually where the problem is.

Can you add a slow PRD function that you can use to detect a heartbeat? When your code crashes, you could put a breakpoint inside the PRD function to see if you can get to it, and that will tell a little more about what is going on.

You could also set a breakpoint on FXN_F_selfLoop then look at the Call Stack to find out how it got there.

0 BryanHughes over 15 years ago in reply to RandyP

Intellectual 570 points

RandyP said:

This is good news for TI because you are less likely to be worrying about a DSP/BIOS bug in the interrupt processing, which is what the "all tasks at priority -1" problem was pointing to.

This is good news for you since you get a speed boost. And it is a good datapoint for me to remember when considering code optimization techniques.

Did you do a search on "kernel object viewer" and "KOV" to see if there are any other reports on this? We had problems with KOV in CCS 2, but I thought it was working well in CCS 3. I have flagged your KOV posting to watch the progress there. I have not used it much so I have no useful opinion.

Even though changing to semaphores was a time-consuming effort that did not solve your problem, it may make it easier to debug. And at least you got the speed boost.

Since you are using a common system approach (i.e. one that I understand) with the SEM_pend/post calls, we are pretty much back at the beginning of trying to find why you are crashing.

Yes, I was quite pleased with the increase. Note: my project is very data driven with a variety of timers...I suspect that the speed increase from the semaphores also had an affect on the timers (things got in on the clock before), or possibly tasks contending for resources were able to get them faster due to more effecient time slicing...or something to that effect. I am glad I made the switch.

RandyP said:

This implies to me that you searched the code for changes to GIE and found none. Even though Brad specifically asked about "your code" affecting GIE, he was most likely looking for what the value is for GIE when the crash occurs.

Do you have CCS access when it crashes so you can inspect registers? It is implied that you do have this access since you said that IFR and IER are still behaving. If IFR is set and IER is set and GIE is set, then you should immediately branch to an interrupt routine.

Yes I can inspect registers. Technical question, when we say "GIE", are we really talking about INTM in the Status 1 register? Or am I missing something? This is interesting, INTM is set to 1 after the crash (I verified it's set to 0 before the crash). I never set the INTM flag at all in my code. I guess that explains why it gets stuck...now to just figure out why.

RandyP said:

What did change in your code "a few days ago"? That is usually where the problem is.

Can you add a slow PRD function that you can use to detect a heartbeat? When your code crashes, you could put a breakpoint inside the PRD function to see if you can get to it, and that will tell a little more about what is going on.

You could also set a breakpoint on FXN_F_selfLoop then look at the Call Stack to find out how it got there.

The code I changed is a bit of a problem: the specific code I changed when I first noticed the behavior was I added a parallel implementation of convolution that calls some of the other MPI functions, but other than that does nothing with semaphores or other task related activities. The MPI functions were tested independently first and didn't cause this behavior, but when brought together they now cause problems. Other things I've done recently: until about two weeks ago I had most of my code loaded into FLASH, but there were some odds and ends still being loaded directly into RAM, but I've since switched everything to run from FLASH. It tested out OK at the time. I also added the MPI routines, including the two MPI tasks.

When I first started designing this system about two years ago, one of the first things I did was add a heartbeat LED tied to an SWI that goes off twice a second. This was actually how I first noticed something was wrong when the heartbeat LED stopped flashing (I verified with breakpoints that the SWI is not triggering after the crash).

I hadn't thought about setting a breakpoint in FXN_F_selfLoop (good idea). Unfortunately the call stack only shows FXN_F_selfLoop()

0 Brad Griffis over 15 years ago in reply to BryanHughes

TI__Guru*** 125430 points

Sorry, INTM is the correct bit. It's the same purpose though opposite polarity of what we call "GIE" in some of our other devices. When an interrupt occurs INTM will automatically be set to 1 in the hardware (so that interrupts won't nest automatically) as you execute your ISR. If you are executing code in your ISR that goes awry then perhaps you would end up in this state. ? Is your convolution code that you changed executed from an ISR?

Are you making any BIOS function calls from your ISR, e.g. SEM_post? If so, you must use "the dispatcher".

0 BryanHughes over 15 years ago in reply to Brad Griffis

Intellectual 570 points

My interrupt code is pretty brief. I do call SEM_post() and SEM_count() and I am using the dispatcher (e.g. bios.HWI.instance("HWI_INT6").useDispatcher = 1;). Below is my interrupt code. If it would be helpful, I can attach my project code, although it is kinda large (~7500 lines).

void PortARXInterrupt()
{
    Uint16 i, flit[NUM_CHARACTERS_IN_FLIT];

    // Buffer the flit
    for(i = 0; i < NUM_CHARACTERS_IN_FLIT; i++)
        flit[i] = (spiaRegisters.SPIRXBUF & 0x0FFF);

    if(flit[0] < 0x0F00)
    {
        // Add the port to the left-most 4 bits of the first flit characer
        flit[0] += PORTA << 12;

        // Enque the flit if there is room and wake up the processing thread
        Uint16 i;

        // If there is room, que the flit
        if(!globals.processing.inboundFlitQueFull)
        {
            // Copy the flit to the que
            for(i = 0; i < NUM_CHARACTERS_IN_FLIT; i++)
                globals.processing.inboundFlitQue[globals.processing.inboundFlitQueHead][i] = flit[i];

            // Check if the buffer is full
            if(globals.processing.inboundFlitQueHead == globals.processing.inboundFlitQueTail - 1 ||
                (globals.processing.inboundFlitQueTail == 0 && globals.processing.inboundFlitQueHead == INBOUND_FLIT_BUFFER_SIZE - 1))
            {
                // Assert all MARB pins to prevent anything else from being transmitted
                gpioCtrlRegisters.GPADIR.bit.MARBA = 1;
                gpioDataRegisters.GPADAT.bit.MARBA = 1;
                gpioCtrlRegisters.GPADIR.bit.MARBB = 1;
                gpioDataRegisters.GPADAT.bit.MARBB = 1;
                gpioCtrlRegisters.GPADIR.bit.MARBC = 1;
                gpioDataRegisters.GPADAT.bit.MARBC = 1;
                gpioCtrlRegisters.GPADIR.bit.MARBD = 1;
                gpioDataRegisters.GPADAT.bit.MARBD = 1;

                // Set the inbout flit que full flag
                globals.processing.inboundFlitQueFull = true;

                // Update the statistics
                globals.statistics.inboundQueFullCount++;

                // Raise an error
                LogWarning(__FILE__,__LINE__);
                return ERROR;
            }

            // Inrement the index
            globals.processing.inboundFlitQueHead++;
            if(globals.processing.inboundFlitQueHead == INBOUND_FLIT_BUFFER_SIZE)
                globals.processing.inboundFlitQueHead = 0;

            // Update the statistics
            globals.statistics.currentInboundQueLoad++;
            if(globals.statistics.currentInboundQueLoad > globals.statistics.maxInboundQueLoad)
                globals.statistics.maxInboundQueLoad = globals.statistics.currentInboundQueLoad;

            // Wake up the processing thread
            if(SEM_count(&ProcessInboundFlitsSem) == 0)
                SEM_post(&ProcessInboundFlitsSem);

            return SUCCESS;
        }
        else
        {
            // Wake up the processing thread
            if(SEM_count(&ProcessInboundFlitsSem) == 0)
                SEM_post(&ProcessInboundFlitsSem);

            // Set the error
            LogWarning(__FILE__,__LINE__);
            return ERROR;
        }
    }

    // Reset the SPI to prevent things from getting shifted around
    spidRegisters.SPICCR.bit.SPISWRESET = 0;
    spidRegisters.SPICCR.bit.SPISWRESET = 1;

    // Acknowledge the interrupt and re-enable it
    pieRegisters.PIEACK.bit.ACK6 = 1;
    spiaRegisters.SPIFFRX.bit.RXFFINTCLR = 1;
}

void LogError(char* file, Uint16 line)
{
    // Update the statistics
    globals.statistics.numErrors++;

#if PRINT_ERRORS == true
    LOG_printf(&LOG_error,"Error: %s at line %d.",file,line);
#endif
}

Note: PRINTER_ERRORS is currently false and NUM_CHARACTERS_IN_FLIT is 13, so the for loops are pretty short.

0 BryanHughes over 15 years ago in reply to BryanHughes

Intellectual 570 points

I fixed it. Turns out it was a stack overflow in the MPI tasks. After increasing the stack sizes it works fine now. Why the behavior manifested itself the way it did is still a mystery, but it does work now, and I can successfully open the KOV (if only it hadn't been crashing before...).

Thank you all for your help, you two have been invaluable.

C2000™︎ microcontrollers

C2000 microcontrollers forum

When all tasks go to sleep, code jumps (crashes?) to FXN_F_selfLoop