This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

GateMutex badContext error

Other Parts Discussed in Thread: SYSBIOS

I'm looking for an explanation for why I see the GateMutex_A_badContext error and info on ways to get rid of it. I'm using both cores of an F28M35H52C1 Concerto part, BIOS version 6.34.04.22, and CCS version 5.3.0.00090. The M3 core has 12 HWIs and the C28 core has 13 HWIs. I get the error when I add a task to the M3 that uses MessageQ to repeatedly send a message to the C28. In the C28, code execution never gets past MessageQ_get() and eventually it hits the abort() function and prints the following error to the console:

ti.sysbios.gates.GateMutex: line 97: assertion failure: A_badContext: bad calling context. See GateMutex API doc for details.

xdc.runtime.Error.raise: terminating execution

From looking at the API documentation for GateMutex, I found that using SysMin instead of SysStd and setting BIOS.rtsGateType to BIOS.GateHwi eliminates the error. This worked for my project and allows my code to run fine. However, in this project real-time response to interrupts is important, which makes me concerned about using BIOS.GateHwi since the BIOS API documentation states:
GateHwi — Interrupts are disabled and restored to maintain re-entrancy. This is a very efficient lock but will also result in unbounded interrupt latency times. If real-time response to interrupts is important, you should not use this gate to lock the RTS library.

I stumbled upon other workarounds for this error that I can't explain:

1. Getting rid of any one of the Hwi.creates in the C28's .cfg. It doesn't matter which one.

2. Adding another task to the C28's .cfg. It can be a task w/ no code in it.

Both of these get rid of the GateMutex error and allow the program to run fine, but I would like to know why. Why do I see the error in the first place and why do these workarounds get rid of it?

Thanks,

Nick

  • Hi Nick,

    The "A_badContext" assertion failure occurs if GateMutex_enter() is called from a Hwi or Swi. Can you confirm if any of your Hwi's or Swi's are calling GateMutex_enter() ?

    There is one other possibility. The rtsGateType is GateMutex by default which means that the run time support library APIs that use the rtsGate, can only be called from main() or a Task context. If Hwi's or Swi's are making calls to run time support APIs, it can result in this assertion failure. This is most likely what you are seeing since the assertion failure does not occur once you change the rtsGateType. Are there any RTS library calls in any of your Hwi's or Swi's ?

    I am not sure how the workarounds you listed are helping. My guess is that one of the above reasons is causing the assertion failure in your project.

    Best,

    Ashish

  • Hi Ashish,

    Thanks for your response. None of my Hwi's or Swi's are calling GateMutex_enter(). I don't believe any of my Hwi's or Swi's have RTS calls. The only TI calls I found in the Hwi's were to Timestamp_get32(). There are calls to functions from IQmathLib.h in the Swi's. But I don't think those are RTS calls though, right? I also noticed that decreasing Program.stack in the C28 .cfg gets rid of the error and lets my program run. Maybe it is a memory issue?

    Thanks,

    Nick

  • Hi Nick,

    Timerstamp_get32() should be safe to call and I think you are right about IQMAthLib calls. I am surprised that reducing the Program stack gets rid of the error. Usually its the other way round.

    Anyways, what you can try is go back to your old code so you are able to repro the problem and then add a breakpoint at xdc_runtime_Assert_raise__I function. You put this function name in the disassembler window search. You should hit this breakpoint when the assertion failure condition is met. At that point you can use ROV to determine the currentThreadType:

    At this point you can also look at the stack back trace. I think the call trace along with the thread type should provide some valuable info that we can use to find the root cause.

    Best,

    Ashish

  • Hi Ashish,

    Ok, I got it to break on xdc_runtime_Assert_raise__I. The currentThreadType is blank.

    Here is the call trace:

    I notice the RtsGateProxy_enter call, but nothing is under it. What does that mean?

    Thanks,

    Nick

  • Sorry, I didn't add the images correctly. Hopefully this works:

    Ok, I got it to break on xdc_runtime_Assert_raise__I. The currentThreadType is blank.

    Here is the call trace:

    I notice the RtsGateProxy_enter call, but nothing is under it. What does that mean?

    Thanks,

    Nick

  • Hi Ashish,

    After increasing the sizes of some of my tasks, I broke on xdc_runtime_Assert_raise__I and the currentThreadType shows Swi:


    Does this mean a Swi is making an RTS call?

    Nick

  • I removed the Swi's from my project's .cfg file and my program runs a little longer than it did before. It still asserts the GateMutex error though and the currentThreadType in ROV shows Swi. The only Swi is the ti_sysbios_knl_Clock_workFunc__E:

    Nick

  • Hi Nick,

    Yes, it is suppose to mean a Swi is calling a library function that is trying to acquire the RTS lock.

    Unfortunately the stack trace is not deep enough so we dont know who called RtsLock. Can you add a breakpoint at ti_sysbios_BIOS_rtsLock__I ? I know it is a bit painful if you have multiple library calls, but once you hit the final call before the assert breakpoint is hit, you can determine the caller. Please note the name of the caller function each time you hit this breakpoint.

    Best,

    Ashish

  • Nick,

    Clock_workFunc__E is safe and cannot cause the assertion failure as it never calls rtsLock. Can you try adding a breakpoint at rtsLock like I suggested in my previous post ?

    Best,

    Ashish

  • Hi Ashish,

    I put a breakpoint in rtsLock() in BIOS.c. It looks like it's called from fputc() in fputc.c. The currentThreadType shows Swi. When I resume, I see the GateMutex error, so it only hits this breakpoint once.

    Thanks,

    Nick

  • Ashish,

    I found something interesting when I looked at the Task module in ROV when the Assert breakpoint is hit. The highest priority Ready task shows a stackSize of 0. It is created with a stackSize of 350 and runs several thousand times before the breakpoint is hit. Here are 2 screenshots of this from 2 different trials:

    Nick

  • I set a breakpoint in rtsLock() in BIOS.c and it looks like it's called from fputc() in fputc.c. When I hit Resume, the GateMutex error shows and the core is in the abort() function.

    I only see it hit this breakpoint once.

    Nick

  • Ashish,

    I added the Swi's back that I removed yesterday and now the project has been running fine for several minutes. Earlier it would only run a few seconds. I don't quite understand it because I'm pretty sure it is the same code I had yesterday that was giving me the GateMutex error. I've run it several different times and cannot reproduce the error. Earlier, I was experimenting with .taskStack in the .cmd file, putting it in shared memory, but the error still occurred. I commented out .taskStack from the .cmd (it wasn't in the .cmd at all originally) and added back my Swi's and the program runs fine. Maybe memory had gotten corrupted somehow before, and something I did fixed it, I don't know.

    Nick

  • Hi Nick,

    The Task ROV view does look a bit worrying. Some sort of memory corruption might be happening. When you see a red box showing up in ROV, if you hover the mouse over it, you will see the error message. The message might be helpful.

    I also wanted to point out that if fputc is calling RTSlock then it is possible that a System_printf() or printf() was called.

    Can you try cleaning your project and rebuilding ? That might help repro the problem. I think we need to figure out the cause so this problem does not recur.

    Best,

    Ashish

  • Nick,

    Are you disabling stack checking for the tasks in the *.cfg file ? I am wondering if the task stack is overflowing and it is not being caught. Can you also check the Hwi ROV view to verify the Hwi stack looks ok.

    Best,

    Ashish

  • Ashish,

    I verified that I'm not disabling stack checking for tasks. I did a clean and build and the project still runs fine. To try to reproduce the problem, I took out the Swi's like I did yesterday. This time I see the GateMutex error after a few seconds and I see the task stackSize of 0 in ROV. The error when I hover over it is:

    Error: Problem fetching Task stack: Error: fetchArray called with length 0.

    The Hwi ROV view looks ok. The stack peak is well below the stack size:

    Thanks,

    Nick

  • Hi Nick,

    When you remove all the Swi's from your app, you still have the Clock Swi function which calls the clock handler function. When the clock handler gets called, it is therefore called within Swi Context. Is there a System_printf() call within the clock handler function ? Can you share your clock handler function ?

    Looking at the Task Module ROV screenshots you posted earlier it looks like some sort of memory corruption is occurring. I am not sure how that is connected with the GateMutex error, but I feel it might be causing it.

    Best,

    Ashish

  • Hi Ashish,

    I haven't touched the Clock Swi function. This is it:

    Void Clock_workFunc(UArg arg0, UArg arg1)
    {
        Queue_Elem  *elem;
        UInt hwiKey, count;
        UInt32 time, compare;
        Clock_Object *obj;
        Queue_Handle clockQ;

        hwiKey = Hwi_disable();
        time = Clock_module->ticks;
        count = Clock_module->swiCount;
        Clock_module->swiCount = 0;
        Hwi_restore(hwiKey);

        /* Log when count > 1, meaning Clock_swi is delayed */
        if (count > 1) {
            Log_write1(Clock_LW_delayed, (UArg)count);
        }

        compare = time - count;

        /*
         * Here count can be zero. When Clock_tick() runs it increments
         * swiCount and posts the Clock_workFunc. In Clock_workFunc we
         * get the value of swiCount atomically. Before we read swiCount, an
         * interrupt could occur, Clock_tick() will post the swi again.
         * That post is unnecessary as we are getting ready to process that
         * tick. The next time this swi runs the count will be zero.
         */

        while (count) {

            compare = compare + 1;
            count = count - 1;

            /* Traverse clock queue */

            clockQ = Clock_Module_State_clockQ();
            elem = Queue_head(clockQ);

            while (elem != (Queue_Elem *)(clockQ)) {
                obj = (Clock_Object *)elem;
                elem = Queue_next(elem);
                /* if event has timed out */
                if ((obj->active == TRUE) && (obj->currTimeout == compare)) {

                    if (obj->period == 0) { /* oneshot? */
                        /* mark object idle */
                        obj->active = FALSE;
                    }
                    else {                  /* periodic */
                        /* refresh timeout */
                        obj->currTimeout += obj->period;
                    }

                    Log_write2(Clock_LM_begin, (UArg)obj, (UArg)obj->fxn);

                    /* call handler */
                    obj->fxn(obj->arg);
                 }
             }
         }
    }

    I have begun to think this may be a CCS or emulator issue. I can reproduce the problem by terminating my Debug session in the Debug perspective, then launching my ccxml file again and loading the same code. The error will occur. If I then comment-out the creation of the Swi's and the instances of Swi_post() from my app, build and load it, the code will run for a few seconds and then it will assert the GateMutex error. Then if I undo those changes by putting the Swi's and Swi_post()'s back in my app, build, load, and run, the code runs fine with no error. I can load the code again at this point with no changes and it will run fine. But if I terminate the Debug session, launch the same ccxml, and load the same code again, the error will occur.

    Thanks,

    Nick

  • A very fruitful debug session today.

    Memory_alloc() is raising an Error within the C28 because there are no more HeapBuf blocks available to provide to MessageQ_alloc() within TransportCirc_swiFxn().

    The Error_raise() code is trying to print the error to the console which results in the GateMutex_enter() Assert...

    It appears that the C28 is being interrupted so often that it is not able to return to the background thread long enough to switch to the task that is dedicated to servicing the incoming messages, which leads to the HeapBuf exhaustion.

    The condition could be caused by the M3 core for some reason beginning to issue messages too often, or it could be that the C28 is not properly servicing the interrupt properly, leaving it always pending.

    We'll do some research at our end and provide more advice ASAP.

    Alan

  • Please send us the M3 and 28x CCS projects we were using today that exhibit the problem with a single Concerto device running.

    Alan

  • Alan,

    I tried sending you a zip file of the code this morning using my company's file upload service (since we cannot email zips). It should appear in your inbox as from "files@sendthisfile.com". If you didn't get it, let me know and I will try something else. Or you could contact Lenio. I sent him the code for all 6 projects last week, the projects from our meeting are Rectifier_C28x and Rectifier_M3.

    As far as the code, I tried slowing down the number of messages being sent to the C28. Originally, the M3 was sending 2 messages per ms. I changed it to 1 per 100ms and the error still occurs.

    Nick

  • I received your email. Thanks for the info regarding slowing the message rate.

    Alan