Hello again,
I've been getting great support from this community and hope you can help once again.
I am using the production silicon part on the launchpad rev 4.1. Service packs have been applied.
I think I may have found a condition which can cause either a deadlock or a CPU fault. The problem has to do with the asynchronous function SimpleLinkHttpServerCallback and making other SimpleLink API calls (eg sl_WlanRxStatGet).
What I am trying to do is periodically get the Rx Stats while also running the web server. There is a task in the background that every 3 seconds executes the sl_WlanRxStatGet() call. Additionally, I have some webpages that every so often update some data. It runs great for a while, but then dies after a random amount of time. I aggravated the issue by requesting web pages with user defined tokens very quickly and it would last only seconds before having a fault or deadlocking. The issue occurs when pages are requested about every 2 seconds, I just sped it up to make it happen faster.
I think what is happening, is the asynchronous calls for the HTTP user tokens and the API calls are interleaving. In the ideal case, the calls would go like this:
Request HTTP Token
Response HTTP Token
Request Rx Stats
Response Rx Stats
I think what is happening when the faults occur, is this:
Request Rx Stats
Request HTTP Token
Response HTTP Token
<Deadlock waiting on a semaphore>
I tried to run my program initially with FreeRTOS and then switched it over to TI-RTOS for better thread diagnostics. Both OS's produced the same results. While running the TI-RTOS, I caused the condition and then paused the debugger to investigate. Using the RTOS tools built in to CCS, I looked at the threads. The thread that makes the calls to the API for the Rx Stats (measurementsTask) has the following status:
0x20018140,,0x1,1,binary,0,Fxn: measurementsTask, priority: 1, pendState: Waiting forever 0x20018b98,,1,Blocked,measurementsTask,0x0,0x0,984,2048,0x20018be8,n/a,n/a,Semaphore: 0x20018140
I looked in to the call stack of this thread and it produced this:
0x20018b98, Fxn: measurementsTask, Task Mode: Blocked, 0 ti_sysbios_knl_Task_schedule__I() at Task.c:124,PC = 0x2000D8DB FP = 0x20019288 1 ti_sysbios_knl_Task_restore__E(unsigned int) at Task.c:324,PC = 0x20011E4E FP = 0x200192A8 2 ti_sysbios_knl_Semaphore_pend__E(struct ti_sysbios_knl_Semaphore_Object *, unsigned int) at Semaphore.c:191,PC = 0x200092D0 FP = 0x200192B0 3 osi_SyncObjWait(void * *, unsigned int) at osi_tirtos.c:250,PC = 0x20007812 FP = 0x20019318 4 _SlDrvMsgReadCmdCtx() at driver.c:1031,PC = 0x2000DDEE FP = 0x20019328 5 _SlDrvCmdOp(struct <unnamed> *, void *, struct <unnamed> *) at driver.c:251,PC = 0x2000F21E FP = 0x20019340 6 sl_WlanRxStatGet(struct <unnamed> *, unsigned long) at wlan.c:779,PC = 0x20011824 FP = 0x20019360 7 measurementsTask(void *) at sensors.c:143,PC = 0x200074C4 FP = 0x20019370 8 ti_sysbios_knl_Task_exit__E() at Task.c:435,PC = 0x2000DB50 FP = 0x200193D0
From what I can tell, a call was made by sl_WlanRxStatGet(), but before the response could come back, the NWP initiated a call to translate an HTTP token to a value.
After playing around a little more, I stopped seeing the above issue and instead started getting faults. When running the FreeRTOS, I would end up in the FaultISR() function. When using the TI-RTOS, I was able to get a dump of the current CPU status and it shed some light on the issue. Using TI-RTOS, I got this debug message:
ti.sysbios.family.arm.m3.Hwi: line 1036: E_hardFault: FORCED ti.sysbios.family.arm.m3.Hwi: line 1113: E_busFault: IMPRECISERR: Delayed Bus Fault, exact addr unknown, address: e000ed38 Exception occurred in background thread at PC = 0x2000d6f0. Core 0: Exception occurred in ThreadType_Task. Task name: {unknown-instance-name}, handle: 0x20017090. Task stack base: 0x200170e0. Task stack size: 0x800. R0 = 0x44022130 R8 = 0x20202024 R1 = 0x00000011 R9 = 0xffffffff R2 = 0xffffffff R10 = 0xffffffff R3 = 0x00000000 R11 = 0xffffffff R4 = 0x00000048 R12 = 0x20020cc8 R5 = 0x00000048 SP(R13) = 0x20017850 R6 = 0x00000048 LR(R14) = 0x2000d6e1 R7 = 0x44022000 PC(R15) = 0x2000d6f0 PSR = 0x21000000 ICSR = 0x0400f803 MMFSR = 0x00 BFSR = 0x04 UFSR = 0x0000 HFSR = 0x40000000 DFSR = 0x0000000a MMAR = 0xe000ed34 BFAR = 0xe000ed38 AFSR = 0x00000000 Terminating execution...
After looking in to the PC address, I found that the fault is occurring in simplelink/cc_pal.c line 179. In the assembly, it is running a load instruction to read the value located at the address stored in R0 (0x44022130). This address is the GSPI peripheral and offset 0x130 is the Channel Status Register. So for some reason, the CPU is hitting a fault when trying to access the GSPI peripheral. However, the module can work for a long time before this fault. The fault only occurs if I request pages from the CC3200 containing user defined tokens very quickly.
A simple modification can be made to either the out-of-box demo or the email demo to demonstrate this issue. The out-of-box demo is based on the FreeRTOS OS and the email demo is based on the TI-RTOS.
To modify the out-of-box demo:
Change the OOBTask routine to match this:
static void OOBTask(void *pvParameters) { long lRetVal = -1; SlGetRxStatResponse_t rxStat; //Read Device Mode Configuration ReadDeviceConfiguration(); //Connect to Network lRetVal = ConnectToNetwork(); if(lRetVal < 0) { ERR_PRINT(lRetVal); LOOP_FOREVER(); } //Handle Async Events while(1) { //LED Actions if(g_ucLEDStatus == LED_ON) { GPIO_IF_LedOn(MCU_RED_LED_GPIO); osi_Sleep(500); } if(g_ucLEDStatus == LED_OFF) { GPIO_IF_LedOff(MCU_RED_LED_GPIO); osi_Sleep(500); } if(g_ucLEDStatus==LED_BLINK) { GPIO_IF_LedOn(MCU_RED_LED_GPIO); osi_Sleep(500); GPIO_IF_LedOff(MCU_RED_LED_GPIO); osi_Sleep(500); } osi_Sleep(3000); sl_WlanRxStatGet(&rxStat,0); Report("RSSI: Data+Ctrl %d dBi\t\tMgMNT %d dBi\r\n", rxStat.AvarageDataCtrlRssi, rxStat.AvarageMgMntRssi); } }
Fill in the AP details and remove the forceAP jumper so the board can connect to a WiFi networks accessible from a Linux computer.
On a Linux computer, run this from the command line to send a lot of requests to the launchpad
while true; do wget -q -O - http://<ip_address>/param_online.html; echo; done
After a few seconds, the board will lock up, requests will time-out and if you pause the debugger you will be in the FaultISR() routine. If you change the out-of-box demo to the TI-RTOS, you can diagnose it much easier.
I have tried doing everything mentioned above and commenting out the sl_WlanRxStatGet() line and there are no issues. The issue is only present when I am reading the Rx Stats.
In summary, am I doing something wrong? Is there something I should be doing to prevent this from happening? Is this a bug with the SimpleLink driver? I am open to questions and to test different theories.