Other Parts Discussed in Thread: MSP432E401Y, SIMPLELINK-CC32XX-SDK, SIMPLELINK-SDK-WIFI-PLUGIN, CC3135
This thread is to follow on from a previous thread here.
We are using the CC3135MOD in an IoT application the requires the CC3135MOD to go into AP mode and host a user settings webpage with login password. Then we need the CC3135MOD to communicate to other CC3135MOD devices in transceiver mode (raw socket) where sl_Send() and sl_Recv() functions are called repeatedly. This is because we have devised a very simple lightweight proprietary communication protocol that is supposed to be very low power and simple, without requiring us to set up a WiFi network.
The problems that we are facing are:
>> The Simplelink WiFi driver and CC3135MOD network processor constantly crashes with the following errors: SL_DEVICE_EVENT_FATAL_DEVICE_ABORT, SL_DEVICE_EVENT_FATAL_SYNC_LOSS, SL_DEVICE_EVENT_FATAL_NO_CMD_ACK and SL_DEVICE_EVENT_FATAL_DRIVER_ABORT
>> The Rx Filters have never worked. We need to Rx Filters to be set on each CC3135MOD device to specific frame header pattern so that we can all devices can filter out ‘normal’ WiFi traffic and only see our own proprietary communication packets.
The Simplelink driver or Network Processor always crashes randomly after 1-3 mins of operation. When the applications detect this, it will call sl_Stop(…), then do a hardware reset by pulling the RST pin low for 200mS then high for 200mS before calling sl_Start(…) again.
I am confident that this isn’t a power supply issue as the DC/DC converter is well and truly oversized with proper feed-through capacitor noise prevention. There are 2x ceramic 100nF and 2x Tantalum 100uF capacitors on the module power supply lines. The SPI lines are very short between the MSP432E401Y and CC3135MOD, and have a good ground plan above and beneath.
We have tried many service-pack and Simplelink versions all with the same results. The following are the issues that we have faced and how we are trying to deal with them. These issues tend to occur when in transceiver mode (raw socket) where sl_Send() and sl_Recv() functions are called repeatedly.
1) DMA Controller Race Condition Crashes. We are using SPI3 for the Simplelink driver to communicate with the CC3135MOD, and SPI2 to communicate with external memory. Randomly throughout program execution the DMA (Direct Memory Access) controller will go into fault and calls void dmaErrorFxn(uintptr_t arg), then the program will crash. I have read on other forum threads that the DMA race condition is a known issue and that a fix is to use the instructions here to copy the SIMPLELINK-CC32XX-SDK files over into the SIMPLELINK-SDK-WIFI-PLUGIN. However, this has not helped in our case. As we weren’t able to diagnose what was calling this issue, I instead rewrote the driver using the SPI2 to not use DMA, and instead use the DriverLib commands. That has worked well, as we only need to send small amounts of data through the SPI2 to external memory.
2) WiFi_Thread() and sl_Task() SPI driver clash. When analysing the SPI3 data lines for the Simplelink driver it can be seen that just before the Simplelink driver crashes the CS pin (FSS) is pulled high causing the CC3135MOD to ignore the SPI3 data coming in, resulting in a crash. From what I understand the sl_Task() thread is responsible for servicing asynchronous events, for when the CC3135MOD pulls the IQR pin high (.hostIRQPin = CC3135_IRQ_INTR,). What I am seeing is that when the WiFi_Thread() is running, which calls sl_Send() and sl_Recv() repeatedly, the sl_Task() will then jump in and service the asynchronous event of IQR pin going high.
In the cc_pal.c file (porting file), both of these threads are calling int spi_Write( Fd_t fd, unsigned char *pBuff, int len ) at the same time, resulting in a clash between them and the CS pin (FSS) is being set or cleared at the wrong time. There doesn’t appear to be any synchronisation between these two threads access to the SPI3 driver.
I always have the sl_Task() thread have the highest priority out of all of the threads, but it doesn’t seem to matter if sl_Task() higher, lower or the same priority as WiFi_Thread() the Simplelink driver still crashes on SL_DEVICE_EVENT_FATAL_SYNC_LOSS or SL_DEVICE_EVENT_FATAL_DRIVER_ABORT.
I have made the following code changes to block one thread from using spi_Write(...) whilst the other is using it. This seems to have stopped the driver from crashing for this reason. What is interesting is that I have included the WriteWait_ctr counter to see how many times this patch is needed, and after WiFi_Thread() has run for only a minute this counter is always >600.
uint32_t ReadAttempt_ctr = 0; uint32_t WriteAttempt_ctr = 0; uint32_t ReadWait_ctr = 0; uint32_t WriteWait_ctr = 0; uint8_t ReadInProgress_flg = 0; uint8_t WriteInProgress_flg = 0; int spi_Write( Fd_t fd, unsigned char *pBuff, int len ) { SPI_Transaction transact_details; int write_size = 0; int attempts; while(ReadInProgress_flg || WriteInProgress_flg) { usleep(100); WriteWait_ctr++; } WriteInProgress_flg = 1; GPIO_write(curDeviceConfiguration->csPin, 0); /* check if the link SPI has been initialized successfully */ if(fd < 0) { GPIO_write(curDeviceConfiguration->csPin, 1); WriteInProgress_flg = 0; return (-1); } transact_details.rxBuf = NULL; transact_details.arg = NULL; transact_details.txBuf = (void*) (pBuff); attempts = 100; while(len > 0) { if(len > curDeviceConfiguration->maxDMASize) { transact_details.count = curDeviceConfiguration->maxDMASize; } else { transact_details.count = len; } do { if(SPI_transfer( (SPI_Handle) fd, &transact_details ) ) { break; } ClockP_usleep(100); WriteAttempt_ctr++; } while (attempts-- > 0); if (attempts != 0) { write_size += transact_details.count; len = len - transact_details.count; transact_details.txBuf = ((unsigned char *) (transact_details.txBuf) + transact_details.count); } else { GPIO_write(curDeviceConfiguration->csPin, 1); return (-1); } } GPIO_write(curDeviceConfiguration->csPin, 1); WriteInProgress_flg = 0; return (write_size); }
3) Both WiFi_Thread() and sl_Task() block forever on semaphore. The Simplelink driver keeps locking up as both the WiFi_Thread() and sl_Task() block forever on semaphore and never return. There is no way for the application to know when this happens to it just freezes up. It seems that when the WiFi_Thread() calls the sl_Send() or sl_Recv() function it then pends on a semaphore, then waits for sl_Task() to receive an asynchronous event when the CC3135MOD sets the IQR pin high. But many times, this never happens as the CC3135MOD seems to crash or it keeps replying with the same 4 bytes and it’s not what the sl_Task() is looking for, so it keeps going back to pending on a semaphore.
The only workaround is to make the below code change to the _SlDrvSyncObjWaitForever(…) function in driver.c to have the WiFi_Thread() semaphore time out after 10 seconds, then set SL_SET_RESTART_REQUIRED. This doesn’t fix anything, but at least it forces the Simplelink driver to crash rather than lock up forever.
_SlReturnVal_t _SlDrvSyncObjWaitForever(_SlSyncObj_t *pSyncObj)
{
_SlReturnVal_t RetVal = sl_SyncObjWait(pSyncObj, 10000); // <-- SL_OS_WAIT_FOREVER Time out after 10sec so both WiFi threads cannot lock up.
if(RetVal < 0) { SL_SET_RESTART_REQUIRED; } // <--If semaphone has timed-out NWP has locked up!!! Above Returns: SemaphoreP_TIMEOUT = -1
/* if the wait is finished and we detect that restart is required (we in the middle of error handling),
than we should abort immediately from the current API command execution
*/
if (SL_IS_RESTART_REQUIRED)
{
return SL_API_ABORTED;
}
return RetVal;
}
4) sl_Task() randomly causes MCU crash on bad memory access. As can be seen in the below CallStack, the sl_Task() is causing the MCU to crash.
This is very tricky to diagnose as the Simplelink driver is not documented, but the crash seems to occur right after Drv_DeBug_ctr = 450; in the _SlDrvMsgRead(…) function in driver.c
/* if _SlDrvFindAndSetActiveObj returned an error, release the protection lock, and return. */ if(RetVal < 0) { Drv_DeBug_ctr = 448; SL_DRV_PROTECTION_OBJ_UNLOCK(); SL_DRV_LOCK_GLOBAL_UNLOCK(TRUE); return SL_API_ABORTED; } /* Verify data is waited on this socket. The pArgs should have been set by _SlDrvDataReadOp(). */ Drv_DeBug_ctr = 449; VERIFY_SOCKET_CB(NULL != ((_SlArgsData_t *)(g_pCB->ObjPool[g_pCB->FunctionParams.AsyncExt.ActionIndex].pData))->pArgs); // <-- AIH Drv_DeBug_ctr = 450; sl_Memcpy( ((_SlArgsData_t *)(g_pCB->ObjPool[g_pCB->FunctionParams.AsyncExt.ActionIndex].pRespArgs))->pArgs, &uBuf.TempBuf[4], RECV_ARGS_SIZE); Drv_DeBug_ctr = 451; if(ExpArgSize > (_u8)RECV_ARGS_SIZE) { Drv_DeBug_ctr = 452; NWP_IF_READ_CHECK(g_pCB->FD, ((_SlArgsData_t *)(g_pCB->ObjPool[g_pCB->FunctionParams.AsyncExt.ActionIndex].pRespArgs))->pArgs + RECV_ARGS_SIZE, ExpArgSize - RECV_ARGS_SIZE); } Drv_DeBug_ctr = 453;
I have no way of diagnosing or patching this issue other than a Watchdog timer recovery.
5) CC3135MOD Network Driver Randomly Stops Communicating. It seems that the Network Process crashes and stops communicating randomly. This happens randomly but usually after about only 2-3mins of operation. This can be clearly observed by monitoring the SPI3 coms lines as shown below.
What we tend to see sometime before the CC3135MOD crashes is that the MISO line starts to drift from low to high to low randomly. This makes me wonder if the CC3135MOD SPI port has a configuration problem.
6) The Rx Filters have never worked. We need the Rx Filters to be set on each CC3135MOD device to a specific frame header pattern so that we can all devices can filter out ‘normal’ WiFi traffic and only see our own proprietary communication packets. Even with a simple test of just trying to filter out all frames except Beacon Frames (0x80), still lets through all other frames.
Please advise us as to what we can try to improve the stability of the Simpelink driver and CC3135MOD. We believe that perhaps this hardware is not suitable for using transceiver mode (raw socket) where sl_Send() and sl_Recv() functions are called repeatedly.