Hi all,
We went through a lot of trouble lately to get the BLE stack a bit more stable and have made some progress. Not there yet unfortunately, as explained below.
Setup
- SDK 2.30.00.34 (12 Oct release)
- Simple_Peripheral example from this release
- Three bug fixes applied to the example, one change
- Three Android phones
Procedure
- Connect all phones (e.g. with NRF Connect app). Keep disconnecting and reconnecting the phones, for at least 30 minutes. For some reason, sometimes the issue occurs very fast, and sometimes it takes a long time. Often we use up to 7 phones.
Result
- Application stops advertising (no crash), and does not resume advertising ever.
Applied changes to simple_peripheral
We have applied the below bug fixes and changes to the simple_peripheral project. Without the bug fixes, the application crashes very fast if connection parameter updates are send (as any Android phone seems to do at least 3 times).
Fix 1 (we stop the clock and free it, like as is happening in the removeConn function as well):
/********************************************************************* * @fn SimplePeripheral_processParamUpdate * * @brief Process a parameters update request * * @return None */ static void SimplePeripheral_processParamUpdate(uint16_t connHandle) { gapUpdateLinkParamReq_t req; uint8_t connIndex; req.connectionHandle = connHandle; req.connLatency = DEFAULT_DESIRED_SLAVE_LATENCY; req.connTimeout = DEFAULT_DESIRED_CONN_TIMEOUT; req.intervalMin = DEFAULT_DESIRED_MIN_CONN_INTERVAL; req.intervalMax = DEFAULT_DESIRED_MAX_CONN_INTERVAL; connIndex = SimplePeripheral_getConnIndex(connHandle); // SIMPLEPERIPHERAL_ASSERT(connIndex < MAX_NUM_BLE_CONNS); if (connIndex < MAX_NUM_BLE_CONNS){ if (connList[connIndex].pUpdateClock != NULL) { // Stop and destruct the RTOS clock if it's still alive if (Util_isActive(connList[connIndex].pUpdateClock)) { Util_stopClock(connList[connIndex].pUpdateClock); }// Deconstruct the clock object Clock_destruct(connList[connIndex].pUpdateClock); // Free clock struct ICall_free(connList[connIndex].pUpdateClock); connList[connIndex].pUpdateClock = NULL; // Free ParamUpdateEventData ICall_free(connList[connIndex].pParamUpdateEventData); } // Send parameter update bStatus_t status = GAP_UpdateLinkParamReq(&req); // If there is an ongoing update, queue this for when the udpate completes if (status == bleAlreadyInRequestedMode) { spConnHandleEntry_t *connHandleEntry = ICall_malloc(sizeof(spConnHandleEntry_t)); if (connHandleEntry) { connHandleEntry->connHandle = connHandle; List_put(¶mUpdateList, (List_Elem *)connHandleEntry); } } }
else
{
Display_printf(dispHandle, SP_ROW_STATUS_1, 0, ANSI_COLOR_RED"Not Matched Handle"ANSI_COLOR_RESET);
}
}
Fix 2 (change line 1296 to set connHandleEntry to NULL)
if (connHandleEntry != NULL) {ICall_free(connHandleEntry); connHandleEntry = NULL;}
Fix 3 (remove ampersand)
We removed the ampersand (see fix 1 code) as discussed in: e2e.ti.com/.../2714998
Change 1
We changed the DEFAULT_ADDRESS_MODE to ADDRMODE_PUBLIC
What's next?
After the issue occured (advertising stopped), we tried, as a workaround, to disable advertising and enable it again with a timer. We see the callback to SimplePeripheral_processAdvEvent when we issue the disable command, but when we issue the enable command, the stack doesn't send a callback to SimplePeripheral_processAdvEvent (note: we are not using 8 phones, so the devices should advertise)
static void SimplePeripheral_performPeriodicTaskAdvRestart(void) { GapAdv_disable(advHandleLegacy, GAP_ADV_ENABLE_OPTIONS_USE_MAX , 0); GapAdv_disable(advHandleLongRange, GAP_ADV_ENABLE_OPTIONS_USE_MAX , 0); GapAdv_enable(advHandleLegacy, GAP_ADV_ENABLE_OPTIONS_USE_MAX , 0); GapAdv_enable(advHandleLongRange, GAP_ADV_ENABLE_OPTIONS_USE_MAX , 0); }
What causes the stack to stop advertising? Is there any workaround to fix this issue? We can't launch our product Beta, as within a few hours, people cannot connect anymore.