Other Parts Discussed in Thread: Z-STACK,
I've suspected this for a while however only just built the code to confirm it.
The problem:
Over a period of consecutive usage messages (nwkDB_t) build up in a state of sent (NWK_DATABUF_SENT) leading to an eventual out of memory situation.
Testing for this issue:
We built the following MT command to dump the state of the buffered packets.
static uint8 nwkAreqSend(nwkDB_t* db, void* _mf ){ nwk_outgoing_result_t* mf = (nwk_outgoing_result_t*)_mf; uint8 buf[9]; if(!(mf->flags & (1 << db->state))) return 0; if(!db->pDataReq) return 0; if(mf->dstAddr != 0xfffe && (db->pDataReq->DstAddr.addrMode != Addr16Bit || db->pDataReq->DstAddr.addr.shortAddr != mf->dstAddr)) return 0; mf->count ++; mf->limit --; buf[0] = LO_UINT16( db->pDataReq->DstAddr.addr.shortAddr ); buf[1] = HI_UINT16( db->pDataReq->DstAddr.addr.shortAddr ); buf[2] = db->state; buf[3] = db->retries; buf[4] = db->apsRetries; buf[5] = db->lastCnfStatus; buf[6] = LO_UINT16( db->macSrcAddr ); buf[7] = HI_UINT16( db->macSrcAddr ); buf[8] = db->nsduHandle; MT_BuildAndSendZToolResponse(((uint8)MT_RPC_CMD_AREQ | (uint8)MT_RPC_SYS_NWK), MT_NWK_QUEUED_ENTRY, sizeof(buf), buf ); return !mf->limit; } static void MT_NwkQueuedDump(uint8 *pBuf) { uint8 cmdId; uint8 retValue[4]; nwk_outgoing_result_t hd; /* parse header */ cmdId = pBuf[MT_RPC_POS_CMD1]; pBuf += MT_RPC_FRAME_HDR_SZ; /* read network address */ hd.dstAddr = osal_build_uint16( pBuf ); retValue[0] = pBuf[0]; retValue[1] = pBuf[1]; pBuf += sizeof( hd.dstAddr ); /* read type */ hd.flags = pBuf[0]; retValue[2] = pBuf[0]; pBuf += sizeof( hd.flags ); /* initialize count */ hd.count = 0; /* read byte limit */ hd.limit = pBuf[0]; /* get buffered packet count */ nwkDB_FindMatch(&nwkAreqSend, &hd); retValue[3] = hd.count; //retValue = nwkDB_MaxIndirectSent(dstAddr); /* Build and send back the response */ MT_BuildAndSendZToolResponse(((uint8)MT_RPC_CMD_SRSP | (uint8)MT_RPC_SYS_NWK), cmdId, 4, retValue); }
Having left a gateway running (25 devices, a mix of routers and end devices under bidirectional communication - unicast and multicast) for a few days there are a few buffered pacekts in a sent state that are never cleared. Eventually (weeks) this can exceed the total limit (NWK_MAX_DATABUFS_TOTAL)
We have confirmed this against an otherwise unmodified (modification above) 3.0.2 (with known issues applied only) as well as against our patched build.
Hypothesis: Seems to occur most frequently when there are high numbers of new nsdu handles being created. Perhaps there is a rollover happening on existing scheduled packets? nsduHandle is only a 8bit unsiged integer (0...255). If this happens the delete function which deletes via nsduHandle would be unable to clear the packet. This all happens inside the binary blob so I can't proove or deny this behaviour.