CC2640R2F: Application streaming large data blob using notifications on a characteristic eventually hits a timeout and ICall_abort()

Part Number: CC2640R2F
Other Parts Discussed in Thread: CC2640

We have a custom board running firmware based on the ble5 simple peripheral example.  (using CCS v10, CC2640R2 SDK v4.30.00.08)  The application is running well, except when transferring large amounts of data to a client.  By "large", I mean over a megabyte (> 80,000 packets).  The application is using notifications on a characteristic to repeatedly send chunks of data to the client (an android (v8.1.0) app on a tablet) which aggregates the chunks back into the original blob.  It will send successfully for anywhere from 20,000 to 60,000 packets, but at an arbitrary point in the transfer, it goes through this series of calls:

GATTServApp_ProcessCharCfg

gattServApp_SendNotiInd

GATT_Notification

icall_directAPI

ICall_waitMatch

and ICall_waitMatch returns ICALL_ERRNO_TIMEOUT, and it falls to ICall_abort

The peripheral configuration includes these parameters:

// Minimum connection interval (units of 1.25ms, 80=100ms) for parameter update request
#define DEFAULT_DESIRED_MIN_CONN_INTERVAL     80

// Maximum connection interval (units of 1.25ms, 104=130ms) for  parameter update request
#define DEFAULT_DESIRED_MAX_CONN_INTERVAL     104

// Slave latency to use for parameter update request
#define DEFAULT_DESIRED_SLAVE_LATENCY         0

// Supervision timeout value (units of 10ms, 300=3s) for parameter update request
#define DEFAULT_DESIRED_CONN_TIMEOUT          300

I tried increasing DEFAULT_DESIRED_MAX_CONN_INTERVAL, but performance was significantly worse.

I am working on getting a packet sniffer set up to see if it sheds any light on where things are breaking down, but does this timeout point to any particular area to focus on?

  • Hello Tony,

    I have assigned an expert to help with your query. In the meantime, I have a couple of questions that will help us better understand the problem. Can you verify what the return status of the notification functions are on the lead up to the behavior happening? Are all functions returning success until the behavior occurs? Can you verify if this behavior is occurring on a LaunchPad?

    Regards,

    Jan

  • Thanks, Jan.  I added some code to capture a small portion of the history leading up to the failure, and when this particular call sequence returns, but was not successful, the return value is always 22 (0x16).  It looks like since it is making it out of icall_directAPI in these cases, the value is coming from this code:

      // The return parameter is set in the runtime stack, at the location of the
      // first parameter.
      res = liteMsg.msg.pointerStack[0];
    
      va_end(argp);
    
      return (res);
    

    So res = 22

    When this or any non-zero return code comes back, the transfer is immediately retried.

    I do not think I'm going to have time to port all the application functionality back to the launchpad, unless that becomes a last resort.

    Not sure if it's relevant, but I have observed that during the download, the transfer rate typically starts out around 100 record/packet transfers per second, then slows down to 80; runs for a while and decreases to 50, and 30, then eventually fails.  The 2 tests I ran this morning transferred around 50000 of the 80000 records.  The transfer rate is faster with stronger signal strength, so if we put the tablet antenna next to the device, the transfer rate is faster, but still eventually fails.

  • Hi Tony,

    A few debug tips are provided here. I think it would help if you could connect the debugger after the board has crashed and, using, the ROV tries to identify the issue.

    The issue makes us think about a memory/heap issue. Using the same process as you mentioned, you could keep track of the heap metrics all along the transfer - heap metrics an be accessed as shown here.

    Regards,

  • I will need to pursue the heap angle.  I just ran another test with the device positioned such that the advertising signal strength was -14db, and the download ran to completion.  There would be fewer retries needed in this scenario.  A second test immediately following this test failed.  I will need to check the state of the heap after a download and see what component is consuming it, though the application does not use dynamic memory allocation.

  • Hi,

    Keep us updated :)

    You could also add a retries-counter, it would help to confirm the idea.

    Regards,

  • As an update:  A call to ICall_getHeapStats at failure show a totalSize of 7032, totalFreeSize of 536, and largestFreeSize is 48, so the heap is exhausted.  The number of retries on failure has ranged from 1.1 million to just over 4 million.  What I'm trying to understand now is what is consuming the heap, and the nature of the failure leading to the retries.  Since the application is attempting to send a notification rather than an indication, I do not know where this would break down, though it does become much more common as signal strength decreases.

    In gattServApp_SendNotiInd, it starts with a call to GATT_bm_alloc, and then if the call to GATT_Notification fails, it calls GATT_bm_free.  The application is using the same buffer to hold the characteristic value as each successive notificaion is sent, so I am not seeing the memory leak on the apparent transmit failures.

  • Hi Tony,

    The ROV has a module showing which heap pieces have been allocated. Could you check if this helps?

    An option could also to disable some pieces of the application and see if it has an effect on the heap. For example, what happen if you stop collecting data - and only sent all the time the same value?

    I hope it will help,

    Regards,

  • In the RTOS Object Viewer and Runtime Object Viewer, Selecting Heapmem shows "No data to display".  Is there another option to show allocated heap blocks?  HEAPMGR_METRICS are enabled, so I'm not sure if there is another step to take to make this info visible.

    Each records sent is read from external flash; I can take a closer look at that area to see if anything turns up there.  Since there are so many packet retries, and around 50,000 successful transmissions, it does not look like a per retry leak or a per record leak.

  • Hi,

    [EDIT] There was a typo - HEAPMGR_CONFIG should be set to 0x81

    Please make sure HEAPMGR_CONFIG = 0x81; in your cfg file. The cfg file is in the folder TOOLS of your project. In general, the file point to other files and the configuration you are looking for is in <SDK>\source\ti\ble5stack\common\cc26xx\kernel\cc2640\config\ble_stack_heap.cfg.

    If it helps, you can review this documentation.

     Regards,

  • Thanks, Clément.  From app_ble.cfg:

    utils.importFile("common/cc26xx/kernel/cc2640/config/ble_stack_heap.cfg");

    from ble_stack_heap.cfg:

    var Memory = xdc.useModule('xdc.runtime.Memory');
    var HEAPMGR_CONFIG = 0x80;
    var HEAPMGR_SIZE = 0x1900; //only valid if static size is used. This is the size of the buffer allocated for Heap.

    if (typeof HEAPMGR_CONFIG == 'undefined' )
    {
    var HEAPMGR_CONFIG = 0x80;
    }

    This appears to match the documentation in the Configuring the Heap section of the BLE5 stack users' guide.

  • Hi,

    Actually I did a typo, I wanted to write "please make sure HEAPMGR_CONFIG = 0x81;". Sorry for this.

    I have edited my previous message.

    Regards,

  • That does show quite a lot of fragmentation:  (could not get a screen capture .png to insert here)

    [
    	{
    		"module": "ti.sysbios.heaps.HeapMem",
    		"view": "FreeList"
    	},
    	[
    		{
    			"label": "HeapMem@20001740-freeList",
    			"elements": [
    				{
    					"Address": "0x20002518",
    					"size": "0x988",
    					"next": "0x20002ea0",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20002ea0",
    					"size": "0x8",
    					"next": "0x20002f30",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20002ea8",
    					"size": "0x88",
    					"next": "0x20002f30",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20002f30",
    					"size": "0x8",
    					"next": "0x200030e8",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20002f38",
    					"size": "0x1b0",
    					"next": "0x200030e8",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x200030e8",
    					"size": "0x8",
    					"next": "0x20003170",
    					"status": "Free"
    				},
    				{
    					"Address": "0x200030f0",
    					"size": "0x80",
    					"next": "0x20003170",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003170",
    					"size": "0x8",
    					"next": "0x20003670",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003178",
    					"size": "0x4f8",
    					"next": "0x20003670",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003670",
    					"size": "0x8",
    					"next": "0x200036a8",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003678",
    					"size": "0x30",
    					"next": "0x200036a8",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x200036a8",
    					"size": "0x8",
    					"next": "0x200036f0",
    					"status": "Free"
    				},
    				{
    					"Address": "0x200036b0",
    					"size": "0x40",
    					"next": "0x200036f0",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x200036f0",
    					"size": "0x8",
    					"next": "0x20003728",
    					"status": "Free"
    				},
    				{
    					"Address": "0x200036f8",
    					"size": "0x30",
    					"next": "0x20003728",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003728",
    					"size": "0x8",
    					"next": "0x20003790",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003730",
    					"size": "0x60",
    					"next": "0x20003790",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003790",
    					"size": "0x8",
    					"next": "0x200037d8",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003798",
    					"size": "0x40",
    					"next": "0x200037d8",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x200037d8",
    					"size": "0x8",
    					"next": "0x20003810",
    					"status": "Free"
    				},
    				{
    					"Address": "0x200037e0",
    					"size": "0x30",
    					"next": "0x20003810",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003810",
    					"size": "0x8",
    					"next": "0x20003898",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003818",
    					"size": "0x80",
    					"next": "0x20003898",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003898",
    					"size": "0x8",
    					"next": "0x20003940",
    					"status": "Free"
    				},
    				{
    					"Address": "0x200038a0",
    					"size": "0xa0",
    					"next": "0x20003940",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003940",
    					"size": "0x8",
    					"next": "0x200039f8",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003948",
    					"size": "0xb0",
    					"next": "0x200039f8",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x200039f8",
    					"size": "0x8",
    					"next": "0x20003a10",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003a00",
    					"size": "0x10",
    					"next": "0x20003a10",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003a10",
    					"size": "0x8",
    					"next": "0x20003a68",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003a18",
    					"size": "0x50",
    					"next": "0x20003a68",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003a68",
    					"size": "0x8",
    					"next": "0x20003ab0",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003a70",
    					"size": "0x40",
    					"next": "0x20003ab0",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003ab0",
    					"size": "0x8",
    					"next": "0x20003ad8",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003ab8",
    					"size": "0x20",
    					"next": "0x20003ad8",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003ad8",
    					"size": "0x8",
    					"next": "0x20003b40",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003ae0",
    					"size": "0x60",
    					"next": "0x20003b40",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003b40",
    					"size": "0x8",
    					"next": "0x20003c48",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003b48",
    					"size": "0x100",
    					"next": "0x20003c48",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003c48",
    					"size": "0x8",
    					"next": "0x20003cb0",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003c50",
    					"size": "0x60",
    					"next": "0x20003cb0",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003cb0",
    					"size": "0x8",
    					"next": "0x20003d28",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003cb8",
    					"size": "0x70",
    					"next": "0x20003d28",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003d28",
    					"size": "0x8",
    					"next": "0x20003d70",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003d30",
    					"size": "0x40",
    					"next": "0x20003d70",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003d70",
    					"size": "0x8",
    					"next": "0x20003d88",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003d78",
    					"size": "0x10",
    					"next": "0x20003d88",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003d88",
    					"size": "0x8",
    					"next": "0x20003e40",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003d90",
    					"size": "0xb0",
    					"next": "0x20003e40",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003e40",
    					"size": "0x8",
    					"next": "0x20003ea8",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003e48",
    					"size": "0x60",
    					"next": "0x20003ea8",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003ea8",
    					"size": "0x8",
    					"next": "0x20003f00",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003eb0",
    					"size": "0x50",
    					"next": "0x20003f00",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003f00",
    					"size": "0x8",
    					"next": "0x20003f68",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003f08",
    					"size": "0x60",
    					"next": "0x20003f68",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003f68",
    					"size": "0x8",
    					"next": "0x20003ff0",
    					"status": "Free"
    				},
    				{
    					"Address": "0x20003f70",
    					"size": "0x80",
    					"next": "0x20003ff0",
    					"status": "In Use"
    				},
    				{
    					"Address": "0x20003ff0",
    					"size": "0x10",
    					"next": "0x0",
    					"status": "Free"
    				}
    			]
    		}
    	]
    ]

    Besides the allocated memory left unfreed, there are the 8-byte free blocks that alternate with these blocks that remain in use.

    I enabled HeapTrack, but performance was degraded too much to get through enough of the data transfer to tell which task is responsible, so I'm still trying to track that down.  ICall_taskEntry and SimplePeripheral_taskFxn are the two active tasks.

  • Hi,

    Interesting. To me the 8-byte free blocks should not worry you - these appear in the out-of-the box project too. Same for the bigger blocks marked "In use" at the beginning of the list.

    The issue comes from the numerous allocations not freed. Not sure if it will help but all these allocations have a multiple of 16 as size. Could you try to find which function / task is responsible for these allocations? I don't think any tool can do this for you. The solution could be one of the following: 1) Look at the content of the memory an see if you can recognize some specific data. Otherwise you could run a test where you set buffers with known data. 2) Step in the code to find which function allocate these buffers. Again you could disable some subset of the code to see if the allocations remain.

    Best regards,

  • I started down that road last night.  The data being allocated was not immediately recognizable, so I'm still reviewing.  I was able to get through a download of just over 8000 records with HeapTrack enabled, and saw that once the download begins, there are series of pairs of allocations of 12 and 16 bytes, each 500,000 ticks apart, split evenly between the SimplePeripheral_Fxn task and the ICall task, but not alternating exactly.  Looking into what these are.

  • I need to test a lot more, but a download did run to completion without any degradation in transfer speed.  The 500,000 tick period shown in the HeapTrack data and the 6 in the allocated blocks pointed at the periodic event with the 5-second period in the original SimplePeripiheral application.  I am not using that functionality, but still had the clock running and was letting it post the event on each period.  I removed that periodic functionality and things look much better.  The heap looked good after the download.

    The ICall_free at the end of SimplePeripheral_processAppMsg is only called if the pData pointer is not NULL, and pData pointer is NULL for the SP_PERIODIC_EVT event.  It looks like it's leaving the spEvt_t struct in memory, which is the case even if pData is not null since it is only freeing pData.  I'll have to look at this more closely if we eventually need a periodic event.

    I could not run the test with HeapTrack enabled as it showed heap corruption almost immediately on download start. I'm not sure what to attribute that to other than performance is degraded to a point where our app will not run with it enabled, whereas it would run long enough to capture the memory leak previously.

  • Lot of red herrings on this one.  The spEvt_t structs are not leaking.  While the application is getting the records off the device over the air as fast as possible, the application task loop is not getting a chance to take applications events off the queue, so every 5 seconds, events were queued and never processed, causing issues with the heap.  Removing the unused periodic clock eliminated the events, and the heap fragmentation.  If we want to use that periodic clock, some design mods will be in order.

    Thanks for your attention while tracking this down.

  • Hi,

    Well done! I am glad you figured that out.

    Kind regards,