SIMPLELINK-CC13X2-26X2-SDK: Firmware not stable since SDK 6.20.00.29

Koen Kanters

Part Number: SIMPLELINK-CC13X2-26X2-SDK
Other Parts Discussed in Thread: Z-STACK, ARM-CGT, CC2652RB, SYSCONFIG, CC2652P, CC2652P7, CC1352P

With recent SDKs, many users a reporting stability issues (total crash, mac errors, nwk table full errors, devices dropping off). This seems to be caused from 6.20.00.29 and up since 6.10.01.01 works well for many users. To figure this out, I've compiled 2 firmwares where only the SDK version was different. See an overview of all the results in this spreadsheet.

I do not see any significant changes in the changelog of 6.20.00.29.

My question: what has been changed in 6.20.00.29 and up which could cause these issues?

over 1 year ago

0 Ryan Brown1 over 1 year ago

TI__Guru**** 210357 points

Hi Koen,

I have reviewed the Z-Stack source code and found no significant changes between the v6.10 and v6.20 SDK, as you've already noticed is reflected in the changelog. I have also not been observing similar reports from other customers using the newer SDKs. The v6.20 SDK Release Notes reflect many global changes which affect Z-Stack:

Removed support for TI ARM-CGT compiler examples and libraries, in favor of the TI Clang Compiler
Deprecated and removed support for PIN Driver in favor of GPIO++
Deprecated and removed support for UART driver, in favor of UART2 in all cases except in BLE5-Stack’s NPI.

What dependencies are you using to build the v6.20 Z-Stack project, how did you migrate your resources from v6.10, and what small changes have persisted? Note that ZNwkTableFull typically relates to MAX_RTG_ENTRIES/MAX_RREQ_ENTRIES/MAX_RTG_SRC_ENTRIES and recall that there have been changes to default heap allocation during these SDKs.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

- How can I see the dependencies? I've compiled both the working (6.10) and non working (6.20) firmware with CCS 12.4.0

- I've migrated the changes from 6.10 to 6.20 by creating a patch file of all the changes

- Here is a link to this patch (for just the CC26XR1, but the patch for CC2652RB/CC1352 is the same): patch . Here you can review all the table sizes. Apart from some file hashes that changed, the patch for 6.10 is the same.

- I'm aware of MAX_RTG_ENTRIES and MAX_RTG_SRC_ENTRIES, but not of MAX_RREQ_ENTRIES! Thanks for pointing me to it. Could you also review the other table sizes I'm using? You can find them in the preinclude.h in the patch I linked.

- I'm for example still unsure what value to use for SRC_RTG_EXPIRY_TIME/ROUTE_EXPIRY_TIME, I know 255 disables the route removal, but what happens if the table gets full? On the other side, a value of 10 would expire the route in 10 seconds but AFAIK it will not be removed until the table gets full. So I don't understand why someone would ever use 255 here.

- Regarding the default heap allocation, this was already changed in 6.10 (which works fine), so this cannot be the issue.

- The ARM-CGT compiler also cannot cause this issue, I was already using the TI clang compiler for my 6.10 firmware

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Project Properties -> CCS General -> Project/Products tabs (for compiler, SDK, and SysConfig dependencies). Also CCS Build -> Environment tab. I don't think MAX_RREQ_ENTRIES should make a difference, I just listed all definitions which could return a NWK table full error. You may also consider CONFLICTED_ADDR_TABLE_SIZE. We have reviewed most, if not all, of the table sizes listed in the patch previously, but the sheer number of changes does make this difficult to further track and quantify. It is true that expired active routes are not removed until necessary, and if route removal is disabled then a typically-expired route would remain even if the table gets full. This would be the implicit decision of the developer to enact, regardless of purpose.

Is the most prevalent commonality in each instance that they NWK table is full, cannot join/rejoin new devices, and perhaps crashes? Can any sniffer logs (with important packets highlighted) or debugging logs be provided? I will try to ask the Software Development Team for their opinions but the best support will be available if the issue can be replicated with a reduced Z-Stack patch which still causes the issue.

Edit: Software Development also does not know of any significant differences between the SDK versions.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

I've compiled a new firmware for users to test with the table sizes adjusted as you mentioned: Z-Stack_3.x.0 coordinator 20231111/20231112 feedback · Koenkk/Z-Stack-firmware · Discussion #483 (github.com)

> if route removal is disabled then a typically-expired route would remain even if the table gets full.

If route removal is disabled (SRC_RTG_EXPIRY_TIME = 255), the table gets full (MAX_RTG_SRC_ENTRIES) and a new route is discovered, will the table overflow and cause e.g. the fw to crash? Or will new routes not be added anymore?

Hereby all the dependencies I use to compile the 6.20 SDK:

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

You may consider using the dependencies listed in the Z-Stack Release Notes for v6.20, but I do not expect this would make a significant difference with the topic at hand

TI Code Composer Studio: CCS-11.2.0
TI ARM Clang Compiler tools: 2.1.0.LTS
XDCTools: 3.62.01.15
SysConfig Standalone tool for IAR IDE: 1.13.0

If MAX_RTG_SRC_ENTRIES is met, the routing layer will exit with RTG_SRC_TBL_FULL without further action. Please let me know the consensus of the new image which is being tested when it is available.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan,

A user captured a log + sniff of a crash with a 6.20 fw with a minimal amount of changes.

- All changes of this fw: diff.patch (since these changes looks so minimal and standard to me, I think we can rule out of this being the culprit)

- Link to the log + sniff: link, some things that caught my attention:

- Various MEM_ERROR (0x10) can be seen in the log (zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - dataRequest - {"status":16})

- Various BUFFER_FULL (0x11) can be seen in the log (zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - dataRequest - {"status":17})

- At some point it completely crashes (failed (SRSP - AF - dataRequest after 6000ms))

- I've asked for the network key such that the sniff can be decrypted

0 Alex Fager over 1 year ago in reply to Koen Kanters

TI__Genius 9591 points

Hello Koen,

Ryan is out for the moment (holidays), and I wanted to let you know that I took a look at your links and your second link of the log + sniff (to the google drive), we can't open (not a problem on your end, we can't open that format). Could you include your results in a zipped file perhaps so we can take a look at it? We may be a bit delayed due to the holidays; I apologize for any inconvenience.

Thanks,
Alex F

0 Koen Kanters over 1 year ago in reply to Alex Fager

Intellectual 925 points

Hi Alex

Attached the files, I think the first pointer is the MEM_ERROR.

- Why does the SDK generate this error?

- What can be done to prevent it?

Archive.zip

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hey Koen,

ZMemError can be returned if there is not enough heap memory to complete the request, and ZBufferFull could indicate that the NWK/MAC buffers are temporarily full. Both can be searched within the znp project. I have provided some comments in this relevant E2E response concerning ways to alleviate these issues. I apologize if we had not addressed this previously, or perhaps it was missed in the minimal patch.

Regards,
Ryan

0 Dhanraj over 1 year ago in reply to Ryan Brown1

Guru 14820 points

we are also facing unstability with sdk 7.10 and with cc2652p, primarily ZCL report command api not working reliable ..sometime it works and sometime it stuck in policy error for loop.

0 Ryan Brown1 over 1 year ago in reply to Dhanraj

TI__Guru**** 210357 points

Hi Dhanraj,

Please start a new thread with all details including your observations, stack changes, sniffer/debug logs, and versions tested.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

It took some time but I finally managed to get a sniff + log of a crash. This user tested various firmwares:

- 20230922 (= 6.10 SDK + all my changes): This firmware works fine, no crashes and performs good

- 20230923 (= 6.20 SDK + all my changes): This firmware crashes and performs bad

- 20231221 (= 6.20 SDK + minimal changes): This firmware crashes and performs bad

Links to my changes:

- All changes

- Minimal changes

To make sure the crash is not because of "all my changes", the sniff + log below is from 20231221, so a firmware with the minimal changes.

Log + sniff: nick_crash_sniff_log_20231221.zip

Notes:

- I see that there are a lot of route requests on the network, maybe this contributes to the crash?

- The last message send by the coordinator is #2992071

- In the log, the ZNP stops communicating at "2023-12-23T22:24:00.450Z", (search for "failed (SRSP - AF - dataRequest after 6000ms)")

- To get all the communication between ZNP and Z2M, filter on "zigbee-herdsman:adapter:zStack:znp"

0 Jan over 1 year ago in reply to Koen Kanters

TI__Mastermind 38970 points

Hello E2E community member,

Thanks you for asking your question concerning TI's SimpleLink Devices on the E2E Forum! The subject expert who can best address your inquiry is out of office for the holidays. After returning in early January, they will review your post and provide an initial response within 24 hours.

Regards,
Jan

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hi Koen,

Thank you for the sniffer and host logs. How many devices are actively communicating on the network when the failure occurs, and is there a correlation between the number of active devices and the stability of the ZNP? This is a lot of information to process, however it does appear to reinforce the idea that MTO route requests are leading to a heap memory overflow. Have you been able to debug an active session in which the ZNP has crashed in order to review the call stack? And does the device recover is soft reset? Compiler migration or changes to the lower-level 15.4-Stack MAC could be resulting in larger heap requirements. Excuse me if I do not remember, but have you evaluated v7.10 yet? You could try increasing the heap again (buffers should be sufficient at their current size), however this will reach a ceiling soon with only 88 kB of RAM available. Some updated devices have more RAM, such as the CC2652P7 with 144 kB of RAM, which may be worth evaluating. There are other considerations which may not have been accounted for involving routing and discovery times, see Table 1 of SWRA650 for details. I understand that the greatest difficulty is replicating and observing the issue given the requirement to have many devices connected and communicating. Is there a specific reason that you need to upgrade SDKs? The Z-Stack solution on v6.10 is stable, and newer versions do not include many memorable updates or bug fixes to be concerned with.

Regards,
Ryan

0 dAVID over 1 year ago in reply to Ryan Brown1

Intellectual 790 points

Hi Koen and Ryan,

Happy New Year.

A 30+ devices(cc1352p) network of ZNP SDK v4.40 is more stable than that of ZNP SDK v6.40.00.13 from testing.

Best regards,

David

0 Ryan Brown1 over 1 year ago in reply to dAVID

TI__Guru**** 210357 points

Hi David,

Happy New Year to you as well. Can you please share your observations, including stack changes and sniffer/debug logs, which resulted in this conclusion?

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan,

Happy new year!

I understand that there are a lot of variables, however I don't expect increasing the heap will fix it. This issue also occurs on a CC2652P7 running 7.10, I would like to stick to 6.10 but then we cannot support the P7 (since it is not supported by 6.10).

Given the 6.10 vs 6.20 diff you sent me earlier, it's unlikely that any of these changes causes this issue. Therefore I expect there is a bug in one of the lower-level libraries. Is it possible to use e.g. the 6.10 15.4 Stack MAC with 6.20?

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Koen Kanters said:
I would like to stick to 6.10 but then we cannot support the P7 (since it is not supported by 6.10)

Can you please explain this further? For the v6.10 SDK, the CC2652P7 is listed in the Release Notes and there are CC1352P7 examples which directly support the CC2652P7.

Koen Kanters said:
Is it possible to use e.g. the 6.10 15.4 Stack MAC with 6.20?

I will ask the Software Development Teams about this, I appreciate further patience as several experts are still out of office at the moment.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan,

I dove a bit deeper into this and found out it's possible to use the libs from 6.10. First I tried with the 15.4 stack from 6.10. I did this by replacing the contents of 'simplelink_cc13xx_cc26xx_sdk_6_20_00_29/source/ti/ti154stack/lib/ticlang/m4f' with 'simplelink_cc13xx_cc26xx_sdk_6_10_01_01/source/ti/ti154stack/lib/ticlang/m4f'. The firmware still crashes after this.

Then I tried using the closed source zstack libs from 6.10 (under 'source/ti/zstack/lib/ticlang/m4f'), after this the firmware does not crash anymore! So it seems one of the changes in the closed source libs of zstack causes this regression. What changed in these libs compared to 6.10?

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Thanks for the update Koen. I've alerted the R&D Teams accordingly. Are you able to replace just the zstack source libraries or are both (i.e. zstack and ti154stack) required?

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Only zstack is enough. I'm now going to test the same for the latest SDK (so 7_10_02_23 + 6.10 zstack libs).

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hello Koen,

I have aligned with R&D on this issue and submitted a bug ticket so that this can be further explored internally. However I cannot currently provide a timeline for the results of such investigation

Regards,
Ryan

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hi Koen,

TI R&D has requested that you replicate this issue using SimpleLink F2 SDK 7.10.02.23 with TI ARM Clang Compiler v2.1.2.LTS in accordance with the Release Notes, and in doing so confirm that a compiler version difference (albeit minor) does not cause the issue.

Thanks,
Ryan

0 Akhilesh Premkumar over 1 year ago in reply to Ryan Brown1

Intellectual 995 points

Hi,

Maybe you could try the dynamic heap configuration used in the 4.40 SDK in the app.cfg file instead of the static configuration being used in 6.10+

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
 * Heap Configuration defines the type of Heap you want to use for the system (application + Stack)
 * Only one Heap buffer will be allocated. This heap will be shared by the system and the stack through
 * one manager (HeapMem, HeapMem+HeapTrack or OSAL)
 * You can still decide to create several heaps if you want, but at least one heap needs to be created.
 * The stack must have a Heap to run.
 * The different Heap manager available are :
 * OSAL HEAP: legacy Heap manager provided with all BLE sdk. By default, this Heap manager is used.
 *  HeapMem:� heap manager provided by TI-RTOS (see TI-RTOS user guide for properties)
 * HeapTrack: module on top of HeapMem allowing an easy debugging of memory allocated through HeapMem.
 *
 * The heap manager to use is selected by setting  HEAPMGR_CONFIG to the corresponding value (see below)
 * 0    = osal Heap manager, size is static.
 * 0x80 = osal Heap manager, with auto-size: The remainning RAM (not used by the system) will be fully assign to the Heap.
 * 1    = HeapMem with Static size
 * 0x81 = HeapMem with auto-size. The remainning RAM (not used by the system) will be fully assign to the Heap.
 * 2    = HeapTrack (with HeapMem) with fixe size
 * 0x82 = HeapTrack (with HeapMem) with auto-size: The remainning RAM (not used by the system) will be fully assign to the Heap.
 *
 * If HEAPMGR_CONFIG is not defined, but the configuration file ble_stack_heap.cfg is used, then the value
 * HEAPMGR_CONFIG = 0x80 is assumed.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

/*
 * Heap Configuration defines the type of Heap you want to use for the system (application + Stack)
 * Only one Heap buffer will be allocated. This heap will be shared by the system and the stack through
 * one manager (HeapMem, HeapMem+HeapTrack or OSAL)
 * You can still decide to create several heaps if you want, but at least one heap needs to be created.
 * The stack must have a Heap to run.
 * The different Heap manager available are :
 * OSAL HEAP: legacy Heap manager provided with all BLE sdk. By default, this Heap manager is used.
 *  HeapMem:� heap manager provided by TI-RTOS (see TI-RTOS user guide for properties)
 * HeapTrack: module on top of HeapMem allowing an easy debugging of memory allocated through HeapMem.
 *
 * The heap manager to use is selected by setting  HEAPMGR_CONFIG to the corresponding value (see below)
 * 0    = osal Heap manager, size is static.
 * 0x80 = osal Heap manager, with auto-size: The remainning RAM (not used by the system) will be fully assign to the Heap.
 * 1    = HeapMem with Static size
 * 0x81 = HeapMem with auto-size. The remainning RAM (not used by the system) will be fully assign to the Heap.
 * 2    = HeapTrack (with HeapMem) with fixe size
 * 0x82 = HeapTrack (with HeapMem) with auto-size: The remainning RAM (not used by the system) will be fully assign to the Heap.
 *
 * If HEAPMGR_CONFIG is not defined, but the configuration file ble_stack_heap.cfg is used, then the value
 * HEAPMGR_CONFIG = 0x80 is assumed.
 * If HEAPMGR_CONFIG is not defined, and the file ble_stack_heap.cfg is not used, then the value
 * HEAPMGR_CONFIG = 0x80 is assumed and the default Heap size will be 3072
 * unless you define HEAPMGR_SIZE to a different value in the project option (0 meaning auto-size).
 *
 * From the configuration below, two #define will be created that will be used by the application to setup the Heap:
 * #define HEAPMGR_SIZE
 * #define HEAPMGR_CONFIG
 * In order to use those define, this include line needs to be added: #include <xdc/cfg/global.h>
 *
 * In order for the auto-size Heap to work, the following symbol needs to be created by the linker:
 *  heapStart
 *  heapEnd
 */

/*
 * DISCLAIMER: The HeapMem module in ROM can only use a GateMutex module. This means the malloc()
 * function cannot be used in a Hwi/Swi.
 * This means also that other access to the heap, with Icall_alloc for example, can potentially break the Heap...
 * Therefore this solution is most effective when TI-RTOS is located in FLASH, so that a GateHwi can be used.
 * If you try to use it in ROM, a workaround using HeapCallback is used, which will degrade performance.
 */
var Memory = xdc.useModule('xdc.runtime.Memory');
var HEAPMGR_CONFIG = 0x80;
var HEAPMGR_SIZE   = 30000; //only valid if static size is used. This is the size of the buffer allocated for Heap.

if (typeof HEAPMGR_CONFIG === 'undefined' )
{
  var HEAPMGR_CONFIG = 0x80;
}

// The following will create the #define HEAPMGR_CONFIG. It can then be used by include  <xdc/cfg/global.h>
Program.global.HEAPMGR_CONFIG = HEAPMGR_CONFIG;

if (HEAPMGR_CONFIG === 1 || HEAPMGR_CONFIG === 0x81)
{
  var HeapMem = xdc.useModule('ti.sysbios.heaps.HeapMem');
  var heapMemParams = new HeapMem.Params();

  if (HEAPMGR_CONFIG === 0x1)
  {
    heapMemParams.size = HEAPMGR_SIZE;
    Program.global.HEAPMGR_SIZE = HEAPMGR_SIZE;
  }
  else
  {
    // if you get an undefined error for the symbol bellow it means that AUTOHEAPSIZE has been defined in the application.
    Program.global.HEAPMGR_SIZE = 0;
    heapMemParams.usePrimaryHeap = true;
    HeapMem.primaryHeapBaseAddr = "&heapStart";
    HeapMem.primaryHeapEndAddr = "&heapEnd";
  }

  Program.global.stackHeap = HeapMem.create(heapMemParams);

  var GateHwi = xdc.useModule('ti.sysbios.gates.GateHwi');
  HeapMem.common$.gate = GateHwi.create();
  Memory.defaultHeapInstance = Program.global.stackHeap;
}
else if (HEAPMGR_CONFIG === 2 || HEAPMGR_CONFIG === 0x82)
{
  var HeapMem = xdc.useModule('ti.sysbios.heaps.HeapMem');
  var heapMemParams = new HeapMem.Params();
  if (HEAPMGR_CONFIG === 2)
  {
    heapMemParams.size =  HEAPMGR_SIZE;
    Program.global.HEAPMGR_SIZE = HEAPMGR_SIZE;
  }
  else
  {
    // if you get an undefined error for the symbol bellow it means that AUTOHEAPSIZE has been defined in the application.
    //
    heapMemParams.usePrimaryHeap = true;
    HeapMem.primaryHeapBaseAddr = "&heapStart";
    HeapMem.primaryHeapEndAddr = "&heapEnd";
    Program.global.HEAPMGR_SIZE = 0;
  }

  var tempHeap = HeapMem.create(heapMemParams);

  var GateHwi = xdc.useModule('ti.sysbios.gates.GateHwi');
  HeapMem.common$.gate = GateHwi.create();

  var HeapTrack = xdc.useModule('ti.sysbios.heaps.HeapTrack');
  var heapTrackParams = new HeapTrack.Params();
  heapTrackParams.heap = tempHeap;
  Program.global.stackHeap = HeapTrack.create(heapTrackParams)
  Memory.defaultHeapInstance = Program.global.stackHeap;
}
else if (HEAPMGR_CONFIG === 0 || HEAPMGR_CONFIG === 0x80)
{

  var HeapCallback = xdc.useModule('ti.sysbios.heaps.HeapCallback');
  var params = new HeapCallback.Params();
  params.arg = 1;
  Program.global.heap0 = HeapCallback.create(params);
  HeapCallback.initInstFxn = '&osalHeapInitFxn';              // Call First When BIOS boot. Initialize the Heap Manager.
  HeapCallback.allocInstFxn = '&osalHeapAllocFxn';            // Call for allocating a buffer
  HeapCallback.freeInstFxn = '&osalHeapFreeFxn';              // Call for Freeing a buffer
  HeapCallback.getStatsInstFxn = '&osalHeapGetStatsFxn';      // Return Statistic on the Heap.
  HeapCallback.isBlockingInstFxn = '&osalHeapIsBlockingFxn';  // Return TRUE: This heap is always blocking ('Hwi Gate' like )
  //HeapCallback.createInstFxn = '&osalHeapCreateFxn';        // Not Supported
  //HeapCallback.deleteInstFxn = '&osalHeapDeleteFxn';        // Not supported
  Memory.defaultHeapInstance = Program.global.heap0;

  if (HEAPMGR_CONFIG === 0)
  {
    // the following definition will create the #define HEAPMGR_SIZE ,
    // which is used by thestack to have information about the heap manager size.
    // if set to 0, this imply auto-size heap
    Program.global.HEAPMGR_SIZE = HEAPMGR_SIZE;
  }
  else
  {
    // the following definition will create the #define HEAPMGR_SIZE ,
    // which is used by the stack to have information about the heap manager size.
    // if set to 0, this imply auto-size heap
    // The heap buffer will be created automaticaly by using all the remaiing RAM available at the end of the build/link.
    // For this, 2 symbole needs to be created by teh linker file: heapStart and heapEnd
    Program.global.HEAPMGR_SIZE = 0;
  }
}

Thanks,

Akhilesh

0 Ryan Brown1 over 1 year ago in reply to Akhilesh Premkumar

TI__Guru**** 210357 points

Hi Koen,

TI's Zigbee Test Team applied your minimal changes to run a network for 24 hours with 43 devices. This included a mix of ZR's/sleepy ZEDs/nonsleepy ZEDs (maximum 6 children per router and 4 hops total), default poll rates, and sending a custom "Large Network Test" packet with various payload sizes every 60s. The ZC (or other devices for that matter) never crashed. Do you have any suggestions which could help cause the issue within this setup?

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan,

I've now compiled a 7.10.02.23 fw + TI ARM Clang Compiler v2.1.2.LTS + CCS 12.2.0, the firmware still crashes somewhere between a couple of hours / couple of days. I've verified this with 2 users.

Regarding the reproducibility, I still don't know what triggers the crashing. Given that there are many variables, I think this is very complex to figure out. As noted before, 7.10 also works stable for many people (including me).

Previously I mentioned in order to get 6.20 SDK stable (the SDK where this issue was first introduced), it was possible to get it stable by using the 6.10 zstack libs. It turns out this was not true, it seems the combination of 6.20 SDK with just 6.10 ti154 libs gets it stable. I'm testing more with users if this is indeed the case.

Once this is confirmed (6.20sdk + 6.10 ti154 libs to get it stable), would it be possible to get some insights in the changes between 6.10 <-> 6.20 ti154 libs, my expectation is that the bug is still present in 7.10 ti154 libs.

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hi Koen,

I will update the TI Software Development Team with your latest feedback and ask about the difference between 15.4-stack libraries between SDK versions 6.10 and 6.20

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan,

It seems the crash occurs right after the `AssocGetWithAddress` call. Note that many calls to `AssocGetWithAddress` were made before the crash (3000+). Could this function maybe have a memory leak?

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hi Koen,

That API returns the association table entry, and there haven't been any assoc/neighbor table updates in years (especially around v6.10 <-> v6.20+). So the issue is likely unrelated to those tables specifically.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan,

I see, but isn't it very suspicious that it out of the many different request, it consistently crashes on this one? Could it be that e.g. another method writes a corrupt entry to this table and upon retrieval causes the crash?

What I will do next is disable this call from z2m and see if the crash still occurs.

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

I have sent your observation to the Software Development Team for further review.

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

We tried running the firmware without the assoc get calls, it seems the firmware stays up longer (couple of days) but in the end still crashes. Do you have some more insight in the ti154 changes between 6.10 en 6.20?

I still really would like to start using the 7.10 sdk, since we can currently not support the new P10 chips.

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

There is no insight to share concerning TI 15.4-Stack changes between SDK versions. The Test Team also has not been able to replicate the behavior with the test conditions provided. Are you not able to use the v6.10 ti154 source on v7.10 project builds?

Regards,
Ryan

0 Koen Kanters over 1 year ago in reply to Ryan Brown1

Intellectual 925 points

ti154 from 6.10 is not compatible with 7.10 (getting invalid param errors).

What do you mean with “no insight to share”, isn’t TI willing to share the changes to debug this issue or are there no changes?

0 Ryan Brown1 over 1 year ago in reply to Koen Kanters

TI__Guru**** 210357 points

I apologize for the vague response, I mean that these representatives do not see any differences between the source code versions which could explain the behavior you are observing.

Regards,
Ryan

0 Ryan Brown1 12 months ago in reply to Koen Kanters

TI__Guru**** 210357 points

Could you benefit from the investigation by Mathew Heard? The R&D team will further investigate possible SRC match table changes which could be causing this.

Regards,
Ryan

0 Koen Kanters 12 months ago in reply to Ryan Brown1

Intellectual 925 points

Yes definitely! Although I think it won't fix the issue crashes.

0 Koen Kanters 12 months ago in reply to Ryan Brown1

Intellectual 925 points

Hi Ryan, I noticed the 6.10 and 6.20 SDKs have been removed from the download site: SIMPLELINK-LOWPOWER-F2-SDK Software development kit (SDK) | TI.com

Why has this happened and can they be put back?

0 Ryan Brown1 12 months ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hi Koen, thanks for reporting this. I've notified the correct stakeholders so that they can resolve this error.

Regards,
Ryan

0 Ryan Brown1 11 months ago in reply to Koen Kanters

TI__Guru**** 210357 points

The missing versions have been restored to https://www.ti.com/tool/download/SIMPLELINK-LOWPOWER-F2-SDK

Regards,
Ryan

0 Koen Kanters 11 months ago in reply to Ryan Brown1

Intellectual 925 points

Many thanks!

In the meantime we did some more testing. It turns out my previous statement about using 6.10 TI154/zstack libraries with 6.20 fixes the 6.20 stability issues is wrong, unfortunately this does not fix the issue.

We also tested using the UART (instead of UART2 driver) with 6.20, this does also not fix the issue.

Do you or the R&D team have any clue in what direction to look next? I'm still committed to find the root cause of this, since I don't want to get stuck on the 6.10 SDK. I'm compiling the firmware on a Mac, in the release notes Windows is recommended, could this potentially cause this issue?

0 Ryan Brown1 11 months ago in reply to Koen Kanters

TI__Guru**** 210357 points

Hi Koen,

The OS should not matter so long as you are following all of the dependencies listed in the Release Notes. I will message you privately to continue our discussion.

Edit:

Koen was able to resolve the issue offline:

After a lot of trial and error, we finally found out what causes the 6.20 firmware to break, it's not 1 but 2 things. This + the fact that it takes 3-7 days until the firmware crashes is the reason why it took so long. The following 2 changes make the 6.20 firmware stable:

- Define `NVOCMP_RECOVER_FROM_COMPACT_FAILURE` (this was added in 6.20).

- Reverting to the UART driver (instead of UART2).

Regards,
Ryan

Zigbee & Thread

Zigbee & Thread forum

SIMPLELINK-CC13X2-26X2-SDK: Firmware not stable since SDK 6.20.00.29