CC1352P7: Linux collector + mac coprocessor - lockup/resource exhaustion - Beagle Play hardware

Joshua van Tol

Part Number: CC1352P7

Tool/software:

I'm using a Beagle Play as a collector in a Sub-Gig system. Configured for 200 kbit, frequency hopping operation.

The collector binary has some light customization to add some messages that are custom to our application, and likewise the collector that runs on the Linux side has matching changes. The co-processor binary is running SDK 7.10.2.23, and the linux binary is using the latest release, which is 4.40.00.03. The only modifications to the co-processor are to increase the number of Tx and Rx queue entries.

The sensors are running in dual mode configuration using DMM to simultaneously run the SubGHz stack and BLE stack. We are using SDK 6.10.0.29. I am using a custom application, written in python, to talk to the collector over the TCP socket, the same as the TI example app.

The issue I've run into is that after a long period of runtime (6 weeks or so) with 8 sensors joined and reporting, the co-processor seemed to have stopped responding. Sensors were NOT joined as indicated by a few dead batteries from attempted re-joins, and by reading the provisioning profile sensor status characteristic. The collector socket was still open, and there was nothing observably wrong, but it was apparent that the co-processor to upper level communication had broken down.

What I'm looking for are some suggestions as to what might have happened, and how I might gather better information for the next time that this happens. I will need to implement a solution that can be deployed to the field and inspected after failure. It will not always be practical to have direct real time access to misbehaving units. I suspect that possibly there's a leak of some resource, for example if the mac tx or rx queue had entries that never were released.

What I've done in the meantime is to put a watchdog on the communications from the collector to my application. If I don't see any sensor packets for fifteen minutes, I kill the collector, reset the co-processor, restart the collector and re-establish the TCP connection. This is quite a crude solution. It should be effective, but it's not great.

4 months ago

0 Arthur R☑️ 4 months ago

TI__Mastermind 21679 points

Hi Joshua,

Have you had access to the devices directly, I would have suggested the use of the following ROV dashboard for the BLE stack: https://dev.ti.com/tirex/explore/content/simplelink_cc13xx_cc26xx_sdk_7_41_00_17/docs/ble5stack/ble_user_guide/html/ble-stack-5.x-guide/debugging-index.html#debugging-common-heap-issues

But it seems like this is not an option, as you would need to connect to the CC1352P7 JTAG and leave a PC on.

What you can do instead is that you can use our logging framework, and dump the logs over another UART on the CC1352P7.
You would need an XDS110 (connected to the beagleplay USB port) probe to connect to the CC1352_TDO, CC1352_TDI pins (configured as another UART) on the TagConnect connector.

With our logging tool located under \tools\log\tiutil, you can then dump those logs to a file.

Regards,

Arthur

0 Joshua van Tol 4 months ago

Intellectual 705 points

Arthur,

It's not the BLE stack I'm concerned with, I'm running a collector here, in 15.4 sub gig mode, no BLE stack involved.

Also, I like the idea of the logging, but I am going to need more info on how to configure the 15.4 stack to emit logging messages that will be useful to narrow down the cause of the coprocessor locking up.

I'm not entirely sure what the issue is here, it could be heap exhaustion, but it could also be some sort of deadlock with the transport mechanism that sends data to and from the co-processor, or it could be an issue with the 15.4 stack where tx or rx radio buffers become exhausted or wedged in some way.

Josh

0 Theo Lange 4 months ago in reply to Joshua van Tol

TI__Expert 4485 points

Hi Josh,

the status of the TI15.4 are by default written over UART as shown in the trainings: https://dev.ti.com/tirex/explore/content/cc13xx_cc26xx_simplelink_academy_7_41_00_00/_build_cc13xx_cc26xx_simplelink_academy_7_41_00_00/source/154-stack/154-stack_01_sensor_collector/154-stack_01_sensor_collector.html#task-3-using-the-collector-and-sensor

We also have an extensive debugging guide: https://dev.ti.com/tirex/explore/content/simplelink_cc13xx_cc26xx_sdk_7_41_00_17/docs/ti154stack/html/ti154stack-guide/debugging-index.html#

In your case I think following Arthurs suggestion by dumping logs on UART and using the logging tool located under \tools\log\tiutil is the best approach. There is an explanation in the shipped readme. And you can go through the Log.h file included in collector.c.

Kind regards,
Theo

0 Joshua van Tol 4 months ago in reply to Theo Lange

Intellectual 705 points

Theo,

A couple of things here.

1.) I'm running the co-processor, which means half of the 15.4 stack (the part that emits status reports via UART, or in the case of the linux collector, to STDOUT) is running on the linux side. I have some visibility there, and I don't think that the upper layers of the stack are broken.

2.) The MAC part of the stack is running on the CC1352P7. I've got a mechanism in place to send debug data to a log. I don't really need help with that. What I do need to know is which information I should be inspecting? There's very little obvious stuff to log. What I'd like to know is how to get the status of the MAC, how many buffers of each type are allocated, internal state variables, etc.

0 Theo Lange 4 months ago in reply to Joshua van Tol

TI__Expert 4485 points

Hi Joshua,

thank you for the clarification.

For the TI 15.4 Stack you can follow the link controller implementation (jdllc.c, cllc.c).
There you see which state transitions happen.

Kind regards,
Theo

Sub-1 GHz

Sub-1 GHz forum

CC1352P7: Linux collector + mac coprocessor - lockup/resource exhaustion - Beagle Play hardware