This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

LAUNCHXL-CC1352R1: No call to Scan confirmation pScanCnfCb after scanning for beacons

Part Number: LAUNCHXL-CC1352R1

Hello,

I am working on a product using the CC1352R1 Sub-GHz 15.4 radio stack and it has been working fine in our "shorter" tests. However, after running the devices (a mix between heavily modified versions of the collector and sensor examples) for a couple of weeks the sensor just disconnects and stops generating scan confirmation calls, even after receiving multiple collector beacons. Neither does the sensor seem to be able to request a beacon from the collector, both are running in non-beacon mode.

However, if I get another sensor to request a beacon the malfunctioning device also receives it during its scanning phase. Though, even after receiving the beacon correctly, at the end of the scan it does not generate a call to the pScanCnfCb callback, which it normally does during normal operation.

I have seen several other questions at E2E regarding the scan callback not being called:

https://e2e.ti.com/support/wireless-connectivity/sub-1-ghz/f/156/t/630595

https://e2e.ti.com/support/wireless-connectivity/sub-1-ghz/f/156/p/669240/2461426 

My question is then if a scan not triggering the scan callback is a known issue, and what could cause it?

Since I do not have access to the underlying source code for the 15.4 stack I have not been able to come very far into the debugging, other than confirming the above behavior.

It seems to be completely arbitrary as some devices do this after just a couple of days and others after about a month after shipped to the customer.

  • Hi

    I will assign someone to look into this. In the meantime, please provide info regarding what SDK version you are running and also if you are running on custom HW or on our LPs.

    BR

    Siri

  • Hello Siri,

    Thank you for trying to find someone to help.

    The project is running on custom hardware using the CC1352R1F3 MCU. The project was built with the SimpleLink 4.10.0.78 SDK, 3.61.0.16 XDCtools and compiled using your "TI v20.2.1.LTS" compiler.

    Should probably add that when I wrote "heavily modified" example I mean using the example projects with its "software stack" and replacing all of the actual app code.

    Regards,

    Felix.

  • Hello again,

    I understand that it takes some time to look into, but it would be nice to know if someone is actually doing so and that is not just dropped (which I have seen several other posts on E2E become). This issue is causing a lot of problems to our customers. We would not mind investigating it on our own, but as I wrote, the source code is not available to us.

    Regards,

    Felix.

  • Hi Felix,

    The issues in the threads you posted would be solved in 4.10 SDK, I don't think you are seeing the same thing. 

    I was looking at the following paragraph: 

    However, if I get another sensor to request a beacon the malfunctioning device also receives it during its scanning phase. Though, even after receiving the beacon correctly, at the end of the scan it does not generate a call to the pScanCnfCb callback, which it normally does during normal operation.

    Can you please give me the details on how you were debugging this, how you found out that the sensor is receiving the beacon correctly?

  • Hello Marie,

    We had let a couple devices run for testing just as in with the production devices. Once they showed symptoms of the problem (not being able to reconnect to a coordinator, even after an additional external signal from a button requested it to reconnect) I connected to them in CCS as a running target. We had built the used firmware to print debug messages during the callbacks (we have verbose console log builds for cases just like this) to output to console what coordinator they connected to and what beacons they received during their scanning phase. After confirming that the device received the reconnect calls (form both internal timers and external requests) and that it handled those by calling the appropriate exposed functions in TI's radio stack without getting any beacon response I tried to take one of the still working devices and let it produce a request instead.

    I let the device (which shows no other symptoms of malfunction apart from the behavior in the radio stack) enter its periodic scanning phase and then triggered the working sensor device to enter its own scanning phase as well. The working device then requests a beacon on all of its configured radio channels for any available coordinators, running in non-beacon mode. The same coordinators all the devices had been connected to throughout the test then answered the request and broadcasted their beacon messages to the working sensor device, which malfunctioning device also picked up and called its "beacon notification" callback.

    Looking at the received beacon it seems to be intact and contain the expected ID and trigger the normal callbacks for receiving a beacon (verified through breakpoints when running to the malfunctioning sensor device). The only callback that was not called as expected was the "scan confirmation". As I have written in the previous posts, I am not sure what the exact cause is due to the lack of underlying source code to follow the issue further down the call stack.

    I suspect that this is triggered after multiple periods of connecting to a coordinator, disconnecting a couple of hours later and reconnecting to a new/same coordinator (though, this is just a hunch and not based on any code/debugging observations!). It is the only direct connection I have seen as to why some of our production devices are more susceptible to this problem than others. It is normal for our devices to switch coordinator after a couple of hours due to changes in radio quality. We have no possibility to affect this as it is directly connected to the use case of our product.

    Regards,

    Felix.

    Scan confirmation

  • Hello Marie,

    Any news on this issue?

    Regards,

    Felix.

  • Hi Felix,

    Are you able to reproduce the issue on a default sensor project (or at least a sensor project with just a few alterations)?

  • Hello Marie,

    Unfortunately we have not used the example projects at all, apart from the 'software stack' folder and TI's proprietary Sub-GHz stack. The application code for our product was built on top of the 'software stack' and the interaction framework between the coordinator and sensor is not the same as in the examples. The product code base is quite complicated and big at this point, so I am not able to port it to an example project either.

    Of course I understand that being able reproduce the issue in a normal TI example project would make things easier to fix. I am not yet able to reproduce the problem on command. Not that the issue is a one off thing, quite a few sensor devices get the issue with callbacks not triggering. But it happens after quite some time in networks with a lot of traffic.

    I am very interested in sorting out this issue, it is causing quite a bit of problem. Is it possible to read some debug flags from the proprietary radio stack or at least load some symbols so that I can debug the radio call stack?

    Regards,

    Felix.

  • Hi Felix,

    I understand,

    I assume if you restart the device it goes back to sending the callback?

    I will ask you to do some tests with different polling/reporting intervals while disabling the print. It sounds like your problem could be some sort of resource gap, and your debug prints are probably making it worse since it usually takes a good amount of time.

    For the TI 15.4-Stack we recommend polling/reporting intervals around 10 s, if you are using a much shorter interval you could be seeing some sort of buffer overflow.

  • Hello Marie,

    Thank you, I will give it a try with switching the poll rates. The affected devices are running on a time set under 10 seconds, so it is possible. However, it will probably take quite some time to do this.

    When you say that it could be a resource gap, are you referring to overall shared resources in the device (clock cycles for timing, shared memory etc) or specifically for non-exposed resources used by the proprietary Sub-GHz stack? When looking at an affected device during debugging no resource seemed to be exhausted, quite the opposite actually.

    Regards,

    Felix.

  • Hi Felix,

    I mean in the application or in the TI 15.4-Stack.

  • Part Number: LAUNCHXL-CC1352R1

    Hello,

    This is a continuation of the post I made a while back that never was concluded:

    e2e.ti.com/.../launchxl-cc1352r1-no-call-to-scan-confirmation-pscancnfcb-after-scanning-for-beacons

    Unfortunately that post got locked for some reason while the suggested changes were tested, which as I wrote in that post takes a lot of time to do. I can now at least confirm that those changes do not resolve the issue. Our devices still end up in a bad network state where the Sensor device's network stack just decides to no longer generate any callback on network connection attempts. What was suggested in the previous post was to increase the polling rate of the devices to "around 10s", which we adjusted to 10s. This does not fix the problem and devices that occasionally switch coordinator will eventually end up with a network stack that still recognizes coordinator responses when other Sensor devices request to join the network (i.e. it still can receive Sub-GHz packages), but does not generate any requests on its own or call any end of scan callbacks (just as described in the linked post).

    We are very much interested in fixing this issue, as it is affecting the quality of our product. I would like to give more information and/or pursue the issue further down the callstack, but as I stated in the previous post TI's network stack is not available for compiling symbols as far as I know.

    Regards,

    Felix.

  • I will assign someone to follow up.

    Siri

  • Hi Felix,

    I joined the threads together so we have everything in one place.

    Thank you for doing the test. From your post it sounds like it doesn't make a difference, e.g. it does not take longer to reproduce the issue when you use a 10 s polling rate?

    Did you go any further in trying to make a minimal example for reproduction?

  • Hello Marie,

    Yes, I can not with certainty say that increasing the poll rate to 10s delays the inevitable network stack problem. It has always been very sporadic and dependent on the overall environment the devices have been placed in. Devices in the harsh networking environment that we work with can show the problem after just a couple of days, others after a couple of months.

    Again, unfortunately I do not have a minimal example for reproducing the issue, as it is not clear to us what causes the problem due to not being able to follow the code execution past the API level of TI's network stack. As much as I would like to provide one, I have do not have the time to spare to identify the issue through trial and error (again, it takes up to several months to find out if the example code would even capture the issue or not, let alone rewrite the entire repository due to not being able to give away company production source code). As I wrote in the earlier posts we are using a custom implementation for interacting with the network stack and not one of TI's standard examples.

    Honestly, all I really want to is to be able to trace the network callback stack to try and identify where the issue is located. I think that is a far more feasible alternative. Is there someone I can contact to try and achieve this?

    Regards,

    Felix.

  • Hi Felix,

    I see what you mean. As you know we don't provide the source code in the SDK and I can not provide it to you.

    I'm not sure how to move forward here. Can you update to the newest SDK (5.10) and retest?

  • Hello Marie,

    I accidentally pressed the resolve button. I do not know how to undo it.

    I will try to update the stack to the 5.10 version at try again. If by any luck the issue is fixed there.

    This will however again take a couple of months. I will return with the results then.

    Regards,

    Felix.

  • Ok, let me know how it goes.

  • Hi Felix,

    Did you have time to do a test?

  • Hello Marie,

    Unfortunately not. There is a lot going on right now, but I still hope to be able to attempt a switch at some point.

    Regards,

    Felix.