This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC1310: CC1310 stability of radio core on long runs

Part Number: CC1310

Hello,

we have mass-manufacturing of sensors based on CC1310 for almost 3 years now. All of them are battery-operating and running continiously without restart and user interaction for years. 

They are based on the Easylink rfWsnNode project with big modifications to both logic of operation and radiopart (we are only transmitting sensor data, w/o waiting for an ACK or any other data from the collector, some kind of simple broadcasting). Both of the cases described below are based on older version of SimpleLink (most likely 2.10 or 2.40)

After some time I have noticed that some sensors stoped sending data. Investigation of these cases is always difficult since devices are installed at customer sites in different countries. I would describe two situations that I saw on my own:

-  one device out of several hundreds from the same batch, installed at the same time stoped sending data after approximately a year. I have a feature that allows to restart device without physical interaction with the board, by putting strong magnet close to device for 20 seconds. If magnetometer values exceeds some limit we are performing SysCtrlSystemReset(). And exactly after 20s device restarted and started sending messages again correctly. 

- two devices from the other batch with other firmware version stoped sending after several months. Other devices from same batch still working fine. I had physical access to these boards and power consumption on both of them was inside the limits for this firmware version. So main core was operating normally. Power reset returned their operation to normal, radio started to work. 

So for me both of these cases look the same way. Main core operating normally but radio core is stuck

Now I need your suggestions on how to improve stability on long runs of radio transmittions so that device will work for 5+ years without modifications and restarts.

I am planning to upgrade to latest SimpleLink 3.20 since I saw several fixes for the Easylink (such as EZLINKPROP-544 for example). Will it improve the situation? 

I am also thinking about calling EasyLink_init() on sending of every message, not only on start in NodeRadioTask.c since it has RF_close(rfHandle) to restart the radio core. Is this a good idea? 

What other ideas can you propose to improve stability on long runs? Is my thought of stucked radio core correct and what can actually cause it? 

Looking forward for your ideas. Best regards,

Max

  • I am afraid that it is not possible to say anything about what would be a good "fix" when we do not know anything about what the problem is.

    You say that the radio stops transmitting, but how do you know that? Because you do not receive anything, or because you have some way of verifying that it does not transmit at all?

    Have you debugged your code to make sure that it handles all possible error cases you can think of?

    The devices that stop working, are they only implementing TX functionality, or do they do RX as well? If they do RX also, have you made sure that the code take care of all possible error situations (CRC errors, addressing errors, length errors, buffer overflows, queue handling etc. ?)

    Siri

  • I know it is hard to say where the problem is. I was more interested in hearing best practices from TI side about using Easylink and RF API. I know that I am not the only one who has troubles with RF core (e.g. here)

    Yes, I know that radio stoped sending because I don't receive any message from the device. And I am also sure main core was up and running properly. 

    I don't know what exactly to debug and how to simulate RF core being stuck (maybe you can give an idea?)

    Those devices only transmit, they don't receive anything. There is also watchdog for the main core being set up. 

    One more specific question from my side. Is it ok to perform RF core restart (by RF_close() and other commands) before sending each message (say we are planning to send more than 1 million messages during sensor lifetime)? Do you see any problems with that which may occur later (for example if RF core on start reading something from flash which has limited amount of read/writes or something like this)? 

  • I have talked to the designer of the RF Driver, and an RF_close will not clean up anything extra (if hte RF Core are in a bad state), so you will not gain anything from doing RF_close, RF_open, compare to just let the power driver turn off hte RF Core when not in use.

    Are you using any kind of sceduling of your TX command so that the problem could be related to this, or are you using TRIG_NOW?

    Is there any chance that your application under certain circumastances will write to locations it is not suppose to write to, and then by mistake overwriting any of your radio commands so that they will fail?

    Unfortunately I am not able to give you any advise before we know more about where your code is failing. You say that it does not send, but we do not know if this is because thte TX command was never sent, or because it was sent but never returned, etc.

    Have you measured the current consumption on the failing device to figure out what sate it is in?

    Since this is failing in field and you have not been able to debug the problem, how can you be sure that it is the RF part that fails and not some other part of the application? How can you confirm that the application sends a TX command with the correct configuration, and that it is this command that fails?

    I am sorry that I could not be of any more help at the moment.

    BR

    Siri

  • Please read my original post carefully. 

    I am sure that main application is running normally because it reads data from sensor and performs as expected. (E.g. device made a complete restart on detecting big magnet nearby). After that restart (using SysCtrlSystemReset();) radio started to operate normally and operates up until now. 

    Also, as I wrote, on other two devices with same behaviour I was able to measure current consumption and it was normal for standby mode (around 60uA including sensors and other peripherals). So I am really sure that main core operates as expected and is not stuck.

    No, I don't think overwriting is possible, I am not using any dangerous techniques as memcpy() or NVS. 

    absTime is set to 0 so it is TRIG_NOW

    Ok, so if the RF_Core is in the bad state what will it return on the EasyLink_init()? I assume it wouldn't be EasyLink_Status_Success, right? So will this bit of code help to recover device by full restart in case RF_Core is stuck. Assuming I will call it in the beginning of each TX event. Is it safe to do so, reinitialising Easylink on every message? 

                /* Reinitialize EasyLink */
                if(EasyLink_init(&easyLink_params) != EasyLink_Status_Success){
                    SysCtrlSystemReset();
                }
    

  • The EasyLink_init runs the setup command and the FS command, but there are no checking if they are OK or not before returning EasyLink_Status_Success. The success status indicates that the PHY you have selected is supported etc. but does not check if the commands are OK or not. I guess that the CMD_FS could fail, but that again will lead to an failing of the TX command. However, if the synth would fail, this would most likely happen just once, and then the next time you run the FS command, it will be OK again.

    A current consumption of 60 us indicates that the RF core is off, hence it is not the RF core that are "hanging" in any wrong state.

    Siri

  • Ok, thanks for the info