This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/CC3220MODA: Queries / Best practices on using Watchdog in CC3220

Genius 3100 points
Part Number: CC3220MODA
Other Parts Discussed in Thread: CC3220SF, CC3200

Tool/software: TI C/C++ Compiler

Team,

We have few queries regarding usage of Watchdog in CC3220 and these are necessary to build a stable system which recover always from a crash. Appreciate if you could help us understand these in details and please let us know if there are any best practises for using Watchdog in CC3220. The example works well, however we have a multi-threaded application and hence the implementation is a little complex.

  1. Is it advisable to start Watchdog in the main thread itself i.e from int main(void) soon after Board_initGeneral() and before initialising any other threads.
  2. If Watchdog has to be started in a separate detached thread, should it have the least priority (1) and is the stack size 1024 sufficient as shown in the example ?
  3. Is it fine to increase the timeout / reload interval of watchdog to 5 seconds ? I have other threads which loops around 3 to 4 seconds, hence 5 seconds seems to be optimal.
  4. Does the callback function always need a while (1) {} loop ?
  5. In our multi-threaded application, we use a kicker thread which checks the health of all other threads and when true clears the watchdog interrupt. Is this approach correct and if so, what should be the ideal priority for this thread?
  6. The watchdog.c example shows clearing interrupt twice i.e after Power_disablePolicy() and after Power_enablePolicy(). Is this necessary always. Could you please explain this usage ?
  7. Will the while(1) {} usage in the application (mostly used to handle errors) trigger the watchdog reset ?
  8. Is the watchdog guaranteed to do a full reset on CC3220 during a timeout ? Are there any additional considerations or steps to be followed ?
  9. Are there any cases when the watchdog might fail like due to an application error, memory issues or some h/w related troubles?

Really appreciate your help.

Zac

  • Team,

    Appreciate if someone can help me with this on priority.

    Regards,

    Zac

  • Hi Zac,

    First, I recommend that you will read chapter 10 of the TRM (http://www.ti.com/lit/ug/swru465/swru465.pdf).

    The watchdog example is not optimized for the CC3220 implementation (the example is shared between CC3220 and other SimpleLink devices which support other watchdog mechanisms).

    In CC3220 the callback (ISR) is invoked once the timeout expires. The watchdog will keep counting until a second expiration will occur and then automatically trigger the RESET.

    This mean that you can use the callback to clear the watchdog (after doing some self tests to verify that all the thread are functional). 

    Answers to your questions:

    1. it is good practice to protect you initialization code but you need to be aware in case the init sequences are longer than the timeout. If you use the callback to check certain state/flags in your code - you should prepare them in advance.

    2. Watchdog thread priority (if used) should be the lowest. The stack size is dependent on your implementation.

    3. You can increase the timeout up to 53.5 seconds (again, it will count the timeout twice until the RESET will occur). Of course the best value is implementation dependent (you need to make sure that the low priority thread gets invoked or the status maintained by the watchdog ISR is updated within the selected interval).

    4. No. In fact this is not optimized for the CC3220 watchdog mechanism (in this case if you set the timeout to 5 seconds, it will stay in the while loop for another 5 seconds waiting for the reset).    

    5. There are many methods that will work. The thread you mentioned can be the watchdog low priority thread. Alternatively, you can create a higher priority thread that run periodically (sleeping for less than the watch interval and then checking the status and clearing the watchdog). You can also use the ISR  (callabck) for performing the check and clearing the timer.

    6. This is just an example

    7. no

    8. after the 2nd timeout the MCU will reset.

    9. There are no issues that we are familiar with.

    Br,

    Kobi

  • Hi,

    Just a comment to point 9. I know at least two reasons how can watchdog failed:

    • In case of collapse of main clock. This can be simulated by touching around 40MHz XTAL at CC3220 LaunchPad. In this case will CC3220SF chip freeze "forever". But MOD is much less susceptible because XTAL is inside metal can.
    • Once is WDT at CC32xx devices started it cannot be deactivated. But WDT use main clock and this clock is gated by PRCM peripheral. That means clock tor WDT can be deactivated and this effectively deactivate WDT. But it is unlikely that you will do this unintentionally from software.

    Jan

  • Thanks a lot Kobi. Somehow despite having the watchdog set, in few systems WDT reset did not trigger as expected.

    My watchdog implementation is as follows - I am using one global variable for each thread which holds the last run timestamp(integer) and this timestamp will be updated based on some thread activity which runs periodically. A kicker thread checks all these global variables against some threshold and will clear the ISR only when the timestamps are valid. I realise that sharing variables among threads isn't safe due to read-write race condition, however since we are updating an integer timestamp, the probability of error may be minute and even if it happens it should either result in WDT reset or it should run fine in the next iteration (assumption though). So the remaining possibility is that the WDT may have failed or something abruptly went wrong with the timestamp variables (corruption or something).

    Please find my remarks below. Appreciate further help.

    1. So it means that the watchdog necessarily need not be started inside a detached thread, rite ?
    2. Ok. Since I am invoking watchdog from the main itself, I am not sure if the priority can be lowered. Is this acceptable ?
    3. Ok.
    4. Ok. My current implementation had while 1 {} and I hope this would not have any side effects.
    5. I need to perform some file operation during a watchdog reset and hence used a separate thread instead of WDT ISR. In my implementation I had set this to have maximum priority since I need the file operation to happen irrespective of other threads state.
    6. So I need not use Power_disablePolicy() and Power_enablePolicy() in my implementation rite ?
    7. Ok
    8. Ok
    9. Ok. 

    Regards

    Zac

  • Thanks Jan,

    Both conditions aren't expected in our scenario, hence I am unsure why it had failed.

    Regards

    Zac

  • What do you mean "in few systems WDT reset did not trigger"? what is the difference between the systems?

    Is the callback (ISR) invoked? 

    Did you check that CLEAR is not happening?

    The Watchdog is hardware based (as explained) and can be verified easily (e.g. using the example).

    Br,

    Kobi  

  • Hi Kobi,

    These are modules installed at customer locations and they have similar h/w and runs the same code. The watchdog is implemented in such a way that, if any threads fails OR if an mqtt connection cannot be established in 6 hours of retry, then watchdog clear will not be called which should result in an MCU reset. This was tested and worked fine during development.

    However 3 out of 50 units running in production did not make an mqtt connection for 4 or 5 days and they did not reset as expected and we aren't sure about what went wrong. We were also not able to get any logs from these systems and no issues appeared during our development. Hence we are analyzing the implementation to understand any anomalies in code. Appreciate your help.

    Could you please comment on the queries ?

    Regards,

    Zac

  • Hi Kobi,

    Appreciate if you can help me with my queries.

    Regards,

    Zac

  • Hi Zac,

    What queries? There are answers to all your queries, what is missing?

    Br,

    Kobi

  • Hi Kobi,

    Please find the queries below (continuation to the initial questions).

    1. So it means that the watchdog necessarily need not be started inside a detached thread, rite ?
    2. Ok. Since I am invoking watchdog from the main itself, I am not sure if the priority can be lowered. Is this acceptable ?
    3. Ok.
    4. Ok. My current implementation had while 1 {} and I hope this would not have any side effects.
    5. I need to perform some file operation during a watchdog reset and hence used a separate thread instead of WDT ISR. In my implementation I had set this to have maximum priority since I need the file operation to happen irrespective of other threads state.
    6. So I need not use Power_disablePolicy() and Power_enablePolicy() in my implementation rite ?
    7. Ok
    8. Ok
    9. Ok. 

    Regards

    Zac

  • Hi Kobi,

    Any update on the query?

    Regards,

    Zac

  • 1. The watchdog doesn't require a detached thread (you can work in the interrupt handler or use any thread context to reset the watchdog).

    2. I don't understand your concern about the context where you invoke the watchdog. The only thread-related concern is in regard to the context where the watchdog gets reset. Anyway you can create a lower thread for the watchdog in your main (there is no restrictions on the priority you set when creating a thread).

    4. it means that you basically take the decision to reset after X seconds (watchdog timeout), but than your app will continue running for another X seconds until it the actual reset is performed. This is un-necessary delay that can be handled better. If the while(1) will happen in high priority context, the system will get unresponsive until the reset occur (i.e. for X seconds). 

    5. this should work.

    6. if you want to use low power you will need to enable the power policy (at init). You won't need to move between disabling and enabling as in the example. 

    Br,

    Kobi

     

  • Thanks Kobi for the feedback. Our WDT routine did not work as expected for few modules, hence we wanted to ensure that watchdog usage is correct.

    The implementation of watchdog is done in such a way that the WDT triggers a reset if the mqtt connection cannot be established continuously for more that 4 hours. However few of our modules(3 out of 70 units in production) did not reset or recover even after 2 days of failed connection. There is no way to get logs from these modules and such behaviour isn't expected unless the WDT fails, which doesn't seems to be the case. These modules came back online only after we did a hard reset. We had tested the WDT reset routine multiple times during development and it always worked.
    Will the WDT resets both MCU and NWP completely? Something we have noticed recently during our debug is that, once the NWP did not recover after few WDT resets. i.e the reset routine worked, MCU is restarted and WLAN got connected, however it was unable to get a successful socket connection. The SNTP call failed, OTA call failed and the MQTT connection failed, even when there is internet connection. It finally worked when we did a hard reset using the reset button. Though this happened only once, this could be a reason for the issue I stated above. The WDT reset may have happened, however the NWP maynot have recovered completely which would have ended up in failed connection for days. Do you think this could happen or should we look elsewhere? Appreciate your help.

    Regards,
    Zac

  • Hi Kobi,

    Also if the NWP fails due to a driver abort or due to some memory issues, can it not be recovered by the WDT reset ? Or should we explicitly call sl_Stop(300) or manually reset using Platform_Reset() during any driver failure ?

    Regards,

    Zac

  • Hi Zac,

    In theory WDT should be used for problems you weren't expecting in advance. If you are monitoring a specific error, you can trigger the Platform_reset.

    It is good practice to reset the NWP (sl_Stop/sl_Start) when the MCU wakeup from WDT reset. I'm not sure whether the WDT resets the NWP.

    Br,

    Kobi

      

  • Hi Kobi,

    Yes, at CC3220 and CC3235 does ROM bootloader restart NWP (and MAC and Baseband as well) after recovery from WDT restart. This recovery issue was at CC3200 only. See TRM chapter 10.4.1.

    Jan

  • Thanks Jan!

    Br,

    Kobi

  • Thanks Kobi & Jan,

    I too read that CC3220 may not required an explicit NWP reset similar to the CC3200. So what could be the reason for our module failure? We are seeing more and more modules failing in production and this had become a major concern now. It could be the noise, a memory overflow or even some application bugs, however the WDT is suppose to help in recovery rite? 

    Would really appreciate if someone can look into this on priority and help us soon. We have already gone ahead with the product and its not easy to recall. We hope you would understand.

    Regards,

    Zac

  • Hi Zac,

    I don't know about any other potential issue which may to cause stuck of WDT, except two reasons are described at my previous answer.

    There may to be one additional reason. If I am not wrong, clock for WDT is gated in power save modes. Maybe something around power save that may to be wrong. It is hard to give you a reasonable advice if you are not able simulate issue at your desk. But I agree with Kobi. WDT should be used as last defence border and all expected issues you should solve inside your code.

    As to be honest, I don't fully understand design of your code around WDT. That design sounds me very complex and prone to potential implementation issues. But maybe it  is only my problem with understanding.

    Jan

  • Hi Jan,

    Can you please provide more info or scenarios on the second reason you mentioned i.e WDT uses main clock and this clock is gated by PRCM peripheral ?

    Also, will any of the power save modes interfere with WDT in CC3220? Could you please give more details?

    Thanks

    Zac

  • Hi Kobi,

    Do you think a stuck sl_Stop function may interfere with the WDT timer?

    I was reading the following thread and finds it similar - https://e2e.ti.com/support/wireless-connectivity/wifi/f/968/t/759892?tisearch=e2e-quicksearch&keymatch=CC3220:%20WatchDog%20did%20not%20trigger

    In our code, we are calling sl_Stop in case of any driver abort and at time some of the peripherals like the ADC would still be open. 

    What are the most appropriate step to be performed during a driver abort ? We have some files to be written and hence I hope an sl_Stop and sl_Start is necessary rite? 

    For any specific errors, should we call sl_Stop(with zero timeout) before Platform_reset?

    Really appreciate your quick support.

    Regards

    Zac

  • Hi Zac,

    2nd point is about disabling clock for WDT peripheral via driverlib PRCM API. It is generally about calling API like a PRCMPeripheralClkDisable(PRCM_WDT, PRCM_RUN_MODE_CLK). I don't expect that you will do this inadvertently. Also WDT at CC32xx devices have one feature which stall WDT in case you reach breakpoint during debug. This should not be a your issue as well.

    Power management driver of CC32xx devices is gating WDT during power save modes, because it simple make sense and it is mandatory for a proper function. You can look by yourself to a code inside \source\ti\drivers\power\PowerCC32XX.c and figure out how this driver works internally. Important is a line with MAP_PRCMPeripheralClkDisable() and structure PowerCC32XX_module with PRCM_WDT element. Theoretically there may to be some issue inside this driver, but I think this is very unlikely. Inside CC32xx SDK is some code which I don't use because I don't have trust to it, but this power management code is not one of them.

    If I will need to guess, I will suspect design of your code and using WDT. And why? It is simple. I was not able to fully understand design of your code. I wrote many firmwares for wired/wireless Ethernet devices where functional safety was key part of design. Your functional safety design is too much complex for me. Few advices from my side:

    • try to redesign your code to be WDT a last defence border and not a integral part of you recovery mechanism in case of issue which you can expect and solve by different way
    • try ask someone else outside your project (e.g. different department of your company) for a independent inspection of your code
    • review your code during decomposition process of all modules, take a time try to simulate your issue

    BTW ... topic about where you should call sl_Stop() before reset and where you should not call is very complex with many consequences. I'll leave the answer to Kobi, because at this answer I already wrote too much lines.

    Jan

  • Thanks Jan. We are using WDT as a last defence border and since there are multiple threads, it has become a little complex I guess. We have also removed all global variables (as mentioned earlier) and have used mqueue for the kicker thread. We have 6 threads which we want to safeguard and hence each thread is expected to give a heart beat to the kicker thread which ultimately clears the WDT interrupt (if all the threads are reporting at defined interval). The reason we have included a connectivity check(i.e no internet connection reported for few hours) to the kicker thread is because we had instances where the NWP failed due to unknown reason and there was no way to find them and recover such failures.

    Since you mentioned about the debug stall for WDT, do you think the WDT may fail in case this got enabled in production module (though no break points are expected)?

    WDT being a last defence, is expected to work reliably despite of any logic, memory, LPDS or even driver failures. However its really alarming that the WDT somehow failed in many instances. WDT is started in the main thread before spawning other threads and it looks like it is getting stalled in some cases (driver abort, sl_Stop, LPDS etc. which ideally should not have any interference).

    We now have 7 modules (7 customers) who have reported random crash despite having the WDT logic in place and this is a major concern. We would really appreciate if any issue with WDT can be identified and fixed with priority so as avoid further concerns. I do understand that without replicating the same in a debugger its hard to find the cause, however it may help if someone can re-check the WDT libraries and implementation.

    Appreciate all support and quick help.

    Regards,

    Zac

  • Hi Zac,

    In my application I use WDT configured from driverlib. I haven't seen ever any issue with CC32xx WDT. And I have very complex application with 18 tasks.

    At my previous post I talk about STALL feature of CC32xx WDT (please see TRM chapter 10.3.6 register WDTTEST and the bit STALL). But I think this should not be related to your issue.

    WDT is very simple peripheral and is very unlikely that it will failed somehow if have main clock enabled and you properly kick the WDT.

    BTW ... what is a exact place where you kick the WDT? It is your "supervisor" task or ISR form the WDT?

    Jan

  • Hi Jan,

    We are clearing watchdog from the supervisor task and we don't do anything inside the ISR.

    WDT open is the first task we do in our application and it is very unlikely that the timer initialization have failed. The only change we have is that the reload time is dynamically changed to 5 seconds soon after WDT initialization. This had also been tested.

    Does Platform_reset has any dependency on WDT? Should we stop WDT before calling Platform_reset ?

    We are really stuck at this.

    Regards,

    Zac

  • Hi Zac,

    We are clearing watchdog from the supervisor task and we don't do anything inside the ISR.

    OK, that is a correct way. Because clearing WDT inside ISR may to be a potential risk.

    Does Platform_reset has any dependency on WDT? Should we stop WDT before calling Platform_reset ?

    I am not aware about any dependency between rest and WDT. Platform_reset() function does internally call PRCMHibernateCycleTrigger() and this function does trigger short hibernation cycle. This is because for full SoC reset is mandatory to restart MAC and Baseband as well (not only application MCU and NWP). CC32xx WDT hardware is not possible stop once you start it. Only way is to cut off him main clock by the PRCM API.

    Yes, I understand that you have issue. But I am not sure how someone else could be able to help you. If you are not able to simulate issue at your desk, then is even more unrealistic that some foreign without your code and hardware will be able figure out what is wrong at your case.

    Jan

  • Thank Jan.

    I have been trying to trace the scenario when the modules crashed and it looks like the issue happened to those modules after a successful OTA. These device did come back online after the OTA, however failed after some time (especially when wLAN is disconnected for a while or when internet is not available). Moreover, these modules worked fine after we did the manual/hard reset. Does it mean that the Planform_reset along with the sl_Stop in OTA did not reset the SoC completely? or did it interfere with the WDT?

    Regards,

    Zac

  • Hi Zac,

    I don't have explanation for such behaviour. Especially in case you are using TI OTA code and OTA procedure was successfully done. Proper function of WDT is a integral part OTA update procedure, but I am not sure how this can affect your case somehow. Please wait for a answer from TI side. Maybe they will have any idea.

    Jan

  • When doing the TI OTA, after the MCU reset, the bootloader triggers the WDT to protect the validation of the new image. This behavior is based on the code in "Platform_CommitWdtConfig" - if the bootloader finds the "/sys/mcubootinfo.bin"  system file with valid configuration (of timeout) when device is powered up in OTA mode (PENDING COMMIT), it will start the watchdog.

    In our example (e.g. cloud_ota.c), once you decide to commit the new  image (check OtaCheckAndDoCommit() ), the "Platform_CommitWdtStop"  resets the WDT.

    This behavior may conflict with your usage.

    Br,

    Kobi

  • Hi Kobi,

    Will the OTA watchdog call - Platform_CommitWdtStop disable the WDT started in the application? We assumed that both of these uses different timers. 

    I am not able to find any documentation stating such issues. Could you please provide further details? If there is a conflict, then how should we use app WDT along with OTA watchdog?

    Regards,

    Zac

  • Hi Zac,

    I think with your discovery about OTA and last comment from Kobi, we are able move forward. I think that this can be rally reason of your issue. Because OTA use same WTD as you are using (there is no multiple WDT at CC32xx which are designed to be used from application MCU).

    Function Platform_CommitWdtStop() does internally call TI-Driver API PowerCC32XX_reset(PowerCC32XX_PERIPH_WDT) which internally call MAP_PRCMPeripheralReset(PRCM_WDT).

    From TRM:

    I think your next step should be test your WDT system exactly after OTA update. Just to be sure that you are at right way.

    Jan

  • Thanks Jan, 

    I guess this was the reason the WDT stopped. I don't have the development board right now and shall test this at the earliest.

    Meanwhile what could be a solution to this problem? I think both Watchdog's are necessary and some kind of logic has to be applied.

    Regards

    Zac

  • Hi Zac,

    I am not pretty sure. Maybe device restart or just configuring and starting WDT again. You can wait for answer from Kobi, or you can find best solution for you by yourself. Good think is that this can be easily simulated.

    Jan

  • Hi Jan,

    I just tested my WDT with OTA and confirmed that it was indeed the PowerCC32XX_reset(PowerCC32XX_PERIPH_WDT) call which stopped the application WDT and finally ended up in system failure. I think the same issue will happen to anyone who uses WDT with OTA and this needs a quick fix.

    Hi Kobi,

    Could you please look into this on priority and give us the best solution? I would also request you to add a warning in the OTA example as well as the WDT example that the WDT is shared and will interfere with the application.

    Regards,

    Zac

  • You can either initiate the WDT after the commit is performed or reset the MCU (again) after the commit (so it will wake up in operational mode).

    In both cases your WDT setting will be used. 

    Br,

    Kobi

  • Hi Kobi,

    Please give more details on the changes required. A code snippet would be really helpful.

    What happens when both WDT timers are started? The application WDT is set to run every 5 seconds where as the OTA WDT has 50 seconds.

    Appreciate if you could analyze this a little further and help us to add an elegant solution.

    Regards,

    Zac

  • Hi Kobi & TI Team,

    Appreciate if you can provide a solution to this at the earliest.

    Regards,

    Zac

  • Hi Zac,

    I think that Kobi already answered your question. After commit you can restart MCU or start WDT again. Choice is up to you.

    As I said before. There is only one WDT. OTA code use same WDT hardware as you are using.

    Jan

  • Hi Jan,

    The application sets the WDT interval to 5 seconds and I believe the OTA WDT will be impacted due to that. Starting application WDT after the OTA thread also has some cons. I would request TI team for a careful analysis and give a stable solution instead of some workaround which may break the system later.

    Regards,

    Zac

  • Hi Zac,

    I'm not sure what are your expectations. I've explained the WDT mechanism and the OTA support and now it is up to you to decide how to use it in your application.

    In addition to the previous suggestions you can also do one of the following:

    1. Disable the OTA WDT (by not writing the "/sys/mcubootinfo.bin" in Platform_CommitWdtConfig() -make sure the file is not already existing in your file system), but then you are risking that a corrupted image will not run (so no WDT will be triggered).

    2. Change the OTA WDT default timeout (50 seconds set by the OTA example, see in OtaInit())

    Br,

    Kobi

  • Hi Kobi,

    We wish you would understand how critical these issues are, and how they would impact a product's brand and presence in the market. The suggestion to disable the OTA WDT is not at all recommended for any product when you know that it may fail one day. We also haven't seen any comments from you explaining how the whole OTA works along with the WDT rollback. The user manual does not seem to have the WDT section explained for OTA process. Please let us know if we have missed something.

    We also don't understand how the system would behave(in a long run) when the WDT is started or opened twice (first by the OTA boot file and then by application). We choose TI based on its value and quality, and we are really worried about the number of issues we had come across so far (most of them are reported in this forum). As a startup, majority of the implementations are derived from the examples from TI (especially the OTA and provisioning) and would really appreciate if you could make them more complete, stable and documented. We do accept the fact that we failed to explore everything before using them, but our choices were limited.

    Though we had reported about the WDT failure, no one really pointed about the OTA WDT usage and it wasn't discoverable in any test or debug. We assume that there may be other companies who may even have this undiscovered. And by the time we identified the issue, more than 10% of our clients were affected.  

    We still believe that the best solution comes from the makers who knows the complete system and hence requested for a better approach. I hope you would understand the situation and help us.

    Regards,

    Zac

  • Hi Zac,

    I understand that this is critical and understand your frustration but I can only explain the mechanisms and the options you have.

    We are always trying to improve the examples and the documentation.

    I agree that disabling the OTA WDT is not recommended.

    I did suggested couple of alternatives that you can use. I don't know what else are you looking for,

    The OTA WDT is used when a new image is triggered (i.e. following a successful OTA download) . In case everything works, you should commit the update and then you can reset the OTA WDT and use your application WDT (or you can reset the MCU and the OTA WDT will not be triggered again).

    If you find something wrong with the image, you can trigger a reversion (or just reset the MCU). You can also wait for the WDT to expire and reset the MCU. In all those cases the previous image will be triggered following the reset (due to the fail-safe mechanism).

    Br,

    Kobi

     

  • Hi Kobi,

    I hope the OTA WDT is started during boot-up (if mcubin is present) before the application startup and what would happen if we start the application WDT again with another timeout (5 seconds in our case). Will the WDT gets updated to 5 seconds and will WDT reset the MCU to rollback the image as set by the OTA WDT?

    Also if we plan to reset the MCU after OTA commit, is Platform_Reset the command to be used? Should I also close the peripherals and call st_stop(with timeout) before calling platform_reset?

    Regards,

    Zac

  • The OTA WDT is started by the boot code before the application is started.

    I'm not sure what will happen if the app changes the timeout. I don't think the application timeout will impact the original OTA version (because the timeout value is loaded only when the WDT is started or when upon a timeout), but it is possible that something gets wrong during such configuration. Please try to avoid this initialization in case the device is in PENDING COMMIT state. You can poll the state using:

    int32_t isPendingCommit;
    int32_t isPendingCommit_len;
    int32_t Status;

    Status = Ota_get(EXTLIB_OTA_GET_OPT_IS_PENDING_COMMIT, &isPendingCommit_len, (uint8_t *)&isPendingCommit);

    Platform_Reset - should be fine. No other peripherals needs to be closed. To be on the safe side, please call sl_Stop before the MCU reset.

    br,

    Kobi

  • Hi Kobi,

    Sorry I deleted few posts to avoid confusion. We need few more clarification before we make changes for OTA WDT. Please correct us if any of our statements are wrong.


    1) How is the firmware rolled back upon OTA timeout expiration? Our understanding is that during the boot after OTA download, in case of OTA WDT timeout (2nd expiration) the MCU will be reset(WDT MCU reset) and since the IMAGE COMMIT is not done, on next boot the system will rollback to the old image. Will the isPendingCommit be false next time and does OTA WDT timer start again? It would be really helpful if you could explain how it works.

    2) Querying isPendingCommit using Ota_get(EXTLIB_OTA_GET_OPT_IS_PENDING_COMMIT, &isPendingCommit_len, (uint8_t *)&isPendingCommit) and then starting the app WDT seems to be fine. However, when should we start the OTA WDT (writing the mcubootinfo.bin). It is currently inside the OTA_Init() and OTA_Init() runs on each boot and at some periodic interval(daily once). Will this not result in OTA WDT to be started always i.e even if there are not OTA updates? And will this not interfere with the app WDT again?


    3) We had few instances where the sl_Stop inside OtaImageTestingAndReset got stuck and did not return because the ADC pins weren't closed. I am doubtful if similar issues may happen whenever sl_Stop is being called. Moreover the application WDT is disabled during the OTA process (as per the recommended approach) and it is critical that we perform a full SoC reset with 100% reliability. Hence closing the peripherals, invoking sl_Stop and then calling Platform_Reset seems to be safe for a complete reboot. You may disagree if this is not necessary.

    Regards,

    Zac

  • 1. This is correct. The isPendingCommit will be false after the device will wake up from the WDT reset (OTA WDT will not be triggered). The boot maintains an internal state that is updated in OTA. 

    the logic is basically the following:

    The state is set to IMAGE_TEST when you call sl_Stop after a successful download (i.e. when you write file with bundle protection flag).

    When the device wakes up in IMAGE_TEST it will trigger the watchdog and update the state to PENDING_COMMIT.

    Revert or Commit will set the state to OPERATIONAL. 

    At the next wake up, if the device is in PENDING COMMIT (e.g. due to WDT timer reset) it will update the state to OPERATIONAL (without triggering the WDT).

    2. Writing to mcubootinfo shouldn't impact the behavior as the OTA_WDT will be triggered only in a specific state (IMAGE_TEST). However, you shouldn't overwrite the file upon every init to prevent flash wear out.

    3, The PlatformReset only resets the MCU so this seems like a good practice. However, I'm not sure how the ADC  closure impacts the sl_Stop.

    Br,

    Kobi

  • Thanks Kobi,

    Regarding point 2) Is it necessary to write mcubootinfo during OTA_init? Can we do this once the image download is complete? Will it have other side effects?

    Regards

    Zac 

  • You can do it anytime before calling sl_Stop and MCU reset (after the image is downloaded).

    Br,

    Kobi

  • Hi Kobi,

    Writing mcubootinfo soon after successful download and before sl_Stop seems to work fine in our dev environment. I may not have tested all cases though.

    Was there any specific reason the mcubootinfo write was done inside ota_init in the sample code? The OTA check would be done frequently in most cases and will lead to flash wear out if not changed. 

    Regards,

    Zac

  • There was specific reason for that implementation. It is just an example.

    The mcubootinfo is just another file on the file system. You can write it anytime. It will only be referred by the bootloader upon MCU Reset.

    Br,

    Kobi