This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC1352R: FLASH reads / writes leading to memory corruption / system failure

Part Number: CC1352R
Other Parts Discussed in Thread: SYSCONFIG, SYSBIOS

Hello,

This is a follow up to this thread.

In that thread, a severe problem with the FLASH drivers or hardware came to light. A solution was proposed, but I am not convinced.
Currently I have a system that runs for over 2 weeks when I disable all NVS_write() and NVS_read() calls. If I enable them again, the system will crash within 1 hour.

Whereas in the previous thread the problem was observable in a printf(), in this failing system it is just shown by the failing system itself. I see no difference in output (and I am no longer using printf, but UART).
Another difference is that in the system described in previous thread, designed to isolate the problem, the NVS calls were in the main function, so before BIOS_start(). In this new system, the calls are in actual running tasks.

Yet, they still cause a system failure. All arrows are aimed at the NVS driver / FLASH driver / hardware. Disabling all tasks using NVS: system runs for weeks. Enabling those tasks but commenting out the NVS calls: system runs for weeks. Using NVS drivers: system fails within an hour.

I would very much appreciate more insight, as I can not move on to the production phase of our new product with these problems still present.

Thank you.

--------------------------------------------------

SDK Version: 3.20.0.68, did not upgrade yet as it is a large codebase and I am not convinced that upgrading the SDK will solve the problems without knowing which changes were made.

  • If you are not able to update the entire SDK you can update the NVS driver only. This can be done by copying the source of the NVS driver into your CCS/IAR project.

  • Hi Severin,

    Thank you very much for your quick reply.

    I have ran a diff on NVS files in the C:\ti\simplelink_cc13x2_26x2_sdk_3_20_00_68\source\ti\drivers and C:\ti\simplelink_cc13x2_26x2_sdk_3_20_00_68\source\ti\drivers\nvs folders. However, there are no differences except the renaming required for using SysConfig.

    So I do not think that this is the solution? That is why I would like more clarification on what has been changed, and why the problem would now be fixed. More information on the actual cause would be even better.

  • Hi,

    Let's continue here the conversation we started in this thread.
    I ran new tests this morning regarding your NVS issue.
    I am now wondering if I reproduced the right issue. In fact, the "issue" I produced was due to the display driver, not to the NVS driver. In short, between SDK 3.20 and 3.40 the default size of the display buffers has been changed. In the SDK 3.20, they were smaller and that is why I could not display the totality of the buffer.


    I reviewed one more time your code and I found a couple of issues that I would like to point: 
    - I actually found that the printf() is no longer being mangled after commenting out: ICall_createRemoteTasks(). Can you verify on your side?

    - I also found that moving the memcpy() to the actual task function of the mangled printf() makes the problem disappear. Can you verify this too?
    Note: this does not necessarily mean that the problem has been resolved. Perhaps the bits are now being flipped in another location, outside the printf() string. (so we need to investigate further)

    - From the previous points, I am pretty sure you are running into a heap problem here. The problem might be caused by a kind of conflict between different heaps. I have noticed you are using malloc() and icall. However, icall uses his own heap... Now, an interesting test is to see whether malloc() call maps to the TIRTOS HeapMem module or to the libc version. These cannot coexist and may overlap.

    Let me know if this might help you.

    Kind regards,

  • Hi Clément,

    Thank you very much for your response and thorough investigation.

    As for your points:
    - Indeed, the printf() is no longer mangled after commenting out ICall_createRemoteTasks(). (But perhaps it is still corrupting other memory)

    - I observe the same, and I agree with your remark. In fact, I think that indeed the problem is just happening where we cannot see it. This is backed up by my experiment with disabling entire tasks that use NVS vs. just disabling NVS in those tasks. In both cases, the system works fine. If I enable NVS again, it fails quite quickly.

    - I had not thought about this, I really appreciate your feedback!
    When I use the CCS "open declaration" functionality by right-clicking a malloc() call in my code, it shows that it is implemented to call ti_sysbios_rts_gnu_MemAlloc_alloc(). In turn, that function calls xdc_runtime_Memory_alloc(), which I believe uses the default heap?

    In my .cfg file, i have these settings active:

     var HeapMem = xdc.useModule('ti.sysbios.heaps.HeapMem');
      var heapMemParams = new HeapMem.Params();
      if (HEAPMGR_CONFIG === 2)
      {
        heapMemParams.size =  HEAPMGR_SIZE;
        Program.global.HEAPMGR_SIZE = HEAPMGR_SIZE;
      }
      else
      {
        // if you get an undefined error for the symbol bellow it means that AUTOHEAPSIZE has been defined in the application.
        //
        heapMemParams.usePrimaryHeap = true;
        HeapMem.primaryHeapBaseAddr = "&heapStart";
        HeapMem.primaryHeapEndAddr = "&heapEnd";
        Program.global.HEAPMGR_SIZE = 0;
      }
    
      var tempHeap = HeapMem.create(heapMemParams);
    
      var HeapTrack = xdc.useModule('ti.sysbios.heaps.HeapTrack');
      HeapTrack.common$.diags_ASSERT = xdc.module("xdc.runtime.Diags").ALWAYS_ON;
      var heapTrackParams = new HeapTrack.Params();
      heapTrackParams.heap = tempHeap;
      Program.global.stackHeap = HeapTrack.create(heapTrackParams)
    
      var HeapCallback = xdc.useModule('ti.sysbios.heaps.HeapCallback');
      var params = new HeapCallback.Params();
      params.arg = 1;
      Program.global.heap0 = HeapCallback.create(params);
      HeapCallback.initInstFxn = '&myHeapTrackInitFxn';              // Call First When BIOS boot. Initialize the Heap Manager.
      HeapCallback.allocInstFxn = '&myHeapTrackAllocFxn';            // Call for allocating a buffer
      HeapCallback.freeInstFxn = '&myHeapTrackFreeFxn';              // Call for Freeing a buffer
      HeapCallback.getStatsInstFxn = '&myHeapTrackGetStatsFxn';      // Return Statistic on the Heap.
      HeapCallback.isBlockingInstFxn = '&myHeapTrackIsBlockingFxn';  // Return TRUE: This heap is always blocking ('Hwi Gate' like )
      Memory.defaultHeapInstance = Program.global.heap0;

    I would assume that HeapMem is being used, but perhaps HeapTrack also has an effect?

    EDIT: I have also tried with the simple_peripheral_app.cfg set to HEAPMGR_CONFIG = 0x80 and HEAPMGR_CONFIG = 0x81, but no differences.

  • Hi again,

    Mmmh... I don't trust CCS [he can sometimes be fooled, especially with this type of functions]... Can you try to step-by-step into the malloc() call? I want to be 100% sure of which code is executed there. [to do so, use the debugger, set a breakpoint at a malloc() function, run the program as usual. When you hit the breakpoint start using step-by-step mode]

    Another interesting move would be to comment only the calls to the NVS driver (eventually replace them by a couple of dummy functions). First you can comment only the NVS_write() and then only the NVS_read(). It should give us some interesting elements too. 

    Regards,

  • Hi,

    I followed your advice. Though I do have one remark: I have a test system running for over 3 weeks without problems, that uses malloc(), and also has the ICall_createRemoteTasks() call. So I am not sure if it is a heap problem.

    - Stepping into the malloc() call gives the same result as the open declaration functionality (in this case at least): It goes to the ti_sysbios_rts_gnu_MemAlloc_alloc() function. When stepping into this function, I can not step into the xdc_runtime_Memory_alloc() call. That call is made with NULL for the heap instance, so it should use the default heap instance.

    - The system I described above that has been running for weeks without problems does only 1 NVS_read(), in which NO bytes are copied. It does many more NVS_writes(). So the problem really looks to be in the reading (perhaps + copying) data from FLASH?

    I have done some more experiments (code included below) that gave interesting results:
    - When using malloc() and free(), the print shows that the total free size reduces to 0x6710.
    - When using ICall_malloc() and ICall_free(), the total free size reduces to 0x6718 (Different from above, and likely therefore that using e.g. ICall_malloc() with a normal free() will result in an exception, even though they use the same heap. But this is unlikely to be related to our problem, and more to the difference in implementations)
    -
    The printf() is being mangled with both. (Note that to see this happening, the printf() with the total free memory sizes must be commented out)
    - They are using the same heap, as I am using the same function to read the default heap total free size, and we can see it being affected in both cases. (But with different values)
    - Allocating more bytes than what we are copying from FLASH has no effect.
    - Reading (copying) from FLASH to heap only mangles the printf() if we are copying a lot of data (like 0x1000 bytes). It does not happen for e.g. 0x100 bytes. (But perhaps the memory just shifted so we can not see it happening anymore)

    int main()
    {
      /* Register Application callback to trap asserts raised in the Stack */
      RegisterAssertCback(AssertHandler);
    
      Board_initGeneral();
    
      // Enable iCache prefetching
      VIMSConfigure(VIMS_BASE, TRUE, TRUE);
      // Enable cache
      VIMSModeSet(VIMS_BASE, VIMS_MODE_ENABLED);
    
    #if !defined( POWER_SAVING )
      /* Set constraints for Standby, powerdown and idle mode */
      // PowerCC26XX_SB_DISALLOW may be redundant
      Power_setConstraint(PowerCC26XX_SB_DISALLOW);
      Power_setConstraint(PowerCC26XX_IDLE_PD_DISALLOW);
    #endif // POWER_SAVING
    
      /* Update User Configuration of the stack */
      user0Cfg.appServiceInfo->timerTickPeriod = Clock_tickPeriod;
      user0Cfg.appServiceInfo->timerMaxMillisecond  = ICall_getMaxMSecs();
    
      /* Initialize ICall module */
      ICall_init();
    
      /* Start tasks of external images - Priority 5 */
      ICall_createRemoteTasks(); // NOTE: If this line is commented, the printf() is left intact
    
      ICall_heapStats_t ICallheapStats;
      ICall_getHeapStats(&ICallheapStats);
    
      uint32_t memBefore = ICallheapStats.totalFreeSize;
    
      uint8_t *pInitReadBuffer = malloc(0x1000);
      if (!pInitReadBuffer)
      {
        printf("Could not allocate memory (%s:%d)\n", __FILE__, __LINE__);
    
        return -1;
      }
    
      memcpy((void *) pInitReadBuffer, (void *) 0x48000, 0x1000);
    
      ICall_getHeapStats(&ICallheapStats);
      uint32_t memDuring = ICallheapStats.totalFreeSize;
    
      free(pInitReadBuffer);
    
      ICall_getHeapStats(&ICallheapStats);
      uint32_t memAfter = ICallheapStats.totalFreeSize;
    
      printf("0x%X -> 0x%X -> 0x%X\n", memBefore, memDuring, memAfter);
    
      //SimplePeripheral_createTask();
      gprs_createTask();
      /* enable interrupts and start SYS/BIOS */
      BIOS_start();
    
      return 0;
    }

  • Changing the program so that pInitReadBuffer is now a stack instead of a heap variable results in a un-touched printf().

    Whether this means that you were right and the problem is indeed heap-related, or that the corruption is now simply somewhere else, I can not tell.

  • Hi,

    Do you have more results? Have you done more tests? What can I do to help you further?

    Best regards,

  • Hi Clément,

    Unfortunately, I do not have more results. I am out of ideas of how to find the source of this weird behavior.

    Suppose someone wants to read in a big configuration that is passed to the tasks during their creation, one would need to read in a big amount of data from FLASH. Hence the problem would show.
    The alternatives of doing it on the stack or waiting until all tasks have started (assuming that us not seeing the problem after NVS calls in tasks means it is not there) and then letting them individually get the configurations are not very desirable. The first would mean reserving stack (Program.Stack in .cfg file) which does not need to be reserved after startup anymore. The second would mean a lot of overhead, either because the functions all read the entire configuration (so also for other tasks), or because separate getter functions need to be made (more code and FLASH reads).

    The problem can we worked around, but not without sacrifices. As I am using the available RAM and internal FLASH quite heavily already, I would rather not make these sacrifices.

    Thus, I would like to read from FLASH to heap once, during startup.

  • Hi,

    Understood.

    Let's now focus on the NV memory itself this time. Are you using BLE pairing / bonding? Is it possible you are having conflicts between the pages reserved by the BLE stack and the pages you are using?

    Regards,

  • Hi,

    Thank you for your quick response.

    I am using the Simple Peripheral base, but no active connections are made. It is advertising though.
    The FLASH pages I am using are placed after the flashBuf / NVS_REGIONS_BASE.

    I did two small tests in which I read from a FLASH address a couple of pages higher or lower, the results are the same.

  • Hi,

    Can you provide me again the code snippet you are using to access (read and write) the FLASH?

    Regards,

  • Hi,

    I have one  more remark on your project. In your board file (IPC_V2.c), you are defining five flash regions. The first flash region is the one used by the BLE stack. The four other regions are (in theory) used by your code. Now if I go inside the file nvoctp.c, I see you are opening and using the BLE's NVS region [in other words, you call NVS_open with 0 as first parameter].

    Maybe this is only true in the test project you sent me. But can you verify you are calling NVS_open() for all the flash regions you are using? Can you also verify you are not trying to read/write on non-opened NVS regions?

    Best regards,

  • Hi Clément,

    Thank you again for your feedback!

    In my real project I am either opening the NVS regions in the initialization functions called in main() (before BIOS_start()), or in the init() functions of tasks. So not all NVS regions are opened when I start accessing some, but the ones I do try to access are all opened.

    Furthermore, I never close NVS regions.

    Also, I am calling NVS_init() before all NVS_open() calls. So one NVS_Init() per NVS_open(). Maybe this is not the right way, but in the example that is demonstrating the problem, I do not do this.

    Inspecting the driver source code shows that NVS_init() and NVS_open() do not really do anything to the FLASH memory, they mainly setup semaphores or check that the provided parameters are valid.
    In my minimal example, this is skipped, but the problem is there. If I do use the NVS driver, the problem is still present.

  • Hi,

    From your previous message, the way your are using the NVS driver seems not the root cause of the problem.

    Have you tried using the ROV on your project? Did you get any result out of it? (You can consult the BLE Stack's User guide's chapter dedicated to debugging for details)

    To finish, I wanted to be back on one of your previous messages:

    user5943842 said:

    Changing the program so that pInitReadBuffer is now a stack instead of a heap variable results in a un-touched printf().

    - Have you done more test and could conclude if there is or there is not any memory corruption after having done this?

    - If you still have some memory corruption could you find at which address it occurs? Is it still the same address?

    - If the memory corruption has completely disappeared, can you verify if the amount of available heap is necessary?  

    Best regards,

  • Hi Clément,

    Thank you again for your inquiry into my issue.

    I have been using ROV extensively throughout the development process. It is a very nice tool, but unfortunately it does not show me anything that I think is related to the problem. (No errors, overflows, leaks, hangs, exceptions, etc.)

    More extensive testing with using the stack instead of the heap, also on my actual system, shows no failures anymore. Of course that does not mean that the fault has been resolved, but it does look promising.

    I do not understand your last bullet point.

    As for now, I cannot sacrifice a lot more time to solve this problem. I will just sacrifice some stack memory and try to compensate for it elsewhere.

  • Hi again,

    Sorry, my last bullet was unclear... and I now think you have already tested it.
    Translation of my last bullet: "As the issue has disappeared by moving the variable from the heap to the stack, I am wondering if the issue was caused by an undersized heap size. In other words, if you put back the variable in the heap and increase the heap-size, what is happening?"

    user5943842 said:
    More extensive testing with using the stack instead of the heap, also on my actual system, shows no failures anymore. Of course that does not mean that the fault has been resolved, but it does look promising.

    Then, should I consider this issue as closed from my side?

    Regards,

     

  • Ah, yes I tested that.

    The problem of reading from FLASH to heap is still unresolved, I just worked around it now.
    If it is important for your record I will mark this thread as resolved, but of course that is not really the case.