This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC1352R: Device suddenly stopps BLE communication

Part Number: CC1352R
Other Parts Discussed in Thread: BLE-STACK, SYSCONFIG

Hi all,

in an industry project we currently face a very critical bug on BLE communication. We use TI SDK 4.10.0.78

Prior to the bug occurance, the device is performing some measurements, while not being in a connection. Then a connection is established and the device sends the measured data upon command to the master device.

Most of the time, there is just no problem with it. But in very rare situations, the device suddenly stops communicating. In Wireshark I can see lots of "empty PDUs", according to connection interval. When the communication breaks up, I can clearly see, that our device simply stops answering to the masters connection events - without any clean termination procedures. After the configured supervision timeout, the master considers the connection to be lost.

When I then start a debugging session on the running device, I can see, that the stack still considers the connection to be active - BLE stack does not recognize the dead connection.

We implemented a monitor function, to track for lost connections, by observing the connection events. Once we have not seen any connection events within a certain time period, we consider the connection to be lost.

When this case happens, we call GAP_TerminateLinkReq, which returns 0x0 = SUCCESS. Then afterwards, we call GapAdv_enable, which returns 0x18 (bleInvalidRange).

Accordingly, we cannot activate advertising, and our device is no longer connectable.

What is strange about it, is that we always call GapAdv_enable with the same parameters. Before the bug happens, the returen status always is 0x0 SUCCESS. Only in this single rare case, it does not succeed. There may be hundreds of calls to GapAdv_enable before, which all succeeded.

Except from the BLE-stack, which seems to be confused, the device is working just fine. Our RTOS-tasks are running with no troubles.

From the SDK Release Notes, we have not found a known bug, that may lead to this behaviour. Does anybody have an idea, of what may be the issue here?

  • Hi Gabiel,

    Do you have the possibility to test with the latest 6.20 SDK, to see if the issue is still present?

    Regards,

    Arthur

  • Hi Arthur,

    I will give this a try. As the bug shows up pretty infrequently, this may take a while, to come to a certain result on this.

    I'll come back to this, when I have the result.

    Thanks for your support!

  • Hi Arthur,

    It took me pretty long now, to do the update, quite a bit has changed since the SDK 4.10.

    What I did:

    - updated SDK 4.10.0.78 -> 6.20.0.29

    - changed Compiler TI v20.2.0.LTS -> TI Clang v1.3.1.LTS

    - updated TI-RTOS6 -> TI-RTOS7

    - translated SYS/BIOS config XCONF -> SYSCONFIG

    - changed paths, pragmas etc...

    - created task dynamically, that were created statically before

    Now, the code can be built and runs on our device.

    From ROV, I can see, that all tasks, events, mailboxes etc. seem to be fine.

    After some struggle with SPI (suddenly you have to configure SPI pins in GPIO section as well, no longer only in SPI section), the device starts up pretty nicely.

    BUT... It does not advertise Disappointed

    I can see, that the GAP_DeviceInit() returns SUCCESS, which looks good. Some ticks later, the GAP_DEVICE_INIT_DONE_EVENT comes up, and triggers GapAdv_create. This in turn feedbacks 0x13 = bleMemAllocError.

    Needless to say, that subsequent calls to GapAdv fail due to 0x34 = bleGAPNotFound.

    I have tried the example project simple peripheral OAD offchip.

    Here also, GAP_DeviceInit() returns SUCCESS. But GAP_DEVICE_INIT_DONE_EVENT never appaers, so with this I can't see any advertising as well.

    Is there any known issue on this?

  • Hi Arthur,

    again I am a few steps further...

    It seems to me, that icall is not really heapTrack enabled with TI-RTOS7.

    There are some predefined syms in the example project, that look like a quick workaround for symbols that would be defined in app_pem4f.h for old RTOS.

    By default, icall will use osalheap, which overflowed in my program.

    Telling icall to use heapTrack, as it was prior to SDK update, led to compile time errors, as rtos_heaptrack is not suited for TI-RTOS7.

    After bending this a bit, I am now able to run icall on heapTrack, as required.

    What I did for this:

    • In icall.c, line 365: changed default heap to heapTrack (#include <rtos_heaptrack.h>)
    • In rtos_heaptrack.h:
      • only include xdc/cfg/global.h, when TIRTOS7_SUPPORT is not defined
      • changed datatype of Memory_defaultHeapInstance to IHeap_Handle
      • use Memory_defaultHeapInstance instead of stackHeap

    Now, the device advertisies - pretty cool.

    The Advertising data is same as it was prior to SDK update.

    BUT... next problem :( I cannot connect to the device (it is set connectable). Approx. 4 out of 5 connect attempts remain completely unrecgnized to application layer - GAP_LINK_ESTABLISHED_EVENT is not posted.

    When the event eventually pops up, only a short time later the app receives a HCI_BLE_HARDWARE_ERROR_EVENT_CODE event. What happens here?

    ROV does not help a lot here. "Scan for errors" results in "... no errors..."

    btw. speaking of ROV; I tried to watch the Stacks Graph - but it is always empty - also with example project. Is this expected behaviour with current SDK? Or are there any new configs to adjust, that I haven't found yet?

    Thanks for your answers.

  • some additional information;
    When GAP_LINK_ESTABLISHED_EVENT comes up, *pMsg looks good. All parameters inside the structure are valid and as expected.

    And regarding Stacks Graph view in ROV. Just to be clear - this has nothing to do with tasks operation. In Tasks view of ROV I can see, all tasks are running well. Also all user tasks receive events as expected. It is only the Stacks Graph view, that does not show up any information.

  • Hi Gabiel,

    Thanks for all the information, this will help, but as you see, this is a pretty complex issue to reproduce.

    Could you, by any chance, provide us a project reproducing the issue? (CCS preferably, IAR works too)

    Regards,

    Arthur

  • Hi Arthur,

    I have a modified example project ready for you to look at.

    How can I give the project to you, without publishing it into the www?

    Kind regards

    Gabriel

  • Hi,
    I will contact you by direct message on e2e. Regards,

    Arthur

  • Hi Arthur,

    thank you very much for your support.

    The bug reason is a unbalanced use of setting and releasing power constrains. When we release more constrains as setting then the bug will occur.

    I can reproduce the bug in a example project (simple_peripheral_oad_offchip_CC13X2R1_LAUNCHXL_tirtos_ccs) from the sdk.

    To produce the bug a clock does the setting and releasing of a power constrain. The First 10 times the setting is equal to the releasing of a constrain. After that it will do every time 1 set and 2 release constrain. After a short time the bug occurs.

    Here in this Graph you can see the count of setting (Test_SetConstrain) and releasing (Test_ReleaseConstrain) a constrain. The Test_RfClockOffCount will be increased every time when the RF_clkPowerUp has stopped. When the Test_RfClockOffCount is greater then 5 then the Test will stopp and the RF clock never comes active back again.

    The following Code changes are needed in the example project in the simple_peripheral_oad_offchip.c file:

    Add the following variables before the LOCAL FUNCTIONS:

    /* Variables to reproduce the bug */
    static uint32_t Test_SetConstrain;
    static uint32_t Test_ReleaseConstrain;
    static uint32_t Test_RfClockOffCount;
    static Clock_Params Test_ClockParams;
    static Clock_Handle Test_ClockHandle;
    static ClockP_Struct *Test_RF_clkPowerUpObj;

    Add this local function at the end of the LOCAL FUNCTIONS:

    static void Test_clockCb(UArg arg);

    Add at the end of SimplePeripheral_init():

    /* initialize the test variables */
      Clock_Params_init(&Test_ClockParams);
      Test_ClockParams.startFlag = true;
      Test_ClockParams.period = 1000 * (1000 / Clock_tickPeriod); //ms -> clock ticks
      Test_ClockHandle = Clock_create(&Test_clockCb, 5000 * (1000 / Clock_tickPeriod), &Test_ClockParams, NULL);
    
      Test_SetConstrain = 0u;
      Test_ReleaseConstrain = 0u;
      Test_RfClockOffCount = 0u;
      Test_RF_clkPowerUpObj = (ClockP_Struct*) 0x20003354; // check that address in memory browser

    Add at the end of the file:

    static void Test_clockCb(UArg arg)
    {
        if(Test_SetConstrain <= Test_ReleaseConstrain)
        {
            // set power constrain
            Power_setConstraint(PowerCC26XX_DISALLOW_IDLE);
            Power_setConstraint(PowerCC26XX_DISALLOW_STANDBY);
            Power_setConstraint(PowerCC26XX_DISALLOW_SHUTDOWN);
            Test_SetConstrain++;
        }
        else if(Test_SetConstrain > Test_ReleaseConstrain)
        {
            // release constrain
            Power_releaseConstraint(PowerCC26XX_DISALLOW_IDLE);
            Power_releaseConstraint(PowerCC26XX_DISALLOW_STANDBY);
            Power_releaseConstraint(PowerCC26XX_DISALLOW_SHUTDOWN);
            Test_ReleaseConstrain++;
    
            if(Test_SetConstrain > 10u)
            {
                // release constrain to occur a bug
                Power_releaseConstraint(PowerCC26XX_DISALLOW_IDLE);
                Power_releaseConstraint(PowerCC26XX_DISALLOW_STANDBY);
                Power_releaseConstraint(PowerCC26XX_DISALLOW_SHUTDOWN);
                Test_ReleaseConstrain++;
            }
        }
    
        /* check if the RF power up clock is activ */
        if(ClockP_isActive(Test_RF_clkPowerUpObj) == false)
        {
            /* rf clock stops, stop test */
            Test_RfClockOffCount++;
            if(Test_RfClockOffCount > 5u)
            {
                Clock_stop(Test_ClockHandle);
            }
        }
    
    }

    I hope you can also reproduce it.

    Our following questions are:

    1. Is there a possibility to detect a unbalanced use of setting and releasing constrains?
    2. Is there a possibility to reactivate the RF_clkPowerUp clock?

    Regards,

    Karl

  • Hi Karl,

    Thank you for the source code.

    About your first question:

    1. You can actually keep track of constraint counts, although it is not recommended, using the following structure, PowerCC26XX_ModuleState: https://dev.ti.com/tirex/explore/content/simplelink_cc13xx_cc26xx_sdk_6_30_01_03/docs/drivers/doxygen/html/struct_power_c_c26_x2___module_state.html#aea286d524dd4afd4b9a8bf2b8954e2c8

    There is also the option to modify the driver to create your own counters.

    2. I do not think we have this kind of granularity in the controls of the RF driver clocks when using the BLE5 stack. What you could try to do, if not done already, is a graceful reset of the device when such a problem occurs.

    Regards,

    Arthur