This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

BLE Bond Occasionally Not Stored

Other Parts Discussed in Thread: CC2541, CC2540

Hi TI devs,

We're having a recurring problem with bonding on a CC2541, and I was hoping someone might have some insight - I am stumped.

About 50% of the time, everything works as expected - an iOS client can connect and bond with our CC2541 and exchange data. If we turn the CC2541 off and on again, the iOS device can reconnect with the same bond keys.

The other 50%? The iOS device can connect and bond and send some data, but if it disconnects and attmempts to reconnect the bond is gone from the CC2541! Of course iOS is now confused because it is attempting to connect to a device with an encrypted channel, but the CC2541 has lost its half of the bond.

Once this happens, it seems like the bond never sticks on that CC2541 anymore until I re-flash the firmware back to stock. It's the same code, but suddenly bonds start working reliably again...until they don't. I also connect from a Python client, and have experience the same issue, so I don't think it has anything to do with iOS in particular.

Here's where we initialize the bond manager:

static void initialize_gap_bond_manager(void)
{
    // Parameter Values
    uint8 pairMode = GAPBOND_PAIRING_MODE_WAIT_FOR_REQ;
    uint8 bonding = TRUE;
    uint8 mitm = FALSE;
    uint8 ioCap = GAPBOND_IO_CAP_NO_INPUT_NO_OUTPUT;

    // If we are already bonded with a device, disable pairing with further
    // devices
    uint8 bondCount = 0;
    GAPBondMgr_GetParameter(GAPBOND_BOND_COUNT, &bondCount);

    if (bondCount != 0)
    {
        pairMode = GAPBOND_PAIRING_MODE_NO_PAIRING;
    }

    // Set parameters
    GAPBondMgr_SetParameter(GAPBOND_PAIRING_MODE, sizeof(uint8), &pairMode);
    GAPBondMgr_SetParameter(GAPBOND_MITM_PROTECTION, sizeof(uint8), &mitm);
    GAPBondMgr_SetParameter(GAPBOND_IO_CAPABILITIES, sizeof(uint8), &ioCap);
    GAPBondMgr_SetParameter(GAPBOND_BONDING_ENABLED, sizeof(uint8), &bonding);
}

and here's our pair state callback:

static void _PairStateCB(uint16 connHandle, uint8 state, uint8 status)
{
    if (state == GAPBOND_PAIRING_STATE_COMPLETE && status == SUCCESS)
    {
        log_message(LOG_INFO, LOG_TAG, "%s",
                "New device connected via pairing, setting up a bond");
        linkDBItem_t  *pItem;

        if ((pItem = linkDB_Find(connHandle)) != NULL)
        {
            if ((pItem->stateFlags & LINK_BOUND) == LINK_BOUND)
            {
                if(_connectionStateChangedCallback != NULL)
                {
                    _connection_state = BLE_CONN_STATE_BONDED;
                    _connectionStateChangedCallback(_connection_state);
                }
            }
        }
    }
    else if (state == GAPBOND_PAIRING_STATE_BONDED && status == SUCCESS)
    {
        log_message(LOG_INFO, LOG_TAG, "%s",
                "Previously bonded device connected");
        if(_connectionStateChangedCallback != NULL)
        {
            _connection_state = BLE_CONN_STATE_BONDED;
            _connectionStateChangedCallback(_connection_state);
        }
    }
}

I can post more sections of code, but honestly I'm not sure what would have an effect on the ability to store the bond - that's all inside the TI stack as far as I can tell.

Any advice would be most appreciated! Thanks.

  • Please try with following configuration:

    uint8 pairMode = GAPBOND_PAIRING_MODE_INITIATE;

  • I encountered a very similar issue about a week before you did. Since then I've been stumped in trying to intentionally reproduce the problem; it happens rather randomly when I'm not looking for it.

    I've discovered that triggering 'ERASE_ALL_BONDS' is sufficient to recover the unit. Fortunately, I have a UI hook to do that. So it doesn't need to be reflashed to be fixed.

    I'm now at the stage where I've a debugger hooked onto a 'bad' unit that keeps losing the link key after the disconnect. I've also demonstrated that the LTK is not getting stored onto the flash by doing a 'before' and 'after' comparison of the flash dump. I think I should be closer to a root-cause in a day or two. I'll update my original post when I do so.

  • Hi Roshan, thanks, your problem sounds very similar if not identical to mine and you seem better equipped to debug it than I do. I've forwarded both our posts to our TI rep to try and get some priority assistance with the issue.

    One correct to my original post, triggering 'ERASE_ALL_BONDS' is also sufficient for us to recover. We are unfortunately encountering what might be a separate problem, where even when we send 'ERASE_ALL_BONDS' the bonds do not actually get erased - that's why our "fix" for the moment whenever our BLE goes bad is to just reflash it.

  • A thing I learned over the past week w.r.t. iOS devices is that they randomize device addresses (gaaah!). So, even with a given iOS central and GAP_BONDINGS_MAX=10, it won't be long before the bonds[] array gets filled. Once the array is full, the next generated LTK will not make it to flash.

    Section 3.7 "The accessory should be able to resolve a Resolvable Private Address in all situations. Due to privacy concerns, the Apple product will use a Random Device Address as defined in the Bluetooth 4.0 Specification, Volume 3, Part C, Section 10.8."
  • Hi Roshan, Christopher,

    Sorry for the long delay in answering this.

    Even if iOS uses private address, what gets stored (as you have no doubt found out on your own by now) is the 'public' address and the 'IRK' of the iPhone.

    These are static for each device, or should be at any rate. So when it connects, the bond manager is called with the random address, it gets resolved, and the bond is found.

    I see that you identified the root cause in this thread as being there is a max number of bonds that can be stored, excess pairings are not stored when the bond table is full. So for my understanding of the above:

    - When you refer to "The iOS device", does that mean one of N iOS devices, or the same one? If the same one, you should not see this problem. Unless you "Forget this device" in the Bluetooth settings. Ref the IRK,PublicAddr paragraph above.

    - Your very last post seems to indicate a problem when reconnecting with the same iOS device. This should not be a problem at all. If it is, please let me know, and if possible please supply an air trace of initial and subsequent reconnections, using e.g. the TI SmartRF Packet Sniffer.

    Best regards,
    Aslak

  • Hi Aslak, thanks for the response.

    I haven't identified the root cause of our issues as being the same as Roshan's. It's one theory but I don't completely understand how it would cause the behavior we are seeing. I will summarize with a timeline of what I have recorded:

    1. Erase, program and verify the CC2541 with our firmware, using GAP_BONDINGS_MAX = 1 as we only ever want to connect to 1 device.
    2. Enable advertising on the CC2541.
    3. Connect and bond from an iOS client.
    4. Read GATT characteristics that required encryption, proving that bonding worked.
    5. Query for the bond count via an internal SPI connection from the CC2541 to another micro on our board - confirms 1 bond. 
    6. Disable advertising on the CC2541.
    7. Query again for the bond count - now it says 0 bonds.
    8. Try and reconnect from the same iOS client - fails to bond.

    To clarify "the iOS device" - we only ever want one device connected at a time. If we ever want to switch the device that's connecting, we need to trigger a factory reset of our board which erases the bonds using:

    GAPBondMgr_SetParameter(GAPBOND_ERASE_ALLBONDS, 0, 0);

    This issue with losing the bonds only happens with the 2nd, 3rd, etc device to attempt to bond with the chip. For example, the first iOS device to connect after erasing, programming and verifying has stable bonds. We then factory reset (i.e. GAPBOND_ERASE_ALLBONDS) and connect either from another iOS device or from the same iOS device but after activating "forget this device" from the iOS settings. We have only seen this failure occur with one of these subsequent bonding attempts.

    I can attempt to use the packet sniffer but I will have to order the CC2540 USB dongle, we don't have any.

  • Hi,

    Can you please try with 'gapBondMgrEraseAllBondings()' instead of 'GAPBondMgr_SetParameter(GAPBOND_ERASE_ALLBONDS, 0, 0)' ? I think it should work. Also you can try increasing the number of bonds to 2 from 1 and check that it changes the behavior. It might give us some pointer.

    Thanks,

    Dhaval

  • Hi Christopher,

    Just to clarify, your later statements indicate re-connection works fine if there's only one iOS device involved. But your steps does not include this information, so can you confirm if the steps are wrong?

    The 2nd, 3rd etc device not bonding properly would point to the bond table in flash not being erased.

    I expect you are reading the parameter GAPBOND_BOND_COUNT to get number of bonds.
    If the steps are accurate, then obviously step 7 is an error.

    Are you using an 128kb device? Which BLE SDK version?

    BR,
    Aslak

  • Thanks, Dhaval. gapBondMgrEraseAllBondings() is a static function inside gapbondmgr.c so I can't really call it from my code without modifying the TI stack. Is it not static in yours?

  • Hi,

    It is defined as static, but feel free to change anything you feel like changing in the source files.

    The idea behind the setParameter interface is that if you are in a connection and choose to erase bonds, it may cause a SNV compaction, which in turn may cause a page erase in the process. This may halt the processor for ~20ms and may break your link. The SetParam interface for erasing instead sets a flag for erase to happen on disconnect.

    At least that's the idea. If Dhaval has found different and if you do, I would be interested to hear about it.

    Aslak

  • Hi Aslak, thanks - I can provide the details of our setup here:

    We are reading the bond count like so:

    GAPBondMgr_GetParameter(GAPBOND_BOND_COUNT, &bondCount); 

    We are using a 256k part, the CC2541 to be exact.

    We are using stack version BLE-CC254x-1.4.0.

    We are using the ti_51ew_cc2540b.xcl linker script from 1.4.0 with the modification to the virtual registers line to work with the newer IAR:

    -Z(DATA)VREG=08-7F

    Just to clarify, your later statements indicate re-connection works fine if there's only one iOS device involved.

    Re-connection works fine if a bond is created once after flashing the chip with 1 device. Once we start to erase the bonds and reconnect from either that same device again, or from another device, that's when we typically start to experience this issue.

    Thanks,

    Chris

  • Hi Christopher,

    I can reproduce that there is something fishy when max bonds is set to 1.

    Will need to dig a bit deeper.

    BR,
    Aslak

  • That's good to hear, Aslak, I'm glad someone else has been able to reproduce the issues, or at least some derivative of them. Any updates you can share on your investigation?

    Today I tested using 'gapBondMgrEraseAllBondings()' instead of 'GAPBondMgr_SetParameter(GAPBOND_ERASE_ALLBONDS, 0, 0)' and unfortunately that had little effect. After about 10 cycles of "bond with an iOS client, erase bonds on both, repeat" the CC2541 was unable to erase the bonds. I would call gapBondMgrEraseAllBondings() and it would still have bonds, and the iOS device would not be able to connect.

  • I have more data to report. We are also using BTool to try and debug this. This happens with some regularity: I can connect once, but then after powercycling, I cannot bond and get an error back with the AuthenticationComplete message.

    UPDATE: I think I figured out the root cause for the following problem, our tool was attempting to create a new bonding on the subsequent connection instead of re-using the same LTK. However I'd still like some clarity on the error message ("Invalid Msg Pointer"). The original issue in this thread still stands - we lose the bonds after a few erasures.

    Here's the first authentication, this works:

    ------------------------------------------------------------------------------------------------------------------------
    [49] : <Tx> - 07:24:19.854
    -Type        : 0x01 (Command)
    -Opcode        : 0xFE0B (GAP_Authenticate)
    -Data Length    : 0x1D (29) byte(s)
     ConnHandle    : 0x0000 (0)
     sec.ioCaps    : 0x04 (KeyboardDisplay)
     sec.oobAvail    : 0x00 (False)
     sec.oob        : 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
     sec.authReq    : 0x01 (Bonding - exchange and save key information)
     sec.maxEKeySize    : 0x10 (16)
     sec.keyDist    : 0x3F (Slave Encryption Key
                     Slave Identification Key
                     Slave Signing Key
                     Master Encryption Key
                     Master Identification Key
                     Master Signing Key)
     pair.Enable    : 0x00 (Disable)
     pair.ioCaps    : 0x03 (NoInputNoOutput)
     pair.oobDFlag    : 0x00 (Disable)
     pair.authReq    : 0x01 (Bonding - exchange and save key information)
     pair.maxEKeySize    : 0x10 (16)
     pair.keyDist    : 0x3F (Slave Encryption Key
                     Slave Identification Key
                     Slave Signing Key
                     Master Encryption Key
                     Master Identification Key
                     Master Signing Key)
    Dump(Tx):
    01 0B FE 1D 00 00 04 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 01 10 3F 00 03 00 01 10 
    3F 
    ------------------------------------------------------------------------------------------------------------------------
    [50] : <Rx> - 07:24:19.867
    -Type        : 0x04 (Event)
    -EventCode    : 0xFF (HCI_LE_ExtEvent)
    -Data Length    : 0x06 (6) bytes(s)
     Event        : 0x067F (GAP_HCI_ExtentionCommandStatus)
     Status        : 0x00 (Success)
     OpCode        : 0xFE0B (GAP_Authenticate)
     DataLength    : 0x00 (0)
    Dump(Rx):
    04 FF 06 7F 06 00 0B FE 00 
    ------------------------------------------------------------------------------------------------------------------------
    [51] : <Rx> - 07:24:22.102
    -Type        : 0x04 (Event)
    -EventCode    : 0xFF (HCI_LE_ExtEvent)
    -Data Length    : 0x6A (106) bytes(s)
     Event        : 0x060A (GAP_AuthenticationComplete)
     Status        : 0x00 (Success)
     ConnHandle    : 0x0000 (0)
     AuthState    : 0x01 (Bonding - exchange and save key information)
     SecInf.Enable    : 0x01 (1)
     SecInf.LTKSize    : 0x10 (16)
     SecInf.LTK    : F1:B8:40:04:E7:73:95:C8:0E:EF:C3:05:0F:13:53:C6
     SecInf.DIV    : 0x32BF (12991)
     SecInf.Rand    : 0C:14:A3:86:B1:BB:54:34
     DSInf.Enable    : 0x01 (1)
     DSInf.LTKSize    : 0x10 (16)
     DSInf.LTK    : 7D:AB:08:E1:63:85:FD:AB:A0:81:E1:7B:D5:C8:6A:9F
     DSInf.DIV    : 0x2D6E (11630)
     DSInf.Rand    : 29:9D:69:9E:4D:EE:C7:F5
     IdInfo.Enable    : 0x01 (1)
     IdInfo.IRK    : 87:F5:3B:A0:AC:91:71:BE:48:E4:A7:75:E1:78:25:88
     IdInfo.BD_Addr    : 00:D0:C8:01:14:4D
     SignInfo.Enable    : 0x01 (1)
     SignInfo.CSRK    : CB:E0:7F:55:78:A2:28:61:E9:9D:25:8E:89:1B:4B:E6
     SignCounter    : 0xFFFFFFFF (4294967295)
    Dump(Rx):
    04 FF 6A 0A 06 00 00 00 01 01 10 F1 B8 40 04 E7 
    73 95 C8 0E EF C3 05 0F 13 53 C6 BF 32 0C 14 A3 
    86 B1 BB 54 34 01 10 7D AB 08 E1 63 85 FD AB A0 
    81 E1 7B D5 C8 6A 9F 6E 2D 29 9D 69 9E 4D EE C7 
    F5 01 87 F5 3B A0 AC 91 71 BE 48 E4 A7 75 E1 78 
    25 88 4D 14 01 C8 D0 00 01 CB E0 7F 55 78 A2 28 
    61 E9 9D 25 8E 89 1B 4B E6 FF FF FF FF 
    ------------------------------------------------------------------------------------------------------------------------
    

    Now I power cycle the entire board, including the CC2541. I power it up again and attempt to connect and authenticate from the same computer:

    [63] : <Tx> - 07:25:08.571
    -Type        : 0x01 (Command)
    -Opcode        : 0xFE0B (GAP_Authenticate)
    -Data Length    : 0x1D (29) byte(s)
     ConnHandle    : 0x0000 (0)
     sec.ioCaps    : 0x04 (KeyboardDisplay)
     sec.oobAvail    : 0x00 (False)
     sec.oob        : 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
     sec.authReq    : 0x01 (Bonding - exchange and save key information)
     sec.maxEKeySize    : 0x10 (16)
     sec.keyDist    : 0x3F (Slave Encryption Key
                     Slave Identification Key
                     Slave Signing Key
                     Master Encryption Key
                     Master Identification Key
                     Master Signing Key)
     pair.Enable    : 0x00 (Disable)
     pair.ioCaps    : 0x03 (NoInputNoOutput)
     pair.oobDFlag    : 0x00 (Disable)
     pair.authReq    : 0x01 (Bonding - exchange and save key information)
     pair.maxEKeySize    : 0x10 (16)
     pair.keyDist    : 0x3F (Slave Encryption Key
                     Slave Identification Key
                     Slave Signing Key
                     Master Encryption Key
                     Master Identification Key
                     Master Signing Key)
    Dump(Tx):
    01 0B FE 1D 00 00 04 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 01 10 3F 00 03 00 01 10 
    3F 
    ------------------------------------------------------------------------------------------------------------------------
    [64] : <Rx> - 07:25:08.589
    -Type        : 0x04 (Event)
    -EventCode    : 0xFF (HCI_LE_ExtEvent)
    -Data Length    : 0x06 (6) bytes(s)
     Event        : 0x067F (GAP_HCI_ExtentionCommandStatus)
     Status        : 0x00 (Success)
     OpCode        : 0xFE0B (GAP_Authenticate)
     DataLength    : 0x00 (0)
    Dump(Rx):
    04 FF 06 7F 06 00 0B FE 00 
    ------------------------------------------------------------------------------------------------------------------------
    
    
    [65] : <Rx> - 07:25:08.719
    -Type        : 0x04 (Event)
    -EventCode    : 0xFF (HCI_LE_ExtEvent)
    -Data Length    : 0x6A (106) bytes(s)
     Event        : 0x060A (GAP_AuthenticationComplete)
     Status        : 0x05 (Invalid Msg Pointer)
     ConnHandle    : 0x0000 (0)
     AuthState    : 0x00 (Gap Auth Req Bit Mask Is Not Set)
     SecInf.Enable    : 0x00 (0)
     SecInf.LTKSize    : 0x00 (0)
     SecInf.LTK    : 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
     SecInf.DIV    : 0x0000 (0)
     SecInf.Rand    : 00:00:00:00:00:00:00:00
     DSInf.Enable    : 0x00 (0)
     DSInf.LTKSize    : 0x00 (0)
     DSInf.LTK    : 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
     DSInf.DIV    : 0x0000 (0)
     DSInf.Rand    : 00:00:00:00:00:00:00:00
     IdInfo.Enable    : 0x00 (0)
     IdInfo.IRK    : 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
     IdInfo.BD_Addr    : 00:00:00:00:00:00
     SignInfo.Enable    : 0x00 (0)
     SignInfo.CSRK    : 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
     SignCounter    : 0x00000000 (0)
    Dump(Rx):
    04 FF 6A 0A 06 05 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 00 00 00 00 00 00 00 00 00 

    It fails with status == 5, "Invalid Msg Pointer".  All of the data in the response is 0.

    Have you made any progress looking into the issue?

    Thanks,

    Chris

  • If you can now reproduce it with some regularity that's great. As a TI CC2541 user myself, one thing I might suggest is to extract the flash contents from the 'bad' unit (using the CC debugger and SmartRF programmer) and provision it onto a different unit. Then check to see if the other unit reports the same error or not. If yes, then you're in a much better position as you can provision as many units as you want and have reproducibility at will. But you'll have to hook up the CC debugger and depending on your BLE device it may be some work (in my case I had to spend about an hour to get the right wires out and then rig a harness to hook it up to the CC debugger).

    The next step will be to do source stepping in the Bond Manager code while reproducing the error. [Remember to simply connect the debugging session without flashing new code in.] In my case, that's how I realized that the once the bonds[] array was 'full' the next bond was no longer getting 'added'. Took me less than 30 mins of putting breakpoints in various places and seeing which one was failing.

  • Hi,

    Error code 5 is unfortunately not Invalid Msg Pointer, BTool uses the wrong lookup for some reason. It's really

    0x05 Pairing Not Supported Pairing is not supported by the device. Ref Bluetooth Core Spec 4.1, Vol 3 Host, Part H Security Manager, Chapter 3.5.5 Pairing Failed.

    This is only supposed to happen if the pairing mode is set to: GAPBOND_PAIRING_MODE_NO_PAIRING. During user task init you probably set this to GAPBOND_PAIRING_MODE_WAIT_FOR_REQ or similar.

    Do you change this to NO_PAIRING at any point?

    Best regards,
    Aslak

  • Hi Aslak,

    Thanks, that makes more sense. We do use GAPBOND_PAIRING_MODE_NO_PAIRING as you can see in the top post in this thread - we want to only allow 1 bond at a time.

    I can try debugging a bad .hex file as Roshan suggested but I have been pulled away to other parts of our product. 

    Aslak, you mentioned that you recognize there was an issue in the TI stack. I'm very interested in knowing if there is any debugging you or your team has been able to do, and if there's any chance of a stack update that would fix this issue. Are you still waiting on more information from me, or has it really been confirmed?

    Thanks,

    Chris

  • Hi,

    I haven't been able to figure out further what could be wrong with the stack, unfortunately. It's in our bug tracking still - awaiting sharper minds than mine to look into it and verify/fix.

    Best regards,
    Aslak

  • Thanks for the update. We're wondering if the issue might have something to do with the way we turn off the BLE module. We have another microcontroller on the board that as some point will decide to shut everything off - literally the power to the CC2541 is cut. Is there perhaps a "proper" shutdown we should be initiating to make sure the stack is shut down safely and all NV storage is written out?

  • Hi,

    This could of course be a problem. You could do some sort of handshake to turn it off. E.g when in osal_run_system() there are no active events, the CC2541 can check a GPIO, assert one of its own and go into a loop while waiting to be shut off.

    On a sidenote, if you are not in a connection, and not using SNV on your own in your application, it should always be safe to switch off the device.

    BR,
    Aslak

  • Hi,

    I tried to reproduce again
    - Using SimpleBLEPeripheral
    - Setting GAP_BONDINGS_MAX to 1.
    - Erase all bonds on writes to Char 1
    - Send numBonds as periodic notification

    Pair, disconnect, reset, reconnect, re-encrypt with existing, numbonds==1, erase allbonds, disconnect, reconnect, expected fail to re-encrypt with existing, numbonds==0, reset, repeat from start.

    At this time I could find no monkey business, and I could verify that the variable 'bonds' which is the RAM shadow of the SNV bond table was updated correctly to either the bonded device or to all FF's when none existed.

    Best regards,
    Aslak