Connection stability issues communicating with iOS and Android simultaneously

Martin Bertsche

Other Parts Discussed in Thread: TM4C1290NCPDT, CC2564

Mr. Sanchez requested me to post this here for better traceability.

In our product we are using the CC2564 together with Bluetopia 1.2R2 on a TM4C1290NCPDT. We are having problems when we are communicating with an Android device via Bluetooth Classic SPP, and at the same time with an iOS device using our own SPP implementation on top of the Bluetooth Smart GATT profile.

Here is a recap of what has happened so far, just to give you our perspective:

Our product requires simultaneous communication with multiple Android devices via SPP and multiple iOS devices via GATT as described above.
- The sum of Android and iOS devices was specified to be a maximum of four. There is no limitation on the composition of this sum so there can be up to 4 iOS devices (in peripheral mode) or up to 4 Android devices.
- The purpose of the App is that of a remote control. The main use case is that the changes one user does on his device are synchronized with all other connected devices. So the commands our board receives are executed and transformed into status updates for all connected devices. The devices display the updated status.
We use Bluetopia in the following way:
- The TM4C runs FreeRTOS 8.1.2
- We provide our own HCI UART driver.
- We implemented the OS abstraction layer as described in the comments of the corresponding header files.
- We run the threaded version of Bluetopia
- All our calls to Bluetopia are made from within a Bluetopia controlled thread or from a single thread that we use to control Bluetopia (aka the Bluetooth thread).
- Bluetopia events are not handled within the context they occur. They are deep copied and put into a message queue to be handled by the Bluetooth thread.
We implemented the Android / SPP stuff first since SPP was already available in Bluetopia so we could quickly prove our concept.
- The simultaneous communication with up to 4 Android devices works fine.
Later we added iOS and our SPP implementation on top of GATT.
- Everything works fine as well when communicating with just one iOS device at a time.
Problems began when we started testing with Android and iOS devices simultaneously.
- Our device and the connected apps exchange what we call heartbeat messages every second in order to be able to tell the user quickly about connection loss. If a device - no matter whether it is an app or the TM4C - does not receive a heartbeat message for more than 10 seconds it will automatically disconnect. On iOS devices we are unable to control this behavior, so it is only our application on the TM4C that performs this action.
- Heartbeat exchange worked fine even if iOS and Android devices were simultaneously connected.
- However, as soon as we start to generate more than the basic traffic on one of the mobile devices the Bluetooth Smart connection with the iOS device seems to die after a short while (8 - 10 Seconds), while the Bluetooth Classic connection remains intact.
  - The observation we make in our logs is that we do not receive a disconnect event for the iOS device. What we do see is that the Bluetopia GATT write queue for sending data to the iOS device starts to clog up with data. The event that the queue is empty again which is specified in the Bluetopia API never occurs.
- We started looking for mistakes on our side using debug log messages everywhere so we could track every event that we received from Bluetopia and every call that we make to the Bluetopia API. After we were able to track down and fix some issues in our code the problem persisted. Looking at our logs we were pretty sure that the issue was out of our hands.
Next we rented an Ellisys Logger in order to look at the data that was actually going over the air and to verify that nothing was wrong with our HCI driver.
- The message payload consisting of our own protocol turned out to be correct.
- The Ellisys sniffer did not report any malformed messages on the HCI connection.
- However we could detect messages sent from the CC2564 that contain invalid checksums. That was the only problem we could see in the logs. It also reproducibly coincided with the connection issues.
- Posting the Ellisys log on E2E caught the attention of the Support team.
We have been talking to the CC2564 software team for the last three months trying to reproduce the issue in Israel which finally started to work four weeks ago.
- The CC2564 software team is now telling us that to their best knowledge the CC2564 is behaving in the way it should up to the point when it receives a disconnect for the iOS device – either from the iOS device itself or from the TM4C.
  - We know from our logs that our application does not actively disconnect from the iOS device until 10 seconds after the problem occurs due to no heartbeat messages being received.
  - Nor do we receive a disconnect event for the iOS device as described above
- The CC2564 software team admits that a short time after said disconnect occurs there indeed are checksum errors in the Ellisys log for messages the CC2564 sends that they cannot explain.

Our current state is that there is unexplained behavior on the air. The CC2564 team says their software works correctly. However, there could be an issue with Bluetopia or the way we use the Bluetopia API. We would be glad if the problem were on our side. BUT Bluetopia does not complain about the way we use it. So we cannot track down possible mistakes in any way.

Therefore, using the information above, we hope you can help us shed some light on the issue. If you need any further information please let me know about it.

Regards

over 10 years ago

0 Miguel over 10 years ago

TI__Mastermind 23565 points

Hi Martin,

We will look at it and provide some feedback. BTW, are you using TI's SPPLE application?

Regards,

~Miguel

0 Martin Bertsche over 10 years ago in reply to Miguel

Prodigy 250 points

Hi Miguel,

yes! I started out from that point. So everything we do is based on the GATT SPP implementation and the "real SPP" code that is provided with the examples. After all, those examples are pretty much the only "sources of documentation" that put the whole API into context.

Yet this example isn't the only source i have been using. Somewhere here in E2E I found another example that shed some light on how to deal with the iOS random addresses and that shows how to use Bluetopia's address resolution API. I think there was an earlier example that worked with iOS for 15 minutes or so and then failed due to the changing address. The version of that example which solves the issue can only be found searching this forum.

Regards

0 Joseph Gigi over 10 years ago in reply to Martin Bertsche

TI__Guru* 85400 points

Hi,

Can you please confirm that if using the TI SPP LE Demo app as is, you don't see this disconnect?

Regards,
Gigi Joseph.

0 Martin Bertsche over 10 years ago in reply to Joseph Gigi

Prodigy 250 points

Hi Joseph,

I'm sorry, I can't!

It is very tedius to do actual tests with LightBlue on iOS because I have to act like an SPP LE client typing on a screen. How would I generate throughput that way?

Even more so, the SPP LE Demo does not entirely cover our use case for 2 reasons.

In your example the App has to be Central / Client of the Service --> In our case the App acts as the peripheral.
In your example I have to type in data via the command line --> Again very tedious to generate throughput.

Therefore I don't think this will provide any hints.

As I said everything seems to be working fine as long as iOS and android exchange just a little data each second or so. I don't think I wouldn't be able to create any more data using the demo`s command line.

Regards,

Martin

0 Joseph Gigi over 10 years ago in reply to Martin Bertsche

TI__Guru* 85400 points

Hi Martin,

Understood.

In that case, can you please share the code snippet that you have? I am mainly interested in the changes that you have made to the SPP Demo App.
I would need this if I have to comment on "However, there could be an issue with Bluetopia or the way we use the Bluetopia API"

Regards,
Gigi Joseph.

0 David Clus over 10 years ago in reply to Joseph Gigi

Prodigy 185 points

Hi joseph,

Martin will come to you regarding the code snippets later.

I just want to bring your attention to what we think where the problem could be located:

Before we started the discussion here, we thoroughly added our code with logs, looked at the hw handshaking between CC2564 and our host controller and got hints, that problems start, when flow-control between the CC2564 and the host comes more and more into action, even though it looks like the flow control works fine.

The last ellisys log we shared with TI logged over the air as well as on HCI level. Here we can see, that the last payload that is sent correctly needs about 400ms from HCI to BLE, and in the meantime there are 6 more att write requests interleaved with two spp packets sent to the CC2564.

Only the two SPP Packets are seen over the air 20ms later, the 6 more BLE packets were never sent. And what we thougt is odd: The last BLE packet is retried 112 times with wrong checksum. Other logs that the CC2564 team has taken, shows that even before things go worse regarding these retries there is something wrong on the BLE side.

Does it make sense to provide additional hardware with logging capabilities?

Kind Regards

David Clus

0 David Clus over 10 years ago in reply to Joseph Gigi

Prodigy 185 points

H joseph,
I'd like to send you the code parts (hci, bluetooth, rtos bindings) in order to give yo an insight how we use the bluetopia api. It's quite huge and contains proprietary information. Can you please contact me in order to have the mail address? Thank you, Regards David

Bluetooth®︎

Bluetooth forum

Connection stability issues communicating with iOS and Android simultaneously