This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WL1837MOD: Missing Bluetooth LE Advertisements when scanning for long periods of time

Part Number: WL1837MOD
Other Parts Discussed in Thread: CC2400, , WL1837

Hi, we are using the WL1837MOD in our Linux Armv7 based thermostat to scan for wireless BLE temperature sensors, which send a set of three advertisements (not connection oriented) every 15 seconds. The thermostat scans indefinitely with the maximum scan interval set via HCI and is connected to a wired power source, so the CPU never sleeps and the WL1837MOD never stops scanning. During testing, we have seen random periods of 5-10 minutes where other Bluetooth LE advertisements are seen by the WL1837MOD, but not our sensors. What makes this even more confusing, is the fact that in the same environment other Bluetooth hardware, including sniffers, do see the missing advertisement packets during the same period. This data, which was recompiled with hcidump, btmon, and a dedicated sniffer (based on the TI CC2400) indicates that either there is a hardware issue or more likely a firmware issue which filters the LE Advertisement events and never sends them via HCI to Linux. 

We've tried long-running scans, and restarting the scan every 2 minutes. Sometimes we have sites that go days without consistent packet loss, other sites have multiple 5-minute gaps per day.  We have one environment at a engineer's house that consistently reproduces the problem multiple times a day. As far as signal strength, there appears to be no correlation to performance.  The second graph with just one color is the average signal strength for the three sensors.  Notice that the period of no data on the right is roughly indistinguishable from the center where's there's strong packet counts. Indeed, even on the left where the signal strength is relatively poor, we have good packet counts. We've also attempted to eliminate wifi co-existance as a contributing factor by disabling the wifi radio entirely and relying on ethernet for one test. This unit has the same patterns as devices with wifi active.

We're not using sleep mode of any kind on the processor, and we can see evidence of other BTLE packets being recieved during this time, but at much, much lower levels. I've attached that graph below as well. (It's using logarithmic scale, because the packet counts are substantially higher; you would be be able to see our sensors without it.)  Notice that the average packet count for the entire population of btle traffic is depressed during the period in question. However, it's worth noting that we're still receiving on the order of one thousand packets a second during this period. At this point, I don't see any explanation other than that they're going missing in the WL1937MOD itself.

Is there any way to dig deeper into the firmware of the WL1837MOD, or are there any known issues related to this that may be relevant?  At this point this problem is blocking a very important product release, and we're getting a lot of pressure to move forward.

  • Hi Alsey,

    Can you take firmware and HCI logs on the WL1837 according to this guide? You should be able to use the BT_DBG pin on the WL1837.

    Thanks,
    Jacob 

  • Hi Jacob, I work with Alsey, and I was able to capture an event.  (It's my lab that has a multiple-event-per-day rate).

    This event occurred on the 28th, and I was running the TI logger as well as btmon on the linux side.  Loading the btmon hci packets into wireshark reveals that the total packet count (not just our sensors) was significantly depressed, even going as low as a few hundred a minute (down from an average of 2500.

    Our logging system which is able to track the packets received from our sensors showed that the loss of packets was enough to bring the count from our sensors down to zero.  This would cause the thermostat to go into protection mode.

    13:12 local, 20:12 UTC

    0005.logs.ziphcidump.pcap.zip

  • Hi William,

     I’ll follow up next week.

     Thanks,

    Jacob

  • Hi William and Alsey,

    During testing, we have seen random periods of 5-10 minutes where other Bluetooth LE advertisements are seen by the WL1837MOD, but not our sensors. What makes this even more confusing, is the fact that in the same environment other Bluetooth hardware, including sniffers, do see the missing advertisement packets during the same period.

    Does the other Bluetooth hardware (like the sniffers you mention) capture these advertisements in addition to the thermostat advertisements? I'm trying to understand if these other advertisements are causing the WL1837 to drop the thermostat advertisements. 

    Thanks,
    Jacob 

  • Thanks for starting to look at this Jacob!

    I'm not totally sure I follow, but I think the answer is that yes, the other hardware (including our older model that uses an atheros chipset) sees all thee advertisements.  In the period during this event:

    There was no change in packet counts on the atheros-based system.

  • Hi William,

    In the pcap you shared, are you able to point me to the advertisments that come from your thermostats? Do you know what type of advertising packets the thermostats are sending (ex: connectable, undirected advertising). In BLE terminology, this is referred to as the PDU type. 

    Thanks,
    Jacob

  • All of sensors advertisements `oui` is `00:1a:ae` and the manufacturer data has our company ID, which you can filter with `btcommon.eir_ad.entry.company_id == 0x01d9`. We are sending Non-Connectable Undirected Advertising packets. Just to be clear, we periodically disable scanning and advertise on thermostats, then go back to scanning, but the outages we are seeing  are not advertising are during the scan intervals. We also have builds where the Thermostats only scan and the issue is present there. I tried alternating between scanning and adv to see if that helped, but its the same. 

     

  • Hey Alsey,

    Thanks for the picture. I'll review your logs again to see if I can detect an issue.

    One workaround for you to prevent other non-sensor advertisements from being sent to your thermostat is to create an accept list. This allows the WL1837 to filter advertisements so that you only receive advertisements from the desired BLE temperature sensors. Perhaps this would help with the dropped advertisements from the temperature sensors.

    Thanks,
    Jacob

  • Do you suspect that non-desired advertisements are causing the issue?  Are there some throughput limitations in the stack somewhere?

  • Hi William,

    I am not aware of any throughput limitations in the stack. I'm not sure if the non-desired advertisements are causing the issue. If the non-desired advertisements show up in the logs when the wireless sensor ones are missing, that would lead me to believe they are an issue.

    Thanks,
    Jacob

  • I'm kinda confused by that.  I'm trying to follow along, so let me ask in another way...  

    If we're doing an unrestricted scan, we'd expect to see advertisements from a whole lot of devices, right?  In my case, that's on the order of around 2000 a minute.  When the WL1800 is having trouble, the global count of packets received can fall down so far that 1/4 or less of those are received.  In that condition, one would expect that there would be advertisements from a wide range of devices, right?  I think that, on balance, the loss of so many packets is just depressing the number we get from our device enough that our thermostat has to go into safety mode.

    We are implementing the accept list currently, but haven't finished yet.  It's a little tricky because we have to implement two scanning modes, one for adding new sensors and one for steady-state.  So, it's taking a bit.  I'll report back with what we find.

  • Hi William,

    The accept list will help you limit the advertisements you see on the WL1837 to the ones from the wireless temperature sensors. It may not help you fix the dropped advertisements you mentioned. 

    The core issue is that the WL1837MOD is dropping advertisements from the wireless sensors, correct? This sentence you mentioned earlier is confusing me because it suggests the opposite:

    During testing, we have seen random periods of 5-10 minutes where other Bluetooth LE advertisements are seen by the WL1837MOD, but not our sensors.

    Additionally, can you specify which Bluetooth stack you are using on the Linux host?

    Are you able to take the firmware logs as mentioned before via the TX debug pin? This will help me understand what is happening on the WL1837MOD. 

    Thanks,
    Jacob

  • Ah, yes..  So, when there are periods of depressed packet counts there are times that we miss our sensor advertisements. But, it is my hypothesis that this is related to a general loss of packets.  We started noticing because our packets were missing, but as we dug into it more completely (by logging _all_ hci traffic) we noticed that, yes there were other packets, but that all packets were less frequent.

    As far as the stack details, Alsey will have to answer that.  He's the primary engineer on the project.

    We did download the firmware logs, they're linked above.  0005.logs.zip  These logs are from the same session that I've been sharing graphs from.  The hci data from btmon is also included for the same timeframe. 

  • Sorry William, 

    I missed your firmware logs from before. Thanks for sharing those, I'll look at them now.

    Another thing you could try: disable then immediately re-enable scanning on the WL1837 when you notice that you receive a lot less advertisements. If you receive the same amount of packets as before, this points to a buffer overflow. Otherwise, there may be some RF antenna issues in certain environments.  

    Thanks,
    Jacob

  • Hi William,

    One other note: in your firmware logs, I see a lot of "Warning: BLE/ANT blocked by arbiter!" messages. These occur throughout the logs and I will see if these could be related to your issue.

    Best,
    Jacob

  • Hi William,

    I wanted to share the WiLink 8 Vendor Specific HCI commands for your debugging. 

    Best,
    Jacob

  • Based on our previous conversation we enabled the whitelist when scanning, but are still seeing the same drops in received packets. 

  • Hi Alsey,

    Are you able to take Bluetooth air sniffer logs with a Protocol Analyzer that can show different levels of Bluetooth communication? For example:

      

    Thanks,
    Jacob

  • We have included the PCAP logs from an air sniffer (Ubertooth One) running at the same time as the TI debug port was scanning.  It's available in this post.  If that's not what you mean, then you might have to elaborate a little more.

  • Hi William,

    The pcap you sent is helpful to some degree, but it only displays the HCI traffic. I'm looking for the other Bluetooth layers (like L2CAP, SDP, etc.).

    Is it possible to get those?

    Thanks,
    Jacob

  • I guess I'm confused how to get meaningful L2CAP data from an external Protocol Analyzer.  Also, our protocol is strictly advertisement-based, and doesn't use link-oriented operation at all.

  • Hi WIlliam, 

    I've seen other Bluetooth air traffic captures that contain data besides the HCI layer, so that's why I recommend it. It's difficult to root cause your issue because the issue is very intermittent. If you are able to directly tie the issue to an action/environment/situation etc. that would be helpful.

    I can say that the "Warning: BLE/ANT blocked by arbiter!" is likely caused by an interrupt when coexistence between Wi-Fi and BLE peripherals. 

    Thanks,
    jacob

  • We analyzed the packets and added code to reject all incoming connection requests and only scan for devices in the whitelist. There is no evidence of anything in the environment trying to connect via BLE to the Thermostat. We are still seeing sensors having random outages of 4-7 minutes on the Thermostat model with the WL1837MOD. Is there something in the WiFi environment we should look at? Would disabling WiFi have an effect? Could you provide more details about the implications of "Warning: BLE/ANT blocked by arbiter!".

  • Hi Alsey,

    Disabling the Wi-Fi could help ease network congestion. Have you tried observing this behavior if Wi-Fi is disabled? 

    The arbiter helps to handle radio coexistence with the other PHY protocols, such as BLE and Wi-Fi. It seems like Wi-Fi might be demanding priority.

    Can you try disabling Wi-Fi?

    Best,
    Jacob

  • Hi Jacob.  We've disabled wifi on one unit and let it sit for a week.  We compared its performance to another that had wifi enabled.  I've attached a screenshot of the influx data for the time period.  As before, these are packet counts per minute.  The upper trace is wifi disabled, and the lower is wifi enabled.  Notice that the wifi disabled case has a much tighter cluster of packet counts, and rarely goes to zero.  I checked each of the times it meets the origin, and counted the number of minutes that each gap lasts.  The longest is 4 minutes, and there are three such periods.  This isn't "desired", but it is acceptable for our use case.  The WiFi case, however has an almost uncountably large number of 5+ minute gaps.

    What are the next steps?  How can we work around the arbiter, and ensure that there's enough time scanning for BTLE packets?

    Thanks!

  • Hey William,

    I'll look to see if I can provide you more information on how the arbiter works - specifically with timing intervals. In the meantime, can you try adjusting the WL1837 BLE scan interval? Instead of scanning continuously, you could try different intervals of every 1 sec, 5 sec, 30 sec, etc. I think having this information in a table will be insightful for your application. I'm imagining something like this:

    Scan Interval Performance (No. of 5+ min gaps)
    1 sec really large
    5 sec 100
    30 sec 10

    Best,
    Jacob