This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WL1837MOD - Bluetooth: latency and QoS implementation?

Other Parts Discussed in Thread: WL1837MOD

Team,

Can you please hlep about the following question for WL1837MOD.

Thanks in advance.
Anthony

1) Latency in Bluetooth:
About every 1-10 second we see an additional latency for L2CAP data packets (DM1 packets). We have been struggling to find out why but we cannot find any other conclusion (so far) that something happens in the radio. We have turned off Wifi, BT connectability, BT discoverability and eliminated our application but we still see this additional latency from time to time. Do you have any suggestion what this could be?


2) QoS:
Because of 1) we started to look at QoS:
Trying to find information of how to implement quality of service to get a short latency when sending data.

The following HCI commands are from BT 4.0 Core specification:
HCI_QoS_Setup_Command
HCI_FlowSpecification_Command

There are also vendor specific commands:
HCI_CS_QoS_Scheduler_Configuration
(also there are some vendor specific command in our old software but I don’t think it is applicable anymore: HCI_SetQoSInterval).

Test performed so far:
a) Test 1 using device A and B:
Steps to reproduce:
1.    Initiate an ACL link from A to B (A is master). This is a point-to-point connection, no other connections are enabled.
2.    Send HCI_QoS_Setup_Command from A where all parameters are set as default except the latency parameter which is set to 4*625.
Results:
-    Successful response from radio QoS Setup Complete event.
-    Sniffed LMP message:
                             Role: Master
                             Address: 5
                             Opcode: LMP_quality_of_service
                             Transaction ID: Initiated by master
                             Poll Interval: 40
                             NBC: 3
Comment:
No matter how the latency is set, we always get poll interval 40.

b) Test 2 using device A and B
Steps to reproduce:
1.    Initiate an ACL link from A to B (A is master). This is a point-to-point connection, no other connections are enabled.
2.    Send HCI_FlowSpecification_Command from A where all parameters are set as default except the latency parameter which is set to 4*625 and flowDirection is set to 1 “incoming data”.
Results:
-    Successful response from radio Flow Specification Complete event.
-    Short latency when sending data.
-    Sniffed LMP message:
                             Role: Master
                             Address: 5
                             Opcode: LMP_quality_of_service
                             Transaction ID: Initiated by master
                             Poll Interval: 4
                             NBC: 3
Comment:
We only get this successful when I send the HCI_FlowSpecification_Command using flowDirection ‘1’. After I’ve sent this I can do a preceeding HCI_FlowSpecification_Command with flowDirection ’0’ but I never see any difference in the latency for outgoing packets.

My questions are:
1.    Is the suggested implementation to use the HCI_FlowSpecification_Command from master and parameter flowDirection ‘1’?
If not – how is this best implemented? Do you have an example?
2.    Why do I only get a fixed poll interval for HCI_QoS_Setup_Command?
3.    Could you explain the HCI_CS_QoS_Scheduler_Configuration command? I am not sure I fully follow how to use the parameters? Do you have an example?
4.    Is the vendor specific command HCI_SetQoSInterval obsolete?

  • TI BT Stack version used is 3.9. Thanks for your support!
  • This query has been assigned to our Bluetooth expert.
    BR,
    Eyal
  • H,

    Please see below the answers to your questions:

    1.    Is the suggested implementation to use the HCI_FlowSpecification_Command from master and parameter flowDirection ‘1’? If not – how is this best implemented? Do you have an example?

    [Tomer] - There's no meaning for the outgoing direction, as the master controls the poll interval, and when to access the slave. The slave must listen all the time. The master must access the slave at least every poll interval. 

    2.    Why do I only get a fixed poll interval for HCI_QoS_Setup_Command? 

    [Tomer] - The command's purpose is to guarantee certain throughput and latency. I'm not sure why you don't get the desired result. When the device receives the data from the host, it will send it on the next poll interval. If you still would like to control/minimize the poll interval in order to make sure a data will be sent under a certain latency, then you should use "Send_HCI_VS_Set_QoS_Interval 0xFF33, 0x0001, poll_interval"

     
    3.    Could you explain the HCI_CS_QoS_Scheduler_Configuration command? I am not sure I fully follow how to use the parameters? Do you have an example?

    [Tomer] - Please don't use this command


    4.    Is the vendor specific command HCI_SetQoSInterval obsolete?

    [Tomer] - This is the recommended command to change only the poll interval, which has direct impact on the latency. 

    In case you still don't get the desired latency after using "Send_HCI_VS_Set_QoS_Interval", please send an air capture, FW log and the desired latency you're trying to achieve.

    Regards,

    Tomer

  • Thank you so much for your reply. I will, based on your answer, do some more testing and I will get back with trace if I don't sort it out myself.


    Would it be helpful with a trace to look at for the latency issue as well?

  • To tell the real latency you'll need to provide HCI trace synchronized to an air capture. You can do so with Frontline sniffer for instance.

    Regards,
    Tomer
  • We don't have the HCI trace functionality enabled yet, I will try to fix it asap and send it to you.

    Thanks!

  • Now I have done some FTS logs for you. I have used the virtual sniffer for HCI logging and I have sniffed the air with my Frontline Bluetooth sniffer. All logs are available here: drive.google.com/.../0B55ZqKqlP6i-dXVzb2o0bk5haGM Please let me know if you find issues with the link.
    The files named xx_cap_flowspec.cfa shows the behavior when using the FlowSpecification HCI command, the xx_qossetup.cfa when using the Qos Setup HCI command. I have also sent to you a HCI and air log where you can see our latency problem, which is really our greatest problem. I don't know if you can give any tips there though since you cannot really see anything strange in the log where we have the delay. The delay only occurs for short packets sent from Master to Slave. In the air capture you can see the delay on frame 11162 and in the HCI log on frame 815.
    Thank you so much for your support.
    Hanna

  • Hi,

    Indeed the delay occurs due to selecting default poll interval of 20 frames. This is to save current consumption.
    If you decrease the poll interval, you would obviously have less delay until the remote receives the packet from the master.
    In "iar_cap_flowspec.cfa" I could see that you changed the poll interval to 4. Have you experienced any delay or issues with this value?

    Regards,
    Tomer
  • Please correct me if I understood your reply wrong. I have seen the delay even if I decrease the poll interval. I see a perfect QoS with Flow Specification HCI command but only for incoming data, but as you told me earlier "There's no meaning for the outgoing direction, as the master controls the poll interval, and when to access the slave". From different tests we have done we can see that the delay occurs on the slave side, somewhere between the radio and when the data comes in to the controller. We see a very good latency over all, the long delay only happens sporadically.
  • I thought it was for the master's case...
    When you say long delay, how long is it? The device could do page scan, inquiry scan, calibrations...
  • Thanks for your quick response! The delays are quite rare (lets say 1 time / 200 packets) and they are 300- 1000 ms. We turned off page scan and inquiry scan. I do see a difference for the general spread of latency but those long rare delays are still there. Do you have any suggestions of what calibrations that could cause this?
  • Hi,

    The delays I was referring to due to page/inquiry scans or calibrations are a matter of several frames, definitely not up to 1 seconds.
    I'm pretty certain the delay is not caused by TI's controller. Virtual HCI sniff is not good enough, as the problem may reside in the upper layer (transport, stack...).
    Please provide a synchronized capture of air + HCI (UART, HW) interface. As I said, I'm pretty sure the delay is not as a result of the controller.

    Regards,
    Tomer
  • Reading this again, I'm confused now, as you said: "the delay only occurs for short packets sent from Master to Slave.", but then you said that the delay is on the slave's side.. so which is true?
  • Also, the delay is on the transmit side or on the receive side?
  • I am so sorry, I have not been clear. It is always hard to give a proper explanation. This is the test case: I have two modules with WL1837MOD, let's call them A and B. Device A initiates the connection and acts as master. Data is transferred periodically every 10 ms from A to B in short packets.

    I have seen that data is sometimes delayed somewhere on the way. When I look in HCI traces I see no delays on A. I see no delays in air but from what I can understand I see a delay somewhere between the data is received in the radio B and when the controller in B gets the data.

    Shortly, when sending data from master to slave I see a delay on the receive side.

    Hope this is clear enough!! Otherwise, please let me know.
  • Hi Hanna,

    Now it's clear thanks. 

    So now that I understand the delay occurs on device B (slave), I would still need to get FW log, air trace, and HCI trace taken over the UART rather than virtual one.

    We could see whether the packet is delayed in the controller (TI's chip) or somewhere at the host (probably transport layer). As I said, I'm pretty certain the issue is with the host. 

    Whenever TI's controller receives a packet (regardless if it's short or not), it decodes it, and copies it directly to the UART FIFO. 

    Thanks,

    Tomer

  • OK thank you for your reply and for taking your time. Then I know there is no obvious explanation for the delays. Right now I have only the possibility to make a virtual trace for the HCI commands unfortunately. We will try to investigate even more if we can find a host issue. I will get back to this thread if new questions comes up.
  • I have uploaded two new files in this folder. Please have a look.
    drive.google.com/.../0B55ZqKqlP6i-dXVzb2o0bk5haGM

    In the attached log files deviceB-master and deviceB-slave, 100 bytes of data was sent at a frequency of ~400 ms from device A. In the air sniff log you can see that the packets are sent in a regular basis from device A but the time between two adjacent packets in the receiving side is fluctuating when device B is acting as slave. The only difference in the two test setups is the master/slave role of the device B.

    We cannot find any explanation for this behavior and hope that you can help us to find the root cause.

    Thanks for your time!

  • Hi Hanna,

    I looked at the air trace, and indeed I could see that slave is receiving LE data packet every 400ms. 

    To pinpoint the issue, HCI capture (not virtual) is a must. I could start adding traces in the FW, but I find it redundant as it could be anywhere, and will also not prove anything (could still be host or controller's issue). Once you provide HCI capture which proves it's a controller issue, I could start working on this issue from controller's/FW point of view.

    Regards,

    Tomer

  • Hi,
    Do you have any new input on this? Should we keep this thread open?
    BR,
    Eyal
  • Hi,

    The problem is still not solved, I hope you can keep the case open. I've done two scripts for TI's HCI Tester (TI_master.txt and TI_slave.txt) and I also add a trace (air_cap_hci_tool.cfa). I guess you are able to run those scripts? What is your comment about the result?

    Thanks for your time!

    Hanna

  • Hi Hanna,

    Please attach the HCI tester scripts you've prepared. I'd like to review them and run them on my machine.

    Thanks,

    Tomer

  • Hi Tomer,

    Hanna is on vacation and I will follow up this case in the meantime.

    The requested tester scripts are available here:
    drive.google.com/.../0B55ZqKqlP6i-dXVzb2o0bk5haGM

    Thanks,
    Ramtin
  • Hi Tomer,

    It seems that the latency problem may relate to the chosen packet type in the base band. There is no latency problem if the payload fits into a 2DH-1 packet (payload < 55) while you can see some delays if the payload is larger than 55 bytes and the data is sent with 2DH-3.

    I ran the tests with TI's HCI tester and the logs and scripts can be find here:
    drive.google.com/.../0Bw_amlbVXlsMa3ppUENvblhLMTQ

    If you compare the time stamps when the data is sent by the client and received by the server you'll see the difference in 54 and 55 bytes cases:

    client 54Bytes server
    11:30:19.950 11:30:19.974
    11:30:20.456 11:30:20.469
    11:30:20.957 11:30:20.981
    11:30:21.465 11:30:21.477
    11:30:21.969 11:30:21.989
    11:30:22.471 11:30:22.484
    11:30:22.972 11:30:22.999
    11:30:23.476 11:30:23.493
    11:30:23.978 11:30:23.989
    11:30:24.485 11:30:24.505
    11:30:25.001 11:30:25.015


    client 55Bytes server
    12:16:59.452 12:16:59.476
    12:16:59.955 12:17:00.493
    12:17:00.461 12:17:00.499
    12:17:00.972 12:17:01.665
    12:17:01.474 12:17:01.687
    12:17:01.983 12:17:02.002
    12:17:02.489 12:17:02.771
    12:17:02.992 12:17:03.011
    12:17:03.501 12:17:03.528
    12:17:04.004 12:17:04.021
    12:17:04.509 12:17:04.807
    12:17:05.018 12:17:05.042
    12:17:05.521 12:17:06.036
    12:17:06.024 12:17:06.059
    12:17:06.526 12:17:07.227
  • Hןi,

    I will look at the script you attached, but HCI tester timestamp event is not the way to measure latency. This is far from being accurate. There's an inherent delay with the tool which could reach up more than tens of ms.

    Have you tested this with a UART sniffer? This is the proper way to measure.

    Regards,

    Tomer

  • Hi,

    I was running the script you provided (with 500ms delay between packets), and I don't see any issue when I look at the air capture. I see ~500ms gap between each packet. Please see attached file (a screenshot I took from the run). 

    But then again, the proper way is to use a UART sniffer to track a certain packet's delay. 

  • Hi Tomer,

    Thank you for your reply. It seems the I need to explain more the test set up and why we think this is an controller issue:

    We have three cases where we send packets of data every 500 ms from master to slave.
    1. If the number of bytes in the payload is 54 bytes the packets are sent using 2-DH1 which will be 100% full and the packets are received at the same interval (500 ms) by the receiving host.
    2. If the number of bytes in the payload is 55 bytes it will not fit into a 2-DH1 packet and the packets will be sent with 2-DH3 (only 15% of the max size) and here we can see irregularities and delays up to over 500 ms.
    3. If we only allow one slot packets and send 55 bytes the packets will be sent with 3-DH1 and we do not see the delays.

    The above results are the same using both our application and HCI tester.

    The inaccuracy in the HCI tester time stamp should be present in all cases. We see delays which are hundreds of ms not tens.
    The delays are not seen in the air sniff since it seems that the delay occurs in the receiving controller.
    When 54 bytes (2-DH1 packet) the receiving side receives a packet almost every 500 ms:
    19.974
    20.469
    20.981
    21.477
    21.989
    22.484
    22.999
    23.493
    23.989
    24.505
    25.015
    but with 55 bytes (2-DH3) you'll see delays and irregularities.

    I can not reach the HCI uart in our product, for that I need to send the hardware to someone who can slaughter it.

    We can not explain the difference between case 2 and 3 from host point of view, the only difference is the chosen packet types.

    Thanks,
    Ramtin
  • I've been trying to reproduce the issue with the scripts you sent for more than an hour but without success.

    Are you running WL8 initscript before you conduct the tests?

    Can you send FW logs?

    What is the host number of buffers that you define? I could also see this from the FW logs.

    btw, why are you not enabling all packet types?

    Regards,

    Tomer

  • Hi,

    Yes, we run WL8 init script and send the patch version 3.6.1 at start up.

    I'm not sure which FW logs you mean.

    We don't send the Host Buffer Size Command, hence the default values will apply.

    In the end product we disable 3 Mbit packets, all other packets are enabled.

    Will you send me your HCI tester logs (from both sides) when you run our script?

    Thanks,
    Ramtin
  • Hi,

    Please find the HCI tester logs.

    Regards,

    Tomerserver_55.txtclient_55.txt

  • Hi,

    Maybe there is something we're doing differently at start up. I cannot see why we're seeing different behaviour.
    I've uploaded new scripts that include the patch file we send at start up. Please try to reproduce with these:

    drive.google.com/.../0Bw_amlbVXlsMa3ppUENvblhLMTQ
    The files are located under the folder TI_latency_issue_02

    The base band payload is 55 bytes in both cases
    We see the packet delays with (script_TI-slave / script_TI-master) where 2-DH3 is being used.
    With (script_TI-slave_dm1_dh1 / script_TI-master_dm1_dh1) where 3-DH1 is used there is no delays in the receiving side.

    I've also added the log file when we run the scripts.

    Thanks
    Ramtin
  • updated_log.txtHi,

    I'm sorry, but I still can't see the issue. I must get BT FW logs, so I would have internal visibility. The logger utility can be obtained from:

    There a document about the logger and how to use it. Also, for additional information you can refer to:

    Regards,

    Tomer

  • Hi,

    Please find the FW logs in folder FW-logs
    drive.google.com/open

    Thanks,
    Ramtin
  • Hi,

    The FW logs seem fine.

    I'm sorry, but I don't see how we can proceed without UART sniffer. Once the UART sniffer proves the issue is with the controller, I will be able to start adding traces in the FW code (will require new initscripts).

    Regards,

    Tomer

  • Hi again,

    We tried with a 115.2k baudrate on the HCI UART and could not see the behavior. Could you try with 3Mbit/s on the HCI UART since this is what we have in our embedded system?

    Regards
  • Hi Tomer,

    Just a reminder, have you been able to try this with bigger HCI baud rate?

    Regards,
    Ramtin
  • Hi,

    I will only be able to try this out next week. Have you had a chance to get  UART sniffer?

    Regards,

    Tomer

  • Hi,

    Ok, thanks. I haven't got a module that I can connect to the HCI UART yet, we're working on it.
    You should be able to reproduce it with 3Mbit UART since everything else seems to be the same in our set up.

    Regards,
    Ramtin
  • Hi Ramtin,

    I've been able to reproduce the issue with baud rate of 3Mbps. I'll be looking into it to see whether it's a bug, or can be fixed by a different configuration of the device.

    As a workaround, are you able to work with 921600Kbps? You will not experience this issue.

    Regards,

    Tomer

  • Hi Tomer,

    Good news that you can reproduce it on your side and hope there is an easy fix to it.
    I still see too long delays with 921600 and some packet sizes, with 460800 it gets ok. However we need at least 2Mbps in our product.

    Thanks,
    Ramtin
  • Hi, just wanted to hear if you have come closer to a solution or anymore findings?
  • Hi Tomer, just wanted to hear if you have come closer to a solution or anymore findings?
  • Hi,

    We're still looking into this.

    Can you explain again why you disable EDR3? As I understand, the issue doesn't occur when EDR3 is enabled.

    Regards,

    Tomer

  • Hi,

    We get worse range with EDR3.

    However the issue also occurs when all packet types are enabled, it depends on how big the l2cap data packet is and how much the baseband packet is filled with data. For instance if the payload in the air is about 110 bytes you get best result if DM3 is carrying the data.

    After what I have experienced if the chosen base band packet type is filled with more data (>60% of its maximum capacity) there is less latency problems. Also if you send the data with less delay between two consecutive packets (<100 ms) you'll see less fluctuation in the latency. It feels like some sleep policy or rescheduling is kicked in these cases.

    Regards,
    Ramtin
  • Hi,

    We're still looking into it, but I'm not sure we'll be able to address this issue assuming it's caused by the MAC/HW layer. I'll keep you updated.

    Regards,
    Tomer
  • Hi Tomer,
    Have you had any more progress today?
  • Hi,

    It's currently being investigated by the HW/System design team. I don't have any update at this stage.

    Regards,

    Tomer

  • Hi,

    I have two workarounds you can use, and this in fact clarifies why we haven’t been able to see the issue in the past 10 years.

    The default mode of the controller should be working in sleep mode (Send_HCI_VS_Sleep_Mode_Configurations 0xFD0C), otherwise current consumption would be much higher. There’s no reason not working in HCILL mode (sleep mode). The controller will strive to enter this mode autonomously whenever possible. The host must comply though with the HCILL protocol ()

    Another point is that whenever there's an ACL connection with low or intermediate throughput, the host should strive to have the connection in sniff mode. There's no reason to have an active connection as it will consume much more current consumption. 

    By working with low sniff interval, I haven't seen the issue, or when working in HCILL mode.

    Regards,

    Tomer