This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/WL1835MOD: CPU frequency scaling and HCI UART

Part Number: WL1835MOD
Other Parts Discussed in Thread: TPS65217

Tool/software: Linux

Hi all,

Our product is using CPU frequency scaling with the "ondemand" governor. Most of the time, streaming music while changing the CPU frequency works just fine, but from time to time we have an issue with the HCI UART link. It seems that after a CPU frequency change, the host cannot communicate anymore with the BT controller.

This issue can be reproduced playing audio over A2DP and running the cpu_scaling.sh script in parallel (see link below). This scripts sets the CPU governor to "userspace" and changes the CPU frequency randomly, without creating any additional CPU load. Switching the frequency from  800 MHz to 1 GHz also triggers the issue, meaning it is not related to a CPU shortage but more to a freeze/lock happening when the kernel change the frequency of the CPU.

HCILL:

After disabling BT deep sleep and HCILL, when the issue happens the HCI link stops working for a few hundreds of milliseconds but recovers after. My guess is that with HCILL, the BT controller is going to deep sleep while the host is changing its CPU frequency, so the host is missing the information and cannot communicate with the BT controller anymore. Without HCILL, packets are just delayed creating a cut, but the communication is still alive.

I'm not able to capture the HCI lines with a logic analyzer on my board, the lines are on a PCB layer that is not accessible.

UART loopback:

I did some low-level tests to try to isolate and reproduce the problem more easily without the BT controller (see serial_loopback.py below). To run the test, you have to connect UART TX/RX, run "./serial_loopback.py /dev/ttyO1" with the correct serial port and run cpu_scaling.sh in parallel.

test01_delay shows no issue, basic delays (at least time.sleep()) are respected even when changing frequency.

test11_serial_loopback shows that sometimes when changing the CPU frequency, the UART loopback takes up to ~750 ms to complete (~10 ms in the normal case). Recompiling the kernel with CONFIG_PREEMPT_VOLUNTARY=y reduces the normal loopback time to a few ms, but the issue with the ~750 ms loopback is still here.

Could you help understanding why this 750 ms delay happens? Do you have the same behavior on your boards? Is there a link between the CPU clock and the UART module?

Environment:

  • AM3352+ WL1835MOD
  • TI Processor SDK 04.00.00.04
  • Bluetopia 4.2.1.0
  • TPS65217 PMIC

Thanks for your help.

-Julien

  • Hi Julien,

    I will try to replicate the issue first on BeagleboneBlack. I will keep you posted.
  • Julien,

    I was unable to complete the test on BBB board with Ti kernel ti2017.01, which is in Processor SDK 4.0.0.4, serial_loopback.py always fails in a few hundreds of iterations, due to the assert failure at line 84.

    But I can run the test with TI kernel ti2017.06 successfully. Though I have to uncomment line 82 to add the little delay. The log attached below shows the max delay is only about 3x of the normal delay, it doesn't show the 750ms problem. I also attached the kernel config for your reference.

    Is it possible for you to test the Processor SDK v4.3.0.5 kernel?

    ti2017.06.log

    _config.gz

  • Hi Bin,

    Thanks for your help. You're right, on some boards you have to uncomment the sleep on line 82.

    Comparing your kernel config with ours, I noticed we had CONFIG_PREEMPT_NONE wrongly set. With the following configuration, the behavior is the same as on your board: the max delay in the UART loopback test is around 3x the normal delay (~20 ms).
    # CONFIG_PREEMT_NONE is not set
    CONFIG_PREEMPT=y,
    CONFIG_PREEMPT_COUNT=y
    CONFIG_HZ_250=y
    CONFIG_HZ=250

    The problem is that we still have the BT UART issue. It seems fixing the low-level test is not enough to completely cure the real problem.

    When playing music over A2DP and running the cpu_scaling.sh script in parallel, we still observe (after around 10 min or less):
    - With HCILL enabled, when the issue happens, the link is broken.
    - With BT deep sleep and HCILL disabled, when the issue happens the HCI link stops working for a few hundreds of milliseconds but recovers after.

    Do you have a board with WL18xx to verify that the HCILL protocol is working properly with a non real-time kernel when changing the CPU frequency?

    Best Regard,

    -Julien
  • Julien,

    While checking why the test fails on SDK v4.0.0.4 kernel, I noticed the 8250 driver in that kernel has a bug which is fixed in newer kernel. Can you please apply the following patch to see if it affects your BT usecase?

    0001-serial-8250-8250_omap-Fix-race-b-w-dma-completion-an.patch.txt
    From 4279e80c27d88ba6ad360312282b8b69ccea4298 Mon Sep 17 00:00:00 2001
    From: Vignesh R <vigneshr@ti.com>
    Date: Fri, 9 Jun 2017 17:52:09 +0530
    Subject: [PATCH] serial: 8250: 8250_omap: Fix race b/w dma completion and RX
     timeout
    
    DMA RX completion handler for UART is called from a tasklet and hence
    may be delayed depending on the system load. In meanwhile, there may be
    RX timeout interrupt which can get serviced first before DMA RX
    completion handler is executed for the completed transfer.
    omap_8250_rx_dma_flush() which is called on RX timeout interrupt makes
    sure that the DMA RX buffer is pushed and then the FIFO is drained and
    also queues a new DMA request. But, when DMA RX completion handler
    executes, it will erroneously flush the currently queued DMA transfer
    which sometimes results in data corruption and double queueing of DMA RX
    requests.
    
    Fix this by checking whether RX completion is for the currently queued
    transfer or not. And also hold port lock when in DMA completion to avoid
    race wrt RX timeout handler preempting it.
    
    Signed-off-by: Vignesh R <vigneshr@ti.com>
    ---
     drivers/tty/serial/8250/8250_omap.c | 21 +++++++++++++++++++--
     1 file changed, 19 insertions(+), 2 deletions(-)
    
    diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
    index db70e6946951..c6bd5c37f6df 100644
    --- a/drivers/tty/serial/8250/8250_omap.c
    +++ b/drivers/tty/serial/8250/8250_omap.c
    @@ -786,8 +786,25 @@ static void __dma_rx_do_complete(struct uart_8250_port *p)
     
     static void __dma_rx_complete(void *param)
     {
    -	__dma_rx_do_complete(param);
    -	omap_8250_rx_dma(param);
    +	struct uart_8250_port *p = param;
    +	struct uart_8250_dma *dma = p->dma;
    +	unsigned long flags;
    +
    +	spin_lock_irqsave(&p->port.lock, flags);
    +
    +	/*
    +	 * If the completion is for the current cookie then handle it,
    +	 * else a previous RX timeout flush would have already pushed
    +	 * data from DMA buffers, so exit.
    +	 */
    +	if (dma->rx_cookie != dma->rxchan->completed_cookie) {
    +		spin_unlock_irqrestore(&p->port.lock, flags);
    +		return;
    +	}
    +	__dma_rx_do_complete(p);
    +	omap_8250_rx_dma(p);
    +
    +	spin_unlock_irqrestore(&p->port.lock, flags);
     }
     
     static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
    -- 
    2.17.1
    
    

  • Bin,

    I applied the patch of the 8250 driver but it does not solve the issue. The behavior is the same.

    The HCILL protocol is implemented in userpace, isn't it? Do you think it can be a problem?
  • Julien,

    Sorry, I don't know much about the BT stack, I guess you have to debug the issue from the userspace application to kernel to understand why the link is broken.
  • Julien,

    I recommend you first test with the latest Processor SDK v5.0, which has newer userspace libraries and kernel.
  • The problem is that I don't have access to the source code of the HCI/HCILL driver. It comes with the BT stack, as a binary provided by Bluetopia/TI.

    Do you have an internal contact who could help to debug this issue, or who could share this part of the code?

    In the meantime, I will try to upgrade to the latest SDK.

  • Julien,

    Have you tested on the latest SDK yet?
    I will try to figure out which team in TI is responsible for the BT stack.
  • Hi Bin,

    I didn't tested with the latest SDK yet. I have some tasks to complete before migrating to the new SDK.
    In the meantime, if you find out who could help debugging the HCILL, I can help them to reproduce the issue.

    Thanks.
  • Julien,

    Is the HCILL driver included in the Processor SDK or a separate package you received from TI? If latter, can you please share the download link?
  • Hi Bin,

    The HCILL driver is part of the Bluetopia stack that you can download at the following link, under "TI Bluetooth Linux Add-On for AM335x EVM, AM437x EVM and BeagleBone With WL18xx and CC256x".

    I guess the HCILL driver is in the libBTPS.a library:

    objdump -t lib/BluetopiaPM/Bluetopia/lib/libBTPS.a | grep HCILL
    HCILL.o:     file format elf32-little
    00000000 l    df *ABS*  00000000 HCILL.c
    00000000 l     O .rodata        00000004 HCILL_Message
    000000b0 g     F .text  00000098 HCILL_Initialize
    00000148 g     F .text  00000058 HCILL_Shutdown
    000001a0 g     F .text  00000064 HCILL_Reconfigure
    00000204 g     F .text  00000038 HCILL_Resynchronize
    0000023c g     F .text  0000011c HCILL_TransmitBytes
    00000358 g     F .text  0000013c HCILL_ReceiveBytes
    00000000 l     O .rodata        00000018 HCILL_Protocol_Information


    Please note that we are not using the Bluetopia PM (daemon/client architecture), our application links directly the Bluetopia library.

     

  • I was wondering, if the AM33x is entering into deep sleep or high latency C-states during idle.. Can, you check, which C-states are allowed in the system and disable them, to give it a try?

    echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state0/disable
    echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state1/disable


    Thanks
  • Hi Hari,

    Thanks for your suggestion, but in our system, cpuidle is disabled (see below).

    In the next days, I will spend some time to upgrade to SDK5 to see if it helps.

    #
    # CPU Frequency scaling
    #
    CONFIG_CPU_FREQ=y
    CONFIG_CPU_FREQ_GOV_ATTR_SET=y
    CONFIG_CPU_FREQ_GOV_COMMON=y
    CONFIG_CPU_FREQ_STAT=y
    CONFIG_CPU_FREQ_STAT_DETAILS=y
    # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
    # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
    CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
    # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
    # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
    CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
    CONFIG_CPU_FREQ_GOV_POWERSAVE=y
    CONFIG_CPU_FREQ_GOV_USERSPACE=y
    CONFIG_CPU_FREQ_GOV_ONDEMAND=y
    CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
    CONFIG_GENERIC_CPUFREQ_CPU0=y

    #
    # CPU frequency scaling drivers
    #
    CONFIG_CPUFREQ_DT=y
    CONFIG_CPUFREQ_DT_PLATDEV=y
    # CONFIG_ARM_KIRKWOOD_CPUFREQ is not set
    # CONFIG_ARM_OMAP2PLUS_CPUFREQ is not set
    CONFIG_ARM_TI_CPUFREQ=y
    # CONFIG_QORIQ_CPUFREQ is not set

    #
    # CPU Idle
    #
    # CONFIG_CPU_IDLE is not set
    # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set

    #
    # Kernel Features
    #
    # CONFIG_SMP is not set
    CONFIG_HAVE_ARM_ARCH_TIMER=y
    # CONFIG_VMSPLIT_3G is not set
    CONFIG_VMSPLIT_2G=y
    # CONFIG_VMSPLIT_1G is not set
    CONFIG_PAGE_OFFSET=0x80000000
    CONFIG_ARM_PSCI=y
    CONFIG_ARCH_NR_GPIO=0
    # CONFIG_PREEMPT_NONE is not set
    # CONFIG_PREEMPT_VOLUNTARY is not set
    CONFIG_PREEMPT=y
    CONFIG_PREEMPT_COUNT=y
    CONFIG_HZ_FIXED=0
    # CONFIG_HZ_100 is not set
    # CONFIG_HZ_200 is not set
    CONFIG_HZ_250=y
    # CONFIG_HZ_300 is not set
    # CONFIG_HZ_500 is not set
    # CONFIG_HZ_1000 is not set
    CONFIG_HZ=250
    CONFIG_SCHED_HRTICK=y
    # CONFIG_THUMB2_KERNEL is not set
    CONFIG_AEABI=y
    # CONFIG_OABI_COMPAT is not set
    CONFIG_ARCH_HAS_HOLES_MEMORYMODEL=y
    # CONFIG_ARCH_SPARSEMEM_DEFAULT is not set
    # CONFIG_ARCH_SELECT_MEMORY_MODEL is not set
    CONFIG_HAVE_ARCH_PFN_VALID=y
    CONFIG_HIGHMEM=y
    # CONFIG_HIGHPTE is not set
    # CONFIG_HW_PERF_EVENTS is not set
    CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
    CONFIG_FLATMEM=y
    CONFIG_FLAT_NODE_MEM_MAP=y
    CONFIG_HAVE_MEMBLOCK=y
    CONFIG_NO_BOOTMEM=y
    # CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
    CONFIG_PAGEFLAGS_EXTENDED=y
    CONFIG_SPLIT_PTLOCK_CPUS=4
    # CONFIG_COMPACTION is not set
    # CONFIG_PHYS_ADDR_T_64BIT is not set
    CONFIG_ZONE_DMA_FLAG=0
    CONFIG_BOUNCE=y
    # CONFIG_KSM is not set
    CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
    # CONFIG_CROSS_MEMORY_ATTACH is not set
    CONFIG_NEED_PER_CPU_KM=y
    # CONFIG_CLEANCACHE is not set
    # CONFIG_CMA is not set
    # CONFIG_ZBUD is not set
    # CONFIG_ZSMALLOC is not set
    CONFIG_FORCE_MAX_ZONEORDER=12
    CONFIG_ALIGNMENT_TRAP=y
    CONFIG_UACCESS_WITH_MEMCPY=y
    # CONFIG_SECCOMP is not set
    CONFIG_SWIOTLB=y
    CONFIG_IOMMU_HELPER=y
    # CONFIG_XEN is not set