This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/PROCESSOR-SDK-AM335X: Kernel 4.4 will sometimes hang on boot @ clocksource: jiffies

Part Number: PROCESSOR-SDK-AM335X

Tool/software: Linux

Hello,

We currently have a couple of boards based off of the AM335x EVM and we're now looking at using a newer kernel (4.4). We noticed that maybe every 1 in 20 or so, the kernel would hang on boot, always at the same spot.

Below is where it hangs (at clocksource: jiffies):

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.4.12-svn100 (dev@aaa) (gcc version 4.8.3 (GCC) ) #3 Tue Feb 14 15:17:47 PST 2017
[    0.000000] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] Machine model: TI AM335x
[...]
[    0.112165] CPU: Testing write buffer coherency: ok
[    0.113980] Setting up static identity map for 0x80008200 - 0x80008258
[    0.122119] devtmpfs: initialized
[    0.172404] VFP support v0.3: implementor 41 architecture 3 part 30 variant c rev 3
[    0.237777] omap_hwmod: debugss: _wait_target_disable failed
^^^ last entry that shows up in when it hangs

[    0.296301] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
^^^ this line no longer shows up

[    0.299314] pinctrl core: initialized pinctrl subsystem
[    0.306278] NET: Registered protocol family 16
[    0.314393] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.331495] GPIO line 20 (ddr vtt enable) hogged as output/high
[...]

We have never encountered this issue with TISDK6's 3.2 kernel, so we're wondering if anyone has seen this before.

Much appreciated!

  • Hi,

    FYI, the 4.4.12 kernel is not an official TI kernel release. Could you share where did you download the sources from, so that I can take a look at them?

    Best Regards,
    Yordan
  • Hi Yordan,

    Thanks for looking into this. We used the kernel from 3.00.00.04 (software-dl.ti.com/.../software_manifest.htm) as it was our last SDK at the time. But now we've switched to the latest one (03.02.00.05), and also saw the same thing happen with 4.4.32.
  • Hi,

    Let me take a look at the kernel sources and I'll get back with my feedback.

    Best Regards,
    Yordan
  • Hi Yordan,

    Great, thanks for your help! Unfortunately, it hangs before anything is written to console (after uboot prints "Starting Kernel") and there's no kernel dump available, so that's all the info I can provide so far.

  • Hi,

    I think the issue is NOT in the clocksourse
    [ 0.296301] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
    This exact same line can be seen on devices that boot normally, i.e. the bellow is from my BBB, running kernel 4.4.19:
    [ 0.169660] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns

    Your system actually hangs on:
    [ 0.331495] GPIO line 20 (ddr vtt enable) hogged as output/high

    which comes from drivers/gpio/gpiolib.c, gpio_hogd():
    /* Mark GPIO as hogged so it can be identified and removed later */
    set_bit(FLAG_IS_HOGGED, &desc->flags);

    pr_info("GPIO line %d (%s) hogged as %s%s\n",
    desc_to_gpio(desc), name,
    (dflags&GPIOD_FLAGS_BIT_DIR_OUT) ? "output" : "input",
    (dflags&GPIOD_FLAGS_BIT_DIR_OUT) ?
    (dflags&GPIOD_FLAGS_BIT_DIR_VAL) ? "/high" : "/low":"");
    The line name in your case is: "ddr vtt enable". Can you post your dts file & full bootlog (uboot output & kernel output up until it hangs)?

    Is this issue reproducible on just one board or on many boards? Did you review your schematic, especially the ddr connection (make sure you strictly follow the DM guide lines) (?) I don't exclude a hardware problem..

    Best Regards,
    Yordan
  • Hi Yordan,

    Interesting, I had thought that when we couldn't see that clocksource jiffies line being printed, that was the area. We'll look into scoping the DDR lines and checking registers in uboot and after successfully booting into linux. However, as the board was able to boot fine with kernel 3.2, we didn't think it would be HW related.

    This issue has occurred on several boards, it's quite random (i.e only happens ~1/20 boots), but always at the same spot.

    Attached is our boot log and dts file.

    4150.boot.txt6505.am335x-dev.dts.txt

  • Hi Yordan,

    After scoping the ddr_vtt_en line, we noticed that there is a brief drop to 0V. In older kernels, this didn't happen.
    We are now wondering if it's the whole GPIO set being re-init when the kernel loads/device tree is loaded.
    As such, is there any way to prevent this pin from being re-init (since it's already init correctly in u-boot) before we pin mux it?

    We have also added the following pinmux:

    &am33xx_pinmux {
    pinctrl-names = "default";
    pinctrl-0 = <&ddr_vtt_pin &hw_id_pins &rst_button_pin &led_gpio_pins &hw_reboot_pins>;

    ddr_vtt_pin: ddr_vtt_pin {
    pinctrl-single,pins = <
    0x1B4 (PIN_OUTPUT_PULLUP | MUX_MODE7) /* xdma_event_intr1.gpio0_20 */
    >;
    };

    Thanks!
  • Hi,

    As such, is there any way to prevent this pin from being re-init (since it's already init correctly in u-boot) before we pin mux it?


    If, you've set this pin (associated with the ddr) correctly in u-boot, try removing it from the kernel side. As a general rule DDR configuration is done only once in u-boot & you should not perform any ddr initialization in linux kernel.

    Best Regards,
    Yordan
  • Hi Yordan,

    Unfortunately we tried to leave it out in linux (i.e. have uboot configure it only) and it wouldn't work. We'd see the ddr_vtt_en voltage drop, and the system would hang at that point. That was how we ended up using the gpio-hog, to ensure it's always run. As we're having trouble determining what is causing the GPIO pin to be reset in the first place, do you have suggestions on where we can look to try to prevent this re-init?

    Thanks!

  • Hi, 

    Let me test it on my BBB.

    Best Regards, 

    Yordan

  • Hi Yordan,

    Thanks, were you able to test it?

  • Hi,

    Yes, I tested it with the following configuration:

    KERNEL PART:
    am335x-bone-common.dtsi:
    &am33xx_pinmux {
    pinctrl-names = "default";
    pinctrl-0 = <&gpio0_pins>;
    ..........
    .........
    gpio0_pins: gpio0_pins {
    pinctrl-single,pins = <
    0x1b4 (PIN_OUTPUT_PULLUP | MUX_MODE7) /* xdma_event_intr1.clkout2 */ //previously mode3
    >;
    };
    };

    U-BOOT PART:

    mux.c:

    + static struct module_pin_mux gpio0_20_pin_mux[] = {
    + {OFFSET(xdma_event_intr1), (MODE(7) | PULLUP_EN)},
    + {-1},
    + };

    + void enable_gpio0_20_pin_mux(void)
    + {
    + configure_module_pin_mux(gpio0_20_pin_mux);+
    + }

    void enable_board_pin_mux(void)
    {
    /* Do board-specific muxes. */
    if (board_is_bone()) {
    /* Beaglebone pinmux */
    configure_module_pin_mux(mii1_pin_mux);
    configure_module_pin_mux(mmc0_pin_mux);
    + configure_module_pin_mux(gpio0_20_pin_mux);

    board.c:

    void set_uart_mux_conf(void)
    {
    /*addition for gpio*/
    + enable_gpio0_20_pin_mux(); //added here to be certain that gpio is enabled as early as debug console... (mlo stage).

    And i got a consistent GPIO0_20 HIGH on the scope. NO signal drop between kernel & u-boot.

    When you designed your board, did you follow strictly the recommendations given in Section 7.7.2.3.3.1 DDR3 Interface Schematic of AM335x Data Manual? Especially the VTT termination part..

    Best Regards,
    Yordan
  • Actually can you post the relevant part of the schematic?

    EDIT: Also post the full bootlog. 

    Best Regards,
    Yordan

  • Hi Yordan,

    Thanks for providing your diffs, as it pointed something out to us that we're validating now.

    Our uboot originally only set it to PULLUDEN, not PULLUP_EN. So what we believe was happening was somehow the GPIO was temporarily being set to input when switching from uboot to linux and the pull-down was causing the blip. I'm wondering if you can see something similar if you only set PULLUDEN in uboot? As now we need to determine where in the linux code it's being first init to input. So now, we've set it to PULLUP_EN and no longer see the blip.

    I apologise as I thought I already included the boot log. I'll include it here for reference, but we'll be testing with PULLUP_EN on that line to see if it solves the issue.

    5228.boot.txt

  • Hi,

    I'm wondering if you can see something similar if you only set PULLUDEN in uboot?

    As expected with only the PULLEDEN set like bellow:

    static struct module_pin_mux gpio0_20_pin_mux[] = {

           {OFFSET(xdma_event_intr1), (MODE(7) | PULLUDEN)},      

           {-1},

    };

    the internal pullup is NOT set in u-boot:

    => md 0x44E109b4

    44e109b4: 00000007 00000030 00000028 00000030    ....0...(...0... 

    As you can see conf_xdma_event_intr1 is equal to 0x7, which means MUXMODE7 (GPIO0_20), PULLUD enabled, PULLDOWN selected (bit 4 = 0x0)... 

    On the scope gpio0_20 is low during u-boot & gents high when kernel takes over. 

    Best Regards, 
    Yordan

  • Hi Yordan,

    Awesome, thank you very much for checking. In our uboot, we also manually set it to output high in board.c:

            gpio_request(GPIO_DDR_VTT_EN, "ddr_vtt_en");
            gpio_direction_output(GPIO_DDR_VTT_EN, 1);

    Which does indeed set it to output high. As a result, the overall sequence is it's high while in uboot and then it drops again when the kernel is loading, then high again when device tree pin config is taken in.

    So now we would like to determine where in the driver the pin is being reinit (we believe it's being reinit as input, then after that, device tree settings is used.) Would you be able to help provide some clues on where to look? We would like linux to not touch the pins at all if possible.

    Thanks!