This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: Kernel boot hangs

Part Number: AM5728
Other Parts Discussed in Thread: PMP, AM5726

Tool/software: Linux

Hi,

We are facing issue of stuck  starting Kernel. it is not consistence of all board.  35% boards are facing stuck at staring Kernel and 65% board successfully boot and working fine. 

We have verified u-boot successfully handover to Kernel but kernel entry point is somewhere stuck and didn’t see any print message on console after load the Kernel

=====Logs========

initializing smps
PHY is initialized
OV798 Sensor reset
GPIO7_5 reset
GPIO7_4 reset
SCSI: SATA link 0 timeout.
AHCI 0001.0300 32 slots 1 ports 3 Gbps 0x1 impl SATA mode
flags: 64bit ncq stag pm led clo only pmp pio slum part ccc apst
scanning bus for devices...
Found 0 device(s).
Net: <ethaddr> not set. Validating first E-fuse MAC
cpsw
Hit any key to stop autoboot: 0
switch to partitions #0, OK
mmc0 is current device
SD/MMC found on device 0
reading boot.scr
reading uEnv.txt
switch to partitions #0, OK
mmc0 is current device
SD/MMC found on device 0
3601144 bytes read in 190 ms (18.1 MiB/s)
100756 bytes read in 19 ms (5.1 MiB/s)
Kernel image @ 0x82000000 [ 0x000000 - 0x36f2f8 ]
## Flattened Device Tree blob at 88000000
Booting using the fdt blob at 0x88000000
Loading Device Tree to 8ffe4000, end 8ffff993 ... OK

Starting kernel ...

  • Please post Linux version you use and complete boot log.
  • Linux Version: 4.4.32-gadde2ca9f8

    ==================Non Working Boot Logs======================
    U-Boot SPL 2016.05-00304-g323bf10-dirty (Dec 12 2017 - 15:00:36)
    G3Z UBL VER 1
    DRA752-GP ES2.0
    Trying to boot from MMC1
    reading u-boot.img
    reading u-boot.img
    U-Boot 2016.05-00304-g323bf10-dirty (Dec 12 2017 - 15:00:36 +0530)
    CPU : DRA752-GP ES2.0
    Board: AM5726 G3Z REV 2.0 VER 1
    I2C: ready
    DRAM: 2 GiB
    NAND: 1024 MiB
    MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
    reading uboot.env

    Using default environment

    initializing smps
    PHY is initialized
    OV798 Sensor reset
    GPIO7_5 reset
    GPIO7_4 reset
    SCSI: SATA link 0 timeout.
    AHCI 0001.0300 32 slots 1 ports 3 Gbps 0x1 impl SATA mode
    flags: 64bit ncq stag pm led clo only pmp pio slum part ccc apst
    scanning bus for devices...
    Found 0 device(s).
    Net: <ethaddr> not set. Validating first E-fuse MAC
    cpsw
    Hit any key to stop autoboot: 0
    switch to partitions #0, OK
    mmc0 is current device
    SD/MMC found on device 0
    reading boot.scr
    reading uEnv.txt
    switch to partitions #0, OK
    mmc0 is current device
    SD/MMC found on device 0
    3601144 bytes read in 190 ms (18.1 MiB/s)
    100756 bytes read in 19 ms (5.1 MiB/s)
    Kernel image @ 0x82000000 [ 0x000000 - 0x36f2f8 ]
    ## Flattened Device Tree blob at 88000000
    Booting using the fdt blob at 0x88000000
    Loading Device Tree to 8ffe4000, end 8ffff993 ... OK

    Starting kernel ...

    ===========================================================
  • Hello Mahesh,

    You should first find out which hardware part of these 35% boards differs from the other 65% boards.

    Best regards,
    Kemal
  • There is no hardware difference between working and non-working board, all of these are made from same batch of fabrication and assembly.
  • Can you enable low-level debugging and see if there are any additional printk? From make menuconfig, select Kernel hacking, then select Kernel low-level debugging functions (read help!). Then select your Kernel low-level debugging port.

    Steve K.
  • I have done with enabled low-level debug function but nothing to print on console when start the kernel.
  • By disabled the PICE interfacing in dts then board is booting up. do you have any idea what was the reason to if disable PCIE interfacing tnen device is booting up.

    Following node disable has been disable in dra7.dts.

    axi@0 {
    compatible = "simple-bus";
    #size-cells = <1>;
    #address-cells = <1>;
    ranges = <0x51000000 0x51000000 0x3000
    0x0 0x20000000 0x10000000>;
    /**
    * To enable PCI endpoint mode, disable the pcie1_rc
    * node and enable pcie1_ep mode.
    */
    pcie1_rc: pcie_rc@51000000 {
    compatible = "ti,dra7-pcie";
    reg = <0x51000000 0x2000>, <0x51002000 0x14c>, <0x1000 0x2000>;
    reg-names = "rc_dbics", "ti_conf", "config";
    interrupts = <0 232 0x4>, <0 233 0x4>;
    #address-cells = <3>;
    #size-cells = <2>;
    device_type = "pci";
    ranges = <0x81000000 0 0 0x03000 0 0x00010000
    0x82000000 0 0x20013000 0x13000 0 0xffed000>;
    #interrupt-cells = <1>;
    num-lanes = <1>;
    linux,pci-domain = <0>;
    ti,hwmods = "pcie1";
    phys = <&pcie1_phy>;
    phy-names = "pcie-phy0";
    interrupt-map-mask = <0 0 0 7>;
    interrupt-map = <0 0 0 1 &pcie1_intc 1>,
    <0 0 0 2 &pcie1_intc 2>,
    <0 0 0 3 &pcie1_intc 3>,
    <0 0 0 4 &pcie1_intc 4>;
    status = "disabled";
    pcie1_intc: interrupt-controller {
    interrupt-controller;
    #address-cells = <0>;
    #interrupt-cells = <1>;
    };
    };

    pcie1_ep: pcie_ep@51000000 {
    compatible = "ti,dra7-pcie-ep";
    reg = <0x51000000 0x28>, <0x51002000 0x14c>, <0x51001000 0x28>, <0x1000 0x10000000>;
    reg-names = "ep_dbics", "ti_conf", "ep_dbics2", "addr_space";
    interrupts = <0 232 0x4>;
    num-lanes = <1>;
    num-ib-windows = <4>;
    num-ob-windows = <16>;
    ti,hwmods = "pcie1";
    phys = <&pcie1_phy>;
    phy-names = "pcie-phy0";
    syscon-legacy-mode = <&scm_conf1 0x14 2>;
    status = "disabled";
    };
    };

    axi@1 {
    compatible = "simple-bus";
    #size-cells = <1>;
    #address-cells = <1>;
    ranges = <0x51800000 0x51800000 0x3000
    0x0 0x30000000 0x10000000>;
    /* status = "disabled"; */
    status = "disabled";
    pcie@51800000 {
    compatible = "ti,dra7-pcie";
    reg = <0x51800000 0x2000>, <0x51802000 0x14c>, <0x1000 0x2000>;
    reg-names = "rc_dbics", "ti_conf", "config";
    interrupts = <0 355 0x4>, <0 356 0x4>;
    #address-cells = <3>;
    #size-cells = <2>;
    device_type = "pci";
    ranges = <0x81000000 0 0 0x03000 0 0x00010000
    0x82000000 0 0x30013000 0x13000 0 0xffed000>;
    #interrupt-cells = <1>;
    num-lanes = <1>;
    linux,pci-domain = <1>;
    ti,hwmods = "pcie2";
    phys = <&pcie2_phy>;
    phy-names = "pcie-phy0";
    interrupt-map-mask = <0 0 0 7>;
    interrupt-map = <0 0 0 1 &pcie2_intc 1>,
    <0 0 0 2 &pcie2_intc 2>,
    <0 0 0 3 &pcie2_intc 3>,
    <0 0 0 4 &pcie2_intc 4>;
    pcie2_intc: interrupt-controller {
    interrupt-controller;
    #address-cells = <0>;
    #interrupt-cells = <1>;
    };
    };
    };

    On U-boot prompt I am seeing below message which I am NOT seeing in any board except this one.
    SCSI: omap_pipe3_wait_lock: DPLL failed to lock
  • Hi Steve -

    Thanks for the support on this. I wanted to pass along some additional information from Mahesh's troubleshooting:

    By disabled the PICE interfacing in dts then board is booting up. Following highlighted changes made in dra7.dtsi. do you have any idea what was the reason to if disable PCIE interfacing and device is booting up.

    These changes are suggested by TI to enable PCI interface for WiFi edgewater modules.



    axi@0 { compatible = "simple-bus"; #size-cells = <1>; #address-cells = <1>; ranges = <0x51000000 0x51000000 0x3000 0x0 0x20000000 0x10000000>; /** * To enable PCI endpoint mode, disable the pcie1_rc * node and enable pcie1_ep mode. */ pcie1_rc: pcie_rc@51000000 { compatible = "ti,dra7-pcie"; reg = <0x51000000 0x2000>, <0x51002000 0x14c>, <0x1000 0x2000>; reg-names = "rc_dbics", "ti_conf", "config"; interrupts = <0 232 0x4>, <0 233 0x4>; #address-cells = <3>; #size-cells = <2>; device_type = "pci"; ranges = <0x81000000 0 0 0x03000 0 0x00010000 0x82000000 0 0x20013000 0x13000 0 0xffed000>; #interrupt-cells = <1>; num-lanes = <1>; linux,pci-domain = <0>; ti,hwmods = "pcie1"; phys = <&pcie1_phy>; phy-names = "pcie-phy0"; interrupt-map-mask = <0 0 0 7>; interrupt-map = <0 0 0 1 &pcie1_intc 1>, <0 0 0 2 &pcie1_intc 2>, <0 0 0 3 &pcie1_intc 3>, <0 0 0 4 &pcie1_intc 4>; status = "disabled"; pcie1_intc: interrupt-controller { interrupt-controller; #address-cells = <0>; #interrupt-cells = <1>; }; }; pcie1_ep: pcie_ep@51000000 { compatible = "ti,dra7-pcie-ep"; reg = <0x51000000 0x28>, <0x51002000 0x14c>, <0x51001000 0x28>, <0x1000 0x10000000>; reg-names = "ep_dbics", "ti_conf", "ep_dbics2", "addr_space"; interrupts = <0 232 0x4>; num-lanes = <1>; num-ib-windows = <4>; num-ob-windows = <16>; ti,hwmods = "pcie1"; phys = <&pcie1_phy>; phy-names = "pcie-phy0"; syscon-legacy-mode = <&scm_conf1 0x14 2>; status = "disabled"; }; }; axi@1 { compatible = "simple-bus"; #size-cells = <1>; #address-cells = <1>; ranges = <0x51800000 0x51800000 0x3000 0x0 0x30000000 0x10000000>; /* status = "disabled"; */ status = "disabled"; pcie@51800000 { compatible = "ti,dra7-pcie"; reg = <0x51800000 0x2000>, <0x51802000 0x14c>, <0x1000 0x2000>; reg-names = "rc_dbics", "ti_conf", "config"; interrupts = <0 355 0x4>, <0 356 0x4>; #address-cells = <3>; #size-cells = <2>; device_type = "pci"; ranges = <0x81000000 0 0 0x03000 0 0x00010000 0x82000000 0 0x30013000 0x13000 0 0xffed000>; #interrupt-cells = <1>; num-lanes = <1>; linux,pci-domain = <1>; ti,hwmods = "pcie2"; phys = <&pcie2_phy>; phy-names = "pcie-phy0"; interrupt-map-mask = <0 0 0 7>; interrupt-map = <0 0 0 1 &pcie2_intc 1>, <0 0 0 2 &pcie2_intc 2>, <0 0 0 3 &pcie2_intc 3>, <0 0 0 4 &pcie2_intc 4>; pcie2_intc: interrupt-controller { interrupt-controller; #address-cells = <0>; #interrupt-cells = <1>; }; }; };

    On U-boot prompt I am seeing below message which I am NOT seeing in any board except this one.

    SCSI:  omap_pipe3_wait_lock: DPLL failed to lock

    Please let me know if you have any question doubt.

    Thank you!

  • Mahesh,
    Can you add some temporary prints to your kernel source. Specifically, let's see if we're even getting into the kernel startup routine. In init/main.c, add a pr_debug() print at the start of the start_kernel() function...

    init/main.c

    asmlinkage __visible void __init start_kernel(void)
    {
    char *command_line;
    char *after_dashes;

    pr_debug("start_kernel()\n");

    /*
    * Need to run as early as possible, to initialize the
    * lockdep hash:
    */
    lockdep_init();
    set_task_stack_end_magic(&init_task);
    smp_setup_processor_id();
    debug_objects_early_init();
  • I have already added debug print in main.c file as you suggested also enable the low-level kernel debug but did not see ant print on console.

    Also verified the put manual print  on head-common.S file before jump to "start_kernel" and it printed on console that message. Please refer below highlighted part in assembly which was validated to hit this line.

    __mmap_switched:
    adr r3, __mmap_switched_data

    ldmia r3!, {r4, r5, r6, r7}
    cmp r4, r5 @ Copy data segment if needed
    1: cmpne r5, r6
    ldrne fp, [r4], #4
    strne fp, [r5], #4
    bne 1b

    mov fp, #0 @ Clear BSS (and zero fp)
    1: cmp r6, r7
    strcc fp, [r6],#4
    bcc 1b

    ARM( ldmia r3, {r4, r5, r6, r7, sp})
    THUMB( ldmia r3, {r4, r5, r6, r7} )
    THUMB( ldr sp, [r3, #16] )

    str r9, [r4] @ Save processor ID
    str r1, [r5] @ Save machine type
    str r2, [r6] @ Save atags pointer
    cmp r7, #0
    strne r0, [r7] @ Save control register values

    b __error_p    /*Just added for debug to make sure the assembly code hit at this line */
    b start_kernel
    ENDPROC(__mmap_switched)

  • It doesn't make a lot of sense that it would behave like that. Did you have the pr_debug() in start_kernel() without the b __error_p jump in __mmap_switched: ?

    Do you have the ability to connect to the target using JTAG?
  • I didn't see any pr_debug print without the b __error_p.

    Yes, we have ability to connect to target using JTAG but board have TP connection for JTAG and lots of rework need to require to connect JTAG on board.
  • okay, lets hold off for now on JTAG. I was hoping we could validate the kernel image is indeed loaded into memory as expected using a memory window. We might be able to do the same thing using uboot.

    Can you stop in uboot and manually load the kernel image then dump the memory contents starting at the kernel start function? I believe you can get the address from System.map. Do this on a good board and on a bad board and compare the results. May want to do the same for the dtb file.

    BTW, have you validated the kernel and dtb are flashed correctly in eMMC? Have you tried completely reflashing?
  • Do all of the 35% of the bad boards print this:

    SCSI: omap_pipe3_wait_lock: DPLL failed to lock

    at U-Boot prompt? If so, this seems to be what needs to be tracked down.

    Do any of the good boards print this?
  • Not all, some of board have same error and some of board didn't getting this error on uboot prompt.
  • Hi, Mahesh,

    I checked the PCIe configuration in dtsi file and it is identical to TI's configuration. Would the kernel boot if keeping the PCIe configuration in dtsi file, but disabling it in dra7-evm.dts file (I assume you use DRA7 EVM from u-boot logs)?

    &pcie1_rc {
    - status = "okay";
    + status = "disabled";
    };

    Also, do those boards boot failed with "SCSI: omap_pipe3_wait_lock: DPLL failed to lock" message or its irrelevant?

    Rex
  • Yes, We are using the DRA7 EVM dts file but it is not identical as default SDK. we added some changes for custom board.

    All board not getting this error SCSI: omap_pipe3_wait_lock: DPLL failed to lock some of board are getting the error.
  • The error message you see is from SATA. Do your custom boards have SATA?

    Steve K.
  • Yes, It is coming from SATA on u-boot prompt and We don't used SATA on custom board.

  • Since you do not use SATA, can you disable it in include/configs/am57xx_evm.h? That should make the message go away.

    Steve K.
  • Yes I disabled SATA on uboot and this message go away on u-boot prompt.
  • Hi, Mahesh,

    I reviewed your 4 changes. After kernel boots up with changes (except #1), does PCIe still function? Removing sys_clkin1 and changing phy-cells shouldn't make any difference. phy-cells is not used by PCIe driver as far as I know.

    If using any of the modified images, does kernel still boot on those working boards?

    Rex
  • PCIe is not working on working board hen modified the changes.

    We had found that the processor was not getting proper voltages of VDDA_PCIE0/VDDA_PCIE1/VDDA_PCIEE(1.8V) for PCIe.

    It was an assembly issue as FB18 was not properly assembled on the board