This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[SOLVED] AM3352 linux 3.12 (SDK 7.0) random hangs

Other Parts Discussed in Thread: AM3352, AM3517, AM3359

Hi all, 

I am testing the new 3.12 kernel from the SDK 7.0 on a custom AM3352 board. 

This board has the following features:

- AM3352ZCZ60 (revision 1.0)
- SPI flash for boot
- micro SD card on MMC0 port
- 512MB nand flash on GPMC
- DP83848 ethernet PHY
- 2 USB ports
- TSP65910A PMIC

The hardware is validated, it runs the 3.2 kernel from SDK 6.0 for a while, but we need to upgrade to the 3.12 kernel with device tree support. 

The problem is that the board is very unstable with the new kernel. 

The behavior is really random. We have a kernel that can boot up, but sometimes it hangs up during the boot process, sometimes very early (just after "starting kernel.." message, sometimes when mounting rootfs, sometimes it can boot completely and we can log on and run some commands but it freezes after a few minutes (less than 10min usually). 

The issue is never the same. I am pretty sure that this is not a SW bug. 

I do not expect a solution, but is someone having the same kind of issue with the new 3.12 kernel ? Has someone any suggestion of what to do ? where to look for ? 

Here is my last idea:

Could it be an OPP issue ? I tried to look in the kernel and device tree how is configured the OPP but I cannot figure it out yet. But maybe these random hangs could be caused by a bad OPP settings ? Am I right ? 

  • Hi Sylvain,

    What type of SDRAM do you use? Have you checked the settings in the new Linux you use?

  • Hi, 

    We are using 256MB DDR3. 

    We followed the tuning procedure and put the data into u-boot. is there another configuration in the kernel ?

    As I said the board is running for weeks with the 3.2 kernel. With the same u-boot, so the same DDR3 settings. 

    Some new information:

    I was able to test some OPP.

    The settings were not optimal, 600MHz with VDD MPU at 1.1V, but it is a 1.0 revision so the OPP is 600MHz 1.2V. 

    I changed the sources to set the 1.2V but no effect... 

    Then I tried to reduce frequency to 500MHz ... not better...

    Now I am testing at 275MHz... I will update soon. 

    During these tests, the only things running are : a heartbeat LED and a script that print uptime each 5 sec on the console.

  • the test with 275MHz is still crashing. 

    I had a panic after 5 min startin with : 

    pagealloc : memory corruption. 

    As I described, the issue is never the same, but sometimes I have panic messages that are related to page allocation. such "Unable to handle kernel paging request..."

  • Hi All, 

    After discussion, we think that our issue is really due to a memory corruption. We identified 4 types of issues:

    1. System freeze without any error message during boot, after boot, logged on.
    2. Kernel panic after boot with message Unable to handle kernel paging request
    3. System freeze with last message : pagealloc : Memory corruption
    4. Panic due to NULL pointer

    We checked the DDR configuration by running u-boot mtest over the night and it was still running in the morning. It was run over approx 200MB on the DDR 256MB. 

    In addition, the u-boot never hangs and it can load the 3.2 kernel and run it without any issue. So we can consider that DDR3 memory is correctly configured. 

    So what could cause this memory issue ? We need someone's help here...

    1. MMU bad configuration ?
    2. Modification of the DDR configuration by the kernel ?
      1. Volountarily
      2. By a buffer overflow ?
    3. Bad DDR power supply 
      1. We will check this
      2. Why would it change after kernel loading ?

    Once again, our processor is a AM3352ZCZ60 (revision 1.0 600MHz)

  • I'm running 3.14 (linux-omap repo) and so far the system is stable, aside from WLAN crashes like this:

    # calibrator wlan0 plt power_mode on

    [ 31.586550] wlcore: power up
    [ 32.167762] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
    # [ 82.175488] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
    [ 82.182983] ------------[ cut here ]------------
    [ 82.187953] WARNING: CPU: 0 PID: 12 at drivers/net/wireless/ti/wlcore/main.c:789 wl12xx_queue_recovery_work+0x60/0x6c()
    [ 82.199408] Modules linked in:
    [ 82.202768] CPU: 0 PID: 12 Comm: kworker/u2:1 Not tainted 3.14.0-rc4-12738-g674748b-dirty #241
    [ 82.211865] Workqueue: phy0 wl12xx_tx_watchdog_work
    [ 82.217196] [<c00151e0>] (unwind_backtrace) from [<c0011e38>] (show_stack+0x10/0x14)
    [ 82.225487] [<c0011e38>] (show_stack) from [<c05c5688>] (dump_stack+0x7c/0x94)
    [ 82.233224] [<c05c5688>] (dump_stack) from [<c0040274>] (warn_slowpath_common+0x6c/0x90)
    [ 82.241763] [<c0040274>] (warn_slowpath_common) from [<c0040334>] (warn_slowpath_null+0x1c/0x24)
    [ 82.251125] [<c0040334>] (warn_slowpath_null) from [<c038ddd8>] (wl12xx_queue_recovery_work+0x60/0x6c)
    [ 82.261047] [<c038ddd8>] (wl12xx_queue_recovery_work) from [<c038def0>] (wl12xx_tx_watchdog_work+0x10c/0x140)
    [ 82.271603] [<c038def0>] (wl12xx_tx_watchdog_work) from [<c0058fac>] (process_one_work+0x1ac/0x4c4)
    [ 82.281236] [<c0058fac>] (process_one_work) from [<c0059e40>] (worker_thread+0x114/0x3b4)
    [ 82.289961] [<c0059e40>] (worker_thread) from [<c005f9fc>] (kthread+0xcc/0xe8)
    [ 82.297684] [<c005f9fc>] (kthread) from [<c000e388>] (ret_from_fork+0x14/0x2c)
    [ 82.305378] ---[ end trace 8c7ebca2e4a9c209 ]---

    Just a guess, have you enabled all of the possible erratas in the kernel? I had similar issue with am3517 and one of the latest kernels, where I haven't activated some of these erratas and had arbitrary memory crashes, that I didn't have with 2.6.37.

  • Thanks for your suggestion but I found the issue ! 

    It was not easy to find... 

    Here is the story... 

    Our board is a "copy" from the AM3359 starter kit, at least the power and DDR3 part. We used the SDK 6.0 and kernel 3.2 and it was working great. 

    The issue appeared when we tried the 3.8 kernel from the beagle bone repo, we thought it was due to the beagle bone optimization and we did not investigate further. We were waiting for the SDK 7.0. But the issue was still there with the SDK 7.0... 

    After testing lot of kernel configuration, we went back this morning to hardware measurement... And there we found the surprize ! The DDR3 termination regulator enable (DDR_VTT_EN) was falling to 0 just after kernel start !!! 

    We were surprized because this was always working before, and we thought that if this pin was set to 0, the DDR3 would fail immediately, not with this random behavior. 

    Then we had to understand why... And we were a bit disapointing by the starter kit method to control this pin... 

    This hardware is first configured by u-boot to enable the DDR3. Then in the kernel device tree, a section is dedicated to this pin in the pinmux : 

    ddr3_vtt_toggle: ddr3_vtt_toggle {
      pinctrl-single,pins = <
        0x164 0x7
      >;
    }

    We have also this section into our dts, again copied from the starter kit. 

    But we removed 3 lines that were in the starter kit and that we found useless for us:

    &gpio0 {
      ti,no-reset;
    }

    In fact the first section is only configuring the multiplexing to GPIO mode. This is not configuring the pin as output high level. Then instead of configuring the pin as output high, the TI configuration is just saying "don't touch GPIO0" and keep the config that was set in u-boot... 

    Is there another method to configure explicitly the GPIO0_7 pin as output-high in the device tree ? 

    We were very disapointed about this issue and the solution... We think that at least a comment into the starter kit dts is necessary.

  • Hi Sylvain,

    Great news, and thanks for sharing your results on the forum! This SDK is new to all of us and all shared information  helps a lot. Thanks again!

  • Hi Sylvain,

    I am also facing the same issue in Linux 3.12 but i am using DDR2 (512MB). It is working in my linux 3.2 and 3.12 there is random hangs.

    what is the reason. I could't figure out the issue.

    could you suggest me how did you fixed your issues. 

    Regards,

    Anil