This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM4376: Linux 4.4 boot failures, potential QSPI JFFS2

Part Number: AM4376


Tool/software: Linux

Hi,

we need to post an issue we are seeing with Linux 4.4 kernel on a customer target HW that has been running previous Linux 4.1 from previous SDKs without these issues. The suspicion is that it may have to do with QSPI NOR JFFS2. Any pointers are greatly appreciated.

With the Linux Kernel 4.4 we are running, we notice that there are occasional boot failures which manifest in the form of "Init" process not starting or a kernel crash with a stack dump.

 

While we are trying to analyze the reason for the failures, we notice a suspicious message coming from the ti-qspi driver that appears to result in a possible jffs2 read failure.

For background, we use jffs2 filesystem for our QSPI NOR. We see these issues on both the GP and HS based boards. Our kernel is 4.4.y from Krogoth branch.

 

On an average we see about 25 "ti-qspi 47900000.qspi: DMA wait_for_completion_timeout" messages for every 1000 cold boot cycles. Of this the Init failure is about 4 to 5 times.

 

This message from the ti-qspi also seems to appear on runs from the 4.1.x fido branch. However, it does not result in any lockups of crashes over 2000 cold cycle boot.

 

Suspicious message from the ti-qspi driver (init failed to start in this case):

[    1.772762] omap_i2c 4802a000.i2c: bus 1 rev0.12 at 100 kHz

[    1.780375] hctosys: unable to open rtc device (rtc0)

[    1.793443] Freeing unused kernel memory: 2004K (c07ef000 - c09e4000)

[    1.799882] This architecture does not have kernel memory protection.

[    8.132448] random: nonblocking pool is initialized

[    8.562683] jffs2: notice: (195) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.

[    8.606267] EXT4-fs (mmcblk0p4): couldn't mount as ext3 due to feature incompatibilities

[    8.616046] EXT4-fs (mmcblk0p4): couldn't mount as ext2 due to feature incompatibilities

[    8.659588] EXT4-fs (mmcblk0p4): recovery complete

[    8.665091] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null)

mount: mounting /dev/mtdblock5 on /rootfs failed: Device or resource busy

[    8.861456] ti-qspi 47900000.qspi: DMA wait_for_completion_timeout

[    8.867982] jffs2: error: (1) read_direntry: read remainder of name: error -110

[    8.875656] jffs2: error: (1) jffs2_do_read_inode_internal: cannot read nodes for ino 13, returned error is -5

[    8.885788] jffs2: iget() failed for ino #13

switch_root: can't execute '/sbin/init': Input/output error

[    8.912758] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

[    8.912758]

[    8.921893] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

[    8.921893]

 

 

Suspicious message from the ti-qspi driver (normal init start in this case):

[    1.642776] omap_i2c 4802a000.i2c: bus 1 rev0.12 at 100 kHz

[    1.650388] hctosys: unable to open rtc device (rtc0)

[    7.589789] random: nonblocking pool is initialized

[    8.210623] jffs2: notice: (1) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.

[    8.229641] VFS: Mounted root (jffs2 filesystem) on device 31:5.

[    8.236097] devtmpfs: mounted

[    8.239535] Freeing unused kernel memory: 260K (c080b000 - c084c000)

[    8.245909] This architecture does not have kernel memory protection.

INIT: version 2.88 booting

[    9.343013] EXT4-fs (mmcblk0p3): recovery complete

[    9.348311] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null)

[    9.361185] EXT4-fs (mmcblk0p4): recovery complete

[    9.366126] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null)

Starting udev

[    9.461479] ti-qspi 47900000.qspi: DMA wait_for_completion_timeout

[    9.467681] jffs2: error: (83) read_direntry: read remainder of name: error -110

[    9.475239] jffs2: error: (83) jffs2_do_read_inode_internal: cannot read nodes for ino 810, returned error is -5

[    9.485656] jffs2: iget() failed for ino #810

[    9.881995] udevd[97]: starting version 3.1.5

[   10.236832] omap_rtc 44e3e000.rtc: already running

[   10.296200] omap_rtc 44e3e000.rtc: rtc core: registered 44e3e000.rtc as rtc0

[   10.324644] omap_wdt: OMAP Watchdog Timer Rev 0x01: initial timeout 60 sec

 

Comparable boot with no such message:

[    1.642787] omap_i2c 4802a000.i2c: bus 1 rev0.12 at 100 kHz

[    1.650397] hctosys: unable to open rtc device (rtc0)

[    7.598711] random: nonblocking pool is initialized

[    8.067240] jffs2: notice: (1) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.

[    8.085259] VFS: Mounted root (jffs2 filesystem) on device 31:5.

[    8.091747] devtmpfs: mounted

[    8.095196] Freeing unused kernel memory: 260K (c080b000 - c084c000)

[    8.101577] This architecture does not have kernel memory protection.

INIT: version 2.88 booting

[    9.163412] EXT4-fs (mmcblk0p3): recovery complete

[    9.168675] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null)

[    9.182149] EXT4-fs (mmcblk0p4): recovery complete

[    9.186962] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null)

Starting udev

udev: Not using udev cache because of changes detected in the following files:

udev:     /proc/version /proc/cmdline /proc/devices

udev:     lib/udev/rules.d/* etc/udev/rules.d/*

udev: The udev cache will be regenerated. To identify the detected changes,

udev: compare the cached sysconf at   /etc/udev/cache.data

udev: against the current sysconf at  /dev/shm/udev.cache

[    9.549253] udevd[96]: starting version 3.1.5

[    9.934537] omap_rtc 44e3e000.rtc: already running

[    9.967047] omap_rtc 44e3e000.rtc: rtc core: registered 44e3e000.rtc as rtc0

[   10.025894] omap_wdt: OMAP Watchdog Timer Rev 0x01: initial timeout 60 sec

[   10.093914] omap_rng 48310000.rng: OMAP Random Number Generator ver. 20

 

The Kernel crash also appears to have signatures indicating jffs2 operation in progress (from backtrace). Please see attached crash_log.txt.

 

On an initial review, we noticed a couple of checkins related to enabling DMA and MMAP reads for QSPI in the krogoth branch for spi-ti-qspi.c

It would be very helpful if someone can take a quick look at this issue and give us some pointers on how to debug this.

 

Have these messages appeared in SDK system testing?

 

 

Thanks,

--Gunter

7563.bootissue.zip

 

 

  • Hi Gunter,

    Could you check whether the same booting issue appears with the last PDK v04.00 with Linux kernel 4.9.28 ?

    BR
    Tsvetolin Shulev
  • Gunter,

    After some investigation I found where the DMA timeouts. Seem there is a timing issue in the "qspi_dma_transfer" function of ./drivers/spi/spi-ti-qspi.c file. Waiting period of "wait_for_completion_timeout" expires and this causing the error message "DMA wait_for_completion_timeout". This means the task is marked as TASK_UNINTERRUPTIBLE and will wait for some period of time (in jiffies).
    I suggest you to increase or multiply the period only for experiment to check whether the issue appearing will be changed.
    For example change

    ret = wait_for_completion_timeout(&qspi->transfer_complete,
        msecs_to_jiffies(len));


    with

    ret = wait_for_completion_timeout(&qspi->transfer_complete,
        2 * msecs_to_jiffies(len));

    or use higher multiply factor (4 for example).

    BR
    Tsvetolin Shulev