DM3730: Kernel 5.10 : MTD partitions mount, ubi doesn't consistently

Nicole Baldy

Part Number: DM3730

Tool/software:

Hi,

I moved up our version of the linux kernel and filesystem and in general it seems to work well, but there's been one particularly pesky issue that I haven't been able to figure out how to get past.

~4/5 times, the board will boot correctly, but every once in a while, I'll get a timeout on my RFS mount. In this case, I still see the MTD partitions looking okay.

[ 8.376464] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xb3

[ 8.384735] nand: Micron MT29X

[ 8.390075] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64

[ 8.398345] omap2-nand 20000000.nand: using prefetch polled xfer mode

[ 8.405151] omap2-nand 20000000.nand: nand: using OMAP_ECC_HAM1_CODE_HW

[ 8.413940] 18 fixed-partitions partitions found on MTD device omap2-nand.0

[ 8.421630] Creating 18 MTD partitions on "omap2-nand.0":

[ 8.427368] 0x000000000000-A : "MLO"

[ 8.473144] A-B : "U-Boot"

[ 8.514190] B-C : "Environment"

[ 8.553131] C-D : "Kernel 1"

...

[ 8.708160] J-K : "Filesystem 1"

...

The boot process looks identical, up to the point of

[ 11.992401] ubi0: attaching mtd8

Where I see:

[ 12.112304] ubi0 error: ubi_io_is_bad: error -110 while checking if PEB 149 is bad

[ 12.130645] ubi0 error: ubi_attach_mtd_dev: failed to attach mtd8, error -110

[ 12.139129] UBI error: cannot attach mtd8

This is is occurring before the "rootwait/rootdelay" option comes into play, because I see (I have rootwait and rootdelay=1):

[ 12.219329] Waiting 1 sec before mounting root device...

[ 13.298339] VFS: Cannot open root device "ubi0:compu-tdi7-rootfs" or unknown-block(0,0): error -19

[ 13.308349] Please append a correct "root=" boot option; here are the available partitions:

[ 13.317321] 0100 16384 ram0

[ 13.317352] (driver?)

[ 13.324310] 0101 16384 ram1

[ 13.324340] (driver?)

...

[ 13.346008] 1f00 512 mtdblock0

[ 13.346038] (driver?)

[ 13.353210] 1f01 1792 mtdblock1

[ 13.353271] (driver?)

[ 13.360473] 1f02 128 mtdblock2

[ 13.360504] (driver?)

...

[ 13.467224] (driver?)

[ 13.474395] 1f11 49152 mtdblock17

...

This does seem to be tied to something wrong with my kernel, as when I fall back to our older 3.19 kernel, I don't see this happen.

These related defconfigs are enabled for ti-staging-linux 5.10:

CONFIG_MTD_NAND_CORE=y

CONFIG_MTD_ONENAND=y

CONFIG_MTD_ONENAND_VERIFY_WRITE=y

CONFIG_MTD_ONENAND_OMAP2=y

CONFIG_MTD_NAND_ECC_SW_HAMMING=y

CONFIG_MTD_NAND_ECC_SW_HAMMING_SMC=y

CONFIG_MTD_RAW_NAND=y

CONFIG_MTD_NAND_OMAP2=y

&gpmc {

nand@0,0 {

compatible = "ti,omap2-nand";

reg = <0 0 4>;

interrupt-parent = <&gpmc>;

interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */

<1 IRQ_TYPE_NONE>; /* termcount */

linux,mtd-name= "micron,mt29X";

nand-bus-width = <16>;

ti,nand-ecc-opt = "ham1";

gpmc,device-width = <2>;

gpmc,cs-on-ns = <0>;

gpmc,cs-rd-off-ns = <36>;

gpmc,cs-wr-off-ns = <36>;

gpmc,adv-on-ns = <6>;

gpmc,adv-rd-off-ns = <24>;

gpmc,adv-wr-off-ns = <36>;

gpmc,oe-on-ns = <6>;

gpmc,oe-off-ns = <48>;

gpmc,we-on-ns = <6>;

gpmc,we-off-ns = <30>;

gpmc,rd-cycle-ns = <72>;

gpmc,wr-cycle-ns = <72>;

gpmc,access-ns = <54>;

gpmc,wr-access-ns = <30>;

#address-cells = <1>;

#size-cells = <1>;

/* MTD partition table */

/* Partitions size minimal length

* which can be independently programmed is a block.

* Erase is block based.

* For this NAND flash this is equal to 128K */

partition@0 {

label = "MLO";

reg = <0 A>;

};

partition@1 {

label = "U-Boot";

reg = <A B-A>;

};

partition@2 {

label = "Environment";

reg = <B C-B>;

};

partition@3 {

label = "Kernel 1";

reg = <C D-C>;

};

...

partition@7 {

label = "Filesystem 1";

reg = <G H-G>;

};

partition@8 {

label = "Filesystem 2";

reg = <H I-H>;

};

...

};

Does anything stick out to you as being a reason why we would periodically fail to mount this properly? If nothing else, can you give any pointers as to the best way to automatically recover from this via a reboot rather than the kernel crashing at the next step? I will start looking into fallback initialization options... Would the timeouts for ubi checks have decreased? I also see timeouts occasionally while running: (-110 is a timeout, I think)

[ 132.486480] ubi3 warning: ubi_io_read: error -110 while reading 64 bytes from PEB 151:0, read only 0 bytes, retry

Note that I see other ubi mounts fail too on other partitions, and then I rerun the command and it succeeds (without a reboot). This is because all of the failures are due to timeouts, so it's smelling like a clock speed issue?

I looked for any clock frequency changes between versions and didn't see anything, so I'm going to keep looking.

Any thoughts would be appreciated!

5 months ago

0 Nicole Baldy 5 months ago

Prodigy 40 points

I meant to add: I do use ubiformat to flash, it doesn't seem to be *how* it is flashed as the kernel is backwards-compatible with the old rfs so I can test the same 3.19 rfs which does not ever do this with the 3.19 kernel, and see it occasionally happen with the 5.10 kernel.

Also meant to add that the commandline is: console=ttyO0,115200n8 rootwait rw rootdelay=1 ubi.mtd=8,512 rootfstype=ubifs root=ubi0:compu-tdi7-rootfs mtdoops.mtddev=omap2.nand earlyprintk=ttyO0,115200n8 nohlt omapfb.rotate=0 6 systemd.unified_cgroup_hierarchy=0

0 Mukul Bhatnagar 5 months ago in reply to Nicole Baldy

TI__Guru* 84075 points

Hello Nicole

DM3730 is now a legacy device and we may not have the ability to provide much guidance on this

https://www.ti.com/product/DM3730 (see note on the product folder)

Will see if someone from the team can provide some pointers. If you are willing to work directly with some 3Ps please let us know and we can make some recommendations.

0 Nicole Baldy 5 months ago in reply to Mukul Bhatnagar

Prodigy 40 points

Thank you - I think I ended up getting it. Since adding the rb-gpios property to that node, I haven't seen any timeouts. I didn't go as far as to trace what changed that this was required (if it was defaulting before, or was actually able to work with timeouts without this interrupt), but this seems to have solved the problems. Generally willing to work with 3Ps on future issues if you have any recommendations for future issues. Thanks!

0 Mukul Bhatnagar 5 months ago in reply to Nicole Baldy

TI__Guru* 84075 points

Good to hear Nicole.

I would recommend talking to 3Ps like Baylibre, Bootlin, Mistral Inc

Processors

Processors forum

DM3730: Kernel 5.10 : MTD partitions mount, ubi doesn't consistently