This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3351: ECC discrepancy between u-boot and the kernel

Part Number: AM3351

Hello TI experts,

I'm working on a product based on the AM3351 chip, with a "MT29F2G08ABAGAH4-IT:G" NAND.

Our early prototype was using sqfs-over-mtd but we are now moving to use ubifs/ubi for the system rootfs.

Everything seems to be okay, system is booting and running. However, if the kernel modifies the rootfs, it results in some weird ECC failures. And we suspect some ECC discrepancies between u-boot and the kernel.

 

It seems that the OOB data written by u-boot is much longer than the OOB data written by the kernel.

The problem can be reproduced as follow:

  • use a ubifs rootfs,
  • boot linux and remount the rootfs as rw (mount / -o remount,tw && touch /rootfs-was-writable)
  • reboot linux - it fails which an avalanche of logs like the following: 
[    4.459922] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.466156] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.473037] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.479261] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.486006] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.492278] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.498976] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.505261] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.511986] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.518204] omap2-nand 8000000.nand: uncorrectable bit-flips found
[    4.538076] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 129024 bytes from PEB 65:2048, read only 129024 bytes, retry

 

Here are the OOB area before (left) and after (right) we remount the filesystem a RW.

=> nand dump.oob 0xc00000   => nand dump.oob 0xc00000
Page 00c00000 dump: Page 00c00000 dump:
OOB: OOB:
ff ff 94 91 0b ab f2 07 ff ff f6 a9 b0 6e e3 38
f8 86 0a 1b b7 21 8b 25 06 68 7c bb ce c0 fe 98
e1 63 74 6c 9a 7e 7d 91 1d b0 22 71 74 64 e6 78
8e a3 49 49 ce df f4 14 f3 3d 43 0c ff ff ff ff
dd 17 ef a7 74 05 fa df ff ff ff ff ff ff ff ff
49 cd dd c1 bd ed ff 11 ff ff ff ff ff ff ff ff
b5 6c 45 04 00 d4 65 28 ff ff ff ff ff ff ff ff
10 6e 77 7f 04 08 f9 c5 ff ff ff ff ff ff ff ff
a3 60 b6 db 2f 8a fd 1c ff ff ff ff ff ff ff ff
a6 1f 1b 43 e1 df 8f d1 ff ff ff ff ff ff ff ff
65 28 10 6e 77 7f 04 08 ff ff ff ff ff ff ff ff
f9 c5 a3 60 b6 db 2f 8a ff ff ff ff ff ff ff ff
fd 1c a6 1f 1b 43 e1 df ff ff ff ff ff ff ff ff
8f d1 ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

It looks like the kernel is writing much less data in the OOB.

 

In December, we received from TI a quite big patch to support 128b OOB area. (For reference, I'll attach the patch we received.)

We are probably missing a similar patch for the Linux kernel. Can TI provide a similar patch for the kernel?

 

I'll attach the DTS section for the GPMC for both the kernel and u-boot (at first sight, they seems to match).

  

Thanks in advance for your help on this issue,

  • This is the patch we received from TI to support 128b OOB area. 

    From c3292eee9b2733044d2e36eb5efb05cae65e4bfa Mon Sep 17 00:00:00 2001
    Date: Fri, 8 Dec 2023 16:51:21 -0500
    Subject: [PATCH] Fix ECC handling in uboot
    
    These changes enable proper ECC support in the SPL.
    
    Prior to these changes, TI's code did not support 128b OOB areas.  
    Additionally, there was code missing ELM initialization which would cause 
    crashes should any ECC errors be detected.
    
    ---
     drivers/mtd/nand/raw/omap_elm.c  |  5 +-
     drivers/mtd/nand/raw/omap_elm.h  |  6 --
     drivers/mtd/nand/raw/omap_gpmc.c | 96 ++++++++++----------------------
     3 files changed, 32 insertions(+), 75 deletions(-)
    
    diff --git a/drivers/mtd/nand/raw/omap_elm.c b/drivers/mtd/nand/raw/omap_elm.c
    index 35a066df41..ca88b65f9e 100644
    --- a/drivers/mtd/nand/raw/omap_elm.c
    +++ b/drivers/mtd/nand/raw/omap_elm.c
    @@ -185,7 +185,6 @@ void elm_reset(void)
     		;
     }
     
    -#ifdef ELM_BASE
     /**
      * elm_init - Initialize ELM module
      *
    @@ -194,10 +193,11 @@ void elm_reset(void)
      */
     void elm_init(void)
     {
    +#ifdef ELM_BASE
     	elm_cfg = (struct elm *)ELM_BASE;
     	elm_reset();
    -}
     #endif
    +}
     
     #ifdef CONFIG_SYS_NAND_SELF_INIT
     static int elm_probe(struct udevice *dev)
    @@ -209,7 +209,6 @@ static int elm_probe(struct udevice *dev)
     	elm_cfg = devm_ioremap(dev, res.start, resource_size(&res));
     	elm_reset();
     #endif
    -
     	return 0;
     }
     
    diff --git a/drivers/mtd/nand/raw/omap_elm.h b/drivers/mtd/nand/raw/omap_elm.h
    index a7f7bacb15..f3db00d55d 100644
    --- a/drivers/mtd/nand/raw/omap_elm.h
    +++ b/drivers/mtd/nand/raw/omap_elm.h
    @@ -74,12 +74,6 @@ int elm_check_error(u8 *syndrome, enum bch_level bch_type, u32 *error_count,
     		u32 *error_locations);
     int elm_config(enum bch_level level);
     void elm_reset(void);
    -#ifdef ELM_BASE
     void elm_init(void);
    -#else
    -static inline void elm_init(void)
    -{
    -}
    -#endif
     #endif /* __ASSEMBLY__ */
     #endif /* __ASM_ARCH_ELM_H */
    diff --git a/drivers/mtd/nand/raw/omap_gpmc.c b/drivers/mtd/nand/raw/omap_gpmc.c
    index 43f7363b9d..622d9e95ad 100644
    --- a/drivers/mtd/nand/raw/omap_gpmc.c
    +++ b/drivers/mtd/nand/raw/omap_gpmc.c
    @@ -293,7 +293,7 @@ static void __maybe_unused omap_enable_hwecc_bch(struct mtd_info *mtd,
     		break;
     	case OMAP_ECC_BCH8_CODE_HW:
     		bch_type = 1;
    -		nsectors = chip->ecc.steps;
    +		nsectors = 1;
     		if (mode == NAND_ECC_READ) {
     			wr_mode   = BCH_WRAPMODE_1;
     			ecc_size0 = BCH8R_ECC_SIZE0;
    @@ -306,7 +306,7 @@ static void __maybe_unused omap_enable_hwecc_bch(struct mtd_info *mtd,
     		break;
     	case OMAP_ECC_BCH16_CODE_HW:
     		bch_type = 0x2;
    -		nsectors = chip->ecc.steps;
    +		nsectors = 1;
     		if (mode == NAND_ECC_READ) {
     			wr_mode   = 0x01;
     			ecc_size0 = 52; /* ECC bits in nibbles per sector */
    @@ -345,17 +345,16 @@ static void __maybe_unused omap_enable_hwecc_bch(struct mtd_info *mtd,
     }
     
     /**
    - * _omap_calculate_ecc_bch - Generate BCH ECC bytes for one sector
    + * omap_calculate_ecc_bch - Generate BCH ECC bytes for one sector
      * @mtd:        MTD device structure
      * @dat:        The pointer to data on which ecc is computed
      * @ecc_code:   The ecc_code buffer
    - * @sector:     The sector number (for a multi sector page)
      *
      * Support calculating of BCH4/8/16 ECC vectors for one sector
      * within a page. Sector number is in @sector.
      */
    -static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
    -				   u8 *ecc_code, int sector)
    +static int omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
    +				   u8 *ecc_code)
     {
     	struct nand_chip *chip = mtd_to_nand(mtd);
     	struct omap_nand_info *info = nand_get_controller_data(chip);
    @@ -368,7 +367,7 @@ static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
     	case OMAP_ECC_BCH8_CODE_HW_DETECTION_SW:
     #endif
     	case OMAP_ECC_BCH8_CODE_HW:
    -		ptr = &gpmc_cfg->bch_result_0_3[sector].bch_result_x[3];
    +		ptr = &gpmc_cfg->bch_result_0_3[0].bch_result_x[3];
     		val = readl(ptr);
     		ecc_code[i++] = (val >>  0) & 0xFF;
     		ptr--;
    @@ -383,21 +382,21 @@ static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
     
     		break;
     	case OMAP_ECC_BCH16_CODE_HW:
    -		val = readl(&gpmc_cfg->bch_result_4_6[sector].bch_result_x[2]);
    +		val = readl(&gpmc_cfg->bch_result_4_6[0].bch_result_x[2]);
     		ecc_code[i++] = (val >>  8) & 0xFF;
     		ecc_code[i++] = (val >>  0) & 0xFF;
    -		val = readl(&gpmc_cfg->bch_result_4_6[sector].bch_result_x[1]);
    +		val = readl(&gpmc_cfg->bch_result_4_6[0].bch_result_x[1]);
     		ecc_code[i++] = (val >> 24) & 0xFF;
     		ecc_code[i++] = (val >> 16) & 0xFF;
     		ecc_code[i++] = (val >>  8) & 0xFF;
     		ecc_code[i++] = (val >>  0) & 0xFF;
    -		val = readl(&gpmc_cfg->bch_result_4_6[sector].bch_result_x[0]);
    +		val = readl(&gpmc_cfg->bch_result_4_6[0].bch_result_x[0]);
     		ecc_code[i++] = (val >> 24) & 0xFF;
     		ecc_code[i++] = (val >> 16) & 0xFF;
     		ecc_code[i++] = (val >>  8) & 0xFF;
     		ecc_code[i++] = (val >>  0) & 0xFF;
     		for (j = 3; j >= 0; j--) {
    -			val = readl(&gpmc_cfg->bch_result_0_3[sector].bch_result_x[j]
    +			val = readl(&gpmc_cfg->bch_result_0_3[0].bch_result_x[j]
     									);
     			ecc_code[i++] = (val >> 24) & 0xFF;
     			ecc_code[i++] = (val >> 16) & 0xFF;
    @@ -431,22 +430,6 @@ static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
     	return 0;
     }
     
    -/**
    - * omap_calculate_ecc_bch - ECC generator for 1 sector
    - * @mtd:        MTD device structure
    - * @dat:	The pointer to data on which ecc is computed
    - * @ecc_code:	The ecc_code buffer
    - *
    - * Support calculating of BCH4/8/16 ECC vectors for one sector. This is used
    - * when SW based correction is required as ECC is required for one sector
    - * at a time.
    - */
    -static int __maybe_unused omap_calculate_ecc_bch(struct mtd_info *mtd,
    -				  const u_char *dat, u_char *ecc_calc)
    -{
    -	return _omap_calculate_ecc_bch(mtd, dat, ecc_calc, 0);
    -}
    -
     static inline void omap_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
     {
     	struct nand_chip *chip = mtd_to_nand(mtd);
    @@ -572,34 +555,6 @@ static void omap_nand_read_prefetch(struct mtd_info *mtd, uint8_t *buf, int len)
     
     #ifdef CONFIG_NAND_OMAP_ELM
     
    -/**
    - * omap_calculate_ecc_bch_multi - Generate ECC for multiple sectors
    - * @mtd:	MTD device structure
    - * @dat:	The pointer to data on which ecc is computed
    - * @ecc_code:	The ecc_code buffer
    - *
    - * Support calculating of BCH4/8/16 ecc vectors for the entire page in one go.
    - */
    -static int omap_calculate_ecc_bch_multi(struct mtd_info *mtd,
    -					const u_char *dat, u_char *ecc_calc)
    -{
    -	struct nand_chip *chip = mtd_to_nand(mtd);
    -	int eccbytes = chip->ecc.bytes;
    -	unsigned long nsectors;
    -	int i, ret;
    -
    -	nsectors = ((readl(&gpmc_cfg->ecc_config) >> 4) & 0x7) + 1;
    -	for (i = 0; i < nsectors; i++) {
    -		ret = _omap_calculate_ecc_bch(mtd, dat, ecc_calc, i);
    -		if (ret)
    -			return ret;
    -
    -		ecc_calc += eccbytes;
    -	}
    -
    -	return 0;
    -}
    -
     /*
      * omap_reverse_list - re-orders list elements in reverse order [internal]
      * @list:	pointer to start of list
    @@ -752,32 +707,38 @@ static int omap_read_page_bch(struct mtd_info *mtd, struct nand_chip *chip,
     {
     	int i, eccsize = chip->ecc.size;
     	int eccbytes = chip->ecc.bytes;
    -	int ecctotal = chip->ecc.total;
     	int eccsteps = chip->ecc.steps;
     	uint8_t *p = buf;
     	uint8_t *ecc_calc = chip->buffers->ecccalc;
     	uint8_t *ecc_code = chip->buffers->ecccode;
     	uint32_t *eccpos = chip->ecc.layout->eccpos;
     	uint8_t *oob = chip->oob_poi;
    +	uint32_t data_pos;
     	uint32_t oob_pos;
     
    +	data_pos = 0;
     	/* oob area start */
     	oob_pos = (eccsize * eccsteps) + chip->ecc.layout->eccpos[0];
     	oob += chip->ecc.layout->eccpos[0];
     
    -	/* Enable ECC engine */
    -	chip->ecc.hwctl(mtd, NAND_ECC_READ);
    +	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize,
    +				oob += eccbytes) {
    +		/* Enable ECC engine */
    +		chip->ecc.hwctl(mtd, NAND_ECC_READ);
     
    -	/* read entire page */
    -	chip->cmdfunc(mtd, NAND_CMD_RNDOUT, 0, -1);
    -	chip->read_buf(mtd, buf, mtd->writesize);
    +		/* read data */
    +		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_pos, -1);
    +		chip->read_buf(mtd, p, eccsize);
     
    -	/* read all ecc bytes from oob area */
    -	chip->cmdfunc(mtd, NAND_CMD_RNDOUT, oob_pos, -1);
    -	chip->read_buf(mtd, oob, ecctotal);
    +		/* read respective ecc from oob area */
    +		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, oob_pos, -1);
    +		chip->read_buf(mtd, oob, eccbytes);
    +		/* read syndrome */
    +		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
     
    -	/* Calculate ecc bytes */
    -	omap_calculate_ecc_bch_multi(mtd, buf, ecc_calc);
    +		data_pos += eccsize;
    +		oob_pos += eccbytes;
    +	}
     
     	for (i = 0; i < chip->ecc.total; i++)
     		ecc_code[i] = chip->oob_poi[eccpos[i]];
    @@ -945,6 +906,7 @@ static int omap_select_ecc_scheme(struct nand_chip *nand,
     		nand->ecc.hwctl		= omap_enable_hwecc_bch;
     		nand->ecc.correct	= omap_correct_data_bch_sw;
     		nand->ecc.calculate	= omap_calculate_ecc_bch;
    +		nand->ecc.steps		= eccsteps;
     		/* define ecc-layout */
     		ecclayout->eccbytes	= nand->ecc.bytes * eccsteps;
     		ecclayout->eccpos[0]	= BADBLOCK_MARKER_LENGTH;
    @@ -986,6 +948,7 @@ static int omap_select_ecc_scheme(struct nand_chip *nand,
     		nand->ecc.hwctl		= omap_enable_hwecc_bch;
     		nand->ecc.correct	= omap_correct_data_bch;
     		nand->ecc.calculate	= omap_calculate_ecc_bch;
    +		nand->ecc.steps		= eccsteps;
     		nand->ecc.read_page	= omap_read_page_bch;
     		/* define ecc-layout */
     		ecclayout->eccbytes	= nand->ecc.bytes * eccsteps;
    @@ -1019,6 +982,7 @@ static int omap_select_ecc_scheme(struct nand_chip *nand,
     		nand->ecc.hwctl		= omap_enable_hwecc_bch;
     		nand->ecc.correct	= omap_correct_data_bch;
     		nand->ecc.calculate	= omap_calculate_ecc_bch;
    +		nand->ecc.steps		= eccsteps;
     		nand->ecc.read_page	= omap_read_page_bch;
     		/* define ecc-layout */
     		ecclayout->eccbytes	= nand->ecc.bytes * eccsteps;
    diff --git a/drivers/mtd/nand/raw/omap_elm.c b/drivers/mtd/nand/raw/omap_elm.c
    index 35a066df41..ca88b65f9e 100644
    --- a/drivers/mtd/nand/raw/omap_elm.c
    +++ b/drivers/mtd/nand/raw/omap_elm.c
    diff --git a/include/configs/am335x_custom.h b/include/configs/am335x_custom.h
    index 18b6607..6e6a2cb 100644
    --- a/include/configs/am335x_custom.h
    +++ b/include/configs/am335x_custom.h
    @@ -130,10 +130,16 @@
                          26, 27, 28, 29, 30, 31, 32, 33, \
                          34, 35, 36, 37, 38, 39, 40, 41, \
                          42, 43, 44, 45, 46, 47, 48, 49, \
    -                     50, 51, 52, 53, 54, 55, 56, 57, }
    +                     50, 51, 52, 53, 54, 55, 56, 57, \
    +                     58, 59, 60, 61, 62, 63, 64, 65, \
    +                     66, 67, 68, 69, 70, 71, 72, 73, \
    +                     74, 75, 76, 77, 78, 79, 80, 81, \
    +                     82, 83, 84, 85, 86, 87, 88, 89, \
    +                     90, 91, 92, 93, 94, 95, 96, 97, \
    +                     98, 99, 100, 101, 102, 103, 104, 105,}
     
     #define CFG_SYS_NAND_ECCSIZE        512
    -#define CFG_SYS_NAND_ECCBYTES   14
    +#define CFG_SYS_NAND_ECCBYTES   26
     #endif /* !CONFIG_MTD_RAW_NAND */
     
     /* USB Device Firmware Update support */
    

    And here is the section of the DTS related to `gpmc`:

    &gpmc {
    	status = "okay";
    	pinctrl-names = "default";
    	pinctrl-0 = <&nandflash_pins_s0>;
    	ranges = <0 0 0x08000000 0x10000000>;	/* CS0: 256MB for NAND */
    	nand@0,0 {
    		compatible = "ti,omap2-nand";
    		reg = <0 0 4>; /* CS0, offset 0, IO size 4 */
    		interrupt-parent = <&gpmc>;
    		interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */
    					 <1 IRQ_TYPE_NONE>;	/* termcount */
    		rb-gpios = <&gpmc 0 GPIO_ACTIVE_HIGH>; /* gpmc_wait0 */
    		ti,nand-xfer-type = "prefetch-polled";
    		ti,nand-ecc-opt = "bch16";
    		ti,elm-id = <&elm>;
    		nand-bus-width = <8>;
    		gpmc,device-width = <1>;

  • Hi Arnold,

    this will require some digging.

    This is the patch we received from TI to support 128b OOB area. 

    First, who sent you this patch? The patch itself doesn't have a name/committer in it, which is unusual for a Git-originated patch.

    Regards, Andreas

  • Hi Andreas.

    Original patch was received via Michael Shklyarman (no names associated on it at the time) on Nov 27th

    diff --git a/drivers/mtd/nand/raw/omap_gpmc.c b/drivers/mtd/nand/raw/omap_gpmc.c
    index 43f7363b9dd..fc8383d84b4 100644
    --- a/drivers/mtd/nand/raw/omap_gpmc.c
    +++ b/drivers/mtd/nand/raw/omap_gpmc.c
    @@ -293,7 +293,7 @@ static void __maybe_unused omap_enable_hwecc_bch(struct mtd_info *mtd,
     		break;
     	case OMAP_ECC_BCH8_CODE_HW:
     		bch_type = 1;
    -		nsectors = chip->ecc.steps;
    +		nsectors = 1;
     		if (mode == NAND_ECC_READ) {
     			wr_mode   = BCH_WRAPMODE_1;
     			ecc_size0 = BCH8R_ECC_SIZE0;
    @@ -306,7 +306,7 @@ static void __maybe_unused omap_enable_hwecc_bch(struct mtd_info *mtd,
     		break;
     	case OMAP_ECC_BCH16_CODE_HW:
     		bch_type = 0x2;
    -		nsectors = chip->ecc.steps;
    +		nsectors = 1;
     		if (mode == NAND_ECC_READ) {
     			wr_mode   = 0x01;
     			ecc_size0 = 52; /* ECC bits in nibbles per sector */
    @@ -345,17 +345,16 @@ static void __maybe_unused omap_enable_hwecc_bch(struct mtd_info *mtd,
     }
     
     /**
    - * _omap_calculate_ecc_bch - Generate BCH ECC bytes for one sector
    + * omap_calculate_ecc_bch - Generate BCH ECC bytes for one sector
      * @mtd:        MTD device structure
      * @dat:        The pointer to data on which ecc is computed
      * @ecc_code:   The ecc_code buffer
    - * @sector:     The sector number (for a multi sector page)
      *
      * Support calculating of BCH4/8/16 ECC vectors for one sector
      * within a page. Sector number is in @sector.
      */
    -static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
    -				   u8 *ecc_code, int sector)
    +static int omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
    +				  u8 *ecc_code)
     {
     	struct nand_chip *chip = mtd_to_nand(mtd);
     	struct omap_nand_info *info = nand_get_controller_data(chip);
    @@ -368,7 +367,7 @@ static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
     	case OMAP_ECC_BCH8_CODE_HW_DETECTION_SW:
     #endif
     	case OMAP_ECC_BCH8_CODE_HW:
    -		ptr = &gpmc_cfg->bch_result_0_3[sector].bch_result_x[3];
    +		ptr = &gpmc_cfg->bch_result_0_3[0].bch_result_x[3];
     		val = readl(ptr);
     		ecc_code[i++] = (val >>  0) & 0xFF;
     		ptr--;
    @@ -383,21 +382,21 @@ static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
     
     		break;
     	case OMAP_ECC_BCH16_CODE_HW:
    -		val = readl(&gpmc_cfg->bch_result_4_6[sector].bch_result_x[2]);
    +		val = readl(&gpmc_cfg->bch_result_4_6[0].bch_result_x[2]);
     		ecc_code[i++] = (val >>  8) & 0xFF;
     		ecc_code[i++] = (val >>  0) & 0xFF;
    -		val = readl(&gpmc_cfg->bch_result_4_6[sector].bch_result_x[1]);
    +		val = readl(&gpmc_cfg->bch_result_4_6[0].bch_result_x[1]);
     		ecc_code[i++] = (val >> 24) & 0xFF;
     		ecc_code[i++] = (val >> 16) & 0xFF;
     		ecc_code[i++] = (val >>  8) & 0xFF;
     		ecc_code[i++] = (val >>  0) & 0xFF;
    -		val = readl(&gpmc_cfg->bch_result_4_6[sector].bch_result_x[0]);
    +		val = readl(&gpmc_cfg->bch_result_4_6[0].bch_result_x[0]);
     		ecc_code[i++] = (val >> 24) & 0xFF;
     		ecc_code[i++] = (val >> 16) & 0xFF;
     		ecc_code[i++] = (val >>  8) & 0xFF;
     		ecc_code[i++] = (val >>  0) & 0xFF;
     		for (j = 3; j >= 0; j--) {
    -			val = readl(&gpmc_cfg->bch_result_0_3[sector].bch_result_x[j]
    +			val = readl(&gpmc_cfg->bch_result_0_3[0].bch_result_x[j]
     									);
     			ecc_code[i++] = (val >> 24) & 0xFF;
     			ecc_code[i++] = (val >> 16) & 0xFF;
    @@ -431,22 +430,6 @@ static int _omap_calculate_ecc_bch(struct mtd_info *mtd, const u8 *dat,
     	return 0;
     }
     
    -/**
    - * omap_calculate_ecc_bch - ECC generator for 1 sector
    - * @mtd:        MTD device structure
    - * @dat:	The pointer to data on which ecc is computed
    - * @ecc_code:	The ecc_code buffer
    - *
    - * Support calculating of BCH4/8/16 ECC vectors for one sector. This is used
    - * when SW based correction is required as ECC is required for one sector
    - * at a time.
    - */
    -static int __maybe_unused omap_calculate_ecc_bch(struct mtd_info *mtd,
    -				  const u_char *dat, u_char *ecc_calc)
    -{
    -	return _omap_calculate_ecc_bch(mtd, dat, ecc_calc, 0);
    -}
    -
     static inline void omap_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
     {
     	struct nand_chip *chip = mtd_to_nand(mtd);
    @@ -572,34 +555,6 @@ static void omap_nand_read_prefetch(struct mtd_info *mtd, uint8_t *buf, int len)
     
     #ifdef CONFIG_NAND_OMAP_ELM
     
    -/**
    - * omap_calculate_ecc_bch_multi - Generate ECC for multiple sectors
    - * @mtd:	MTD device structure
    - * @dat:	The pointer to data on which ecc is computed
    - * @ecc_code:	The ecc_code buffer
    - *
    - * Support calculating of BCH4/8/16 ecc vectors for the entire page in one go.
    - */
    -static int omap_calculate_ecc_bch_multi(struct mtd_info *mtd,
    -					const u_char *dat, u_char *ecc_calc)
    -{
    -	struct nand_chip *chip = mtd_to_nand(mtd);
    -	int eccbytes = chip->ecc.bytes;
    -	unsigned long nsectors;
    -	int i, ret;
    -
    -	nsectors = ((readl(&gpmc_cfg->ecc_config) >> 4) & 0x7) + 1;
    -	for (i = 0; i < nsectors; i++) {
    -		ret = _omap_calculate_ecc_bch(mtd, dat, ecc_calc, i);
    -		if (ret)
    -			return ret;
    -
    -		ecc_calc += eccbytes;
    -	}
    -
    -	return 0;
    -}
    -
     /*
      * omap_reverse_list - re-orders list elements in reverse order [internal]
      * @list:	pointer to start of list
    @@ -752,32 +707,38 @@ static int omap_read_page_bch(struct mtd_info *mtd, struct nand_chip *chip,
     {
     	int i, eccsize = chip->ecc.size;
     	int eccbytes = chip->ecc.bytes;
    -	int ecctotal = chip->ecc.total;
     	int eccsteps = chip->ecc.steps;
     	uint8_t *p = buf;
     	uint8_t *ecc_calc = chip->buffers->ecccalc;
     	uint8_t *ecc_code = chip->buffers->ecccode;
     	uint32_t *eccpos = chip->ecc.layout->eccpos;
     	uint8_t *oob = chip->oob_poi;
    +	uint32_t data_pos;
     	uint32_t oob_pos;
     
    +	data_pos = 0;
     	/* oob area start */
     	oob_pos = (eccsize * eccsteps) + chip->ecc.layout->eccpos[0];
     	oob += chip->ecc.layout->eccpos[0];
     
    -	/* Enable ECC engine */
    -	chip->ecc.hwctl(mtd, NAND_ECC_READ);
    +	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize,
    +				oob += eccbytes) {
    +		/* Enable ECC engine */
    +		chip->ecc.hwctl(mtd, NAND_ECC_READ);
     
    -	/* read entire page */
    -	chip->cmdfunc(mtd, NAND_CMD_RNDOUT, 0, -1);
    -	chip->read_buf(mtd, buf, mtd->writesize);
    +		/* read data */
    +		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_pos, -1);
    +		chip->read_buf(mtd, p, eccsize);
     
    -	/* read all ecc bytes from oob area */
    -	chip->cmdfunc(mtd, NAND_CMD_RNDOUT, oob_pos, -1);
    -	chip->read_buf(mtd, oob, ecctotal);
    +		/* read respective ecc from oob area */
    +		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, oob_pos, -1);
    +		chip->read_buf(mtd, oob, eccbytes);
    +		/* read syndrome */
    +		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
     
    -	/* Calculate ecc bytes */
    -	omap_calculate_ecc_bch_multi(mtd, buf, ecc_calc);
    +		data_pos += eccsize;
    +		oob_pos += eccbytes;
    +	}
     
     	for (i = 0; i < chip->ecc.total; i++)
     		ecc_code[i] = chip->oob_poi[eccpos[i]];
    @@ -945,6 +906,7 @@ static int omap_select_ecc_scheme(struct nand_chip *nand,
     		nand->ecc.hwctl		= omap_enable_hwecc_bch;
     		nand->ecc.correct	= omap_correct_data_bch_sw;
     		nand->ecc.calculate	= omap_calculate_ecc_bch;
    +		nand->ecc.steps		= eccsteps;
     		/* define ecc-layout */
     		ecclayout->eccbytes	= nand->ecc.bytes * eccsteps;
     		ecclayout->eccpos[0]	= BADBLOCK_MARKER_LENGTH;
    @@ -987,6 +949,7 @@ static int omap_select_ecc_scheme(struct nand_chip *nand,
     		nand->ecc.correct	= omap_correct_data_bch;
     		nand->ecc.calculate	= omap_calculate_ecc_bch;
     		nand->ecc.read_page	= omap_read_page_bch;
    +		nand->ecc.steps		= eccsteps;
     		/* define ecc-layout */
     		ecclayout->eccbytes	= nand->ecc.bytes * eccsteps;
     		for (i = 0; i < ecclayout->eccbytes; i++)
    @@ -1020,6 +983,7 @@ static int omap_select_ecc_scheme(struct nand_chip *nand,
     		nand->ecc.correct	= omap_correct_data_bch;
     		nand->ecc.calculate	= omap_calculate_ecc_bch;
     		nand->ecc.read_page	= omap_read_page_bch;
    +		nand->ecc.steps		= eccsteps;
     		/* define ecc-layout */
     		ecclayout->eccbytes	= nand->ecc.bytes * eccsteps;
     		for (i = 0; i < ecclayout->eccbytes; i++)
    

    It was later upstreamed here it though it seems (by Roger):

    https://git.ti.com/cgit/ti-u-boot/ti-u-boot/commit/?h=ti-u-boot-2023.04&id=c0176ab8dd53f86e193c578ec4781400e9fb18e3

    When that failed to work, we were then provided this patch email again by Michael on Dec 1 (again seems upstreamed by Roger)

    From 51033711fb849e02c392187e3f4107aebf7bd9ea Mon Sep 17 00:00:00 2001
    From: Roger Quadros <rogerq@kernel.org>
    Date: Fri, 1 Dec 2023 18:25:27 +0200
    Subject: [PATCH] mtd: rawnand: omap_elm: Fix elm_init definition
    
    The macro ELM_BASE is defined in mach/hardware.h and is
    not visible at the omap_elm.h header file. Avoid using it
    in omap_elm.h.
    
    Fixes: 7363cf0581a3 ("mtd: rawnand: omap_elm: u-boot driver model support")
    Signed-off-by: Roger Quadros <rogerq@kernel.org>
    ---
     drivers/mtd/nand/raw/omap_elm.c | 4 ++--
     drivers/mtd/nand/raw/omap_elm.h | 6 ------
     2 files changed, 2 insertions(+), 8 deletions(-)
    
    diff --git a/drivers/mtd/nand/raw/omap_elm.c b/drivers/mtd/nand/raw/omap_elm.c
    index 35a066df41a..0e4954a0515 100644
    --- a/drivers/mtd/nand/raw/omap_elm.c
    +++ b/drivers/mtd/nand/raw/omap_elm.c
    @@ -185,7 +185,6 @@ void elm_reset(void)
     		;
     }
     
    -#ifdef ELM_BASE
     /**
      * elm_init - Initialize ELM module
      *
    @@ -194,10 +193,11 @@ void elm_reset(void)
      */
     void elm_init(void)
     {
    +#ifdef ELM_BASE
     	elm_cfg = (struct elm *)ELM_BASE;
     	elm_reset();
    -}
     #endif
    +}
     
     #ifdef CONFIG_SYS_NAND_SELF_INIT
     static int elm_probe(struct udevice *dev)
    diff --git a/drivers/mtd/nand/raw/omap_elm.h b/drivers/mtd/nand/raw/omap_elm.h
    index a7f7bacb154..f3db00d55de 100644
    --- a/drivers/mtd/nand/raw/omap_elm.h
    +++ b/drivers/mtd/nand/raw/omap_elm.h
    @@ -74,12 +74,6 @@ int elm_check_error(u8 *syndrome, enum bch_level bch_type, u32 *error_count,
     		u32 *error_locations);
     int elm_config(enum bch_level level);
     void elm_reset(void);
    -#ifdef ELM_BASE
     void elm_init(void);
    -#else
    -static inline void elm_init(void)
    -{
    -}
    -#endif
     #endif /* __ASSEMBLY__ */
     #endif /* __ASM_ARCH_ELM_H */
    
    base-commit: 11dcd9e84f90e876d9cc7259f3cdb98b7959676e
    -- 
    2.34.1
    
    

    Finally, after the 2nd patch still didn't solve the problem, we were provided a third patch on Dec 8 (Again by Michael over email) - but upstreamed by Hong:

    From 2dab9b3a78795824b8897fce789485031fdab8e7 Mon Sep 17 00:00:00 2001
    From: Hong Guan <hguan@ti.com>
    Date: Fri, 8 Dec 2023 13:57:47 -0600
    Subject: [PATCH 1/1] Add u-boot GPMC-NAND support for MT29F2G08ABAGAH4-IT:G in
     AM335x SDK 9.1
    
    Signed-off-by: Hong Guan <hguan@ti.com>
    ---
     configs/am335x_evm_defconfig |  3 ++-
     include/configs/am335x_evm.h | 10 ++++++++--
     2 files changed, 10 insertions(+), 3 deletions(-)
    
    diff --git a/configs/am335x_evm_defconfig b/configs/am335x_evm_defconfig
    index 84fb0c2..dc27f27 100644
    --- a/configs/am335x_evm_defconfig
    +++ b/configs/am335x_evm_defconfig
    @@ -83,11 +83,12 @@ CONFIG_SYS_I2C_EEPROM_ADDR=0x50
     CONFIG_MMC_OMAP_HS=y
     CONFIG_MTD=y
     CONFIG_MTD_RAW_NAND=y
    +CONFIG_NAND_OMAP_ECCSCHEME_BCH16_CODE_HW=y
     CONFIG_SYS_NAND_BLOCK_SIZE=0x20000
     CONFIG_SYS_NAND_ONFI_DETECTION=y
     CONFIG_SYS_NAND_PAGE_COUNT=0x40
     CONFIG_SYS_NAND_PAGE_SIZE=0x800
    -CONFIG_SYS_NAND_OOBSIZE=0x40
    +CONFIG_SYS_NAND_OOBSIZE=0x80
     CONFIG_SYS_NAND_U_BOOT_LOCATIONS=y
     CONFIG_SYS_NAND_U_BOOT_OFFS=0xc0000
     CONFIG_DM_SPI_FLASH=y
    diff --git a/include/configs/am335x_evm.h b/include/configs/am335x_evm.h
    index 5b47778..743160e 100644
    --- a/include/configs/am335x_evm.h
    +++ b/include/configs/am335x_evm.h
    @@ -174,10 +174,16 @@
     					 26, 27, 28, 29, 30, 31, 32, 33, \
     					 34, 35, 36, 37, 38, 39, 40, 41, \
     					 42, 43, 44, 45, 46, 47, 48, 49, \
    -					 50, 51, 52, 53, 54, 55, 56, 57, }
    +					 50, 51, 52, 53, 54, 55, 56, 57, \
    +					 58, 59, 60, 61, 62, 63, 64, 65, \
    +					 66, 67, 68, 69, 70, 71, 72, 73, \
    +					 74, 75, 76, 77, 78, 79, 80, 81, \
    +					 82, 83, 84, 85, 86, 87, 88, 89, \
    +					 90, 91, 92, 93, 94, 95, 96, 97, \
    +					 98, 99, 100, 101, 102, 103, 104, 105,}
     
     #define CFG_SYS_NAND_ECCSIZE		512
    -#define CFG_SYS_NAND_ECCBYTES	14
    +#define CFG_SYS_NAND_ECCBYTES	26
     #endif /* !CONFIG_MTD_RAW_NAND */
     
     /* USB Device Firmware Update support */
    -- 
    2.7.4
    
    

    The patch my colleague Arnold shared should be a cullumanation of the 3.

  • Hi Andreas,

    Terry has answered regarding the origin of the patch. Sorry if my first post was a bit obscure.

    We dived a bit deeper and we found that that the ECC as written by the kernel was shorter because not every subpage were written.

    We suspect that the method used to write the UBI partition in the first place, from u-boot, was not very UBI compliant. We were using `fastboot erase` and `fastboot write system system.ubi` (which ends up calling `nand_write_skip_bad()` in u-boot).

    As a experiment, we manually rewrote the system partition within u-boot using:
    ```
    ubi remove rootfs
    ubi create rootfs -
    ubi write $loadaddr rootfs 0x3700800
    ```
    Using this way of writing the UBI data, the filesystem can be remounted as RW and it does not result anymore into ECC errors. So the origin of our problem is probably identified.

    What we don't understand well, and that's the part that is worrying us, why is there causing ECC errors.
    What is happening under the hood? We could see our "nandwrite-like" is causing corruption of ubi, but why ecc errors. ECC is handled at the MTD level.

    Regards,

    Arnaud

  • Hi Arnaud,

    what exact U-Boot and Kernel versions are you using?

    What we don't understand well, and that's the part that is worrying us, why is there causing ECC errors.

    Just to re-confirm, those errors appear immediately after trying to write data to the filesystem, from Linux, right? And that happens on all your devices the same way, right? So it's almost certainly not an actual Flash data corruption / wear issue or anything like this, but rather SW stack usage-related, right?

    Regards, Andreas

  • Hello Andreas,

    We're using Yocto, pointing to TI meta-layer that are tagged with TISDK-09.01.00.001. So we're using:

    - u-boot-2023.04 (d74d0993e2)

    - linux-ti-staging-6.1.46 (1d4b5da681)

    However, since my answer yesterday, we dived a bit further and found two explanations in the UBI/UBIFS documentation:

    http://www.linux-mtd.infradead.org/faq/ubifs.html#L_why_ubiformat

    http://www.linux-mtd.infradead.org/faq/ubifs.html#L_free_space_fixup

    My understanding of this doc is that UBI images are better written with proper ubi tools (like ubiformat). If using MTD routines, the ECC of empty pages may be different, especially with HW backed ECC. Using the `-F` option to mkfs.ubifs forces the system to fix those ECC at first mount.

    Is it something TI could confirm? That the BCH16 ECC for an "all-0xFF" subpage would not be "all-0xFF"?

    I think we can consider this issue as resolved. All the ECC logs put us on the wrong track, and made the issue look more worrying that it is.

    Thanks,
    Arnold