Part Number: AM3358
Tool/software:
Hello,
We came across a peculiar problem and we suspect it's a bug with the omap2.c driver.
What we observed: we were testing bit flips in NAND and in one case, we saw a single bit flip in the last byte of the ECC (Byte 26 stored in OOB)
corresponding to a 512 byte sector of an empty page. We are using BCH16.
This was not corrected (even though it should have since its a single flip, much less than the ECC strength) and resulted in the page being marked as non-empty by UBIFS, which led to a kernel panic.
Concerned areas of code:
in the file omap2.c, in function omap_nand_attach_chip(), where it sets up ECC for each scheme
|
case OMAP_ECC_BCH8_CODE_HW:
pr_info("nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme\n");
chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST;
chip->ecc.size = 512;
/* 14th bit is kept reserved for ROM-code compatibility */
chip->ecc.bytes = 13 + 1;
...
case OMAP_ECC_BCH16_CODE_HW:
pr_info("Using OMAP_ECC_BCH16_CODE_HW ECC scheme\n");
chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST;
chip->ecc.size = 512;
chip->ecc.bytes = 26;
chip->ecc.strength = 16;
|
Notice that for BCH8, even though the ECC size is 13 size bytes (from the TRM) , the driver treats it as 14 bytes (The comments are not very helpful why this is being done). In case of BCH16, this is not the case. It uses the ECC size of 26 bytes, matching the TRM.
Moving on the function erased_sector_bitflips(), there is a check to count number of bit flips in data and OOB areas of an empty page. This function decides if this page needs to be further corrected by ELM or not.
|
for (i = 0; i < info->nand.ecc.size; i++) {
flip_bits += hweight8(~data[i]);
}
for (i = 0; i < info->nand.ecc.bytes - 1; i++) {
flip_bits += hweight8(~oob[i]);
}
|
Notice that when it loops over the bytes of OOB, we skip the last byte of ECC in OOB.
This is ok for BCH4, BCH8, as their defined ECC byte size is (size +1),
but results in the last byte of ECC of BCH16 being ignored for checks, thus, corrections.
For now, we just added a conditional check in this function:
|
if(info->ecc_opt == OMAP_ECC_BCH16_CODE_HW)
bytes = info->nand.ecc.bytes;
else
bytes = info->nand.ecc.bytes - 1;
for (i = 0; i < bytes; i++) {
flip_bits += hweight8(~oob[i]);
}
|
Do you think this wrong size is affecting other areas of the driver too?
Regards