AM623: Running CQE recovery

Gibbs Shih

Part Number: AM623

Tool/software:

Hi, Dear Expert

This is extension discussion of this thread, because of the original thread was locked.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1504295/am623-mmc0-running-cqe-recovery-problem

(1) Issues : We found linux uart log output "mmc0: running CQE recovery" , and some files crash. eMMC works on HS200, and we already make sure disable DDR mode in uboot / linux DTS, because AM62 do not support eMMC DDR mode.

(2) SDK : 10.01.10.04

(3) eMMC parts : THGBMUG6C1LBAIL (KIOXIA)

(4) SDIO0 (eMMC) signal integrity : HS200 Pass (hardware)

(5) SDIO0 (eMMC) switching characteristics : HS200 Pass (hardware)

(6) Clue : If we reduce speed to HS mode, "CQE recovery" disappear

(7) Not every board have the same issues, as far as I know, failed rate about 15%, we are still keep tracking.

(8) Error log.

* Log 1 :

[    4.711702] mmc0: running CQE recovery
[    4.735046] mmc0: running CQE recovery
[    4.758311] mmc0: running CQE recovery
[    4.766138] I/O error, dev mmcblk0, sector 102744 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 2
[    4.775291] EXT4-fs warning (device mmcblk0p1): htree_dirblock_to_tree:1082: inode #880: lblock 0: comm
systemd: error -5 reading directory block
[    4.788391] systemd[1]: tmp.mount: Failed to check directory /tmp: Input/output error
[    4.815487] mmc0: running CQE recovery
[    4.838846] mmc0: running CQE recovery
[    4.862113] mmc0: running CQE recovery
[    4.869930] I/O error, dev mmcblk0, sector 291832 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2

* Log 2 :

[   11.193981] mmc0: running CQE recovery
[   11.199428] Buffer I/O error on dev mmcblk0p1, logical block 0, lost sync page write
[   11.207195] EXT4-fs (mmcblk0p1): I/O error while writing superblock
[   11.229373] mmc0: running CQE recovery
[   11.252665] mmc0: running CQE recovery
[   11.275931] mmc0: running CQE recovery

* Log 3 :

[    1.309542] mmc0: Command Queue Engine enabled
[    1.314030] mmc0: new HS200 MMC card at address 0001
[    1.319843] mmcblk0: mmc0:0001 008GB1 1.16 GiB
[    1.325811] mmc0: running CQE recovery
[    1.332062] mmc0: running CQE recovery
[    1.338170] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    1.345632] GPT:2113535 != 2441215
[    1.349132] GPT:Alternate GPT header not at the end of the disk.
[    1.355233] GPT:2113535 != 2441215
[    1.358647] GPT: Use GNU Parted to correct GPT errors.
[    1.363807]  mmcblk0: p1 p2
[    1.367355] mmcblk0boot0: mmc0:0001 008GB1 4.00 MiB
[    1.373287] mmcblk0boot1: mmc0:0001 008GB1 4.00 MiB
[    1.379213] mmcblk0gp0: mmc0:0001 008GB1 2.55 GiB
[    1.385927] mmc0: running CQE recovery
[    1.392012] mmc0: running CQE recovery
[    1.397284] mmc0: running CQE recovery
[    1.402967]  mmcblk0gp0: p1 p2
[    1.406607] mmcblk0rpmb: mmc0:0001 008GB1 4.00 MiB, chardev (243:0)
[    1.474043] mmc2: Failed to initialize a non-removable card
[    2.730108] sdhci-am654 fa00000.mmc: Power on failed
[    2.771213] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    2.780358] mmc0: running CQE recovery
[    2.785587] mmc0: running CQE recovery
[    2.791585] mmc0: running CQE recovery\

* Log 4 :

 1.205849] mmc2: CQHCI version 5.10
[    1.230661] mmc0: SDHCI controller on fa10000.mmc [fa10000.mmc] using ADMA 64-bit
[    1.238247] mmc2: SDHCI controller on fa20000.mmc [fa20000.mmc] using ADMA 64-bit
[    1.316005] mmc0: Command Queue Engine enabled
[    1.320483] mmc0: new HS200 MMC card at address 0001
[    1.326164] mmcblk0: mmc0:0001 008GB1 1.16 GiB
[    1.332067] mmc0: running CQE recovery
[    1.337534] mmc0: running CQE recovery
[    1.343268] mmc0: running CQE recovery
[    1.347344] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[    1.355843] Buffer I/O error on dev mmcblk0, logical block 0, async page read

Few question :

(1) As far as I know, most of eMMC "CQE recovery" error already solved by SDK 10.01.10.04, isn't?

Because I think SDK 10.01.10.04 should be update these patch about EXT_EP-12086

https://sir.ext.ti.com/jira/browse/EXT_EP-12086?jql=text%20~%20%22running%20CQE%20recovery%22

(2) How to track uboot / linux code that we ever make any modification for eMMC issues? because I would like check one by one ...

(3) What's difference (update) between SDK 11 and SDK 10.01.10.04 about eMMC code update? I also track this thread, it seems still have eMMC issues under tracking w/ SDK11

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1513257/am623-copy-files-to-emmc-trigger-cqe-error/5846867?tisearch=e2e-sitesearch&keymatch=am62%2525252520mmc0%2525252520running%2525252520CQE%2525252520recovery#5846867

We need some debug direction, need your suggestion.

Thank You Very Much

Gibbs

3 months ago