AM625: eMMC running CQE recovery

Ryuuichi machida

Part Number: AM625
Other Parts Discussed in Thread: SK-AM62B,

Tool/software:

Hello

At first, this question is related to following thread.
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1327667/am625-emmc-tranision-issue/5104544#5104544

My customer try to use SDK ver 09.02.01.10 with kernel "ti-linux-6.1.y" (From : ://git.ti.com/ti-linux-kernel/ti-linux-kernel.git)
Preivously, they used SDK ver 08.06.00.42 and faced eMMC transition problem with kernel panic.

After changing SDK to latest version, they do not observe kernel panic, however they observed following message on log as well.

"mmc0: running CQE recovery"

Is it possible to comment why this message was observed ?
(Is it due to CRC error ?)

Here is what customer did.

* Extract tar file(extension is "gz") from SD to eMMC.

Thanks in advance,

over 1 year ago

0 Prashant Shivhare over 1 year ago

TI__Guru 70081 points

Hi Machida-san,

Ryuuichi machida said:
* Extract tar file(extension is "gz") from SD to eMMC.

What is the size of this file?

Does the issue occur every time the file is extracted?

Regards,

Prashant

0 Ryuuichi machida over 1 year ago in reply to Prashant Shivhare

Guru 12655 points

Hello,

>What is the size of this file?
1.5GB before extracting. (4GB after extracting)

>Does the issue occur every time the file is extracted?
Yes.

BR,

0 Prashant Shivhare over 1 year ago in reply to Ryuuichi machida

TI__Guru 70081 points

Hi Machina-san,

Thank you. Please allow me a day or two to once test this and get back to you.

Regards,

Prashant

0 Andreas Dannenberg over 1 year ago

TI__Guru 69652 points

Hi Machida-san,

Ryuuichi machida said:
My customer try to use SDK ver 09.02.01.10 with kernel "ti-linux-6.1.y" (From : ://git.ti.com/ti-linux-kernel/ti-linux-kernel.git)
Preivously, they used SDK ver 08.06.00.42 and faced eMMC transition problem with kernel panic.

After changing SDK to latest version, they do not observe kernel panic, however they observed following message on log as well.

great to hear they are using the latest Kernel.

Can you please confirm they also absorbed SDK v9.2 related changes to the `sdhci0` device tree node into their own DTS files, and re-built those. Specifically how it's defined in k3-am62-main.dtsi, see https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/arch/arm64/boot/dts/ti/k3-am62-main.dtsi?h=ti-linux-6.1.y#n541 It is critical to absorb the latest changes ti,* properties defining various settings and delays.

Regards, Andreas

0 Ryuuichi machida over 1 year ago in reply to Andreas Dannenberg

Guru 12655 points

Hello Prashant-san,

Thank you for your response.
I will wait for your feedback.

Hello Andreas-san,

Thank you for your advise.

>Can you please confirm they also absorbed SDK v9.2 related changes to the `sdhci0` device tree node into their own DTS files, and re-built those.
I have shared patch file for device tree from customer to apply their custom board.
According to this patch file, it seems that they do not any change for k3-am62-main.dtsi file.
This means that they use SDK v9.2 ordinally k3-am62-main.dtsi file, however I will confirm this just in case.

Best Regards,

0 Ryuuichi machida over 1 year ago in reply to Ryuuichi machida

Guru 12655 points

Hello Andreas-san,

Here is additional information about below.
>This means that they use SDK v9.2 ordinally k3-am62-main.dtsi file, however I will confirm this just in case.
As a result of confirming with custumer, as I mentioned above, customer do NOT change code of "k3-am62-main.dtsi" at all. So customer is using SDK ver9.2 original k3-am62-main.dtsi file.

Best Regards,

0 Andreas Dannenberg over 1 year ago in reply to Ryuuichi machida

TI__Guru 69652 points

Hi Machida-san,

thanks for the confirmation. So this is a custom HW design/board I suppose, NOT a TI board?

One thing that should be done is to review HW design/layout.

Regards, Andreas

0 Ryuuichi machida over 1 year ago in reply to Andreas Dannenberg

Guru 12655 points

Hello Andreas-san,

Yes, this is observed on their custom board.

How can we review ?

Should I send custom board schematic and layout to you ?

Best Regards,

0 Andreas Dannenberg over 1 year ago in reply to Ryuuichi machida

TI__Guru 69652 points

Hi Machida-san,

Ryuuichi machida said:
Should I send custom board schematic and layout to you ?

Two ways...

1) You can ask for the relevant section of the schematic and board layout, showing the AM62x SoC, the traces, and the eMMC (both schematic and board), leaving out all other stuff, and post it here to the forum to review, or

2) I can reach out to you via your email address stored in our E2E system, and you can send the information directly by email.

Let me know which way you prefer.

Ryuuichi machida said:
Yes, this is observed on their custom board.

Can they try to see if they can re-create this issue on a TI SK-AM62B board? If that's the case this will greatly help narrow down the issue and any analysis/debug efforts.

Regards, Andreas

0 Ryuuichi machida over 1 year ago in reply to Andreas Dannenberg

Guru 12655 points

Hello, Andreas-san,

>Can they try to see if they can re-create this issue on a TI SK-AM62B board?
In my case, I did not see such log under ver SDK9.2.

Here is difference.
Command : $ dd if=/dev/mmclk1 of=/dev/mmcblk0 bs=512

Result :
Ver 8.6 : I got above error case of ver8.6
root@am62xx-evm:~# dd if=/dev/mmcblk1 of=/dev/mmcblk0 bs=512
[ 605.211905] mmc0: running CQE recovery
[ 605.220767] ------------[ cut here ]------------
[ 605.225400] mmc0: cqhci: spurious TCN for tag 1
[ 605.229988] WARNING: CPU: 0 PID: 171 at drivers/mmc/host/cqhci.c:742 cqhci_irq+0x318/0x4a0
[ 605.238231] Modules linked in: xt_conntrack xt_addrtype iptable_filter br_netfilter bridge stp llc overlay xfrm_user xfrm_algo md5 des_generic libdes cbc iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables x_tables wl18xx wlcore mac80211 xhci_plat_hcd xhci_hcd usbcore cfg80211 rfkill libarc4 rpmsg_char dwc3 udc_core usb_common cdns_csi2rx v4l2_fwnode pru_rproc irq_pruss_intc crct10dif_ce snd_soc_simple_card snd_soc_simple_card_utils wlcore_sdio pvrsrvkm(O) ti_k3_r5_remoteproc dwc3_am62 virtio_rpmsg_bus rti_wdt sa2ul ti_k3_m4_remoteproc sha512_generic tps6598x authenc mcrc typec snd_soc_tlv320aic3x roles j721e_csi2rx cdns_dphy videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common pruss sch_fq_codel cryptodev(O) ipv6
[ 605.306816] CPU: 0 PID: 171 Comm: kworker/0:1H Tainted: G O 5.10.168-g2c23e6c538 #1
[ 605.315751] Hardware name: Texas Instruments AM625 SK (DT)
[ 605.321230] Workqueue: kblockd blk_mq_run_work_fn
[ 605.325926] pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--)
[ 605.331917] pc : cqhci_irq+0x318/0x4a0
[ 605.335654] lr : cqhci_irq+0x318/0x4a0
[ 605.339389] sp : ffff8000112cbd40
[ 605.342691] x29: ffff8000112cbd40 x28: ffff000001c40580
[ 605.347990] x27: ffff000001074880 x26: 0000000000000001
[ 605.353289] x25: ffff800010e63ff0 x24: ffff000001074898
[ 605.358589] x23: ffff80001122d730 x22: ffff000000ca3900
[ 605.363890] x21: ffff000001c40000 x20: 0000000000000002
[ 605.369189] x19: 0000000000000001 x18: 0000000000000010
[ 605.374488] x17: 0000000000000000 x16: 0000000000000000
[ 605.379788] x15: ffff000000ca3e50 x14: 00000000000001a8
[ 605.385088] x13: ffff000000ca3e50 x12: 00000000ffffffea
[ 605.390386] x11: ffff8000111b0770 x10: ffff800011198730
[ 605.395687] x9 : ffff800011198788 x8 : 0000000000017fe8
[ 605.400986] x7 : c0000000ffffefff x6 : 0000000000000001
[ 605.406285] x5 : ffff00007fb81ab8 x4 : 0000000000000000
[ 605.411584] x3 : 0000000000000027 x2 : 0000000000000023
[ 605.416884] x1 : 1e3d1bcf28029100 x0 : 0000000000000000
[ 605.422184] Call trace:
[ 605.424625] cqhci_irq+0x318/0x4a0
[ 605.428016] sdhci_am654_cqhci_irq+0x58/0x88
[ 605.432273] sdhci_irq+0xb0/0xe80
[ 605.435580] __handle_irq_event_percpu+0x54/0x178
[ 605.440270] handle_irq_event_percpu+0x34/0x90
[ 605.444699] handle_irq_event+0x48/0x90
[ 605.448526] handle_fasteoi_irq+0xb8/0x170
[ 605.452611] generic_handle_irq+0x30/0x48
[ 605.456608] __handle_domain_irq+0x64/0xc0
[ 605.460694] gic_handle_irq+0x58/0x128
[ 605.464432] el1_irq+0xcc/0x180
[ 605.467568] _raw_spin_unlock_irqrestore+0x14/0x48
[ 605.472345] cqhci_request+0xc8/0x4e8
[ 605.476001] mmc_cqe_start_req+0x58/0x68
[ 605.479914] mmc_blk_mq_issue_rq+0x49c/0x880
[ 605.484171] mmc_mq_queue_rq+0x118/0x2b0
[ 605.488082] blk_mq_dispatch_rq_list+0x104/0x7f0
[ 605.492687] __blk_mq_sched_dispatch_requests+0xd4/0x1a0
[ 605.497984] blk_mq_sched_dispatch_requests+0x38/0x78
[ 605.503021] __blk_mq_run_hw_queue+0xac/0x128
[ 605.507365] blk_mq_run_work_fn+0x20/0x30
[ 605.511366] process_one_work+0x1a0/0x348
[ 605.515363] worker_thread+0x4c/0x440
[ 605.519016] kthread+0x140/0x160
[ 605.522234] ret_from_fork+0x10/0x30
[ 605.525797] ---[ end trace 95c7690e7b7bb091 ]---
[ 605.534148] mmc0: running CQE recovery
[ 605.543928] mmc0: running CQE recovery
[ 605.553315] mmc0: running CQE recovery
[ 605.563928] mmc0: running CQE recovery
[ 605.574600] mmc0: running CQE recovery
[ 605.583716] mmc0: running CQE recovery
[ 605.620836] mmc0: running CQE recovery

ver9.2 : I did NOT see above error case of ver9.2 (Here is just due to lower size than SD card.)
root@am62xx-evm:~# dd if=/dev/mmcblk1 of=/dev/mmcblk0 bs=512
dd: error writing '/dev/mmcblk0': No space left on device
[ 1949.520168] mmcblk0: p1 p2
[ 1949.523523] mmcblk0: p2 size 60805120 extends beyond EOD, truncated
31080449+0 records in
31080448+0 records out
15913189376 bytes (16 GB, 15 GiB) copied, 1793.05 s, 8.9 MB/s
root@am62xx-evm:~#
--

From below, I believe that you are also confirming whether you can observe same phenomenon.
How about result of below ?

>Please allow me a day or two to once test this and get back to you.

Best Regards,

0 Ryuuichi machida over 1 year ago in reply to Ryuuichi machida

Guru 12655 points

Hello Andreas-san,

I sent schematic and layout to you on private message.
Please confirm my message.

BR,

0 Ryuuichi machida over 1 year ago in reply to Ryuuichi machida

Guru 12655 points

Hello Andreas-san and Prashant-san,

Do you have any feedback about my information ?

Best Regards,

0 Andreas Dannenberg over 1 year ago in reply to Ryuuichi machida

TI__Guru 69652 points

Hi Machida-san,

actually the team has been working on a support case that seems very much related (same error signature), and has developed a patch set that improves eMMC stability. The two relevant patches have been posted below just a few days ago. Can you please apply those (both!) to your SDK v9.2-based kernel and see if this solves the "running CQE recovery" issues you encountered and report back here.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1366363/am623-problem-detecting-emmc-at-hs200-speed-during-1-8v-voltage-switch/5340995#5340995

Regards, Andreas

Processors

Processors forum

AM625: eMMC running CQE recovery