AM625: eMMC tranision issue

Ryuuichi machida

Guru 11915 points

Part Number: AM625

Hello,

Since following thread was locked, I re-posted this as related issue.
Could you please tell me update about following thread ? (Could you clarify root cause ?)
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1299623/processor-sdk-am62x-emmc-transition-issue?tisearch=e2e-sitesearch&keymatch=%2525252520user%252525253A80765#

Best Regards,

10 months ago

0 Bin Liu 10 months ago

TI__Guru*** 148681 points

Hi Machida-san,

The fix patches just got integrated a few days ago. Can you please ask the customer to test the latest kernel git tag "09.02.00.005" on https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/log/?h=ti-linux-6.1.y.

The "SDHCI REGISTER DUMP" in kernel log might still happen occasionally due to CRC error from the eMMC device, but it can be ignore, as the data copy is still on-going and will finish without data corruption.

0 Bin Liu 10 months ago in reply to Bin Liu

TI__Guru*** 148681 points

Please also apply the patch attached below on top of the tag "09.02.00.005". It was missing while resolving git merge conflict.

0001-mmc-sdhci_am654-Add-ret-to.patch

0 Ryuuichi machida 10 months ago in reply to Bin Liu

Guru 11915 points

Hello,

Thank you for your reply.

Q1. Can I conclude that this issue is caused by software not hardware issue ?
Q2. It seems that your patch is for linux kernel 6.x. My customer now use kernel 5.10 (they use SDK version 8.6.)
Can we also apply same patch to above environment ?

Best Regards,

0 Bin Liu 10 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Ryuuichi machida said:
Q1. Can I conclude that this issue is caused by software not hardware issue ?

The original problem has two parts - first of all, the eMMC host controller received data with CRC error; secondly, the kernel MMC driver tries to recover from the CRC error but failed and caused eMMC communication stopped.

The root cause of the CRC error is unknown, it could be caused by hardware, or incorrect eMMC programming.

The kernel patches mentioned above fixes the second part of the problem, and makes the CRC error recovery successfully and eMMC communication can complete.

Ryuuichi machida said:
Q2. It seems that your patch is for linux kernel 6.x. My customer now use kernel 5.10 (they use SDK version 8.6.)
Can we also apply same patch to above environment ?

It technically can, but I am not sure if you would have patch merge conflict on kernel 5.10.

I will back port these patches to kernel 5.10 and provide the patches to you. When do you expect the v5.10 patches to be provided to you?

0 Ryuuichi machida 10 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

Thank you for your reply.
>I will back port these patches to kernel 5.10 and provide the patches to you.
Thank you for your response. I'm waiting for your feedback.
My customer do not have time to check this at this time even if we can provide patch.
However, they may have chance to check around 3/E. So I wish you share your result by 3/M.

Best Regards,

0 Bin Liu 10 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Ryuuichi machida said:
So I wish you share your result by 3/M.

Do you mean by March 3rd or middle of March?

0 Ryuuichi machida 10 months ago in reply to Bin Liu

Guru 11915 points

>Do you mean by March 3rd or middle of March?
Yes, correct.

BR,

0 Mukul Bhatnagar 10 months ago in reply to Ryuuichi machida

TI__Guru* 81885 points

sorry you did not clarify - please confirm that you are saying you if we share by March 18th that works for you?

0 Ryuuichi machida 10 months ago in reply to Mukul Bhatnagar

Guru 11915 points

>please confirm that you are saying you if we share by March 18th that works for you?
Yes.

BR,

0 Ryuuichi machida 9 months ago in reply to Ryuuichi machida

Guru 11915 points

Hello,

Is it possible to send your result by March 18th ?

BR,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Yes, I will send the patches by end of 3/18.

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Attached below are the kernel MMC controller driver patches back ported from kernel 6.1 to 5.10.

k3-mmc-kernel-patch-backport-5.10.zip

However, while validating the patches in the SDK8.6 kernel, I noticed that the eMMC kernel dump issue gets improved but not completely resolved in kernel 5.10. While with the patches in kernel 6.1, the issue is fully resolved. After spent sometime looking into it, I found the kernel 6.1 MMC framework drivers have many changes since kernel 5.10, which are related timeout/reset/cqe handlings, and are done the open source community, I believe these MMC framework changes along with the MMC controller driver patches attached above solved the eMMC kernel dump issue in kernel 6.1.

Howeveer, it is not trivial to back port all the MMC framework changes from v6.1 to v5.10, which are 11 major kernel version differences. Can you customer migrate to the latest v6.1 kernel to have this issue resolved?

0 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

Thank you for your reply.

- 1 -
Howeveer, it is not trivial to back port all the MMC framework changes from v6.1 to v5.10, which are 11 major kernel version differences. Can you customer migrate to the latest v6.1 kernel to have this issue resolved?
=> I will talk to this with customer.

- 2 -
After appling patch (7 files), I performed kernel build (use following command). However I got same error.
Should I perform "make linux" instead of below ?
> $ make ARCH=arm64 CROSS_COMPILE=aarch64-none-linux-gnu- Image

- 3 -
>However, while validating the patches in the SDK8.6 kernel, I noticed that the eMMC kernel dump issue gets improved but not completely resolved in kernel 5.10.
What phenomenon we may see on the log due to above problem ?

Best Regards,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Ryuuichi machida said:
Should I perform "make linux" instead of below ?
> $ make ARCH=arm64 CROSS_COMPILE=aarch64-none-linux-gnu- Image

Assuming your kernel has been prebuilt and .config file exists, these two make commands are basically doing the same in compiling Image.

Ryuuichi machida said:
What phenomenon we may see on the log due to above problem ?

When the issue happens, the Linux console stuck for about a minute, then the following messages are printed.

[  224.315917] mmc0: cqhci: timeout for tag 0             
[  224.320036] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[  224.326493] mmc0: cqhci: Caps:      0x000030c8 | Version:  0x00000510
[  224.332945] mmc0: cqhci: Config:    0x00000101 | Control:  0x00000000
[  224.339384] mmc0: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
[  224.345817] mmc0: cqhci: Int sig:   0x00000006 | Int Coal: 0x00000000
...

In my test with SDK8.6 kernel, before applying the patches, the messages above happened about 5~6 times during 'tar xvf /tisdk-default-image-am62xx-evm.tar.xz'. But it only happened about 2~3 times when the patches are applied.

But with the latest kernel 6.1 on git.ti.com, this message never happened.

0 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

After using both command "make linux" and "make ARCH=arm64 CROSS_COMPILE=aarch64-none-linux-gnu- Image", I replaced created "Image" file to original "Image" file which is implemented SD card, however I still observe issue on each case.

I only perfrom replace kernel image, but should I perform other operation ?

Best Regards,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Ryuuichi machida said:
I only perfrom replace kernel image, but should I perform other operation ?

No, for this specific eMMC test, replacing kernel Image itself is sufficient.

Using 'uname -a' command on the board would verify if the new kernel Image has been replaced properly.

0 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

Thank you for your reply.
I compared "Image" file size between after appling patch and before appling patch.
Then, file size was same. (After building, timestamp was changed.).
Is your result same as me ?

BR,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Yes, it is interesting the 'Image' file size are the same before and after.

But the 'vmlinux' file size at the kernel source top directory are different. And 'uname -a' command on the board also shows different timestamp.

0 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

I confirmed that my Image file was expected one.

* root@am62xx-evm:~# uname -a
Linux am62xx-evm 5.10.168-g2c23e6c538 #3 SMP PREEMPT Thu Mar 21 07:52:42 JST 2024 aarch64 aarch64 aarch64 GNU/Linux

Here is my test.

* dd if=/dev/mmcblk1 of=/dev/mmcblk0 bs=512

When I perform this, I got error previously.
After applying patch (patch -p1 xxx.patch), when I perform same test, but I also get "Buffer I/O error"(same error as previous one.).

Is it possible to perform same test on your environment ?

BR,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

As I mentioned earlier:

"
After spent sometime looking into it, I found the kernel 6.1 MMC framework drivers have many changes since kernel 5.10, which are related timeout/reset/cqe handlings, and are done the open source community, I believe these MMC framework changes along with the MMC controller driver patches attached above solved the eMMC kernel dump issue in kernel 6.1.
"

So these backported MMC controller driver patches alone do not solve this eMMC issue on kernel 5.10. Why are you still trying to test these backported patches anyway?

0 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

I was a little bit confusing from your following description.

>I noticed that the eMMC kernel dump issue gets improved but not completely resolved in kernel 5.10.
From your above reply, I thought I can confirm something improvement in kernel 5.10 after appling patch.

>In my test with SDK8.6 kernel, before applying the patches, the messages above happened about 5~6 times during 'tar xvf /tisdk-default-image-am62xx->evm.tar.xz'. But it only happened about 2~3 times when the patches are applied.
For above, did you simply mean error message will decrease than kernel which do not apply patch, but error did not resolve ?

BR,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

Ryuuichi machida said:
For above, did you simply mean error message will decrease than kernel which do not apply patch, but error did not resolve ?

Correct.

The following MMC error happened less times with the patches on kernel 5.10 but not completely disappeared. While on kernel 6.1 with these patches, this MMC error doesn't happen at all (it does happen without these patches).

[  220.219920] mmc0: cqhci: timeout for tag 0
[  220.224042] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[  220.230491] mmc0: cqhci: Caps:      0x000030c8 | Version:  0x00000510
[  220.236932] mmc0: cqhci: Config:    0x00000101 | Control:  0x00000000
[  220.243391] mmc0: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
...
[  220.282007] mmc0: cqhci: SSC2:      0x00000001 | DCMD rsp: 0x00000000
[  220.288440] mmc0: cqhci: RED mask:  0xfdf9a080 | TERRI:    0x992f1a2c
[  220.294873] mmc0: cqhci: Resp idx:  0x0000002f | Resp arg: 0x00000900
[  220.301307] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[  220.307741] mmc0: sdhci: Sys addr:  0x00000400 | Version:  0x00001004
[  220.314178] mmc0: sdhci: Blk size:  0x00007080 | Blk cnt:  0x00000000
[  220.320612] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
...
[  220.384966] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d07f01
[  220.391398] mmc0: sdhci: Host ctl2: 0x0000000b
[  220.395839] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000817db20c
[  220.402970] mmc0: sdhci: ============================================

Basically, there is no easy solution for kernel 5.10. I recommend you to migrate to the latest kernel on git.ti.com or wait for the SDK9.2 release which will be available in a few weeks.

+1 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

I totally understand.
I do not have SDK9.X environmet at this time, so I will try patch later.
Is my understanding that user do not need to apply patch to avoid this issue when user will use SDK 9.2 correct ?
(Or still need to apply patch even if user will use latest(9.2) SDK ?)

BR,

0 Bin Liu 9 months ago in reply to Ryuuichi machida

TI__Guru*** 148681 points

Hi Machida-san,

No you don't need to apply any patch if you use the coming SDK 9.2 release, to today's latest kernel (git tag 09.02.00.008) on git.ti.com https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tag/?h=09.02.00.008.

0 Ryuuichi machida 9 months ago in reply to Bin Liu

Guru 11915 points

Hello Bin-san,

Understood. I will try to check above but I will close this thread.
(If I have question for latest sdk, I will open another thread.)

BR,

0 Andreas Dannenberg 9 months ago in reply to Ryuuichi machida

TI__Guru 62557 points

Machida-san,

if you experiment with SDK v9.x I want to point out one item, see my linked post below. There have been recently discovered issues with HS200/HS400 modes in the context of SDK v9.1 and SDK v9.2 in which ITAPDLY calibration values don't get applied, leading to various PHY-level breakdowns that manifest themselves in bit and CRC errors, just as shown in the OP. The post has a solution to try out.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1344972/am623-issues-with-sandisk-emmc-entering-hs200-mode/5124827#5124827

I'm not sure if you want to use HS200 mode on your AM62x so I wanted to point it out.

Regards, Andreas

Processors

Processors forum

AM625: eMMC tranision issue