This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM625: eMMC tranision issue

Part Number: AM625

Hello,

Since following thread was locked, I re-posted this as related issue.
Could you please tell me update about following thread ? (Could you clarify root cause ?)
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1299623/processor-sdk-am62x-emmc-transition-issue?tisearch=e2e-sitesearch&keymatch=%2525252520user%252525253A80765#

Best Regards,


  • Hi Machida-san,

    The fix patches just got integrated a few days ago. Can you please ask the customer to test the latest kernel git tag "09.02.00.005" on https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/log/?h=ti-linux-6.1.y.

    The "SDHCI REGISTER DUMP" in kernel log might still happen occasionally due to CRC error from the eMMC device, but it can be ignore, as the data copy is still on-going and will finish without data corruption.

  • Please also apply the patch attached below on top of the tag "09.02.00.005". It was missing while resolving git merge conflict.

    0001-mmc-sdhci_am654-Add-ret-to.patch

  • Hello,

    Thank you for your reply.

    Q1. Can I conclude that this issue is caused by software not hardware issue ?
    Q2. It seems that your patch is for linux kernel 6.x. My customer now use kernel 5.10 (they use SDK version 8.6.)
    Can we also apply same patch to above environment ? 

    Best Regards,

  • Hi Machida-san,

    Q1. Can I conclude that this issue is caused by software not hardware issue ?

    The original problem has two parts - first of all, the eMMC host controller received data with CRC error; secondly, the kernel MMC driver tries to recover from the CRC error but failed and caused eMMC communication stopped.

    The root cause of the CRC error is unknown, it could be caused by hardware, or incorrect eMMC programming.

    The kernel patches mentioned above fixes the second part of the problem, and makes the CRC error recovery successfully and eMMC communication can complete.

    Q2. It seems that your patch is for linux kernel 6.x. My customer now use kernel 5.10 (they use SDK version 8.6.)
    Can we also apply same patch to above environment ? 

    It technically can, but I am not sure if you would have patch merge conflict on kernel 5.10.

    I will back port these patches to kernel 5.10 and provide the patches to you. When do you expect the v5.10 patches to be provided to you?

  • Hello Bin-san,

    Thank you for your reply.
    >I will back port these patches to kernel 5.10 and provide the patches to you. 
    Thank you for your response. I'm waiting for your feedback.
    My customer do not have time to check this at this time even if we can provide patch.
    However, they may have chance to check around 3/E. So I wish you share your result by 3/M.

    Best Regards,

  • So I wish you share your result by 3/M.

    Do you mean by March 3rd or middle of March?

  • >Do you mean by March 3rd or middle of March?
    Yes, correct.

    BR,

  • sorry you did not clarify - please confirm that you are saying you if we share by March 18th that works for you? 

  • >please confirm that you are saying you if we share by March 18th that works for you? 
    Yes.

    BR,

  • Hello,

    Is it possible to send your result by March 18th ?

    BR,

  • Hi Machida-san,

    Yes, I will send the patches by end of 3/18.

  • Hi Machida-san,

    Attached below are the kernel MMC controller driver patches back ported from kernel 6.1 to 5.10.

    k3-mmc-kernel-patch-backport-5.10.zip

    However, while validating the patches in the SDK8.6 kernel, I noticed that the eMMC kernel dump issue gets improved but not completely resolved in kernel 5.10. While with the patches in kernel 6.1, the issue is fully resolved. After spent sometime looking into it, I found the kernel 6.1 MMC framework drivers have many changes since kernel 5.10, which are related timeout/reset/cqe handlings, and are done the open source community, I believe these MMC framework changes along with the MMC controller driver patches attached above solved the eMMC kernel dump issue in kernel 6.1.

    Howeveer, it is not trivial to back port all the MMC framework changes from v6.1 to v5.10, which are 11 major kernel version differences. Can you customer migrate to the latest v6.1 kernel to have this issue resolved?

  • Hello Bin-san,

    Thank you for your reply.

    - 1 -
    Howeveer, it is not trivial to back port all the MMC framework changes from v6.1 to v5.10, which are 11 major kernel version differences. Can you customer migrate to the latest v6.1 kernel to have this issue resolved?
    => I will talk to this with customer.

    - 2 -
    After appling patch (7 files), I performed kernel build (use following command). However I got same error.
    Should I perform "make linux" instead of below ?
    > $ make ARCH=arm64 CROSS_COMPILE=aarch64-none-linux-gnu- Image

    - 3 -
    >However, while validating the patches in the SDK8.6 kernel, I noticed that the eMMC kernel dump issue gets improved but not completely resolved in kernel 5.10.
    What phenomenon we may see on the log due to above problem ?

    Best Regards, 

  • Hi Machida-san,

    Should I perform "make linux" instead of below ?
    > $ make ARCH=arm64 CROSS_COMPILE=aarch64-none-linux-gnu- Image

    Assuming your kernel has been prebuilt and .config file exists, these two make commands are basically doing the same in compiling Image.

    What phenomenon we may see on the log due to above problem ?

    When the issue happens, the Linux console stuck for about a minute, then the following messages are printed.

    [  224.315917] mmc0: cqhci: timeout for tag 0             
    [  224.320036] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
    [  224.326493] mmc0: cqhci: Caps:      0x000030c8 | Version:  0x00000510
    [  224.332945] mmc0: cqhci: Config:    0x00000101 | Control:  0x00000000
    [  224.339384] mmc0: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
    [  224.345817] mmc0: cqhci: Int sig:   0x00000006 | Int Coal: 0x00000000
    ...
    

    In my test with SDK8.6 kernel, before applying the patches, the messages above happened about 5~6 times during 'tar xvf /tisdk-default-image-am62xx-evm.tar.xz'. But it only happened about 2~3 times when the patches are applied.

    But with the latest kernel 6.1 on git.ti.com, this message never happened.

  • Hello Bin-san,

    After using both command "make linux" and "make ARCH=arm64 CROSS_COMPILE=aarch64-none-linux-gnu- Image", I replaced created "Image" file to original "Image" file which is implemented SD card, however I still observe issue on each case.

    I only perfrom replace kernel image, but should I perform other operation ?

    Best Regards,

  • Hi Machida-san,

    I only perfrom replace kernel image, but should I perform other operation ?

    No, for this specific eMMC test, replacing kernel Image itself is sufficient.

    Using 'uname -a' command on the board would verify if the new kernel Image has been replaced properly.

  • Hello Bin-san,

    Thank you for your reply.
    I compared "Image" file size between after appling patch and before appling patch.
    Then, file size was same. (After building, timestamp was changed.).
    Is your result same as me ?

    BR,

  • Hi Machida-san,

    Yes, it is interesting the 'Image' file size are the same before and after.

    But the 'vmlinux' file size at the kernel source top directory are different. And 'uname -a' command on the board also shows different timestamp.

  • Hello Bin-san,

    I confirmed that my Image file was expected one.

    * root@am62xx-evm:~# uname -a
    Linux am62xx-evm 5.10.168-g2c23e6c538 #3 SMP PREEMPT Thu Mar 21 07:52:42 JST 2024 aarch64 aarch64 aarch64 GNU/Linux

    Here is my test.

    * dd if=/dev/mmcblk1 of=/dev/mmcblk0 bs=512

    When I perform this, I got error previously.
    After applying  patch (patch -p1 xxx.patch), when I perform same test, but I also get "Buffer I/O error"(same error as previous one.).

    Is it possible to perform same test on your environment ?

    BR,

  • Hi Machida-san,

    As I mentioned earlier:

    "
    After spent sometime looking into it, I found the kernel 6.1 MMC framework drivers have many changes since kernel 5.10, which are related timeout/reset/cqe handlings, and are done the open source community, I believe these MMC framework changes along with the MMC controller driver patches attached above solved the eMMC kernel dump issue in kernel 6.1.
    "

    So these backported MMC controller driver patches alone do not solve this eMMC issue on kernel 5.10. Why are you still trying to test these backported patches anyway?

  • Hello Bin-san,

    I was a little bit confusing from your following description.

    >I noticed that the eMMC kernel dump issue gets improved but not completely resolved in kernel 5.10.
    From your above reply, I thought I can confirm something improvement in kernel 5.10 after appling patch.

    >In my test with SDK8.6 kernel, before applying the patches, the messages above happened about 5~6 times during 'tar xvf /tisdk-default-image-am62xx->evm.tar.xz'. But it only happened about 2~3 times when the patches are applied.
    For above, did you simply mean error message will decrease than kernel which do not apply patch, but error did not resolve ?

    BR,

  • Hi Machida-san,

    For above, did you simply mean error message will decrease than kernel which do not apply patch, but error did not resolve ?

    Correct.

    The following MMC error happened less times with the patches on kernel 5.10 but not completely disappeared. While on kernel 6.1 with these patches, this MMC error doesn't happen at all (it does happen without these patches).

    [  220.219920] mmc0: cqhci: timeout for tag 0
    [  220.224042] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
    [  220.230491] mmc0: cqhci: Caps:      0x000030c8 | Version:  0x00000510
    [  220.236932] mmc0: cqhci: Config:    0x00000101 | Control:  0x00000000
    [  220.243391] mmc0: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
    ...
    [  220.282007] mmc0: cqhci: SSC2:      0x00000001 | DCMD rsp: 0x00000000
    [  220.288440] mmc0: cqhci: RED mask:  0xfdf9a080 | TERRI:    0x992f1a2c
    [  220.294873] mmc0: cqhci: Resp idx:  0x0000002f | Resp arg: 0x00000900
    [  220.301307] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
    [  220.307741] mmc0: sdhci: Sys addr:  0x00000400 | Version:  0x00001004
    [  220.314178] mmc0: sdhci: Blk size:  0x00007080 | Blk cnt:  0x00000000
    [  220.320612] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
    ...
    [  220.384966] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d07f01
    [  220.391398] mmc0: sdhci: Host ctl2: 0x0000000b
    [  220.395839] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000817db20c
    [  220.402970] mmc0: sdhci: ============================================

    Basically, there is no easy solution for kernel 5.10. I recommend you to migrate to the latest kernel on git.ti.com or wait for the SDK9.2 release which will be available in a few weeks.

  • Hello Bin-san,

    I totally understand.
    I do not have SDK9.X environmet at this time, so I will try patch later.
    Is my understanding that user do not need to apply patch to avoid this issue when user will use SDK 9.2 correct ?
    (Or still need to apply patch even if user will use latest(9.2) SDK ?)

    BR,

  • Hi Machida-san,

    No you don't need to apply any patch if you use the coming SDK 9.2 release, to today's latest kernel (git tag 09.02.00.008) on git.ti.com https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tag/?h=09.02.00.008.

  • Hello Bin-san,

    Understood. I will try to check above but I will close this thread.
    (If I have question for latest sdk, I will open another thread.)

    BR,

  • Machida-san,

    if you experiment with SDK v9.x I want to point out one item, see my linked post below. There have been recently discovered issues with HS200/HS400 modes in the context of SDK v9.1 and SDK v9.2 in which ITAPDLY calibration values don't get applied, leading to various PHY-level breakdowns that manifest themselves in bit and CRC errors, just as shown in the OP. The post has a solution to try out.

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1344972/am623-issues-with-sandisk-emmc-entering-hs200-mode/5124827#5124827

    I'm not sure if you want to use HS200 mode on your AM62x so I wanted to point it out.

    Regards, Andreas