This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM625: SD Flash eMMC Problem

Part Number: AM625

Tool/software:

Hi TI Experts,

Customer is working on SDK9.2.

They have build up 20 boards based on AM625 already.

With the default SDK, at the beginning (two months ago), they found all the boards could flash the eMMC from SD card due to the below IO errors.

[ 34.100373] mmc0: running CQE recovery
[ 34.235006] mmc0: running CQE recovery
[ 34.243545] mmc0: running CQE recovery
[ 34.251916] mmc0: running CQE recovery
[ 34.257663] I/O error, dev mmcblk0, sector 14721024 op 0x1:(WRITE) flags 0x4000 phys_seg 9 prio class 2
[ 34.267831] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 259130 starting block 1840256)
[ 34.279362] Buffer I/O error on device mmcblk0p2, logical block 1806336
[ 34.286082] Buffer I/O error on device mmcblk0p2, logical block 1806337
[ 34.292776] Buffer I/O error on device mmcblk0p2, logical block 1806338
[ 34.299443] Buffer I/O error on device mmcblk0p2, logical block 1806339
[ 34.306145] Buffer I/O error on device mmcblk0p2, logical block 1806340
[ 34.312842] Buffer I/O error on device mmcblk0p2, logical block 1806341
[ 34.319527] Buffer I/O error on device mmcblk0p2, logical block 1806342
[ 34.326200] Buffer I/O error on device mmcblk0p2, logical block 1806343
[ 34.332865] Buffer I/O error on device mmcblk0p2, logical block 1806344
[ 34.339556] Buffer I/O error on device mmcblk0p2, logical block 1806345

We solved this problem by keeping the eMMC staying at the low speed mode using "no-1.8-v".

This solves the problem, and we never meet this problem again.

When more boards customer made, they found that there are 2 boards still could not successfully flash the eMMC from SD card, even with "no-1.8-v". The 2 boards have tried multiple times to flash the eMMC, they will all fail and meet the following error log.

[ 11.420127] platform 2b300050.target-module: deferred probe pending
[ 25.735248] Unable to handle kernel execute from non-executable memory at virtual address ffff800009255f40
[ 25.744935] Mem abort info:
[ 25.747739] ESR = 0x000000008600000f
[ 25.751487] EC = 0x21: IABT (current EL), IL = 32 bits
[ 25.756799] SET = 0, FnV = 0
[ 25.759852] EA = 0, S1PTW = 0
[ 25.762982] FSC = 0x0f: level 3 permission fault
[ 25.767772] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082f54000
[ 25.774467] [ffff800009255f40] pgd=10000000f7fff003, p4d=10000000f7fff003, pud=10000000f7ffe003, pmd=10000000f7ff9003, pte=0078000083255703
[ 25.786994] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP
[ 25.793249] Modules linked in:
[ 25.796300] CPU: 2 PID: 204 Comm: tar Not tainted 6.1.83-dirty #2
[ 25.802384] Hardware name: Texas Instruments AM625 Gree Board V03 (DT)
[ 25.808897] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 25.815847] pc : ecap_perms+0x288/0x5e0
[ 25.819687] lr : rcu_core+0x22c/0x5e0
[ 25.823348] sp : ffff80000930bed0
[ 25.826651] x29: ffff80000930bef0 x28: ffff8000091e2680 x27: ffff8000080ce53c
[ 25.833781] x26: 0000000000000000 x25: ffff8000091e27c0 x24: ffff000077b9aaf8
[ 25.840910] x23: 000000000000000a x22: 0000000000000000 x21: ffff000077b9aa80
[ 25.848038] x20: ffff0000002bc740 x19: 0000000000000001 x18: 0000000000000000
[ 25.855165] x17: ffff80006eb95000 x16: ffff800009308000 x15: 000082a5e37196b6
[ 25.862294] x14: 0117f48048d64294 x13: 0000000000000001 x12: 0000000000000004
[ 25.869422] x11: 0000000000000001 x10: ffff80006eb95000 x9 : 0000000000000000
[ 25.876550] x8 : ffff000077b9ab08 x7 : ffff000077b9ab10 x6 : 0000000000000000
[ 25.883678] x5 : ffff0000047bf900 x4 : ffff000077b9ab00 x3 : ffff0000047bfa00
[ 25.890806] x2 : ffff0000047bf000 x1 : ffff800009255f40 x0 : ffff000003a7d000
[ 25.897936] Call trace:
[ 25.900376] ecap_perms+0x288/0x5e0
[ 25.903858] rcu_core_si+0x10/0x20
[ 25.907255] _stext+0x124/0x28c
[ 25.910391] ____do_softirq+0x10/0x20
[ 25.914046] call_on_irq_stack+0x24/0x4c
[ 25.917961] do_softirq_own_stack+0x1c/0x30
[ 25.922136] __irq_exit_rcu+0xb4/0xe0
[ 25.925793] irq_exit_rcu+0x10/0x20
[ 25.929274] el1_interrupt+0x38/0x70
[ 25.932846] el1h_64_irq_handler+0x18/0x2c
[ 25.936935] el1h_64_irq+0x64/0x68
[ 25.940328] _raw_spin_unlock_irqrestore+0xc/0x50
[ 25.945025] __wake_up_sync_key+0x20/0x30
[ 25.949028] pipe_read+0x3a4/0x3e4
[ 25.952425] vfs_read+0x27c/0x2a4
[ 25.955734] ksys_read+0xe4/0xfc
[ 25.958955] __arm64_sys_read+0x1c/0x30
[ 25.962783] invoke_syscall+0x48/0x114
[ 25.966527] el0_svc_common.constprop.0+0x44/0xfc
[ 25.971223] do_el0_svc+0x20/0x30
[ 25.974531] el0_svc+0x28/0xa0
[ 25.977579] el0t_64_sync_handler+0xbc/0x140
[ 25.981841] el0t_64_sync+0x18c/0x190
[ 25.985500] Code: 00000000 00000000 00000000 00000000 (00000000)
[ 25.991583] ---[ end trace 0000000000000000 ]---
[ 25.996190] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 26.003050] SMP: stopping secondary CPUs
[ 26.006966] Kernel Offset: disabled
[ 26.010443] CPU features: 0x00000,00800084,0000420b
[ 26.015310] Memory Limit: none
[ 26.018363] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt]---

We thought this might strongly relate to the eMMC on the 2 boards, however, this afternoon, we found that for the rest of the 18 good boards, if we increase the number of test to flash the eMMC, they still have possibility to fail, we test 1 good board 8 times to flash the eMMC, it pass, but for the 9th time testing, it still fail and having the below similar error log.

[  277.952523] Unable to handle kernel paging request at virtual address dead000000000122
[  277.960481] Mem abort info:
[  277.963272]   ESR = 0x0000000096000044
[  277.967012]   EC = 0x25: DABT (current EL), IL = 32 bits
[  277.972313]   SET = 0, FnV = 0
[  277.975357]   EA = 0, S1PTW = 0
[  277.978487]   FSC = 0x04: level 0 translation fault
[  277.983353] Data abort info:
[  277.986221]   ISV = 0, ISS = 0x00000044
[  277.990045]   CM = 0, WnR = 1
[  277.993002] [dead000000000122] address between user and kernel address ranges
[  278.000124] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[  278.006378] Modules linked in:
[  278.009434] CPU: 2 PID: 211 Comm: umount Not tainted 6.1.83-00005-gec2ea3e46d64-dirty #4
[  278.017512] Hardware name: Texas Instruments AM625 Gree Board V03 (DT)
[  278.024025] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  278.030976] pc : free_pcppages_bulk+0x160/0x1fc
[  278.035513] lr : free_pcppages_bulk+0x134/0x1fc
[  278.040034] sp : ffff80000a3537c0
[  278.043337] x29: ffff80000a3537c0 x28: 0000000000000000 x27: 0000000000000210
[  278.050467] x26: 0000000000000001 x25: ffff000077b9d110 x24: 000000000000000c
[  278.057595] x23: ffff000077b9d100 x22: ffff800008ee6000 x21: ffff000077b9d128
[  278.064723] x20: 0000000000000001 x19: 0000000000000000 x18: 0000000000000000
[  278.071851] x17: 0000000000000000 x16: 0000000000000000 x15: fffffc00015cbf40
[  278.078979] x14: 00000000fffffffd x13: dead000000000100 x12: dead000000000122
[  278.086108] x11: 0000000000000001 x10: 00000000f0000080 x9 : ffff800009275368
[  278.093236] x8 : 00000000f0000000 x7 : 00000000000972fc x6 : 0000000000000002
[  278.100364] x5 : fffffc00005cbf88 x4 : 0000000000000001 x3 : ffff800009275438
[  278.107492] x2 : ffff000077b9d128 x1 : fffffc00015cbf48 x0 : dead000000000122
[  278.114621] Call trace:
[  278.117060]  free_pcppages_bulk+0x160/0x1fc
[  278.121237]  free_unref_page_commit+0x104/0x130
[  278.125760]  free_unref_page_list+0x274/0x3ec
[  278.130110]  release_pages+0x138/0x42c
[  278.133858]  __pagevec_release+0x28/0x70
[  278.137772]  truncate_inode_pages_range+0x140/0x4d0
[  278.142640]  truncate_inode_pages_final+0x50/0x80
[  278.147335]  ext4_evict_inode+0x60/0x4c0
[  278.151256]  evict+0xa4/0x190
[  278.154221]  evict_inodes+0x148/0x1e4
[  278.157876]  generic_shutdown_super+0x44/0x170
[  278.162315]  kill_block_super+0x20/0x70
[  278.166144]  deactivate_locked_super+0x44/0xe0
[  278.170581]  deactivate_super+0x88/0xa0
[  278.174410]  cleanup_mnt+0x98/0x130
[  278.177892]  __cleanup_mnt+0x14/0x20
[  278.181459]  task_work_run+0x80/0xe0
[  278.185029]  do_notify_resume+0x214/0xd9c
[  278.189036]  el0_svc+0x88/0xa0
[  278.192088]  el0t_64_sync_handler+0xbc/0x140
[  278.196351]  el0t_64_sync+0x18c/0x190
[  278.200011] Code: d100202f a9400022 b94021e4 f9000440 (f9000002)
[  278.206092] ---[ end trace 0000000000000000 ]---
[  278.210699] note: umount[211] exited with irqs disabled
[  278.215980] note: umount[211] exited with preempt_count 3
[  298.955134] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  298.961241] rcu:     1-...0: (0 ticks this GP) idle=cf7c/1/0x4000000000000000 softirq=4198/4198 fqs=1211
[  298.970450] rcu:     2-...0: (3 ticks this GP) idle=76fc/1/0x4000000000000000 softirq=6576/6578 fqs=1211
[  298.979657]  (detected by 3, t=5253 jiffies, g=8697, q=265 ncpus=4)
[  298.985914] Task dump for CPU 1:
[  298.989132] task:kcompactd0      state:R  running task     stack:0     pid:38    ppid:2      flags:0x0000000a
[  298.999037] Call trace:
[  299.001475]  __switch_to+0xd4/0x130
[  299.004970]  contig_page_data+0x1900/0x1d00
[  299.009151] Task dump for CPU 2:
[  299.012367] task:umount          state:R  running task     stack:0     pid:211   ppid:182    flags:0x00000006
[  299.022269] Call trace:
[  299.024704]  __switch_to+0xd4/0x130
[  299.028188]  0xfffffc00000c5240
[  361.983133] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  361.989238] rcu:     1-...0: (0 ticks this GP) idle=cf7c/1/0x4000000000000000 softirq=4198/4198 fqs=4365
[  361.998446] rcu:     2-...0: (3 ticks this GP) idle=76fc/1/0x4000000000000000 softirq=6576/6578 fqs=4365
[  362.007653]  (detected by 3, t=21015 jiffies, g=8697, q=276 ncpus=4)
[  362.013997] Task dump for CPU 1:
[  362.017214] task:kcompactd0      state:R  running task     stack:0     pid:38    ppid:2      flags:0x0000000a
[  362.027118] Call trace:
[  362.029556]  __switch_to+0xd4/0x130
[  362.033048]  contig_page_data+0x1900/0x1d00
[  362.037228] Task dump for CPU 2:
[  362.040445] task:umount          state:R  running task     stack:0     pid:211   ppid:182    flags:0x00000006
[  362.050345] Call trace:
[  362.052781]  __switch_to+0xd4/0x130
[  362.056265]  0xfffffc00000c5240
[  425.063132] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  425.069227] rcu:     1-...0: (0 ticks this GP) idle=cf7c/1/0x4000000000000000 softirq=4198/4198 fqs=7516
[  425.078436] rcu:     2-...0: (3 ticks this GP) idle=76fc/1/0x4000000000000000 softirq=6576/6578 fqs=7516
[  425.087642]  (detected by 3, t=36785 jiffies, g=8697, q=277 ncpus=4)
[  425.093985] Task dump for CPU 1:
[  425.097202] task:kcompactd0      state:R  running task     stack:0     pid:38    ppid:2      flags:0x0000000a
[  425.107105] Call trace:
[  425.109542]  __switch_to+0xd4/0x130
[  425.113031]  contig_page_data+0x1900/0x1d00
[  425.117209] Task dump for CPU 2:
[  425.120426] task:umount          state:R  running task     stack:0     pid:211   ppid:182    flags:0x00000006
[  425.130326] Call trace:
[  425.132763]  __switch_to+0xd4/0x130
[  425.136247]  0xfffffc00000c5240

Now we think that maybe all the boards potentially have this problem as long as we increase the number of times to flash the eMMC.

Could you help provide us with some suggestions how to debug with?

Many Thanks,

Kevin

  • Hi TI Experts,

    Could you help look at this problem and help provide us with some suggestions please?

    Many Thanks,

    Kevin

  • Hello,

    if we increase the number of test to flash the eMMC, they still have possibility to fail, we test 1 good board 8 times to flash the eMMC, it pass, but for the 9th time testing

    If they see the issue even with "no-1-8-v" then it is mostly not an issue with the eMMC but something else.

    Have they stress tested their DDR?

    When more boards customer made, they found that there are 2 boards still could not successfully flash the eMMC from SD card, even with "no-1.8-v". The 2 boards have tried multiple times to flash the eMMC, they will all fail and meet the following error log.

    What is the exact test setup they have which reports the issue? Have you tried the same setup on the TI EVM & see issues on it as well?

    -------------------------

    In any case, the recently released SDK v10.1 should be given a try:

    https://www.ti.com/tool/download/PROCESSOR-SDK-LINUX-AM62X/10.01.10.04

    Regards,

    Prashant

  • Hi Prashant,

    As the same steps the could work successfully on rest of most of the boards many times, so I do not think the step is a problem.

    Only two boards have this problem after no-1.8-v set, I want to further reduce the clock, their current MMC clock is 50MHz, we want to try reduce it to 25MHz, may i know could you help provide some guide to us please?

    Thanks,

    Kevin

  • Hi,

    As there is no response for long. I am closing the thread.

    Feel free to ping back if you want to continue the discussion.

    Regards

    Ashwani