This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM623: EMMC test get failed with some pattern

Part Number: AM623

Tool/software:

I have custom Yocto image on AM6232. I booted from SD card. It booted properly. I want to test emmc for bad block. I used below script to do that.

=========================== START ==============================

#!/bin/sh

EMMC_DEV="/dev/mmcblk0"
BLOCK_SIZE=512
START_BLOCK=0
TMP_FILE="/tmp/pattern.bin"
LOG_FILE="/tmp/emmc_bad_blocks.log"

# Check boot source

BOOT_DEV=$(cat /proc/cmdline | sed -n 's/.*root=\([^ ]*\).*/\1/p')
echo "Boot device: $BOOT_DEV"

if [[ "$BOOT_DEV" == *"mmcblk1"* ]]; then
echo "Booted from SD card ($BOOT_DEV)"
else
echo "Not booted from SD card. Aborting test."
exit 1
fi

# Generate 512-byte test pattern (0x55AA55AA)
echo "Generating test pattern..."
> "$TMP_FILE"
counter=0
while [ $counter -lt $((BLOCK_SIZE / 4)) ]; do
printf '\x55\xAA\x55\xAA' >> "$TMP_FILE"
counter=$((counter + 1))
done

# Get total number of blocks
TOTAL_BLOCKS=$(cat /sys/block/mmcblk0/size)
if [ -z "$TOTAL_BLOCKS" ]; then
echo "Could not determine eMMC size."
exit 1
fi

echo "Starting block-by-block test from block $START_BLOCK to $TOTAL_BLOCKS..."
echo "Logging bad blocks to $LOG_FILE"
> "$LOG_FILE"

FAILED=0
TESTED=0
counter=$START_BLOCK

while [ $counter -lt $TOTAL_BLOCKS ]; do
echo "Testing block $counter / $TOTAL_BLOCKS"

# Write pattern
dd if="$TMP_FILE" of="$EMMC_DEV" bs=$BLOCK_SIZE count=1 seek=$counter 2>/dev/null
sync

# Read and verify
dd if="$EMMC_DEV" bs=$BLOCK_SIZE count=1 skip=$counter 2>/dev/null | cmp - "$TMP_FILE" > /dev/null
if [ $? -ne 0 ]; then
echo "Block counter failed verification" | tee -a "$LOG_FILE"
FAILED=$((FAILED + 1))
fi

TESTED=$((TESTED + 1))
counter=$((counter + 1))
done

echo ""
echo "Test complete."
echo "Total blocks tested: $TESTED"
echo "Total failed blocks: $FAILED"
echo "Bad blocks logged in: $LOG_FILE"

# Cleanup
rm -f "$TMP_FILE"

====================== END ========================

Output of this script as below

**************************** START *******************************

root@etn-cbc9000:~# flash-emmc-test
Generating test pattern...
Starting block-by-block test from block 0 to 62160896...
Logging bad blocks to /tmp/emmc_bad_blocks.log
Testing block 0 / 62160896
Testing block 1 / 62160896
Testing block 2 / 62160896
Block counter failed verification
Testing block 3 / 62160896
Block counter failed verification
Testing block 4 / 62160896
Block counter failed verification
Testing block 5 / 62160896
Block counter failed verification
Testing block 6 / 62160896
Block counter failed verification
Testing block 7 / 62160896
Block counter failed verification
Testing block 8 / 62160896
Testing block 9 / 62160896
Testing block 10 / 62160896
Block counter failed verification
Testing block 11 / 62160896
Block counter failed verification
Testing block 12 / 62160896
Block counter failed verification
Testing block 13 / 62160896
Block counter failed verification
Testing block 14 / 62160896
Block counter failed verification
Testing block 15 / 62160896
Block counter failed verification
Testing block 16 / 62160896
Testing block 17 / 62160896
Testing block 18 / 62160896
Block counter failed verification
Testing block 19 / 62160896
Block counter failed verification
Testing block 20 / 62160896
Block counter failed verification
Testing block 21 / 62160896
Block counter failed verification
Testing block 22 / 62160896
Block counter failed verification
Testing block 23 / 62160896
Block counter failed verification
Testing block 24 / 62160896
Testing block 25 / 62160896
Testing block 26 / 62160896
Block counter failed verification
Testing block 27 / 62160896
Block counter failed verification

**************************** END *******************************

passing block pattern: 1,2,8,9,16,17,24,25,32,33,40,41,48,49 .…..

Another block test failed.

In AM62x-SK board same test passed. I haven't change anything in DTS file for my custom hardware.

Attaching schematic for reference.

/resized-image/__size/320x240/__key/communityserver-discussions-components-files/791/pastedimage1750855402617v2.png

  • On dmesg I find below errors

    [71919.793205] mmc0: running CQE recovery
    [71919.798323] mmc0: running CQE recovery
    [71919.802992] mmc0: running CQE recovery
    [71919.807674] I/O error, dev mmcblk0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
    [71919.816631] Buffer I/O error on dev mmcblk0, logical block 0, lost async page write

    Looking for inputs to resolve this.

  • Hello, I am leading this effort. Parvez has posted more details at my direction. I'd like to add some more. 

    I do see here some possibly related material:

    AM623: Custom Hardware - eMMC CQE recovery during boot - Processors forum - Processors - TI E2E support forums

    Since eMMC memory corruption is such a critical matter we are taking this systematically and are looking for an exact fix. We have double checked all our wiring, and although the traces are a bit different, the wiring appears to be identical to the PhyTech reference board. 

    The above testing was on kernel 6.12.24. 

    If you look carefully at the blocks of 512 bytes which are failing above, it's those with bits 2 or 3 set in the address. This is consistent up through the blocks. Now, some certain patterns don't trigger an error or do trigger an error and then succeed or succeed upon second try.  I think this is probably a red herring, but it's worth mentioning. Linux is doing a lot of caching and queuing here. 

    We are now going to look at the various drive options such as adjusting the speed and disabling CQE etc, and see what might work.  I understand that there's pin termination settings we can affect in the DTS (our setup here should be the same as the dev board).   But any suggestions appreciated. 

  • To follow up on this, and after extensive testing including reducing the bus size to 1 and 4 we determined that this was in fact a hardware problem - we had inconsistent trace lengths for the eMMC. 

  • we determined that this was in fact a hardware problem - we had inconsistent trace lengths for the eMMC. 

    Thanks Peter for confirmation.

    Regards

    Ashwani