This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3354: sd card boot fail at some custom boards

Part Number: AM3354

Dear Champs,

My customer assembled 200 custom boards, and found boot fail issue in 20 boards of these 200 boards using SD card boot.

But, all these 20 failed boards can boot well with emmc boot mode, and there was no issue in accessing SD card after emmc boot.

The SD card they used is 16GB SanDisk Ultra.

The following is the output when I run dss script and boot fail was occurred with SD card boot mode.

CONTROL: device_id = 0x2b94402e
* AM335x family
* Silicon Revision 2.1

PRM_DEVICE: PRM_RSTST = 0x00000001
* Bit 0 : GLOBAL_COLD_RST

CONTROL: control_status = 0x00400338
* SYSBOOT[15:14] = 01b (24 MHz)
* SYSBOOT[11:10] = 00b No GPMC CS0 addr/data muxing
* SYSBOOT[9] = 0 GPMC CS0 Ignore WAIT input
* SYSBOOT[8] = 0 GPMC CS0 8-bit data bus
* Device Type = General Purpose (GP)
* SYSBOOT[7:6] = 00b MII (EMAC boot modes only)
* SYSBOOT[5] = 1 CLKOUT1 enabled
* Boot Sequence : SPI0 -> MMC0 -> USB0 -> UART0

ROM: Current tracing vector, word 1 = 0x0010009f
* Bit 0 : [General] Passed the public reset vector
* Bit 1 : [General] Entered main function
* Bit 2 : [General] Running after the cold reset
* Bit 3 : [Boot] Main booting routine entered
* Bit 4 : [Memory Boot] Memory booting started
* Bit 7 : [Boot] GP header found
* Bit 20 : [Configuration Header] CHSETTINGS found

ROM: Current tracing vector, word 2 = 0x00011000
* Bit 12 : [Memory Boot] Memory booting trial 0
* Bit 16 : [Memory Boot] Execute GP image

ROM: Current tracing vector, word 3 = 0x00001000
* Bit 12 : Memory booting device SPI

ROM: Current copy of PRM_RSTST = 0x00000000

ROM: Cold reset tracing vector, word 1 = 0x00000000

ROM: Cold reset tracing vector, word 2 = 0x00000000

ROM: Cold reset tracing vector, word 3 = 0x00000001
* Bit 0 : [Memory Boot] Memory booting device NULL

Cortex A8 Program Counter = 0x402f0440

ROM Exception Vectors
* 0x4030CE04 Undefined
* 0x4030CE08 SWI
* 0x4030CE0C Pre-fetch abort
* 0x4030CE10 Data abort
* 0x4030CE14 Unused
* 0x4030CE18 IRQ
* 0x4030CE1C FIQ

ROM Dead Loops
* 0x00020080 Undefined exception default handler
* 0x00020084 SWI exception default handler
* 0x00020088 Pre-fetch abort exception default handler
* 0x0002008C Data exception default handler
* 0x00020090 Unused exception default handler
* 0x00020094 IRQ exception default handler
* 0x00020098 FIQ exception default handler
* 0x0002009C Validation test PASS
* 0x000200A0 Validation test FAIL
* 0x000200A4 Reserved
* 0x000200A8 Image not executed or returned
* 0x000200AC Reserved
* 0x000200B0 Reserved
* 0x000200B4 Reserved
* 0x000200B8 Reserved
* 0x000200BC Reserved

And, they connected CCS through JTAG and it looked the MLO was not booted.

Their HW schematic for SD card is as below.

Thanks and Best Regards,

SI.

  • SI, it looks like the code execution is successfully getting through the ROM but failing in MLO.  Trying using the tips here:  https://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_U-Boot.html?highlight=debugging%20tips#uboot-spl-debugging-tips

    to debug your MLO code and find out where in the code it is failing.  Have you modified the MLO from the original SDK?

    Regards,

    James

  • Yes. They modified SPL from the original linux SDK.

  • Thanks to your fruitful comments, I can dig-out their MLO code and found an issue in their DDR memory initialization and resolved it.

    Thanks for your support.

    Thanks and Best Regards,

    SI.

  • Hi James,

    Can you check below file generated by using am335x-ddr-analysis.dss?

    Switched to DAP_DebugSS
    Read value of 2b94402e from Device_ID register.
    CONTROL: device_id = 0x2b94402e
      * AM335x family
      * Silicon Revision 2.1
    
    CONTROL: control_status = 0x00400338
      * SYSBOOT[15:14] = 01b (24 MHz)
    CM_CLKSEL_DPLL_DDR = 0x00003202
      * DPLL_MULT = 50 (x50)
      * DPLL_DIV = 2 (/3)
    CM_DIV_M2_DPLL_DDR = 0x00000201
      * CLKST = 1: M2 output clock enabled
      * DIVHS = 1 (/1)
    
    DPLL_DDR Summary
     -> F_input = 24 MHz
     -> CLKOUT_M2 = DDR_PLL_CLKOUT = 400 MHz
    
    EMIF: SDRAM_CONFIG = 0x61a05332
      * Bits 31:29 (reg_sdram_type) set for DDR3
      * Bits 28:27 (reg_ibank_pos) set to 0
      * Bits 26:24 (reg_ddr_term) set for RZQ/4 (001b)
      * Bits 22:21 (reg_dyn_odt) DDR3 dynamic ODT set to RZQ / 4
      * Bit  20    (reg_ddr_disable_dll) set to 0, DDR3 DLL enabled
      * Bits 19:18 (reg_sdram_drive) set for RZQ/6 (00b)
      * Bits 17:16 (reg_cwl) set for 0, CWL = 5
      * Bits 15:14 (reg_narrow_mode) set to 1 -> 16-bit EMIF interface
      * Bits 13:10 (reg_cl) set to 4 -> CL = 6
      * Bits 09:07 (reg_rowsize) set to 6 -> 15 row bits
      * Bits 06:04 (reg_ibank) set to 3 -> 8 banks
      * Bits 02:00 (reg_pagesize) set to 2 -> 10 column bits
    
    EMIF: PWR_MGMT_CTRL = 0x00000000
     * Bits 10:8 reg_lp_mode set to 0, auto power management disabled
     * Warning: Bits 7:4 (reg_sr_tim) are in violation of Maximum Self-Refresh Command Limit
       -> Please see the silicon errata (DDR3: JEDEC Compliance for Maximum Self-Refresh Command Limit) for more details.
       -> This is only an issue if used in conjunction with reg_lp_mode=2.
    
    DDR PHY: DDR_PHY_CTRL_1 = 0x00100208
      * Bits 9:8 (reg_phy_rd_local_odt) to 2 -> full thevenin termination
      * Bits 4:0 (reg_read_latency) set to 8 -> Ok: CL+2 is typical with PHY_INVERT_CLKOUT=1.
    
    *********************
    *** Register Dump ***
    *********************
    
    *(0x4c000000) = 0x40443403
    *(0x4c000004) = 0x40000004
    *(0x4c000008) = 0x61a05332
    *(0x4c00000c) = 0x00000000
    *(0x4c000010) = 0x00000c30
    *(0x4c000014) = 0x00000c30
    *(0x4c000018) = 0x0aaae523
    *(0x4c00001c) = 0x0aaae523
    *(0x4c000020) = 0x246b7fda
    *(0x4c000024) = 0x246b7fda
    *(0x4c000028) = 0x50ffe67f
    *(0x4c00002c) = 0x50ffe67f
    *(0x4c000038) = 0x00000000
    *(0x4c00003c) = 0x00000000
    *(0x4c000054) = 0x00ffffff
    *(0x4c000058) = 0x8000140a
    *(0x4c00005c) = 0x00021616
    *(0x4c000080) = 0x00000075
    *(0x4c000084) = 0x00000026
    *(0x4c000088) = 0x00010000
    *(0x4c00008c) = 0x00000000
    *(0x4c000090) = 0xd7328928
    *(0x4c000098) = 0x00050000
    *(0x4c00009c) = 0x00050000
    *(0x4c0000a4) = 0x00000000
    *(0x4c0000ac) = 0x00000000
    *(0x4c0000b4) = 0x00000000
    *(0x4c0000bc) = 0x00000000
    *(0x4c0000c8) = 0x50074be1
    *(0x4c0000d4) = 0x00000000
    *(0x4c0000d8) = 0x00000000
    *(0x4c0000dc) = 0x00000000
    *(0x4c0000e4) = 0x00100208
    *(0x4c0000e8) = 0x00100208
    *(0x4c000100) = 0x00000000
    *(0x4c000104) = 0x00000000
    *(0x4c000108) = 0x00000000
    *(0x4c000120) = 0x00000305
    
    ************************
    *** IOCTRL Registers ***
    ************************
    
    CONTROL: DDR_CMD0_IOCTRL = 0x0000018b
      * ddr_ba2 Pullup/Pulldown disabled
      * ddr_wen Pullup/Pulldown disabled
      * ddr_ba0 Pullup/Pulldown disabled
      * ddr_a5 Pullup/Pulldown disabled
      * ddr_ck Pullup/Pulldown disabled
      * ddr_ckn Pullup/Pulldown disabled
      * ddr_a3 Pullup/Pulldown disabled
      * ddr_a4 Pullup/Pulldown disabled
      * ddr_a8 Pullup/Pulldown disabled
      * ddr_a9 Pullup/Pulldown disabled
      * ddr_a6 Pullup/Pulldown disabled
      * Bits 9:5 control ddr_ck and ddr_ckn
        - Slew slow
        - Drive Strength 9 mA
      * Bits 4:0 control ddr_ba0, ddr_ba2, ddr_wen, ddr_a[9:8], ddr_a[6:3]
        - Slew slow
        - Drive Strength 8 mA
    CONTROL: DDR_CMD1_IOCTRL = 0x0000018b
      * ddr_a15 Pullup/Pulldown disabled
      * ddr_a2 Pullup/Pulldown disabled
      * ddr_a12 Pullup/Pulldown disabled
      * ddr_a7 Pullup/Pulldown disabled
      * ddr_ba1 Pullup/Pulldown disabled
      * ddr_a10 Pullup/Pulldown disabled
      * ddr_a0 Pullup/Pulldown disabled
      * ddr_a11 Pullup/Pulldown disabled
      * ddr_casn Pullup/Pulldown disabled
      * ddr_rasn Pullup/Pulldown disabled
      * Bits 4:0 control ddr_15, ddr_a[12:10], ddr_a7, ddr_a2, ddr_a0, ddr_ba1, ddr_casn, ddr_rasn
        - Slew slow
        - Drive Strength 8 mA
    CONTROL: DDR_CMD2_IOCTRL = 0x0000018b
      * ddr_cke Pullup/Pulldown disabled
      * ddr_resetn Pullup/Pulldown disabled
      * ddr_odt Pullup/Pulldown disabled
      * ddr_a14 Pullup/Pulldown disabled
      * ddr_a13 Pullup/Pulldown disabled
      * ddr_csn0 Pullup/Pulldown disabled
      * ddr_a1 Pullup/Pulldown disabled
      * Bits 4:0 control ddr_cke, ddr_resetn, ddr_odt, ddr_csn0, ddr_[a14:13], ddr_a1
        - Slew slow
        - Drive Strength 8 mA
    CONTROL: DDR_DATA0_IOCTRL = 0x0000018b
      * ddr_d8 Pullup/Pulldown disabled
      * ddr_d9 Pullup/Pulldown disabled
      * ddr_d10 Pullup/Pulldown disabled
      * ddr_d11 Pullup/Pulldown disabled
      * ddr_d12 Pullup/Pulldown disabled
      * ddr_d13 Pullup/Pulldown disabled
      * ddr_d14 Pullup/Pulldown disabled
      * ddr_d15 Pullup/Pulldown disabled
      * ddr_dqm1 Pullup/Pulldown disabled
      * ddr_dqs1 and ddr_dqsn1 Pullup/Pulldown disabled
      * Bits 9:5 control ddr_dqs1, ddr_dqsn1
        - Slew slow
        - Drive Strength 9 mA
      * Bits 4:0 control ddr_d[15:8], ddr_dqm1
        - Slew slow
        - Drive Strength 8 mA
    CONTROL: DDR_DATA1_IOCTRL = 0x0000018b
      * ddr_d0 Pullup/Pulldown disabled
      * ddr_d1 Pullup/Pulldown disabled
      * ddr_d2 Pullup/Pulldown disabled
      * ddr_d3 Pullup/Pulldown disabled
      * ddr_d4 Pullup/Pulldown disabled
      * ddr_d5 Pullup/Pulldown disabled
      * ddr_d6 Pullup/Pulldown disabled
      * ddr_d7 Pullup/Pulldown disabled
      * ddr_dqm0 Pullup/Pulldown disabled
      * ddr_dqs0 and ddr_dqsn0 Pullup/Pulldown disabled
      * Bits 9:5 control ddr_dqs0, ddr_dqsn0
        - Slew slow
        - Drive Strength 9 mA
      * Bits 4:0 control ddr_d[7:0], dqm0
        - Slew slow
        - Drive Strength 8 mA
    CONTROL: DDR_IO_CTRL = 0x00000000
      * Bit 31: DDR_RESETn controlled by EMIF.
      * Bit 28 (mddr_sel) configured for SSTL, i.e. DDR2/DDR3/DDR3L operation.
    CONTROL: VTP_CTRL = 0x00010167
      * VTP not disabled (expected in normal operation, but not DS0).
    CONTROL: VREF_CTRL = 0x00000000
      * VREF supplied externally (typical).
    CONTROL: DDR_CKE_CTRL = 0x00000001
      * CKE controlled by EMIF (normal/ungated operation).
    

    I found they are using wrong dpll settings, but they still failed booting with right one.

    I recommended to use below dpll values for DDR3 400Mhz(24Mhz).

    const struct dpll_params dpll_ddr3_400MHz[NUM_CRYSTAL_FREQ] = {

                    {125, 5, 1, -1, -1, -1, -1}, /*19.2*/

                    {50, 2, 1, -1, -1, -1, -1}, /* 24 MHz */

                    {16, 0, 1, -1, -1, -1, -1}, /* 25 MHz */

                    {200, 12, 1, -1, -1, -1, -1}  /* 26 MHz */

    };

    But, they failed booting with above values, and I would like to check what is an issue.

    Their DDR3 memory is K4B4G1646E-BCK0 and you can find datasheet in below.

    https://www.samsung.com/semiconductor/global.semi/file/resource/2017/11/DS_K4B4G1646E-BC_Rev101-0.pdf

    Their current dpll values for DDR3 are in below(24Mhz)

    Thanks and Best Regards,

    SI.

  • They need to follow the configuration instructions in this app note:  https://www.ti.com/lit/pdf/SPRACK4 to ensure they configure the DDR controller and PHY properly for their board design.  In the app note, there is a link to a spreadsheet which need to be filled out, and the output for u-boot will be provided in one of the tabs.

    Regards,

    James

  • Hi James,

    When we checked their emif values using app note you mentioned again, we could not find any strange things in their values.

    One strange thing they found in today was there was no issue when they tried step debugging using CCS, but issue was occurred when resume in CCS.

    So, I requested to modify ddr reset timing as you helped me before in below e2e, and will check their result in tomorrow.

    https://e2e.ti.com/support/processors/f/processors-forum/895362/am3352-emif-values-for-twin-die-ddr3?tisearch=e2e-sitesearch&keymatch=twin#

    https://e2e.ti.com/support/processors/f/processors-forum/893158/am3352-ddr-test-fails#pi320966

    in the meantime, it would be helpful if you can provide any idea on this issue.

    Thanks and Best Regards,

    SI.

  • Can you post the filled out spreadsheet?

    Regards,

    James

  • Hi James,

    1. Could you please check below the filled out spreadsheet?

    AM335x_EMIF_Configuration_Tool_v3.zip

    2. They successfully run their board after modifying ddr reset timing in above, but they still want to check if there is any issue in their custom board or production before their mass production. Do you think there is something to check further in their HW?

    3. I found their PLL values are different from our SDK and it looks those are old values.

    Do you think below PLL values are OK to use for 800Mhz CPU freq.?

    CLKIN : 24Mhz

    MPU_PLL - N : 23, M : 800, M2 : 1

    Core_PLL - N : 23, M : 1000, M4 : 10, M5 : 8, M6 : 4

    PERI_PLL - N : 23, M : 960, M2 : 5

    DISP_PLL - N : 1, M : 25, M2 : 1

    4. When I checked the values in the GEL file of AM335x SK board, I found below values for 800Mhz, but I'm curious these values are OK.

       MPU_PLL_Config(  CLKIN, 3, 100, 1);                     // 800 MHz

    • N : 3, M : 100,  M2 : 1 

    When I calculate it, the result is 600Mhz as below.

    ((24Mhz/4)*100)/1 = 600Mhz

    Thanks and Best Regards,

    SI.

  • SI, I looked over their spreadsheet, and everything looks fine.  Have they performed any memory tests?  It looks like they are using linux, a good test to use is linux memtester.  This would give confidence that the memory interface can handle stress cases.

    It looks like there is a typo in the SK configuration for MPU PLL.  Below are the suggested settings for 24MHz input clock, which are optimized for jitter performance:

    MPU_PLL - N : 2, M : 100, M2 : 1   (for 800MHz MPU), TURBO mode)

    Core_PLL - N : 2, M : 125, M4 : 10, M5 : 8, M6 : 4

    PERI_PLL - N : 9, M : 400, M2 : 5

    DISP_PLL - N : 1, M : 25, M2 : 1 

    DDR_PLL N:2 , M:50, M2:1 (for 400MHz DDR memory clock)

    Regards,

    James

  • Hi James,

    Thanks for your response and checking spreadsheet.

    Even with new PLL values, the modification for DDR reset timing was required. 

    Do you think they can start production when there is no issue in the memory test using linux memtester?

    When they tried memory test using linux memtester, they found below error log in 10inch LCD model, but there was no issue in 4inch LCD model. There was a difference in pixel resolution between these LCD(10inch = 1024 x 600, 4inch LCD = 480x272).

    [ 1806.702278] tilcdc 4830e000.lcdc: tilcdc_crtc_irq(0x00000020): FIFO underflow

    [ 1806.716200] tilcdc 4830e000.lcdc: tilcdc_crtc_irq(0x00000004): Sync lost

    [ 1806.723012] tilcdc 4830e000.lcdc: tilcdc_crtc_irq(0x00000004): Sync lost

     I attached their full log in below.

    log_10inch.txt

    The difference in dtb file between 2 models is as below.

    I think this is not their HW issue and may be an issue with linux memtester. Are you agree on this?

    Thanks and Best Regards,

    SI.

  • SI,

    I checked the log, it looks like there are no DDR memory errors, the linux memtester is passing all the tests.  So at least the memory configuration looks stable.  

    The LCD sync issue is a completely different problem, and probably requires a new question on the forum to get the software folks more involved.  To me, it looks like an arbitration or performance issue.  The linux memtester kicks off threads that are very memory intensive, and it looks like potentially the LCD is getting starved.  That may be why it shows up only in the 10in LCD case, the LCD module is requesting much more data from memory than in the 4in use case.  

    Whether or not it is an issue for your customer depends on use case.  Will there ever be a situation in their application which requires memory intensive functions running on the ARM processor?  If so, they will need to re-evaluate their software arbitration schemes.  They could also possibly use Class of Service (COS) in the EMIF to control master priority in the EMIF (see section 7.3.3.5.4 in the TRM), but this could affect other aspects of their system.   

    Regards,

    James