This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM4376: Booting issues in new production batch

Part Number: AM4376
Other Parts Discussed in Thread: TPS65218, TMDSEVM437X

Hi,

our custom board features AM4376BZDN100, TPS65218D0RSL, single chip 1GB DDR3L single point-to-point connection, 256Mbit QSPI and 8GB eMMC. Board has been in production since 08/2020 and up to now more than 1500 pcs. were produced without any issues related to booting process. 
The Linux-4.19.59-215cd90 has been used since the production start. We can also confirm that DDR3 IC populated on the PCB comes now from the new batch (2134 ->  Year 2021. Week 34.) and all previous boards have used DDR3 ICs manufactured in 2019.

Now we have been facing booting issues with new boards where DDR3 ICs from new batch are used and boards behave quite strange.
We got 2 boards (SN2798 and SN2819) from our EMS Partner company, have performed some tests and comparison with two boards (SN0011 and SN0008) with older manufacturing date.

Here are some findings:   

  1. There are cases where no response from board on serial debug port UART0 at power up, nothing comes out or black screen.

  2. There also some cases where the board starts with bootloade, SPL but CRC Checksum Error occurs.

  3. After the board is powered up and no response on serial debug port (as described under 1) and input power is toggled relatively fast (hundreds of ms), the booting will be executed with no errors and comes up to logging point “….loging:”
    See attached picture: SN2819_Scope17_5V_1V8_1V0RTC_DDR_Fast_ON_OFF_Startup.png

  4. Board doesn’t features pressure switch for manual reset but if we apply strong pull down (e.g. 100R) on PMIC_PWR_ON_RST# (Pin 19 of TPS65218D0RSL) the board will go in reset and booting will be again successful.
    This was repeated more than 10 times for the same board.

  5. Comparing MCPU_MPU_VDD rails between boards from old and new batch, it has been observed that boards from new batch take 10x more time between power up and vcore scaling event where MPU voltage rail will be switched from 1,1V tio 1,325V.
    new batch ~ 3,6 seconds
    old batch ~ 0,1 … 0,34 ms
    See attached pictures:
    SN0011_MPU_VDD.png
    SN2798_MPU_VDD.png
    SN2819_MPU_VDD.png

    Do you have Idea why does it take so long to carry out vcore_scaling at new boards?  

  6. Our EMS partner company informed us, that they had tried to replace DDR3 ICs (batch 2134) with ICs from an another batch at 4 boards where the booting issue was occurred.
    The outcome was:  the booting was executing with no errors and came up to logging point “….loging”: and they also managed  to perform flashing of eMMC memory as well.

    We have already asked them to check if the vcore_scaling takes so long time, after the DDR3 ICs were swapped. We will let you know as soon as the feedback will be received.  

  7. We have also checked power up sequency and it seems fine.
    See attached pictures SN2819_Scope01 to 17.

    Tomorrow we are going to check I2C communication between Sitara and PMIC and power up sequence at old board in same way as it was done for the new board.
    Afterwards DDR signals will be checked.
  8. Power Sequencing With RTC Feature Enabled, All Dual-Voltage IOs Configured as 3.3 V is used but internal RTC is disabled (XTAL RTC is not running). We are not sure if this can cause any issues.


       
    Could you advise here, what could be additionally checked in order to find out the root of cause ?

    If there would be an issue with EMIF settings for DDR3, should it be possible that booting works each time after the second power up or manual reboot? 
    We had similar issue in development phase, because EMIF was not configured properly at all. But in that case, the DDR3 was crashed somewhere in the middle of Linux booting process and had never reached the end of booting proces (Linux logging point).   

    We can also share more details (schematic, Layout, EMIF Config, etc.) over our local TI representative, if needed. 

    Board_Test_Measurements.zip

    Br

    Josko

  • There is a mistake under the point 5: 
    new batch ~ 3,6 seconds
    old batch ~ 0,1 … 0,34 s

  • Update to point 3): 
    - our EMS partner company tried this workaround with power supply toggling at 20 problematic boards and they all behave same. If the input power is toggled relatively fast (hundreds of ms), the booting was executing with no errors and come up to logging point “….logging:”

     

    Update to point 5): 

    - We have also checked SDIO_CLK and SDIO_CMD at power up sequence between new and old board. 

    At new boards the ROM-Bootloader needs 3 seconds to access SPL on SD Card (from point where PMIC_PWR_ON_RST# is released to time point where SDIO_CLK starts to be generated).  At old boards it takes  less than 30 ms.   

  • Update to point 6): 

    No changes in time needed from startup to vcore scaling after DDR3 IC swapping. Boards with DDR3 IC production date 2134 where booting doesn't work have same time to vcore scaling as like as boards with DDR3 IC production date 2111 where booting works fine. 

       

  • Hi Josko,

    We are looking into this and will follow up with you shortly.

    Regards,

    Colin

  • Hi Colin, 

    update regarding vcore_scaling time: 

    - we made here a wrong assumption in our analysis. The new boards have all storage media empty and we have used SD Card in test with boards.
    Since the SD Card is at last place in standard boot list, the ROM-Bootloader runs through all boot-device list  and this is what takes 3.6 seconds. 
    In normal case the QSPI is the 1st medium in boot order list. 

    Br

    Josko

  • Hi Josko,

    Thank you for the additional detail. To clarify, does this mean the boot issue is resolved?

    Regards,

    Colin

  • Hi Colin, 

    unfortunately no. 

    The issue with booting is still there! 

    Br

    Josko 

  • Hi Josko,

    Can you please clarify the boot order that is used? 

    Do you have JTAG-debug access?

    Do you see QSPI, MMC clocks, UART waveforms on the board during the attempted boot sequence? 

    Regards,

    Colin

  • Hi Collin,

    (1)  the boot order is as follows:

    SYSBOOT [4:0]:  11000b MMC0, USB1, USB0, QSPI

    (for the first time QSPI is empty and we are forcing boot device to be SD Card slot MMC0 with Tactile SW connected over 10k to Sysboot4 and pulled up to VDDIO)

    afterwards is this boot order valid because the tactile SW is no longer kept pressed

    SYSBOOT [4:0]:  01000b QSPI, USB1, MMC0, USB0 (MMC0 = SD Card Slot)

    (2) yes, we have JTAG-Debug access (TC2050-IDC-FP Adapter) 

    (3) yes, we can see QSPI and MMC0 CLK and UART TX (output from AM4376) during the attempted boot sequence
    Please find waveforms attached: CLKs_at_Startup.zip.

    We have some printf commands before the SPL will be loaded to see where the process stops.

    In normal case when the boot is executed correctly, it looks like this: 

    <debug_uart>
    early_system_init -> debug_uart_init() done
    spl_early_init() -> done
    do_board_detect() -> done
    scale_vcores() -> done
    sdram_init() -> Perform hardware leveling for DDR3
    done
    ram size: 40000000
    board_init_f done

    U-Boot SPL 2019.01-g5dab7a2-dirty (Dec 17 2021 - 07:46:07 +0100)
    Trying to boot from MMC1


    U-Boot 2019.01-g5dab7a2-dirty (Dec 17 2021 - 07:46:07 +0100)

    CPU : AM437X-GP rev 1.2
    Model: TI AM437x Industrial Development Kit
    DRAM: 1 GiB
    PMIC: TPS65218
    MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
    Loading Environment from MMC... OK
    Hit any key to stop autoboot: 0
    Booting from mmc...
    49941 bytes read in 4 ms (11.9 MiB/s)
    21475384 bytes read in 1081 ms (18.9 MiB/s)
    ## Booting kernel from Legacy Image at 88080000 ...
    Image Name: Linux-4.19.59-215cd90
    Image Type: ARM Linux Kernel Image (uncompressed)
    Data Size: 21475320 Bytes = 20.5 MiB
    Load Address: 82000000
    Entry Point: 82000000
    Verifying Checksum ... OK
    ## Flattened Device Tree blob at 88000000
    Booting using the fdt blob at 0x88000000
    Loading Kernel Image ... OK
    Loading Device Tree to 8fff0000, end 8ffff314 ... OK

    Starting kernel ...

     

    In failure case where the board is powered up for the first time and the boot is not executed correctly, it looks like this:

    <debug_uart>
    early_system_init -> debug_uart_init() done
    spl_early_init() -> done
    do_board_detect() -> done
    scale_vcores() -> done
    sdram_init() -> Perform hardware leveling for DDR3
    done
    ram size: 00000800
    board_init_f done


    <debug_uart>
    early_system_init -> debug_uart_init() done
    spl_early_init() -> done
    do_board_detect() -> done
    scale_vcores() -> done
    sdram_init() -> Perform hardware leveling for DDR3
    done
    ram size: 00000200
    board_init_f done

    The boot process stops after board_init_f done. 

    We have built in these commands because at the beginning we got nothing on UART0 and didn't know what's going on.  

    Somtimes this occurs as well: 

    <debug_uart>
    early_system_init -> debug_uart_init() done
    spl_early_init() -> done
    do_board_detect() -> done
    scale_vcores() -> done
    sdram_init() -> Perform hardware leveling for DDR3
    done
    ram size: 40000000
    board_init_f done

    Here looks everything fine, but it doesn't go further.  

     

    CLKs_at_Startup.zip

    Br

    Josko

  • Comparing the data sheet Rev. A 12/19/2021, ISSI has updated its datasheet Rev. B1 06/25/2021 with following notes: 

    33. The tRAS(min) is required in order to complete the internal Precharge during a Read or Write with Auto Precharge.
    34. During DLL off (Disable), tWR(min) is the greater of 4nCK or 15ns.
    35. For DLL off (Disable)
    a. Any Auto Refresh Command must have at least one NOP Command before the next Auto Refresh.
    b. If tCK < 40ns, any Auto Refresh Command must be followed by a Precharge All Command.
    c. For 85⁰C < tCASE ≤ 95⁰C, tCK(max) = 3.9μs. DLL off is not specified for tCASE > 95⁰C

    Could this have any impact to EMIF Calculation tool?  
     

  • Hi Josko,

    This debug requires additional DDR experts, whom are all out until 3-Jan. Unfortunately we will not be able to provide additional DDR debugging insight until then.

    Regards,

    Colin

  • Hi Josko,

    May we ask for a copy of the schematics? There should be a TI rep that you can forward NDA documents to that can reach us, without posting schematics on the public forums.

    Secondly, can you please disable vcore_scaling to confirm this is unrelated to the failure signature

    Additionally can you show us o-scope captures of the QSPI signals and possibly Power signals at the event of the fail? 

    Can you also please confirm that these voltage rails maintain a consistent 1.8V before, during and after the failure; VDDS_PLL_DDR, VDDS_PLL_CORE_LCD, VDDS_PLL_MPU, VDDS_SRAM_MPU_BB

    Lastly, with JTAG access can you please confirm where the PC is at the point of failure of the A9, especially during the 3s delay referenced in '5.' ? If you have the source code, we should be able to see where in the boot seq things are failing.

    Regards,

    Colin 

  • Hi Colin, 

    today is my first working day in this year and therefore delayed response. 

    (1) The schematic was sent to TI rep before Christmas.  We will check today the status.

    (2) Ok, we will disable the vcore_scaling and let you know the results. 

    (3) I will send the scope captures of all QSPI signals. Have you checked the captures in Board_Test_Measurements.zip and CLKs_at_Startup.zip ?

    (4) Rails 1.8V, VDDS_PLL_DDR, VDDS_PLL_CORE_LCD, VDDS_PLL_MPU, VDDS_SRAM_MPU_BB before, during and after the failure will be captured today. However, we have already checked these signals and it is a quite difficult to notice any difference on all rails during the power-up (and later) between normal boot up and in the failure case. 

     (5) Have you checked previous post with these descriptions? 

    <debug_uart>
    early_system_init -> debug_uart_init() done
    spl_early_init() -> done
    do_board_detect() -> done
    scale_vcores() -> done
    sdram_init() -> Perform hardware leveling for DDR3
    done
    ram size: 40000000
    board_init_f done  

    I have already explained in previous posts, what is the reason for the delay: "we made here a wrong assumption in our analysis. The new boards have all storage media empty and we have used SD Card in test with boards. Since the SD Card is at last place in standard boot list, the ROM-Bootloader runs through all boot-device list  and this is what takes 3.6 seconds. In normal case the QSPI is the 1st medium in boot order list." 


    Additional question: 

    Which findings have TI experienced in the reference design “TMDSEVM437X, AM437x high security evaluation module, AM437x General Purpose Evaluation Module (EVM) Schematic — SPRR396.ZIP (761 KB)” where the pull-up resistor R141 (1K) must be used on DDR_RESETn line?
    On the other hand both,  AM437x Starter Kit (SK) Schematic — SPRR202.ZIP (588 KB) and AM437x/AMIC120 Industrial Development Kit (IDK) Schematic (Rev. A) — TIDRC79A.ZIP (1276 KB) haven’t had this pull-up even planned in schematic.

     

  • Hi Colin,

    have you checked my last question regarding pull-up resistor on DDR3_RESETn line? 
    Why it was used in one desing and in other not?

    Br

    Josko 

  • Hi Josko,

    My last response to you from 6 days ago is not on this thread? I am not sure what is going on there. A group of experts held a long discussion on this last Thursday and we responded asking for details of things like:

    • did anything in the BOM change?
    • did the PCB material change?
      • these two questions were of most interest since the configuration was stable and working before the DDR3 IC change.
      • our custom board features AM4376BZDN100, TPS65218D0RSL, single chip 1GB DDR3L single point-to-point connection
      • my last question regarding pull-up resistor on DDR3_RESETn line
      • On AM437x this signal is driven, so it would overcome any pulls. The pullup implemented on your setup for this layout should be inconsequential.
    • lastly we never received any schematic files from any field team. Can you please ask the team you are communicating with to forward them to us?

    Apologies again for the delay.

    Regards,

    Colin

  • Hi Colin, 

    unfortunately I can't see your reply, you sent 6 days ago. It is simply not visible to me in this thread.

    Here are some updates from our side:

    • There was no any change in BOM from our side. Only change that we have figured out is in production datecode of DDR3 chip (DateCode2134).
      Chips with any other Datacode other than 2134 works without reported malfunction.
    •  There was no change in PCB material or PCB stack-up. Furthermore the same PCB works without this boot issue if the existing BGA DDR3 chip (DateCode2134) is replaced with production data code other than DataCode2134.   
    • VTT termination is not implemented.
    • The the question regarding the pull-up resistor on DDR3_RESETn was asked because we figured out following:
      • If the pull-up resistor on the DDR3_ RESETn line is removed from board (R400 in our schematic), the issue is no more present. The boot process will be executed from the beginning to the end without any issues.
      • Our EMS partner company tried this on 5 boards, where DDR chip (DateCode2134) was populated and afterwards all boards were bootable.
      • This could be a workaround solution for the issue, but concerning thing here is that we still don’t know the real root of cause for this issue.   

      • Please check attached file. There are two capture, one with and another without pullup on DDR3_RESETn.
        It is interesting that these signals form boards where no DateCode2134 is populated, looks exactly same as these on capture "Scope49".  
          
    • We have tried to disable vcore_scaling but didn’t see any difference in behavior.  

     

    Please just confirm you received the excerpt from schematic where memories are showed end A9's EMIF pins. I sent you as attachment in private message.

    Regarding QSPI Signals measurements, could you tell us, which combination of signals do you want to see on scope figures and on which signal we should trigger? I mean which combinations. 
    For example, we can always keep 3.3V rail and power_on_reset signal on each scope figure and add QSPI signal accordingly: 

    Capture 1: 

    Ch1 - 3.3V rail
    Ch2 - PWR_ON_RSTn
    Ch3 - QSPI RESTn,       CSn,     D1,      D3
    Ch4 - QSPI CLK,            D0,       D2,      empty

    Capture 2: 
    Ch1 - 3.3V rail
    Ch2 - PWR_ON_RSTn
    Ch3 -  CSn 
    Ch4 -   D0   

    Capture 3: 

    Ch1 - 3.3V rail
    Ch2 - PWR_ON_RSTn
    Ch3 -    D1,      
    Ch4      D2,     

    Capture 4: 

    Ch1 - 3.3V rail
    Ch2 - PWR_ON_RSTn
    Ch3 - D3
    Ch4 - empty

    etc. 

    For any further measurements it would be very helpful for us to agree in advanced, which signals do you want to se on captures o-scope pictures, and on which signal we should trigger the measurements. 

    Regards

    Josko

    Boards_Test_with_and_without_pull_up.zip

  • Hi Colin, 

    additionally, have you perhaps checked new notes in the DDR3 chip Datasheet Rev. B1? 
    Please find attached rev. B1 of data sheet. 

    Update on the page 25: 

    Update on the page 71: 

    Can this cause any issues regarding leveling, timing setting in EMIF ConfigTool or similar? 


    43-46TR16512B-81024BL_June2021.pdf

     

    Br

    Josko

  • Hi Colin,

    could you let me know if there is any progress from your side on this? I don't see any reply on my last post. 

    BR

    Josko

  •  Hi Colin,

     we have some news and questions.

    If the pullup resistor on DDR_RESETn line is removed, the problem disappears. 

    However it seems the AM43xx and AM33xx don't feature same functionality of DDR_RESETn pin. 
    It looks like the AM43xx can’t keep DDR_RESETn line low if there is the pull-up present.  Also the “BALL RESET REL. STATE“  is not defined for AM43xx.
    Could you comment on this?

    Next, the Errata note 3.1.2  "DDR3/DDR3L: JEDEC Specification Violation for DDR3 RESET Signal When Implementing RTC+DDR Mode" states that a 1K pull up is required on DDR_RESETn line when implementing RTC+DDR mode.
    It is now clear that pull-up in DDR_RESETn line will impact the DDR state machine initialization because RESET# will go up together with DDR_VDD.

    This may cause the initialization of the control logic to be incomplete and hence cause the internal state machine to be unexpected. According to the DDR manufacturer the occurrence of the issue can vary depending on the process.
    What does can be done to avoid this situation during power up?

    Regards

    Josko