This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3352: Calibration of DDR3 PHY FIFO WE slave ratio (aka gate leveling)

Part Number: AM3352

I have spent some time trying to understand the details of the calibration done with the DDR3_slave_ratio_search_auto.out binary. There are just two things that I don't fully understand:

1) On our board the DATA_PHY_FIFO_WE_SLAVE_RATIO search finds a minimum value of 0 and a maximum value of about 0x154, resulting in a final value of 0xAA. Why is a minimum of 0 considered valid if the maximum is found to be more than one full clock cycle away? As far as I know any minimum less than the maximum minus tRPRE is wrong. In my case it still works because the average of minimum and maximum is less than tRPRE away from the maximum. But it is suboptimal since it is not centered within tRPRE.

2) The Ratio Seed spreadsheet calculates a seed value for DATA_PHY_FIFO_WE_SLAVE_RATIO that is twice as sensitive to the DQS length than to the CK length. Why? The FIFO WE slave ratio has to compensate for the time it takes the read command to travel from the CPU to the RAM chip and for the time it takes the result to travel back to the CPU. There is no second transfer on the DQ(S) lines starting after the result has arrived that has to be taken into account in the formula. The equivalent spreadsheet for Keystone I processors (SPRABL2) uses a different formula where DQS length and CK length have equal weight. Is the formula in the AM335x spreadsheet wrong?

  • Hello Daniel,

    Due to the US holidays, on this particular E2E thread, our response may get delayed until the week of Jan 2, 2019.

    Warmest Wishes for Happy Holidays and a Happy New Year!

    best regards,
    David Zhou
  • Daniel, there is a slight bug in the algorithm which does not allow the search to go negative.  Can you try with the attached algorithm, which uses CMD_PHY_CTRL_SLAVE_RATIO= 0x100, and  CMD_PHY_INVERT_CLKOUT =  0x1?  We are in the process of updating the collateral with this new algorithm.

    On your second question, i will have to investigate more.  This seems to have been a recommendation from the controller specification.

    DDR3_SlaveRatioSearch_ver2.zip

  • The attached binary gives me strange results:

    ***************************************************************
            The Slave Ratio Search Program Values are... 
    ***************************************************************
    PARAMETER                       MAX  |  MIN  | OPTIMUM |  RANGE 
    ***************************************************************
    DATA_PHY_RD_DQS_SLAVE_RATIO    0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_FIFO_WE_SLAVE_RATIO   0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_WR_DQS_SLAVE_RATIO    0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_WR_DATA_SLAVE_RATIO   0x43f | 0x040 |  0x23f  | 0x3ff
    ***************************************************************
    rd_dqs_range = 1ff
    fifo_we_range = 1ff
    wr_dqs_range = 1ff
    wr_data_range = 0
    
    Optimal values not reached, rerunning program with new values...
    
    ***************************************************************
            The Slave Ratio Search Program Values are... 
    ***************************************************************
    PARAMETER                       MAX  |  MIN  | OPTIMUM |  RANGE 
    ***************************************************************
    DATA_PHY_RD_DQS_SLAVE_RATIO    0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_FIFO_WE_SLAVE_RATIO   0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_WR_DQS_SLAVE_RATIO    0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_WR_DATA_SLAVE_RATIO   0x43f | 0x040 |  0x23f  | 0x3ff
    ***************************************************************
    rd_dqs_range = 0
    fifo_we_range = 0
    wr_dqs_range = 0
    wr_data_range = 0
    
    Optimal values have been found!!
    
    ***************************************************************
            The Slave Ratio Search Program Values are... 
    ***************************************************************
    PARAMETER                       MAX  |  MIN  | OPTIMUM |  RANGE 
    ***************************************************************
    DATA_PHY_RD_DQS_SLAVE_RATIO    0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_FIFO_WE_SLAVE_RATIO   0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_WR_DQS_SLAVE_RATIO    0x3ff | 0x000 |  0x1ff  | 0x3ff
    DATA_PHY_WR_DATA_SLAVE_RATIO   0x43f | 0x040 |  0x23f  | 0x3ff
    ***************************************************************
    
    ===== END OF TEST =====

    This is regardless of what I enter for PHY_INVERT_CLKOUT or *_SLAVE_RATIO. It's almost as if it never tries to change the DDR PHY registers.

  • Daniel, that's strange. Let me double check the binary.

    On your other question, we did look at the seed formula, and indeed you are correct, there should not be a *2 for DQS. The actually consequence of this is minimal, as the software leveling will converge to the optimum value anyway. Appreciate the comment and will get it corrected.

    Regards,
    James

  • Daniel, the binary works here on our EVM. A couple of things:

    1. Are you running the AM335x_Initialization GEL script before running the algorithm? Can you attach the GEL you are using?
    2. Ensure that after you load the algorithm using Run->Load->Load Program in CCS, that you are in Supervisor mode. In the bottom right corner of the CCS window, you should see SPV before attempting to run the algorithm. If you see something else (such as USR), you need to run the AM335xStartState script to return back to supervisor mode.

    Thanks,
    James
  • The first four instructions of your binary (entry point is at 0x40309718) drop from Supervisor mode into User mode. If I change the third instruction from "orr r0, r0, #16" to "orr r0, r0, #19" the processor stays in Supervisor mode and the calibration succeeds:

    Optimal values have been found!!
    
    ***************************************************************
            The Slave Ratio Search Program Values are...
    ***************************************************************
    PARAMETER                       MAX  |  MIN  | OPTIMUM |  RANGE
    ***************************************************************
    DATA_PHY_RD_DQS_SLAVE_RATIO    0x06b | 0x003 |  0x037  | 0x068
    DATA_PHY_FIFO_WE_SLAVE_RATIO   0x151 | 0x000 |  0x0a8  | 0x151
    DATA_PHY_WR_DQS_SLAVE_RATIO    0x080 | 0x000 |  0x040  | 0x080
    DATA_PHY_WR_DATA_SLAVE_RATIO   0x0c0 | 0x040 |  0x080  | 0x080
    ***************************************************************
    

    But as you can see the DATA_PHY_FIFO_WE_SLAVE_RATIO range still exceeds tRPRE. This is with PHY_INVERT_CLKOUT set to 0.

  • Actually those are great for all. And I have some different tricks, you just get some review from online and from there you will be great.
  • Daniel, what if you try with INVERT_CLKOUT=1?
  • Daniel, a few other things. Some of the results from the boards i have tested also show a similar range for FIFO_WE_SLAVE_RATIO. It equates to about 1.5 clock cycle range, so about a 1/4 cycle wider on each side of the tRPRE. The algorithm sweeps the delay values, starting from the seed value going positive, and then the seed value going negative, to find the full range of operation, and then chooses the middle for the optimal value.
    I took a look at a DDR3 datasheet, and the possible range for tRPRE is around 0.9tCK to 1tCK+400ps (for 800MHz clock). So i think the range is reasonable.
    Another thing i noticed from your log is that the minimum is 0, which doesn't seem right. So try with INVERT_CLKOUT=1, and also send the seeds you are using.

    Thanks,
    James
  • I have automated the process of entering the values and restarting the calibration and let it run for a few hours with
    random parameters.

    At 400 MHz without INVERT_CLKOUT the RAM is stable when
    2 < RD_DQS_SLAVE_RATIO < 110
    FIFO_WE_SLAVE_RATIO < 281+RD_DQS_SLAVE_RATIO
    0 < WR_DQS_SLAVE_RATIO < 142
    8+WR_DQS_SLAVE_RATIO < WR_DATA_SLAVE_RATIO < 116+WR_DQS_SLAVE_RATIO

    With INVERT_CLKOUT the RAM is stable when
    2 < RD_DQS_SLAVE_RATIO < 59
    19+RD_DQS_SLAVE_RATIO < FIFO_WE_SLAVE_RATIO < 410+RD_DQS_SLAVE_RATIO
    24 < WR_DQS_SLAVE_RATIO < 262
    8+WR_DQS_SLAVE_RATIO < WR_DATA_SLAVE_RATIO < 115+WR_DQS_SLAVE_RATIO

    But strangely it is also stable when RD_DQS_SLAVE_RATIO and FIFO_WE_SLAVE_RATIO instead satisfy
    RD_DQS_SLAVE_RATIO < 477
    288 < FIFO_WE_SLAVE_RATIO < 5+RD_DQS_SLAVE_RATIO

    It is also strange that the first RD_DQS_SLAVE_RATIO region does not include the expected seed value of 64. In theory INVERT_CLKOUT should not have an effect on the values allowed for RD_DQS_SLAVE_RATIO. When putting all measurements in a RD_DQS_SLAVE_RATIO/FIFO_WE_SLAVE_RATIO diagram, the line that limits RD_DQS_SLAVE_RATIO to values smaller than 59 is very jagged.

    For kicks I also ran it at 303 MHz. Here the RAM is stable when
    2 < RD_DQS_SLAVE_RATIO < 115
    FIFO_WE_SLAVE_RATIO < 274+RD_DQS_SLAVE_RATIO
    WR_DQS_SLAVE_RATIO < 137
    6+WR_DQS_SLAVE_RATIO < WR_DATA_SLAVE_RATIO < 117+WR_DQS_SLAVE_RATIO
    are satisfied without INVERT_CLKOUT and
    3 < RD_DQS_SLAVE_RATIO < 111
    13+RD_DQS_SLAVE_RATIO < FIFO_WE_SLAVE_RATIO < 403+RD_DQS_SLAVE_RATIO
    18 < WR_DQS_SLAVE_RATIO < 258
    7+WR_DQS_SLAVE_RATIO < WR_DATA_SLAVE_RATIO < 116+WR_DQS_SLAVE_RATIO
    with INVERT_CLKOUT.

    Again there is an unexpected disjunct second region with INVERT_CLKOUT. But this time it exists for the write parameters:
    0 < WR_DQS_SLAVE_RATIO < 16
    WR_DATA_SLAVE_RATIO < 364+WR_DQS_SLAVE_RATIO

    Best regards,

    Daniel

  • Daniel

    I'm not sure what you are trying to accomplish running the script with random parameters. Basically, the objective of the leveling procedure is to make sure you have enough margin for your operating frequency. Based on the new binary that James provided, I recommend you still enter the seed ratio values based on your board layout and use invert_clkout=1; with invert_clkout=1, you also need to adjust the phy_ctrl_slave_ratio to a value of 0x100.

    Ultimately, with this setup, you want to ensure: (which is accomplished with the binary)
    - DQS vs. CLK board delay differences are compensated
    - DQS gate placement is done correctly to align with the Read DQS
    - Read eye training to optimally place Read DQS in the center of the RD Data eye

    Please do functional tests after you arrive at the optimal values from the binary to make sure the DDR interface is robust.

    Regards, Siva
  • Daniel

    Please let us know if you need any more information on this thread.

    Regards, Siva