This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to debug smart reflex hardware issues?

Other Parts Discussed in Thread: TPS65217

I have this board design which is not too different from the white Beaglebone, where some boards crashes in/around init, but not all.

I have found that if I disable Smart Reflex, the board will boot just fine, but missing 600 and 720MHz scaling frequencies, staying at 1.10V.

While Smart Scaling isn't absolutely necessary, I would like to fix this isse, since something obviously is wrong here, and staying at 500MHz isn't optimal.

I can see on the scope that when kernel boot arrives at init, the smart reflex steps down in two steps, (seams to depend somewhat on scaling governor) so Smart Reflex seams to do it's magic, and then the crash (typically "Unable to handle kernel paging request at virtual address d2d98d9c" and similar hard crashes).

After this, there's no way of getting in contact with the board.

I am currently running on a board with a AM3352ZCZD72, a TPS65217, SDK 06.00.00.

Some boards has AM3352BZCZ60 populated, and those boards seams far more stable. Not sure if this due to the 'B' model, or possibly because being  a 600MHz part.

Is there any documentation on how to proceed from here? Should I concentrate on Core or MPU power, etc? Can I start the kernel with some argument to have smart reflex disabled at boot time?

Any suggestions are welcome!

  • Hi Micael,
     
    The AM3352ZCZD72 is silicon revision 1.0 and probably has no SmartReflex values programmed in e-fuses. The 'B' parts definitely should have these values programmed. Check these posts for further information:
     
     
  • Hi Biser,

    Looking at the boot log (with kernel with activated SR, before crash), all relevant that I can see is:

    [ 0.807641] tps65217 1-0024: TPS65217 ID 0xf version 1.1
    [ 0.815622] print_constraints: DCDC1: 900 <--> 1800 mV at 1800 mV
    [ 0.824574] print_constraints: DCDC2: 900 <--> 3300 mV at 1100 mV
    [ 0.833396] print_constraints: DCDC3: 900 <--> 1500 mV at 1100 mV
    [ 0.842209] print_constraints: LDO1: 1000 <--> 3300 mV at 1800 mV
    [ 0.851064] print_constraints: LDO2: 900 <--> 3300 mV at 3300 mV
    [ 0.859841] print_constraints: LDO3: 1800 <--> 3300 mV at 3300 mV
    [ 0.868676] print_constraints: LDO4: 1800 <--> 3300 mV at 3300 mV

     [..]

    [ 1.213046] Power Management for AM33XX family
    [ 1.217977] Trying to load am335x-pm-firmware.bin (60 secs timeout)
    [ 1.224732] Copied the M3 firmware to UMEM
    [ 1.229122] Cortex M3 Firmware Version = 0x181
    [ 1.234822] create_regulator: DCDC2: Failed to create debugfs directory
    [ 1.242538] smartreflex smartreflex: am33xx_sr_probe: Driver initialized
    [ 1.255162] clock: disabling unused clocks to save power

    This suggests to me, that the part has these EFUSE bits set, no?

    Here and there, in the crash logs (which can be quite long), I see lines similar to these;

    [ 4.707174] ---[ end trace 186286eaa60e4f31 ]---
    [ 4.712030] Fixing recursive fault but reboot is needed!
    [ 4.962879] am33xx_sr_cpufreq_transition: prechange
    [ 5.043272] SR 1: curr=1000000, delta_v=-24425, calc=975575, act=1000000, gain=19
    [ 5.051475] am33xx_sr_cpufreq_transition: postchange
    [ 5.056792] am33xx_sr_cpufreq_transition: postchange, new opp=0
    [ 5.387911] SR 1: curr=1100000, delta_v=-55000, calc=1045000, act=1050000, gain=00
    [ 6.717906] SR 1: curr=1050000, delta_v=-92500, calc=957500, act=975000, gain=19

    Not sure what those means, though... 

  • Try disabling Smart Reflex on this part to see what happens.

  • If I disable it in kernel build, the board starts fine. But I will get max speed 500MHz, missing 600 and 720.

    I'm guessing this is beacuse of these lines in board-am335xevm.c;

    pr_debug("%s: core regulator value %d\n", __func__, voltage_uv);
    if (voltage_uv > 0) {
    rev = omap_rev();
    switch (rev) {
    case AM335X_REV_ES1_0:
    if (voltage_uv <= AM33XX_VDD_CORE_OPP50_UV) {
    /*
    * disable the higher freqs - we dont care about
    * the results
    */
    opp_disable(mpu_dev, AM33XX_OPP120_FREQ);
    opp_disable(mpu_dev, AM33XX_OPPTURBO_FREQ);
    }
    break;

    Thanks,

      Micael

  • In addition, it seams as I was mistaken about the 'B' part being much better - I realized that I tested those with an earlier kernel (05.07 I think), which did not accept anything but ES1.0 for SR according to boot log, and therefore disabled SR.

    So regardless ES 1.0 or 2.0, I cannot get the boards up with SR, indicating this is really a board issue somehow. 

  • I'm still struggling with this issue. I find it very hard to get a coherent information about SR. So far, my understanding is:

    1. Contrary to what has been written elsewhere in this forum, ES1.0 parts (at least my ones) do support SR (driver gets initialized just fine, no message about EFUSE missing).

    2. U-boot sets up my core & mpu to 1.1V. 

    3. This makes kernel disable OPP 120 and OPP TURBO for some reason (happens in function am335x_opp_update()), regardless if SR is enabled in kernel or not.

    4. When booting kernel, core and mpu goes down to 1026mV and 977mV respectively. 

    5. This seams to low in my board; after some time (even in idle, at 275MHz), which seams temperature dependent, board crashes. I'm not sure if it is core or mpu that is the problem. I'm guessing mpu at this point. Reason for thinking that temp has something to do with it, is that I need the board to be off for some ten minutes before it will work fully again when SR is enabled. No parts on board get hot though.

    My takes on this;

    1. Strange that my findings differ from TI staff!?!?

    2,3 Make u-boot configure core+mpu to 1250mV, regardless of SR or not. This will make kernel accept 720MHz.

    5. Are there any way of adjusting the individual voltages?

  • Regarding Smart Reflex support on different silicon versions, this is the valid information:

    http://e2e.ti.com/support/arm/sitara_arm/f/791/p/234498/1106175.aspx#1106175

  • Biser Gatchev-XID said:

    Regarding Smart Reflex support on different silicon versions, this is the valid information:

    http://e2e.ti.com/support/arm/sitara_arm/f/791/p/234498/1106175.aspx#1106175

    Yes, I have read that thread 1000 times, but I do not get the EFUSE error, I get the normal prinf_constraints, the kernel reports (sr) driver  initializes without error.

    The kernel debug fs is also populated with SR data.

    So everything points to that my ES1.0 has the fuses set.

    Or am I missing your point perhaps?

  • Now, I have looked around here in the office, and found a board populated with XAM3359.

    The XAM gives the EFUSE error on the exact same boot disk, and disables SR altogether, the part therefor runs.

    The AM3352ZCZD72 does not give EFUSE error, enables the SR and crashes after some time (if at all surviving through init)

    The AM3352BZCZ60 does not give EFUES error, enables the SR and crashes after some time (if at all surviving through init).

  • Looking at sprs717f, the minimum core voltage for OPP100 is specified 1056mV.

    Looking at sprs717f, the minimum mpu voltage for OPP100 is specified 1056mV.

    I'm currently running my board at 500MHz (which is what I get with the performance governor). 500 equals OPP100 afact.

    Now, in the light of above figures, maybe it is all normal that my boards are crashing at 1026/976 mV.

    Or can Smart Reflex go below the specified ones from the ones in the reference manual?

  • Regarding the loss of 600MHz and 720MHz OPPs, there is another reason you might not see those.  If you are on USB power, there is code in the board file to disable 600MHz and 720MHz OPPs.  If you plug the board into the wall supply, you may see your missing OPPs.

    SmartReflex should not be related to this.  Regarding the board file code you mentioned, if the VDD_CORE voltage is configured to be <= 0.95V, we must disable the higher MPU frequencies.  SmartReflex being enabled will cause you to operate with a lower VDD_CORE voltage than normal, which could potentially trigger this.  However, you are saying that disabling SmartReflex causes your speed to be limited to 500MHz, and this doesn't make sense.

    Once we resolve why your frequencies get limited, then we can look at the SmartReflex side of it, which looks like an undervoltage condition to me.  It may be that your board has excessive "IR drop", causing the voltage presented at the processor to drop below what is expected.  We have a capability in the SR driver to add voltage margin to both the VDD_CORE and VDD_MPU supplies.

  • I think, that the reason that I do not get 600/720MHz OPPs are because of u-boot setting core to 1100mV. I have learned that in sdk 06.00.00, there's a new function, am_opp_update() which disables OPP120+ with this core voltage. So it turns out that regardless of SR activated in kernel or not, OPP120+ is no longer around. I started looking into this with older kernels, (without the am335x_opp_update() call), which is why those boards where running all OPPs at core 1100mV, and I made the wrong assumption that it was SR that made the OPP120+ go away.  I'm not running from USB.

    Disabling OPP120+ when core is at 1100 makes sense, looking into the reference manual. (Talking about ES1.0 parts). My take on this, is that I need to change u-boot to start board with 1200mV core to get these OPPs?

    But should SR really be allowed to lower core to 1026 and mpu to 976 when running on ES1.0 silicon @OPP100? According to sprs717f minimum voltage at OPP100 is 1056mV for both core and mpu. But maybe this is what SR does, going further down, below specified minimum? But then again, why then disable OPP120+ at 1100 (ES1.0) if SR is allowed to below spec.?

    I have done some investigations to IR drop, but there's nothing obvious there. There's some switching noise from the primary 5V regulator which is quite obvious in the 65217 area, but at the am335x, it is rather smooth. But maybe we are talking about so small IR drops that it is hard to make them out with just a scope and multimeter. How close to the limit will SR press the envelope? Multimeter gives about 1,5mV drop on both core and mpu (just for reference, multimeters are obviously not the best tool here), scope gives really just noise in the area of 50mV.

    Thanks,

     Micael