This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

UCD9090A: Problem with all rails entering unrecoverable IDLE state during stand-by event

Part Number: UCD9090A

We are using the UCD9090A to sequence power supplies on a PCIe card that goes into servers.

The UCD is connected to PCIE_3V3_AUX, all supplies on the board sequenced by PCIE_3V3_AUX are sourced off the 12V PCIe. Therefore all sequencing waits for this 12V to be correct nominal values (12V has voltage and current monitored). The card has been tried and tested in production to work in a number of standard ATX mobos, servers etc.

During a soft-off where 12V drops off the card and 3V3 AUX is still available, we use the UCD to sequence off all supplies and continue to monitor. We've never had problems with this until we tried this in a specific server. For some reason in this server we can power cycle a number of times, each time during the standby all of the rails move from REGULATION to the sequence supplies becoming SEQ_ON and the 12V to RAMP_UP, then back to REGULATION. However every now and then all of the supplies go from REGULATION to IDLE during the exact same event and then when 12V becomes available again the supplies remain in IDLE.

 

1 First Picture:

 

  • 1st Stand-by event 31:35-32:30.
  • All supplies floating. 12V @ 1.9V. All sequenced supplies in a SEQ_ON state, 12V monitored supply in RAMP_UP state. This all looks correct and as expect to me.
  • The Rail monitoring graph looks correct and show the 0P85V rail drop from a nominal 0.85V to 0V at about 32:25.
      • Status Registers/Lies show that Vout#1 is showing an Under Voltage Fault (UVF). This is as expected as the supply has no 12V Vcc and therefore floating output Voltage, Slaved Fault is expected as parent supply is low. Sequence Off Timeout is just an indication that some sequence off condition wasn't met before the supply was dropped off.
      • Sequence Off Timeout is used to check that rail sequences off within a certain period of time. If timeout is used to detect if one of the rail's sequence off depencies is never met when this occurs a status bit is set.
        • Cannot read status of GPIO from this picture.

 

2 Second Picture

  • 1st Restart from Stand-by 32:10 - 33:30
  • All supplies nominal. 12V @ 12.538V. All sequenced supplies in a REGULATION state. All supplies appear to be present and working.
    • 0P85V shows the sequence off at approx 32:30 and then sequencing back on at 33:15.
      • Status Registers shows that POWER_GOOD# condition has been cleared but UVF and Sequence Off have not been. However I suspect these are historical as the ADC on the supply is reporting it is at 0.856V with 12P0V_SCALED showing the parent supply at 12.538V, clear faults button would remove these I think.
        • GPIO shows that PB_RST_N, MARGIN_EN and CFGTR RST_N are sitting high and therefore PB and CFGTR are deasserted and MARGIN_EN is asserted. MARGIN_HI is deasserted.

 

 

3 Third Picture

  • 2nst Stand-by even and 2nd Restart from Stand-by 34:40-36:00
  • All supplies nominal. 12V @ 12.526V. All sequenced supplies in a REGULATION state. All supplies appear to be present and working.
    • 0P85V shows the sequence off at approx 35:10 and then sequencing back on at 35:45.
      • Status Registers shows that POWER_GOOD# condition has been cleared but UVF and Sequence Off have not been. However I suspect these are historical as the ADC on the supply is reporting it is at 0.856V with 12P0V_SCALED showing the parent supply at 12.538V, clear faults button would remove these I think.
        • GPIO shows that PB_RST_N, MARGIN_EN and CFGTR RST_N are sitting high and therefore PB and CFGTR are deasserted and MARGIN_EN is asserted. MARGIN_HI is deasserted.

 

 

 

 

Pictures 4 and 5 show the same pattern for the 3rd Stand-by and Restart event.

 

Picture 4 shows that GPIs during this stage report Margin_en, Margin_hi, PB_RST and CFGTR as deasserted.

 

6 Sixth Picture

  • 4th Stand-by event 37:90-39:10.
  • All supplies floating. 12V @ 1.9V. All sequenced supplies in a IDLE state. This does not seem correct.
    • The Rail monitoring graph looks correct and shows 0 to 0.85 at 38:10 the 0P85V rail drop from a nominal 0.85V to 0V at about 38:10.
      • Status Registers/Lies show that Vout#1 is showing an Under Voltage Fault (UVF). This is as expected as the supply has no 12V Vcc and therefore floating output Voltage, Slaved Fault is expected as parent supply is low. Sequence Off Timeout is just an indication that some sequence off condition wasn't met before the supply was dropped off. HOWEVER, Output Off is now shown which does not seem correct.
        • GPIO is as expected.

 

 

7 Seventh Picture

  • Last Restart from Stand-by 38:30-39:50
  • All supplies nominal. 12V @ 12.6V. All sequenced supplies in an IDLE state. All sequenced supplies are off.
    • 0P85V shows the sequence off at approx 39:10 but then there is no sequence back on as expected with 12V coming back.
      • OUTPUT OFF Shown which is not correct. POWER GOOD has not become OK which is not correct.
        • Margin EN does not re-assert.

Bit 6 “output off” gets set in register 79, but there is no definition of what this is. Our Vin is 12 volts to the system, all other supplies are dependant on it Normal operation this supply has RAMPUP status and moves to regulation. When system fails it has IDLE and will not move to ramp up state why, since it has no dependanicie.

I've attached some tables one shows the PMBus log for a power down that successfully goes into SEQ_ON and RAMP_UP state. For conciseness I've included only STATUS_WORD, MFR_STATUS and STATUS_VOUT.

Good' Power Down
Timestamp Adapter Part_ID Address CommandID Code Page Phase Value_Encoded Value_Decoded
24:08.6 1 UCD9090A 104 STATUS_WORD 0x79     0x9001 NONE_OF_ABOVE,MFR,VOUT
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 0   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 0   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 1   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 1   0x00 <EMPTY>
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 2   0x00000001 SLAVED_FAULT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 2   0x00 <EMPTY>
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 3   0x00000001 SLAVED_FAULT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 3   0x00 <EMPTY>
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 4   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 4   0x00 <EMPTY>
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 5   0x00000001 SLAVED_FAULT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 5   0x20 VOUT_UV_WARN
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 6   0x00000001 SLAVED_FAULT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 6   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 7   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 7   0x00 <EMPTY>
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 8   0x00000000 <EMPTY>
24:08.6 1 UCD9090A 104 STATUS_VOUT 0x7A 8   0x10 VOUT_UV_FAULT
24:08.6 1 UCD9090A 104 MFR_STATUS 0xF3 9   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:09.3 1 UCD9090A 104 STATUS_WORD 0x79     0x9001 NONE_OF_ABOVE,MFR,VOUT
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 0   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 0   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 1   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 1   0x00 <EMPTY>
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 2   0x00000001 SLAVED_FAULT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 2   0x00 <EMPTY>
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 3   0x00000001 SLAVED_FAULT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 3   0x00 <EMPTY>
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 4   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 4   0x00 <EMPTY>
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 5   0x00000001 SLAVED_FAULT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 5   0x20 VOUT_UV_WARN
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 6   0x00000001 SLAVED_FAULT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 6   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 7   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 7   0x00 <EMPTY>
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 8   0x00000000 <EMPTY>
24:09.3 1 UCD9090A 104 STATUS_VOUT 0x7A 8   0x10 VOUT_UV_FAULT
24:09.3 1 UCD9090A 104 MFR_STATUS 0xF3 9   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT

And a table for a bad power down event. I compared both tables and the only difference is that after stand-by event on the bad power down. STATUS_WORD shows NONE_OF_ABOVE,OFF,POWER_GOOD,MFR,VOUT which shows Off and Power Good different from the 'good' table above. And STATUS_VOUT on one page VOUT_UV_FAULT,VOUT_UV_WARN.

Bad' Power Down
Timestamp Adapter Part_ID Address CommandID Code Page Phase Value_Encoded Value_Decoded
28:59.2 1 UCD9090A 104 STATUS_WORD 0x79     0x9001 NONE_OF_ABOVE,MFR,VOUT
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 0   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 0   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 1   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 1   0x00 <EMPTY>
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 2   0x00000001 SLAVED_FAULT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 2   0x00 <EMPTY>
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 3   0x00000001 SLAVED_FAULT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 3   0x00 <EMPTY>
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 4   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 4   0x00 <EMPTY>
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 5   0x00000001 SLAVED_FAULT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 5   0x20 VOUT_UV_WARN
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 6   0x00000001 SLAVED_FAULT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 6   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 7   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 7   0x00 <EMPTY>
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 8   0x00000000 <EMPTY>
28:59.2 1 UCD9090A 104 STATUS_VOUT 0x7A 8   0x10 VOUT_UV_FAULT
28:59.2 1 UCD9090A 104 MFR_STATUS 0xF3 9   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
29:00.1 1 UCD9090A 104 STATUS_WORD 0x79     0x9841 NONE_OF_ABOVE,OFF,POWER_GOOD,MFR,VOUT
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 0   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 0   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 1   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 1   0x00 <EMPTY>
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 2   0x00000001 SLAVED_FAULT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 2   0x00 <EMPTY>
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 3   0x00000001 SLAVED_FAULT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 3   0x00 <EMPTY>
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 4   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 4   0x00 <EMPTY>
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 5   0x00000001 SLAVED_FAULT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 5   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 6   0x00000001 SLAVED_FAULT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 6   0x30 VOUT_UV_FAULT,VOUT_UV_WARN
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 7   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 7   0x00 <EMPTY>
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 8   0x00000000 <EMPTY>
29:00.1 1 UCD9090A 104 STATUS_VOUT 0x7A 8   0x10 VOUT_UV_FAULT
29:00.1 1 UCD9090A 104 MFR_STATUS 0xF3 9   0x00000005 SLAVED_FAULT,SEQ_OFF_TIMEOUT

We also checked the potential that the fault was created by supply page 6 with VOUT_UV_FAULT. I tried making the VOUT_UV_WARN on page 6 to 0V but still saw the same log on a bad turn off event.

Hopefully this covers everything.

Best regards,

Sean Suttie.

  • EDIT:

    We also checked the potential that the fault was created by supply page 6 with VOUT_UV_FAULT. I tried making the VOUT_UV_FAULT on page 6 to 0V but still saw the same log on a bad turn off event.

  • Hi

    I assume that you have enabled the re-sequencing option in the fault response.

    In order to have a successful re-sequencing, all re-sequencing rails must stay below POWER_GOOD_OFF threshold. If one of the rail is back to normal before other rails reach POWER_GOOD_OFF threshold, the re-sequencing would be failed(stay at IDLE).

    My suspicion is that 12V is back to quick before the remaining rails are off.

    Under Global Config->Misc Configure->Resequencing, you can select the checkbox of 12V rail to exclude it from resequencing. This shall help you.

    If not, I would need your project file to further review this issue.

    Regards

    Yihe

  • Hi Yihe,

    I tried this, I assume the checkbox I had to select was in this region?

    I tested 2 UCD configurations today. One with only Rail #9 selected (12V) and one with all checked.

    We still saw the same problem with these new configurations.

    Is there a secure method  that  I can send you the Fusion Digital Power Designer files?

    Thanks,

    Sean.

  • Hello

    You can send me via local TI team(David).

    Via looking your own pmbus log. the problem could be from the rail with VOUT_UV_FAULT. ideally. when 12V is off, you shall not see UV fault from rails other than 12V. if that is the case, you may need to check the system.

    The solution is to enable the re-sequence option for that rail with UV_FAULT. This may help.

    do you know which rail shows extra UV_FAULT?

    Regards

    Yihe

  • Hi Yihe,

    The VOUT_UV_FAULT is shown on a rail for 1.2V. When I investigated this fault I tried to lower the VOUT_UV_FAULT threshold to 0 to see if it would remove this fault but had no success.

    I have sent the configuration files through email to David. I tried a build where i enabled the re-sequence for all rails but didn't have any success with this regrettably.

    Thanks,

    Sean.

  • Hello

    But the UV fault of 1.2V was not shown up on the correct resequencing case.

    Was this problem present after multiple 12V UV fault? How many tries to get this problem? UCD9090A has not reset or power cycle during these tests? If you power cycle the UCD9090A, will it starts working?

    From the snapshot, the time between resequencing is 0ms. Could you try to increase it?

    I will review the project file once receiving from Dave.

    Regards

    Yihe

  • Hi

    I received your files. In order to simplify the debug. I would suggest the following:

    1. disable the fault responses from all rails except 12V

    2. disable the glitch of 12V

    3. increase the resequencing interval to large number.

    regards

    Yihe

  • Hi Yihe,

    Thank you for your reply.

    I will try these items and create experimental configuration files.

    I'll feedback how the testing goes and what changes I've made to my configuration.

    Best regards,

    Sean.