This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/TPS23861: TPS23861's Particular Batch has a High Failure Rate?

Part Number: TPS23861

Tool/software: Linux

We have a problem with TPS23861 chipset. We have been using this chipset in our project under normal environmental temperature. After 24 hours of continuous running, about 2.5% of our entire batch of 1000 pcs would exhibit failure. The PoE would not be able to detect our PD device and supply electricity to our board.

  • We have a problem with TPS23861 chipset. We have been using this chipset in our project under normal environmental temperature. After 24 hours of continuous running, about 2.5% of our entire batch of 1000 pcs would exhibit failure. The PoE would not be able to detect our PD device and supply electricity to our board.

     

    We initially thought that it might be due to a huge spike or fluctuations on the 48V that caused the chipset to fail. Therefore, we added a lot of protective circuits including adding a 510-ohm resistor on the 48V, as well as adding a protective transistor and some capacitors. However, this does not reduce the failure rate of this chipset on our board.

     

    Attached is our Schematic for the PSE and the step-up voltage circuit:

    1. External supply of 27V. We stepped up to 48V through DCDC boost.
    2. We controlled TPS23861’s I2C and Reset through the MPU. MPU’s IO Voltage is 2.62V.

     

    This is our schematic:

     

     

    When we used an oscilloscope to test Drain1 to Drain4, we are unable to see any detection signal that is supposed to be there (as illustrated in the picture below):

     

    (We could not see any of these detection signals. Our signal is always one straight line)

     

    We have also done some comparison on the registers of the spoilt and normal chipset and discovered that for all the spoilt chipsets, they have the same “Firmware Revision” (0x41)and “Silicon Revision Number”(0x43). The normal chipset does not have this “Firmware Revision” or “Silicon Revision Number”. This seems to imply that this particular batch of chipsets with this “Firmware Revision” or “Silicon Revision Number” is more susceptible to failure, but we cannot be absolutely sure. We also do not know if these parameters have been re-written after it is spoilt, though from our understanding, these 2 registers are not writable. 

     

    Normal Chipset Register Dump

     

    Spoilt Chipset Register Dump

     

    We have analyzed the spoilt chipset as well, and do not see any shorted circuits or anything that is abnormal.

     

    Currently, we are at a loss. Can somebody advise us on what to do next?

     

  • Hi Shuanglin,

    Can you describe how did you run your test for the 24 hours? Did you keep the 48V power always on and the ports always on? Or did you cycle your system consistently? Can you also check TPS23861's power on sequence(VPWR, VDD and RESET)? We need a similar waveform as as figure 1 in this app note:www.ti.com/.../slva723.pdf. Thanks.

    Best regards,
    Penny
  • Hi Penny:

    We are so sorry that we have submitted so many different cases to you. This is because during the submission, something gone wrong and seems to have rejected our submission.

    Below is the picture that illustrate how we test our boards for 24 hours.

    Answers are listed below:

    Question 1: Did you keep the 48V power always on and the ports always on?

    Answer 1: 48V power is always on, and it is being supplied by a DCDC on our board. This DCDC steps up the 27V that we feed in and convert into 48V to supply to TPS23861 and an external PD device. When the boards are being tested, the PD device is not connected.

    Question 2: Or did you cycle your system consistently?

    Answer 2: When the system is powered on, we do not cut the power supply, which means that we are not cycling the system at all.

    Question 3: Regarding slva723.pdf waveforms, my testing results are as below:

    I noticed that our power-on waveform might not be able to satisfy the document’s requirements. This is mainly because when our MPU is powered on, IO’s status is high. When our software system is up, then the software can modify the IO output status. Therefore, when we powered on, we reset the chipset. Would this be the possible cause of the chipset’s failure?

  • Hi Shuanglin,

    No problem. We can focus on this thread and close other threads. The improper power on sequence could cause TPS23861 trim bits corruption and it could not be recovered. You need to find a way to keep the RESET pin low before Vpwr reaches to its UVLO.
    Can you also send me the schematic of how TPS23861's 3.3V is generated? Thanks.

    Best regards,
    Penny
  • Hi Penny:

    Thank you very much for your advice and your professional assistance.

    Below is our power supply’s block diagram and schematic. It would show how TP23861’s 3.3V is being generated.

    Currently, we have already produced a small batch to send to our customers. It would be extremely difficult for us if we needed to do a major change to our board.

     

    1. Can you help us to think of an alternative way to resolve this problem?
    2. Also, in your opinion, do you think that for those boards that have already been through the 24 hour aging process, they would not exhibit this problem in the future?

     

    We also discovered that for those spoilt chipsets, the registers in I2C showed that the “Firmware Revision” (0x41=0x00) and “Silicon Revision Number” (0x43=0xe2) are the same. It seems that only this particular batch exhibit this type of failure. In your opinion, do you think that this batch is particularly sensitive to this kind of failure, or is it because of our abnormal power on sequence that cause these 2 registers to be re-written (normal values should be 0x41=0x02,0x43=0xe3)? The chipsets are from the same batch, and in the same packaging, therefore these registers’ should have been the same.

     

    Our main intention is to see if this particular batch would exhibit the problem more easily, so that we can then pick out the boards which has this chipset of this particular batch from our production to destroy and not hand them over to our customer. This would be the most cost-effective solution for us right now. In the next batch, we would modify our hardware to eliminate the problem completely.

  • Hi Shuanglin,

    I think the firmware version should be the same originally. To solve this problem, I think you can change the default output of the GPIO to low. I know there's a way to do it with our MSP430, but I think there must be something similar with your MPU. Thanks.

    Best regards,
    Penny
  • Hi Penny:

     

    Our MPU is based on a MIPs processor. The software running is uboot + linux. If it is a microprocessor, we can use software to control the power on IO status. However, this particular processor does not have a way to control this power on IO status. Therefore, what we can do is to wait till uboot boots up, then take over control of the GPIO. But we would already past the power on timing when we are able to control.

    Currently, we have found a way to modify the hardware to fulfil the power on sequence.  We used GPIO and a pull-down resistor to modify the reset and power supply sequence. When it is powered on for 80ms, the reset would have a short spike (this spike is not controlled by us).

     

    After modification of the hardware, the waveform becomes as shown below:

    Can you help us to review if this power on sequence can satisfy TPS23861’s requirements?

     

    For current products, we have already sent out 2K pieces to our customer. If we recall back the products, it would definitely cause us some great distress. Those 2K products has all been through 24-hour aging process. Every product, we have power on at least 4 times (during features testing as well as burning in the MAC Address). There are about 6 spoilt boards, which would correspond to 3% failure rate. All the boards’ chipsets showed that the “Silicon Revision Number” is 0xe2. Do you think it is possible for those chipsets’ versions that is 0xe3 would not spoil as easily as 0xe2 for this situation? Or is it possible that 0xe3 chipsets would not meet with such a problem at all?

     

    We are asking this question is because, if we can use software to determine which board’s chipsets is 0xe2, then we can just pick out those boards. We have checked our products, and found that 0xe2 is the minority of our pool of boards that we sent to customers.

     

    In summary,

    1.  Please do help us review if our modified hardware proposal can satisfy TPS23861’s requirements.

    2.  Is 0xe2 more susceptible to these kinds of problems? Can 0xe3 still works without any problems under our situation?

     

    Thank you so much for your help. We would definitely modify our PCB hardware in the future, but we first need to resolve the current situation. Hopefully, under your professional assistance, we would be able to resolve this problem.

     

    Thanks so much,

    Shuanglin

  • Hi Penny:

            Sorry for the rush, but because we are in the middle of production and also handing over to customers, thus this is a very critical issue for us. If you can give us any information or progress that you have as soon as you can, we would be very grateful to you.

  • Hi Shuanglin,

    Sorry for the late response. I am still waiting for my development team to confirm. My initial reply is that all our device should have the same firmware version which is 0xE3. The improper power on sequence may cause the firmware version problem. I think the power on sequence after rework should be fine but it would be good to run the test again. The improper power on sequence could cause the device malfunction and it is highly recommended to be fixed. Thanks.

    Best regards,

    Penny

  • Hi Penny:

     

    We have done some experiments on the modified boards and the unmodified boards:

    1. Put 10 boards of each group under standard operating temperatures and do power cycling.
    2. For every board, we would do a power cycling namely: power down for 10s, power on for 60s.
    3. During the experiment, the environmental temperature is 32 degrees Celsius, humidity is 53%. There is no PoE load (no PD device is connected).
    4. Before the experiment, we perform a dump on the register readings and all the dump shows that the chipset’s firmware version is 0xe3.

     

    Currently we have tested 20 hours, and every board has been through 1000 times of above testing. We did not discover that the TPS23861 exhibit any failure. Currently the tests are still on-going.

     

    We hope that through the above comparison tests, we can repeat the failure on the unmodified boards and see no failure on the modified boards. This would then confirm that our current analysis on this type of failure can be resolved by modification of the boards.

     

    Please do continue to ask your technical support team if there are anything that we should take note of and also our question on why the read-only registers would be able to be re-written to 0xe2. We hope to be able to eliminate this problem completely so that we can quickly move on and resolve the problems for our customers.

     

    Once again, thanks so much for being so patient with us.

     

    Thanks,

    Shuanglin

  • Hi Penny, currently our comparison tests are still carrying on, and we have already tested more than 96 hours (4 days) and above. Both sets of testing does not recur the failure problem of the PoE, thus we are still very concerned if we have resolved the PoE problem altogether. Do you have any news on your side? Thanks and we are so sorry for the rush.

  • Hi Shuanglin,

    Our firmware set register 0x42 to 0x2 and register ox43 to 0xE3, so all the devices should have the same value in these 2 registers. Thanks.

    Best regards,

    Penny

  • Hi Penny,
    Device ID should be only at 0x43, and at that register, we see that that bad batch is 0xe2 and the good batch is 0xe3. Why is this so?
    Also, in your very first reply, you mentioned that “The improper power on sequence could cause TPS23861 trim bits corruption and it could not be recovered. “, can you let us know what is “trim bits corruption”?
     
    Thanks,
    Shuanglin


  • Hi Shuanglin,

    I think the 0xE2 might be caused by your improper power on sequence. I have seen similar issues from another customer, and after they fix the power on sequence issue, the TPS23861 malfunction disappeared. If they reset is high(digital system is “enabled” )  before its power supply is considered “valid”, the digital system can enter into an unknown state.  Thanks.

    Best regards,

    Penny

  • Hi Penny:

    We used the new method and reroute our boards, and we are currently undergoing aging process on 1K of boards. These tests would end on Aug 3rd. Currently we are still very worried about whether the issues has really been found as we have been unable to find an absolute solution. Therefore if you do not mind, is it possible that we extend the case to around Aug 3rd? We would be very grateful if you can do so. Thanks.

  • Hi Penny:

         we have completed our production of this 1K pieces, and we did not find out any defective PoE device. We believe we have successfully resolved the problem with your help.

    Thanks so much again!  ^_^