Other Parts Discussed in Thread: AM5728
Intro:
We have a board based around Phy-Tec's AM5728-based SOM which contains the recommended TPS659037 PMIC. We are using Phy-Tec's kernel (4.19.79) and u-boot (2019.01) which are from the TI-SDK.
What Happened:
I was debugging a unrelated driver a few days ago, which caused a kernel panic. After this happened, Linux would get to the point boot in which the palmas-pmic driver is probed and then the board would reset. I originally thought this was some sort of bug in the driver, so I added some printks to track down the problem. The driver does read-modify-write operations on the CTRL/status registers of each regulator on the PMIC, but the read operation was returning all zeros. When the driver wrote the modified value back, the regulator shut down (as expected since the mode field was set to "off").
I then thought that this was some sort of SW regression, so I went back to a commit that I knew worked and built completely from scratch, but the same thing happened. I then flashed a build that had been built before the issue started and it still happened.
So I decided to test out the issue in the u-boot console via the i2c command. Specifically, I issued "i2c md 0x58 0x20 1", which reads the CTRL register of SMPS12. I'd expect this to return 0x33, since this regulator should power up to mode 0b11 and report its status as 0b11 in bits 5:4. The first read returned 0. The next read returned the expected value, so I decided to see if it was some sort random intermittent failure. I noticed a pattern. After the first read of 0, I get 23 reads of 0x33, then this sequence: 0x02, 0x00, 0x01, 0x01, 0x03, 0x00, 0x01, 0x01, 0x00, then another 23 reads of 0x33, and so on. Reading other CTRL registers does not reset the sequence. Those reads return the next value in the sequence. The voltage select registers seem affected as well, and follow the same pattern (with exactly the same bad values).
Other notes:
This I2C bus is shared with an RTC @ 0x68 and an EEPROM @ 0x50. I disconnected the battery backup to the RTC but observed no change in symptoms. I don't think it's a bug in the other devices, though, since other registers on the PMIC, like the VENDOR_ID register, read back perfectly every time.
I found a board that hadn't been damaged yet and did the same probing in u-boot that I did with the damaged ones. The reads from the PMIC functioned normally.
Questions:
What are the possible causes? It happened with no relevant SW changes, but I suspect something like the OTP programming registers got written during the initial crash (but that's pure speculation).
Is this sort of thing recoverable or does the PMIC need replaced? It seems to function somewhat normally aside from the faulty reads. For now, I'm going to patch the Linux driver to always assume the reads from the CTRL register include the correct mode bits (which is a safe assumption seeing as though if it were not the case, there would not be power to the processor).