Dear TI team,
we've seen an issue a few months ago (~April '21) with the U-Boot driver for initializing the LPDDR4 memory on our custom hardware.
At the time we found a workaround for this, and postponed looking into this further since we expected this issue to surface again with the SK-AM64, which is using LPDDR4, too.
I began looking into this again, and at first it seemed like the issue was gone. I checked the code, and while there is support for the SK-AM64 now, there are no changes in the U-Boot code that would suggest this problem is fixed. For a while I was suspecting that a newer version of the DDR register configuration tool might have changed some setting that avoided this problem (see related thread), but I've checked with a DDR .dtsi file generated with DDR tool verison 0.08.10.00, and it still showing the same effect.
I tried to recreate the issue, and realized that it appears to be timing sensitive. I'm suspecting that this is the reason why the code "works" for the SK-AM64. The EVM is probably not affected, since the problematic code isn't used on the EVM with its DDR4 memory. I don't feel comfortable relying on the code to be "fast enough" so that it doesn't run into this issue.
With LPDDR4 memory the DDR controller and memory start at a much lower frequency FSP0. As part of the initialization sequence the frequency is switched to FSP1 and FSP2. The necessary clock is generated outside of the DDR4 controller, and the controller apparently signals via some CTRLMMR register bits the desired frequency. Software needs to read these registers, needs to reconfigure the DDRSS0 input clock, and signals completion back to the DDR controller via the CTRLMMR registers.
The code for this sequence is implemented in the U-Boot driver:
k3_lpddr4_info_handler -> k3_lpddr4_ack_freq_upd_req -> clk_set_rate
The issue that I'm seeing is that during this sequence, the R5f can stall indefinitely on the read from the LPDDR4__DRAM_CLASS__REG register that is used to determine if the current system uses a DDR4 or LPDDR4 memory. When my U-Boot is stuck, I can unfreeze it via R5F debug registers, and I can see that it is always stuck on the load from the LPDDR4__DRAM_CLASS__REG. In this situation none of the DDR controller registers appear to be accessible, and if I try to access them the core stalls. If I try reading any of these registers via CCS the debugger looses connection.
Originally we had logging and debug code enabled in U-Boot, and I'm guessing that this is what caused the problem to show. With a U-Boot compiled with little logging and no debug messages the problem doesn't show. If I halt the R5f BEFORE it starts initializing the DDR4 controller and then let it run, everything is fine. If I halt the R5f inside the k3_lpddr4_info_handler, the processors stalls once it tried to determine the type of DDR memory.
If I just put a small counting loop inside the info_handler, it depends on the delay whether the code stalls or works. If I count to 100,000 it works, if I count to 1,000,000 it fails.
I've been told that my colleagues have been seeing this issue even without any artifical delay loops - I'll need to talk to them how the U-Boot was configured at the time.
Is there any explanation why the R5f stalls indefinitely when trying to read from DDR controller registers during the FSP frequency update?
Is there any explanation why it works when the read from the DDR controller happens "soon enough", but fails when too much time passed?
Our workaround modifies the driver to cache the content of this register in a data structure, because this is the only DDR controller register that is actually read during the FSP frequency update, and the type of memory is of course fixed for the execution of the bootloader. If there's a reason why reading the registers is problematic during the FSP frequency update I believe this fix would be "correct", but the DDR controller is basically undocumented, and it is impossible for us to be certain of this.
Regards,
Dominic