We have a custom Keystone 2 (66ak2ho6) board design for which a large percentage of boards (3 out of 12) hang in exactly the same way. The board is a very minor re-spin to a design that has never seen this problem for 25+ boards.
The hang is when u-boot tries to access the network (e.g., DHCP). Using the emulator, we have isolated the specific instruction that causes the hang. It is a read of the MDIO control register:
ctl = readl(&adap_mdio->CONTROL);
in drivers/net/keystone_net.c:keystone2_eth_mdio_enable
When stepping that instruction (the assembler instruction that actually performs the read), the emulator reports that it cannot halt the CPU because the pipeline is stalled. A system reset is required to regain control.
I also have a debug print before and after the offending instruction, and, when I run it directly (without the emulator) I see the pre-instruction message and not the post-message (when the board hangs).
I have also used the emulator to read that register (rather than stepping the instruction) and got read error from the emulator. I read other registers in the same area (Ethernet switch subsystem) and they cause the same problem.
Note that the board does not hang every time it reads that register, but every time the board does hang it is when reading that register. Also, I put two reads of that register (with intervening messages) and, while it sometime works successfully, it has never failed on the second read after succeeding on the first. Note also that the routine first writes to that register before reading it, and the write never hangs.
Does any of this sound familiar?
A specific question I have is: can a problem in the network coprocessor sybsystem could cause the ARM core to hang this way when reading such a register?
Thanks,
Lance