Other Parts Discussed in Thread: DA8XX
Hi,
We are trying to make the Linux kernel boot on the K2HK SoC, but there is a problem that seems to come down to memory management in the QMSS part of the NetCP driver. The kernel we are using is 4.19 because that is an LTS version.
Generically, what we are trying to achieve is to boot Linux over the 1 GB Ethernet interface. We can already have U-Boot load the kernel image using TFTP over the same network interface and therefor, this shouldn't be about a hardware problem. The step we are working on now is to have Linux kernel mount rootfs over NFS. For that, we need the driver for that interface in kernel to work, and that's when the problems start.
It seems that the problem is not in the driver for the interface but in the NetCP driver, or perhaps more specifically in the driver for the QMSS. That driver first allocates memory for a pool of descriptors related to DMA, then maps it to some virtual addresses and finally tries to access those virtual addresses. That last step leads to a failing paging request and a kernel Oops.
There is one interesting detail in this problem that is related to the amount of RAM in the hardware and the address ranges of the allocations/mappings. On the K2HK EVM, which has 2 GB of RAM, the driver works. The problems appear on custom hardware, which has the same SoC but 4 GB of RAM. Our debug prints show that the address ranges used are quite different depending on the amount of RAM.
Right now, our best guess is that something goes wrong with handling of pointers. With 2 GB of RAM, 32-bit pointers are enough, but having anything more requires more bits, and we are suspecting that they are not handled correctly.
One thing we have tried is to limit the amount of RAM seen by the kernel with kernel argument mem=2G. That fixed the issue and made address ranges on the EVM and our custom hardware similar, but it brought other issues. We are still analyzing them, but it looks as if not all parts of the kernel were abiding the limitation of memory.
At least for now, having the amount of RAM artificially limited to 2 GB would be an acceptable work-around, if it didn't bring in this another issue.
This leads to us having two alternative paths to follow: fixing the original problem leading to the kernel Oops or making the limiting of the amount of RAM work completely.
At least the problem leading to the kernel Oops originates from the TI driver code, so we wonder if we could get any help regarding it on this forum. As for the problem that comes with the attempt to limit the amount of RAM, we are not yet sure about the origins of that, but if it rings any bell on someone, we would be glad about any ideas.