This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Intermittent non-linefetch faults when accessing GPR0 as PCIe EP.

Hi,

We have a DM8148 card that we are using as an endpoint on a custom PCIe board.

We have been able to successfully load it via PCIe as an EP with a PC using some of the examples from the TI PCIe user guide and wiki pages as a starting point.  We have been able to perform a fair amount of communication via the PCIe from a host PC to the DM8148.

However, on occasion during boot our DM8148 PCIe endpoint driver throws a fault when we try to read the GPR0 register in the PCIe peripheral core. The read is performed using an ioread32() on the GPR0 address that has been mapped using ioremap_noncached().  The fault generated by the kernel is:

Reading from 0xD7068070
Unhandled fault: external abort on non-linefetch (0x1008) at 0xd7068070
Internal error: : 1008 [#1]

When the fault occurs, it is always on the same ioread32() call.  When it does not happen, our communications code seems to work without error.  We don't always see this error, it is very intermittent.  Is there any suggestion as to how to isolate / debug the problem here?  I have seen some posts mentioning setting the read_rq() maximum size needing to be set to 256, but it's not clear how modifying any PCIe setting should fix this fault which is caused by reading a register on the local DM8148 peripheral bus....

Thanks.

-Mike

  • Hello Mike,

    I wasn't able to find someone that could know something here; this issue seems to be tough.

    Did you managed to resolve the issue in the meantime?

    Can you post some kind of update, please?

    BR

    Vladimir

  • Hi Vladimir,
    This issue seems to be tied to one piece of hardware, and we are suspecting that there may have been a signal integrity issue on the PCIe connection between the host and DM8148 based card (we have one other that has not exhibited this problem using the same load/kernel code).  If the PCIe link were faulty (perhaps the PLL in the DM8148 that is sinking the PCIe reference clock from the host is breaking lock or something) then I could see why this were happening.  I've seen one or two other posts on the E2E that suggest that a bad PCIe reference clock might result in these bus errors.
    We've notice that on that same card, the host doesn't always detect it (lspci doesn't always show the card in the system on the host) after a power up.  
    We are in the process of building up a couple more cards, if it presents on any more and the signal integrity looks OK then I will revisit this issue/post.
    Can probably consider this "closed" for now.  Thanks for following up.
    -Mike
     
  • Hi Mike,

    I am seeing the same problem as what you have seen earlier

    Can you share with me your solution?

    Thanks and regards,

    poh boon