Tool/software: TI-RTOS
Hello,
Not sure if this is an AM5728 question or TI-RTOS question, but I guess I need to start someplace....
We have a AM5728 Custom board running TI-RTOS on the DSP Cores, using PCIe to talk to an FPGA. The PCIe
is working at Gen2 and we are using MSI interrupt from the EP (Altera FPGA) to the RC (DSP). The DSP
driver code is modeled very much after the PCIe sample code provided in the PDK.
The PCIe engine in the FPGA sends an MSI interrupt when a data transfer (EP->RC) completes. This allows
the RC to know when data has arrived and can then process it. We can transfer data from two possible
streams. Data from a single stream seems to work just fine. However, when both streams are active the
DSP will stop getting the MSI interrupt (everything else remains working, just the MSI interrupt stops)
and a complete reload/restart of the DSP is required. I can see the MSI memory write coming across the
PCIe interface (we have a Lecroy Protocol Analyzer connected), so I know the MSI is still working from
the EP, it is just not getting into the DSP.
I checked all the RC MSI/Interrupt setup registers and none of them change between working and non-working cases.
I started to dig deeper into what happens when we get an MSI interrupt and found the following.
PCIECTRL_TI_CONF_IRQSTATUS_MSI - this register is set when you get an INTx or MSI interrupt.
PCIECTRL_PL_MSI_CTRL_INT_STATUS_N - this register is set to indicate the MSI interrupt number.
These are used by the example code to read and clear the interrupts with the functions, Pcie_getPendingFuncInts()
and Pcie_clrPendingFuncInts().
I've also found forum posts that reference the PCIECTRL_TI_CONF_IRQ_EOI register. It is supposedly
required to be written to to enable further MSI interrupts, but it does not appear to be used by the
sample code and does not appear to be required since MSI interrupts are working in general.
So, I started looking at the IRQSTATUS_MSI and MSI_CTRL_INT_STATUS registers more closely when I was
in the failure case. It seemed like the MSI_CTRL_INT_STATUS was set but the IRQSTATUS_MSI was not. This
got me to look at the timing of the two interrupts. It appeared that if two MSI interrupts happened
within less than 4.5 usec of each other, the MSI interrupt would stop working. And, in fact, if I were to clear
the MSI_CTRL_INT_STATUS register (by writing its same value back to it) the MSI interrupt would start
working again.
So, that got me looking at the sample code function PlatformMsiIntxIsr(). The code starts with a call to
Pcie_getPendingFuncInts() to read the two registers IRQSTATUS_MSI and MSI_CTRL_INT_STATUS. Then bits from
those are checked to see if this was indeed an MSI interrupt and if it was the MSI interrupt expected. I
also added code to read a memory location to detect which of the two possible data transfers completed.
Then at the end of the function Pcie_clrPendingFuncInts() was called to clear the interrupt and presumably
enable future interrupts. Note that Pcie_clrPendingFuncInts() does not write to the EOI register.
So, I did a test, I moved the call to Pcie_clrPendingFuncInts() from the end of the ISR function to the
immediately the function Pcie_getPendingFuncInts() at the start of the ISR. This greatly improved things.
The code would run much longer before breaking, but would eventually stop servicing MSI
interrupts as before. So, then I changed the code to not only call Pcie_clrPendingFuncInts() at the
start of the ISR but also at the end. This allowed the code to continue running much longer, but I do
miss interrupts, as probably the IRQSTATUS_MSI is not set correctly, and will still eventually stop getting MSI.
It appears there can be a race condition when you get two MSI interrupts very close together. Perhaps this
was the purpose for the EOI register. Prevent more MSI interrupts until the SW is done servicing the
current MSI interrupt. However, that does not seem to be used.
So, I guess I am looking for suggestions on how to make this more reliable. How to handle closely
spaced MSI interrupts.
Thanks,
Chris