Problems with simultaneous PCIe transactions (C6678 to Virtex 6 FPGA)

Gordon Deane

Hi,

We have a C6678 talking to a Xilinx Virtex 6 over PCIe and have a curious problem. There are several kinds of PCIe operations involved: the DSP reading 4k-ish blocks of data in bursts; the DSP writing 600 byte blocks of data in bursts; and the FPGA raising an MSI interrupt, which underneath is a 4-byte payload write to the DSP. (The blocks are done as EDMA operations on 128-byte DBS channels, and are therefore broken into 128-byte packets by the PCIe peripheral).

If we do only one thing at a time (read, write or MSI) and make sure they don't overlap in time, everything works fine. There is no evidence of link-level corruption and we have plenty of bandwidth.

If operations overlap - especially if MSIs arrive while read or write operations are going on - then things start to fall over. The MSIs are every 40 usec and the problems typically take only a few dozen to a few hundred of these periods to appear. The MSIs need to be acknowledged, which is a 4-byte write to a memory mapped register on the FPGA, otherwise they stop coming. So this is also potentially a source of parallel transactions.

The symptoms are various: sometimes we get Completion Timeout uncorrectable errors, more often no PCIe error at all but the MSIs just stop coming, possibly because one of them is not received. If we shorten the frame lengths then things work for longer but the problems don't go away.

They are on the same custom PCB and there are no other devices on the bus. The FPGA's PCIe interface is based on Xilinx's PCIe core but is providing a memory mapped view into the output of a signal processing chain and some registers ie. it is not simply some "RAM on a stick" as in the Xilinx example application. We also support burst transactions which the Xilinx demo application did not. So if we tried to go "back to the demo" we'd have to re-write not to use burst transactions, and the data rates would be a lot lower, so that isn't an attractive option.

We do not know if the problems are on the DSP end, the FPGA end, or some interaction of the two. We cannot yet reproduce the problem in simulation on the FPGA.

So my questions are:

(a) has anyone seen something like this before?

(b) are there any settings on the DSP side that might help, or at least help diagnose the issue?

(c) we have a Blackhawk XDS560v2 System Trace pod, and I have been wondering if it could give us a detailed trace of PCIe transactions, which might help construct a simulation case to break the FPGA core. Can we do this, and if so, how do I get that kind of capture?

Thanks in advance,

Gordon

over 12 years ago

0 Steven Ji over 12 years ago

TI__Genius 12065 points

Gordon,

May I ask if the FPGA triggers the same MSI interrupt in DSP (keeps writing the same vector value into the DSP PCIe register MSI_IRQ (0x21800054)) or does it write different vector values for different MSI events please?

And what will happen when the DSP receives the MSI interrupt from FPGA please?

I am wondering if you have the ISR related to the MSI event received in DSP and within the ISR, if the CorePac will clear the MSI event flag in the "MSIn_IRQ_STATUS" register (writing 1 to the bit field) and if the CorePac will write appropriate values (such as 0x4 for MSI vector 0/8/16/24, please see table 2-10 in PCIe user guide) into IRQ_EOI (0x21800050) register indicate the end of the MSI event in DSP.

I think after the steps above, the DSP will write something back to the FPGA via PCIe to acknowledge the FPGA as well, is it correct?

Do you have anything else in the ISR please? And will them be completed within the 40usec period, even we have concurrent traffic in parallel please?

0 Gordon Deane over 12 years ago in reply to Steven Ji

Expert 1255 points

Yes, it's always the same MSI (interrupt 0) handled by core 0. The core interrupt used is not shared with any other interrupts or host events.

The DSP is running SYS/BIOS 6, with a HWI as follows:

volatile CSL_Pciess_appRegs * const the_pciessAppRegisters = (volatile CSL_Pciess_appRegs *)CSL_PCIE_CONFIG_REGS;
void PcieDriver_MsiHwi(UArg arg)
{
    s_HwiCount++;
    (*s_msiCallback)(0);
    the_pciessAppRegisters->MSIX_IRQ[0].MSI_IRQ_STATUS = 0x1;
    the_pciessAppRegisters->IRQ_EOI = 0x4;
}

The MSI callback in normal use is as follows.

void Distributor_MsiCallback(int32_t channel)
{
    (void)channel;
    EDMA3_DRV_enableTransfer (hEdma,
            s_pciEdmaChannel,
            EDMA3_DRV_TRIG_MODE_MANUAL);

    /* Read ONE word from NOR flash, to toggle CE#, since
     * we happen to have a test point on CE#.
     */
    gpioSetOutput(1);
    iDummy = *(volatile uint16_t*)(0x70000000);
}

The acknowledgement of the read is not actually done by the HWI. It's done by a second DMA channel which is chained off the first on completion. So that won't happen until after the read has completed and the MSI EOI cleared.

I've tried the following instead, and can still get the system to fall over. This does no bulk reading at all, but just writes the acknowledgement directly from the HWI. This also falls over pretty quickly. That is, just doing bulk writes (and some small word writes) to the FPGA at the same time the FPGA is generating MSIs seems to cause the failure.

void Distributor_MsiTestCallback(int32_t channel)
{
    volatile uint32_t* pLastRead = (volatile uint32_t *)FPGAREG_LATEST_READ_TIMESLICE;

   iDummy = *(volatile uint16_t*)(0x70000000);
    *pLastRead = (msiHwiCount) % 8;
    msiHwiCount ++;
}

There are no other MSI events in the system, and cores 1-7 do not attempt to use the PCIe block in any way. All the PCIe traffic is either from Core 0 or EDMA CC1.

0 Steven Ji over 12 years ago in reply to Gordon Deane

TI__Genius 12065 points

In the simplified scenario (only bulk writing and MSI between DSP and FPGA), may I ask how the traffic looks like please?

I think the MSI event is sent from FPGA and then DSP writes the acknowledge back to FPGA in period (such as 40us).

Then what about the bulk writing? Does the EDMA keep writing the data from DSP to FPGA during the whole testing period please? Is there any idle time between each bulk write please?

I am wondering if the EDMA write blocks the CPU write of the MSI acknowledge since both of the masters are targeted the same PCIe slave port.

Could you try to lower the queue priority in the "QUEPRI" register for that EDMA CCn TCn, which is being used for the bulk write please? By default the EDMA TCs have the highest priority (0). Please give a try that if the lowest priority (7) on that TC will make things better.

Another thing is I do not see any limitation on the DSP side to stop receiving the MSI events, then what about the FPGA side?

Will FPGA stop sending the MSI event to DSP if the FPGA could not receive the MSI acknowledge in time (such as within 40us)? What will happen if you do not have this limitation on FPGA side?

And are you able to keep track of the time interval between the time of FPGA sending MSI and the time of FPGA receiving acknowledge? We can take a look to see if the time interval is much longer once the issue happens. It may prove the guess of whether the EDMA write blocks the CPU write of MSI on the DSP side.

0 Koray Korkmazer over 11 years ago in reply to Steven Ji

Intellectual 820 points

Hi,

Did you reach to a solution on this thread ? Because I am having a similar problem right now.
Here are the details:

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/307737.aspx

Moreover, I kept track of the time interval between the time of FPGA sending MSI and the time of FPGA receiving acknowledge. It is just as you said. The time interval is longer at the last MSI_0 transaction.( We used chipscope tool of FPGA for this measurement. I can send you the snapshot of chipcsope signal pattern if you want)

Lastly, I applied QUEPRI register solution but it didn't work, either.

regards,
koray.

0 Gordon Deane over 11 years ago in reply to Koray Korkmazer

Expert 1255 points

No, we did not solve it. We changed from an MSI to using a GPIO line from the FPGA to the DSP to signal data ready, as it is easy to trigger an interrupt on GPIO level change and in our case the DSP and FPGA are on the same board. So now all reads and writes are intiated by the DSP, and I control this with semaphores so that only one operation is ever taking place at a time. It is a little constraining but it works well enough for our application.

Can I ask what FPGA chip and PCIe IP you are using? We have a Xilinx Virtex-6, which has a a PCIe hard core for the link layer but very little usable free code for the transaction layer including DMA and MSI signalling. So we wrote our own transaction layer on a tight budget, and I have always blamed this for messing up simultaneous transactions. We have not investigated this with chipscope as we found the workaround above.

You can buy IP cores (eg. NorthWest logic) for the Xilinx chips that provide a sophisticated DMA transaction engine, but they are a surprisingly expensive extra cost for an interface (PCIe) that Xilinx advertise as "built in". We were quite disappointed with Xilinx over this.

Processors

Processors forum

Problems with simultaneous PCIe transactions (C6678 to Virtex 6 FPGA)