This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Should I See SRIO Errors between C6455 and FPGA

Hi,

Our board has a C6455 connected to an Altera FPGA via SRIO.

In my simple tests I appear to be able to transfer large amounts of data without seeing errors but when I make the tests slightly more complicated I start seeing errors.

For my first test I transfers data from the C6455 to the FPGA and this ran for a long time (several million transfers) without any errors or missed transfers.

To make the test more complex I started sending data in both directions at the same time and now I see errors and some of the data (and/or doorbells) gets lost. The errors occur about once in every few thousand transfers which I don't think is very long!

There is no synchronisation between the sending of data (and/or doorbells) in the 2 directions. In other words it is perfectly possible for both devices to start sending at the same time.

As far as I understand SRIO there should be no relationship between the 2 directions.

Does any of this make any sense to anyone, if so I would appreciate any thoughts about whether errors, lost packets/doorbells are to be expected!

Thanks,

Matt

  • One of my problems is that I don't always get the LSU transaction completion interrupt.

    I was trying to be interrupt driven, i.e. start LSU transaction and pend until the transaction completion interrupt occurs.

    But I've just changed to polling the LSU status (LSU_REG6) i.e. loop until the BSY field indicates not busy and now my system runs much more reliably.

    As far as I can tell it only becomes a problem when the SRIO is busy i.e. there are a variety of asynchronous transactions going in both directions (although I'm not sending vast amounts of data).

    There are still other problems that I can't put my finger on so I shall carry on scratching my head!

  • Hello, Mattb

     

    I work on similar task -  but with Xilinx and at begin of your way ( "simple" & successful test).  Since you connect TMS and Altera and there are problems, may be problem in Altera (core/project)? Has Altera's core/project enough buffers and has it separate and independent interfaces for initiator/response flow?

    BR, Serge.

  • Hi Serge,

    We have made some progress by removing a few problems between the Altera SRIO MegaCore and our FPGA logic.

    But:

    • I haven't worked why I don't always get the LSU transaction complete interrupt. This would appear to have nothing to do with the FPGA.
    • When I allow the system to run it doesn't take long for the SRIO error status to become non-zero on both sides.

    We are currently looking at a more serious problem! We have found that  we can't successfully send a burst of 50 SRIO packets followed by a doorbell from the FPGA.

    Here's what we have tried:

    • Send 50 packets of data, wait a while, send the data again and repeat - this works fine
    • Send 50 packets of data, send a doorbell, wait a while, send data and doorbell, keep repeating - we find that the data gets corrupted, specifically it appears that the words in the packet have got rearranged.

    No errors are indicated in the error status registers.

    I can't understand why the doorbell makes a difference!

    Our link is 1x/1x, 1.25 Gbps.

    sch said:
    Has Altera's core/project enough buffers and has it separate and independent interfaces for initiator/response flow?

    Good question, I'm not sure!

    We keep talking about the receive and transmit buffers, I think the FPGA transmit buffer is set to 8k. But as I understand it the FPGA logic uses a ready signal i.e. we don't pass data to the Altera MegaCore if it is not ready.

    Thanks,

    Matt

  • MattB said:
    I can't understand why the doorbell makes a difference!

    I now think the error occurs without the doorbell but much less often.

  • Hi,

    I am not sure, but could it be a problem that a particular doorbell is lost, because it is not registered at the time of a particular clock edge? Are you using doorbells in synchronous fashion? Also, it could be possible that the doorbell period is on the edge of the minimum permissible to imply it correctly, which means that registering it might help ensure that previous status of doorbell is not lost.

    Also, there could be a requirement of a synchronizer/phase lock mechanism between the DSP and FIFO of FPGA so that their clock edges are synchronized. It may not be a problem for small transfers but for huge number of transfers, the edges may get out of sync enough to generate random discrepancies.

    Hope, some of these suggestions help.

    Regards,

    Sid

     

  • Hi Sid,

    Sid said:
    I am not sure, but could it be a problem that a particular doorbell is lost, because it is not registered at the time of a particular clock edge? Are you using doorbells in synchronous fashion? Also, it could be possible that the doorbell period is on the edge of the minimum permissible to imply it correctly, which means that registering it might help ensure that previous status of doorbell is not lost.

    I don't think we're losing doorbells any more. That was a register synchronisation issue within the FPGA. If I asked for a doorbell (via the EMIF interface) just as the previous request was being serviced the second request could get lost.

    Sid said:
    Also, there could be a requirement of a synchronizer/phase lock mechanism between the DSP and FIFO of FPGA so that their clock edges are synchronized. It may not be a problem for small transfers but for huge number of transfers, the edges may get out of sync enough to generate random discrepancies.

    I don't understand what you mean!

    The FPGA logic is all driven from the SRIO clock so there should be no FPGA clock domain issues.

    But, are you suggesting the DSP's SRIO peripheral clock and the FPGA's SRIO clock have to be synchronised. I don't know anything about the design of our board and how the various clocks are generated and routed to the various components but I could talk to the board designer if I had some sensible questions!

    Thanks,

    Matt

  • Hello MattB,

    MattB said:
    we can't successfully send a burst of 50 SRIO packets followed by a doorbell from the FPGA.

    1.Who is initiator: FPGA or DSP?Could you tell ttype/ftype fields (packet type).

    2. Have packets max size (e.g. 256bytes)?

     

     

    MattB said:
    Our link is 1x/1x, 1.25 Gbps.

    I'm not sure, but typically this not very big stream for modern FPGA logic, as well for DSP DMA engine.

     

    MattB said:
    But as I understand it the FPGA logic uses a ready signal i.e. we don't pass data to the Altera MegaCore if it is not ready.

    You should check it first!

     

     

    BR, Serge

  • Hi Serge,

    sch said:
    1.Who is initiator: FPGA or DSP?Could you tell ttype/ftype fields (packet type).

    In this example we are sending data from the FPGA to the DSP.

    We are requesting NWRITE (i.e ftype = 5, ttype = 0100b?).

    sch said:
    2. Have packets max size (e.g. 256bytes)?

    Yes.

    sch said:
    I'm not sure, but typically this not very big stream for modern FPGA logic, as well for DSP DMA engine.

    Yes, this is what we thought. Hence we were not expecting any problems and now we are not sure how to find out what's going wrong.

    sch said:
    You should check it first!

    Hm, yes. But we're not sure how to!

    I've checked the port error status and logical/transport layer error detect on both sides and no errors are reported.

    I tried changing the NWRITE packet priority (I started with priority 0) but that didn't make any difference.

    We are in the process of creating an Altera design that contains back-to-back SRIO MegaCores to run in simulation. Unfortunately this will take a couple of days.

    I think I might set up a similar test using a EVM6455.

    Thanks,

    Matt

  • We appear to have fixed the problem of the data getting messed up! There was something wrong with the way one of the input FIFOs was being handled, something to do with the nearly full level being set to FIFO size -1 and this not given enough time to stop sending the data. The nearly full level has been moved down the FIFO a couple of words and now it appears to work.

    Thanks,

    Matt

  • Hi, Mattb,

    Doorbell sent by FPGA, too? If so, could you you explain how you use LSU?

     

     

     

    MattB said:
    We are in the process of creating an Altera design that contains back-to-back SRIO MegaCores to run in simulation. Unfortunately this will take a couple of days.

    :-D

     

    BR, Serge

  • Hi Serge,

    sch said:
    Doorbell sent by FPGA, too? If so, could you you explain how you use LSU?

    Er, I can try and explain but I'm not sure I understand myself!

    The Load/Store Unit is part of the SRIO peripheral on the C6455. It is used to initiated Direct I/O transfers and sending of doorbells. I use LSU registers 0 to 6 to send data and doorbells to the FPGA.

    I'm not sure which block within the SRIO peripheral receives doorbells but I use the doorbell routing registers to configure an interrupt and the interrupt handler uses the doorbell interrupt condition status register to decode which doorbell arrived.

    I use the linker to set aside areas of DSP memory that can receive data from the FPGA and the FPGA has configuration registers which I use to tell it where to send the data. The FPGA sends data using NWRITE transactions and then sends a doorbell.

    The Altera SRIO MegaCore has some similarities with the C6455 SRIO peripheral but the Altera terminology is different and it doesn't have anything called 'LSU'.

    The Altera SRIO MegaCore can be used to configure a memory mapped interface and somehow we manage to configure a memory mapped interface that allows the FPGA and C6455 to exchanged Direct I/O transfers!

    Matt

  • Hi, Matt,

     

    I'm sorry for unclear question, of course, LSU - is DSP engine and thanks for your kindly explanation about using in general. 

    I suggest that very important are transfer (direction,size,packettype), its order and some synchronization event that appear (like DSP ints and some status line/bits in FPGA project. I suggest that you manage FPGA project by some interface like PCIE?) . At moment I understood (sorry for my English):

    1. FPGA sent 50 packet NWRITE followed by doorbel.

    2. DSP got data in its mem and got interrupt when it receive doorbell.

    3. DSP send some data and then doorbell to FPGA

    4. Go to step 1 (or something like but during second iteration there was corrupted data in DSP mem)

    BR, Serge

     

  • Hi Serge,

    sch said:
    sorry for my English

    Please don't apologise for your English! My English is pretty poor and it is the only language I speak!

    sch said:
    1. FPGA sent 50 packet NWRITE followed by doorbel.

    2. DSP got data in its mem and got interrupt when it receive doorbell.

    3. DSP send some data and then doorbell to FPGA

    4. Go to step 1 (or something like but during second iteration there was corrupted data in DSP mem)

    That's one of the things we're doing. Steps 1, 2 and 3 get performed and then we wait for the next video field and do it again.

    Step 1 was failing to send the data correctly but now the FPGA FIFO handling has been corrected the data appears to be correct when it arrives at the DSP in step 2.

    The amount of data sent from DSP to FPGA in step 3 is much smaller (2 or 3 packets) and it is difficult to be sure but so far it appears okay.

    We are also continually transferring raw video from FPGA to DSP but the data rate is much slower so there is a delay between packets.

    Matt