This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

K2H SRIO throughput for same/different SRCID packets

Guru 10570 points

We use K2H SRIO to connect 2 FPGAs with 5Gbps, x2 lane, x2 port for each FPGA configuration.
Each FPGA transmits DirectIO packets(SWRITE) to K2H. And the final destination of the packet
is DDR3A memory under K2H.

Based on application note and other post report, we expect about 12Gbps throughput
as overall SRIO throughput which is sum of throughput on both ports.
- e2e.ti.com/.../322606
- Throughput Performance Guide for C66x KeyStone Devices [SPRABK5A1]

We observed following results:
- If both FPGA transmit DirectIO packets with different SRCID, it reaches about 12Gbps.
- If both FPGA transmit DirectIO packets with same SRCID, it reaches about 8-10Gbps.

Questions:
1) Is this result is reasonable? If yes, why?
2) Are there any configuration registers to increase the throughput for same SRCID case?

  • Hi RY-san,
    Please refer below post from "Travis",
    e2e.ti.com/.../1398127
    Thank you.
  • Hi, Rajasekaran-san,

    Thanks for pointing the other post.

    Our understanding based on the post you mentioned is as follows:
    - K2H SRIO processes same SRCID packets, from multiple ports, sequentially.
    - As a result, our observed throughput degradation has occurred.
    If yes, are there any method to increase the throughput?
    e.g. register settings or something.

    Best regards, RY

  • Hi RY-san,

    "If a read/write request comes in with a same SrcID, same DestID, and same or lower Priority as

    any pending transaction (pending completion status from DMA), then the MAU is stalled at this point to

    wait until those outstanding transactions (of same IDs, =< pri) are completed. "

    We would like to clarify this description.

    If SRIO peripheral receives posted-write type packets (SWRITE or NWRITE) continuously,

    with same SrcID/DestID/Priority, dose MAU stall until previous posted-write request reaches

    to final destination memory(such as DDR)? If yes, we think such mechanism impacts throughput negatively.

    ANSWER:  Correct, this is the functionality that was built in to the SRIO peripheral in order to maintain SRIO priority ordering on the chip.  What you have to remember is that the DMA bandwidth is very large, so the affects on the SRIO throughput are not huge.  If you take a look at the measured SRIO throughput performance in: www.ti.com/.../sprabk5a.pdf

    the number reflect the performance while using the same SRCID/DESTID on posted transactions.

    Note:

    I am not an expert on this however would like to highlight above post about throughput from Travis. I have requested experts to into this thread. Thank you for your patience.

  • Raja-san,

    Thanks for your comments.

    We think the mechanism you mentioned is applicable not only for x2 lane mode.
    The x4 lane mode seems to reach 12Gbps on other post, it may use same SRCID,
    however, our result shows x2 lane and x2 port dose not reach it.
    So we assume that there is other mechanisms to decrease the throughput on x2 port configuration.

    We are waiting an additional explanation and a patch to increase the throughput.

    Best regards, RY
  • Hi RY,

    For my understanding, two thinks will effect the SRIO throughput at same SRCID mode(scheduling and Error handling).

    If the SRCID is already being used for a non-posted transaction and another non-posted transaction comes from the same SRCID, the request is suspended until the previous SRCID transaction is not complete.

    The software decides to flush the transaction. It writes to the flush bit for the LSU. All transactions originating from the same SRCID that are present in the shadow registers for the specific LSU will be flushed. This can take more than one cycle to do the flush.

    Thanks,

  • Ganapathi,

    Your explanation seems to be focused on non-posted transactions generated by K2H, since it mentions LSU operations.

    Our use-case is, however, posted transactions generated by FPGA. For my understanding, MAU handles this type of transactions.

    Could you explain for our use-case?

    Best regards,

    RY

  • Could you please your return?
    Best regards, RY
  • Are you using max size packets, i.e. 256B? If so, there really isn't going to be a lot you can do other than use different srcIDs from the two FPGAs. Is there a reason that you are trying to avoid doing this? I'm surprised that you are seeing that much degradation, but the srcID is the only change when you see 12-->10Gbps??

    This limitation of srcID on the DMA transfers is in HW, there is no way to work around this specific implementation, though I'm very surprised you are seeing this.

    Regards,
    Travis
  • Travis-san,

    Yes, we use 256B packets and 12Gbps / 10Gpbs result is leaded by SRCID change only.

    The reason why we use same SRCID is to treat packets from both FPGAs as a supervisor privilege in K2H VBUS.
    Since the SUPRVSR_ID register in K2H SRIO, which is used to set supervisor privilege, can set only 1 SRCID,
    we use same SRCID on both FPGAs. If there is another way to make supervisor privilege from both FPGAs, that will be a solution.

    Best regards, RY

  • Understood.  Unfortunately, you are correct, in order to assign VBUS supervisor priviledge, the SRCID must be set tot he SUPRVSR_ID register value.  There is no other way.  What area of memory are you trying to access that needs supervisor permission?  Could you use another area of memory?

    I will discuss your situation with the SRIO designers and see if there is anything they can think of here.


    Regards,

    Travis

  • I spoke with the designer, and there really isn't any alternative here. Do you see the same degradation in throughput if both FPGA SRCIDs are the same but not equal to the SUPRVSR_ID value? Just curious.

    Regards,
    Travis
  • Travis-san,

    Thanks for your confirmation. That is unfortunate answer for us...

    We need supervisor privilege to access MSMC configuration registers.
    In our application, most of packets access DDR3 memory and a few packets access MSMC registers.
    So we decided to change SRCID dynamically, i.e. DDR3 access packets have different SRCID from both
    FPGAs and MSMC register access packets have same SRCID, equal to the SUPRVSRID, from both FPGAs.

    We have no throughput data if both FPGA SRCIDs are the same but not equal to the SUPRVSR_ID value.

    Best regards, RY
  • If the majority of the traffic is DDR, then it doesn't seem like a big issue.  Does the FPGA only support one SRCID?  If so, can it be changed dynamically during operation, so that most of the time it uses a unique ID, and only when accessing the MSMC registers does it use the SUPRVSR_ID?


    Regards,

    Travis

  • Travis-san,

    Yes, we have installed such dynamic SRCID change mechanism.
    The result shows this mechanism avoids throughput degradation.

    Best regards, RY

  • Excellent.