C6678 congestion control and transaction restart semantics

Michael P

Hi,

In my test system, a C6678 DSP sends SRIO SWRITEs to an FPGA, with a Byte_Count of several kilobytes per operation. The FPGA implements a FIFO at the destination region, and sends Type 7 XOFF and XON packets for flow control at high- and low- watermarks for the FIFO. After the FPGA sends an XOFF, we see the DSP send the first SRIO packet of the next LSU operation (which is fine). When the DSP software sees completion code 2 ("Packet not sent due to flow control blockade (Xoff)"), it writes the value 2 to LSUn_REG6 (Restart field = 1). After the FPGA sends XON, and presumably after the DSP triggers the Restart, we see the DSP perform the entire LSU operation -- including the first write. Because the FPGA treats the destination region like a FIFO, this causes the first 256 bytes of data to be written into the FIFO twice.

Is this kind of retransmission the expected behavior from TI's perspective? If so, is there any way to avoid it, or a bound on how many packets might be retransmitted in this scenario?

SPRUGW1B is not clear whether the logic in Figure 2-12 (selection of an LSU for the next transmission) occurs per LSU operation or per SRIO packet. If the former, I would have expected an entire LSU operation to complete before congestion control takes effect. If the latter, I would have expected behavior like the C6474's, where DSP address, remote address and Byte_Count fields in the LSU are updated as SRIO packets are sent (although I have not experimented with congestion control on a C6474). The DSP and FPGA here can implement a scheme to work around these retransmissions, but I want to make sure we understand the DSP's behavior so our workaround is complete.

Michael

over 10 years ago

0 Ganapathi Dhandapani95 over 10 years ago

TI__Mastermind 28085 points

Hi,

I will check with my team and get back to you.

Thanks,

0 Michael P over 10 years ago in reply to Ganapathi Dhandapani95

Expert 1810 points

I think at least part of what I was seeing was an error in my code. My IRQ handler for LSU completions would write to LSU_REG6's FLUSH bit, and then the application code would write to the RESTART bit. I can see how that would cause the same data to show up twice, but I would expect that both transmissions would be after the DSP received the XON message.

But there is something else going on that I do not understand. After I removed the flush, it looks like several DIO operations in a row all get completion code 2. For example:

1. My code issues an SWRITE to the first LSU, and gets LTID 1.
2. It gets an LSU interrupt; LSU_STAT_REG0 indicates that LTID 1 stopped with CC=2 (with matching LSU Context Bit).
3. It writes LSU0_REG6.RESTART=1.
4. It issues another SWRITE to the first LSU, and gets LTID 2.
5. It gets an LSU interrupt; LSU_STAT_REG0 indicates that LTID 1 and 2 are both stopped with CC=2.
6. Steps 3-5 might repeat with additional LTIDs, depending on how long it takes the FPGA to send XON, and LSU_STAT_REG0 showing all of them stopped with CC=2.

The FPGA also sees "gaps" corresponding entire DIO operations between its XOFF and XON transmissions.

Is it valid to restart a transfer that has Completion Code equal to 2? SPRUGW1B says "When the CPU issues a flush or restart command (described below) for an LSU, the context specific Completion Code bits for that LSU are reset back to '000'." I am not sure how to square that with the read from LSU_STAT_REG0 in step 5, which shows consecutive LTIDs failed due to XOFF.

Michael

0 Michael P over 10 years ago in reply to Michael P

Expert 1810 points

Can anyone tell me whether it is valid to restart a transfer that has Completion Code equal to 2? The documentation suggests (to me) that it is, but the behavior I see suggests otherwise.

0 tscheck over 10 years ago in reply to Michael P

TI__Mastermind 23525 points

Michael,

A couple things to note, after an error condition like the CC=002, you are doing the right thing by having the CPU jump start the LSU again. The only real difference between RESTART and FLUSH, is whether or not only the current transaction or all transactions in the shadow registers by the same SRCID are termintated. In either case, the current transaction is killed, so the CPU will have to reprogram a shadow register to cause the transation to resend. Now, you should be able to have the CPU read the LSU_reg3 to determine the number of bytes/packets sent already, but you must read this before issuing the RESTART/FLUSH command. This will be similar to the C64x implementation as you noted. The Xon/Xoff status is checked before sending each packet, so you will run into is a race condition on when the actual Xoff/Xon packets are received on the DSP and how many packets get sent before an Xoff is logged. For example, say you are sending 4KB transaction, the LSU will try to create and send the 16 TX packets to the physical layer for transmission as fast as possible, so you could for instance get 5 packets sent to the physical layer before the Xoff stops from sending the 6th packet. The good news is that the LSU will only show 5 packets of data sent via the LSU_rege3, and the packets in the physical layer will be sent, so when the LSU is reprogrammed after restart/flush, you only have to program the LSU to send 11 remaining packets. Depending on your use data and interactions between LSU transactions, you will need to decide on RESTART vs FLUSH.

Regards,

Travis

0 Michael P over 10 years ago in reply to tscheck

Expert 1810 points

Travis,

Thank you for the details on how the software should interact with the hardware -- it really helps clarify my mental model of how things are supposed to happen.

I think SPRUGW1 is quite misleading on this point, though, because its description of the RESTART bit (in Table 2-7 and Table 3-48) is: "If an LSU is frozen on an error condition, a write of '1' to this bit will restart the LSU from the transaction where it was working on before the error condition occurred." It sounds like a clearer description of its effect would be "[...] restart the LSU from the transaction after the one that detected the error."

Michael

0 tscheck over 10 years ago in reply to Michael P

TI__Mastermind 23525 points

I can see your confusion. Section 2.3.2.7 does a better job describing the whole error handling and is clearer on the point above:
- The software fixes the issue (for e.g. enables the port that is XOffed), and sets a restart bit for the specific LSU. The LSU will terminate the current transaction and load the next set of shadow registers.
- The software decides to flush the transaction. It writes to the flush bit for the LSU. All transactions originating from the same SRCID that are present in the shadow registers for the specific LSU will be flushed. This can take more than one cycle to do the flush.

Processors

Processors forum

C6678 congestion control and transaction restart semantics