Serial RapidIO Streaming Synchronisation

jeremie veyret

Hi,

I need an advice on best way to implement Srio streaming synchonisation.

I have an FPGA streaming data directly into DSP memory (C6472) (swrite ftype 6) and I need the DSP to use this data but what's the best way for the DSP to be notified a data block is available.

Can the DSP monitor this transfer from external device and be notified when it's completed?

Should the FPGA do an additional operation to signal it has finished writting the data and can we be sure the data is written into memory at that stage.

The idea we had so far are:

External interrupt line but there's no way to know the packets have reached destination.

Do a subsequent write to DSP registers to trigger an interrupt.

Doorbell interrupt, that's another packet type to add but what about ordering.

From experience, maybe doing a read after a write ensures write packets reached destination but that's not ideal for data flow.

Anyway the best way must be the same with two DSPs exchanging data over srio so I'm sure this has been dealt with before.

Thanks for you advices,

Jeremie Veyret

over 13 years ago

0 Peter Robertson over 13 years ago

Expert 2770 points

Hi Jeremie,

This is the problem with mechanisms like SRIO which have two fundamental issues:

1: Transfers have to use external mechanisms (essentially a side channel) to know that a transmitter is permitted to write to a receiver, so what looks like an efficient mechanism has to be degraded to make it work; and

2: A transmitter has to have dangerous intimate knowledge of the receiver (such as its memory addresses or the messaging structures it has set up). For large systems, because one processor can randomly write to another's memory, there are serious problem in debugging memory corruption errors.

I shall write to you separately about how Diamond works round these problems.

0 Evan52341 over 13 years ago in reply to Peter Robertson

Prodigy 160 points

Hi Peter - I think this is a little unfair on SRIO. Transmit handshaking is handled by control symbols; if the receiver can't accept a packet, it sends back a PNA, and the transmitter retries at a later time. No side channels are required, unless you have a requirement for one higher up the comms stack, but that's not a SRIO issue. Yes, the transmitter can write directly to a known address in the receiver, but how that's handled is up to the receiver. TI chooses to DMA directly to the requested address, but this could alternatively have been handled by a device driver which decided whether or not the transfer should be allowed, whether address mapping was required, and so on.

Jeremie's problem is not SRIO, but TI's implementation of SRIO (or, more likely, the fact that the SRIO manual (sprue13j) appears to be unfinished). Incoming DirectIO is not properly documented. In principle, an SWRITE is received in the MAU, the MAU sets up and initiates a DMA, the DMA completes and interrupts a user thread on the DSP, telling the user that X bytes have arrived at address Y. It should be easy, and this would allow proper incoming streaming operation.

The problem is that I can't find any documentation or example code for this. 2.3.3.3/p44 is meant to document "Direct IO RX Operation", but it doesn't - it actually documents the response to Direct I/O transmit ops, ie. type 13 packets. No mention of incoming SWRITEs, or even of the MAU.

To make matters worse, none of the Interrupt Condition Routing Registers, or the Interrupt Status Decode registers, even mention the MAU. They only refer to the Dorrbells, RXU, TXU, LSU, and the err/rst/special events. So, at first sight, it's impossible to get an interrupt out of the MAU when an incoming SWRITE arrives. Or have I just missed something? Or does the MAU share CPPI interrupts? How? My reading of the manual is that TI's SRIO core did not initially support incoming SWRITE streaming, and someone added this to the manual at a later date, as an afterthought, without completing it.

So, if anyone from TI is reading this: I have an incoming SWRITE. The packet data has turned up in the correct location in DSP memory. I know it's there because, if I wait a bit, I can see it. However, I can't put a "wait a bit" in a real streaming application. How do I get an interrupt to tell me it's there? How do I find out where it was written to, and how much was written?

Jeremie - you could also try the Direct I/O library at http://processors.wiki.ti.com/index.php/DIO_Library. I haven't looked at it yet.

0 Peter Robertson over 13 years ago in reply to Evan52341

Expert 2770 points

"if the receiver can't accept a packet, it sends back a PNA, and the transmitter retries at a later time."

The is exactly what should be avoided at all costs; active waits are not a good solution to the problem of transferring data from one place to another. Consider a transmitter that is ready to send, but the receiver is not going to be ready to accept the data for a significant time (a very common circumstance in my customers' applications). The transmitter will have to keep retrying at this undefined 'later time'. How long does it wait? Too long and the other end will be held up waiting for the data when they are actually needed; too short and the wasted time actively retrying goes up. The only requirement is for efficiency, and the SRIO mechanism you describe doesn't give that.

"TI chooses to DMA directly to the requested address"

The actual movement of data is not the issue. However the transfer is done, the transmitter still is forced to make assumptions about how the receiver is handling its memory, something that should be no business of the transmitter. It's exactly the same issue as one function assuming the data structures used locally inside another function, something competent software developers abandoned decades ago.

I agree with you that TI's documentation is inadequate, but this is common. Writers seem to be more interested in simply dumping all the information (or showing how clever they are) rather than organising it in a useful way. As an example, just try assuming you're unfamiliar with the C6000 and work out how to get a device to interrupt a C6678. You end up chasing your tail through several separate documents, each of which assumes you know everything about the others and occasionally makes arbitrary changes to the names used elsewhere to refer to entities. All it would take is a one-page diagram showing how interrupts work and explicitly referencing the document that describes each stage. I make such diagrams as they're the only way my small brain can understand what's going on.

To be fair, I find TI's documentation, annoying as it is, usually to be better than most.

0 Evan52341 over 13 years ago in reply to Peter Robertson

Prodigy 160 points

Peter Robertson said:

"if the receiver can't accept a packet, it sends back a PNA, and the transmitter retries at a later time."

The is exactly what should be avoided at all costs; active waits are not a good solution to the problem of transferring data from one place to another. Consider a transmitter that is ready to send, but the receiver is not going to be ready to accept the data for a significant time (a very common circumstance in my customers' applications). The transmitter will have to keep retrying at this undefined 'later time'. How long does it wait? Too long and the other end will be held up waiting for the data when they are actually needed; too short and the wasted time actively retrying goes up. The only requirement is for efficiency, and the SRIO mechanism you describe doesn't give that.

But it's not actually an "active wait", unless we're using different terminology. The receiver sends back a PNA control symbol if there's an error in the packet, or the packet can't be accepted, or whatever. The transmitter doesn't have to wait for this symbol to be returned; it can immediately and continuously send another packet - in other words, it's pipelined. All the transmitter has to do is to manage its buffering so that it can re-use a tx buffer when it receives a Packet-Accepted symbol with the correct AckID for that buffer. This is zero-overhead streaming, with no side-channel, using the otherwise unused return path for control symbols. It can't be done any better in a system which guarantees packet delivery. You haven't suggested an alternative, but any other alternative has to be fire-and-forget, which has any number of other problems.

And, in streaming applications, which is what this thread is about, the system design must guarantee that the receiver is going to be ready. If you run out of buffering, then the system design is broken. I really can't see that SRIO is in any way "inefficient" for SWRITE-based streaming apps, apart from the obvious issues to do with encoding and packet overhead.

Peter Robertson said:

"TI chooses to DMA directly to the requested address"

The actual movement of data is not the issue. However the transfer is done, the transmitter still is forced to make assumptions about how the receiver is handling its memory, something that should be no business of the transmitter. It's exactly the same issue as one function assuming the data structures used locally inside another function, something competent software developers abandoned decades ago.

I don't see this, and you'd need to give me a specific example to convince me. You're presumably not referring to the fact that the transmitter puts a destination address in an SWRITE, since you seem to be saying that that's not the issue. I don't make any assumptions about the receiver in my Tx hardware (apart from inserting a destination address in a circular buffer); I just send numbered packets. Either the receiver can handle continuous SWRITEs, or it can't, in which case the receiver TI DSP code has to be rewritten.

This is of course not addressing Jeremie's original question, which is how the receiver thread on the DSP actually knows that a packet has been received. Unfortunately, with TI's limited bandwidth, I suspect that they may ignore questions which appear to have answers to them.

0 Peter Robertson over 13 years ago in reply to Evan52341

Expert 2770 points

Although the particular case that Jeremie mentioned involved streaming, I have been working with him on a more generalised non-streaming data transfer mechanism; that was what I was talking about.

"You're presumably not referring to the fact that the transmitter puts a destination address in an SWRITE, since you seem to be saying that that's not the issue."

That is precisely the issue.

"The receiver sends back a PNA control symbol if there's an error in the packet, or the packet can't be accepted".

You mention your "Tx hardware". Is that how you manage to send these control symbols? How does s/w on a C6000-C6000 connection arrange for this to happen?

"This is of course not addressing Jeremie's original question, which is how the receiver thread on the DSP actually knows that a packet has been received."

The receiver is completely passive and is only notified when the transmitter explicitly pings it to show that the data have been sent. It also begs the question about your PNA symbol suggestion. How can a receiver that doesn't know that a transfer into its memory is being initiated by a remote transmitter know to say no before anything is transmitted and memory is potentially corrupted?

0 Evan52341 over 13 years ago in reply to Peter Robertson

Prodigy 160 points

You mention your "Tx hardware". Is that how you manage to send these control symbols? How does s/w on a C6000-C6000 connection arrange for this to happen?

"My" hardware (and Jeremie's, as it happens) is an FPGA. I talk to the Xilinx SRIO core to send SWRITE packets. The Xilinx core handles control symbols, so they're not exposed at my level. Control Symbol handling is defined in Part 4 of the 'RapidIO Interconnect Specification'. A couple of quick extracts:

The RapidIO Part 4: 8/16 LP-LVDS Physical Layer Specification defines an exchange of packet and acknowledgment control symbols in which a destination or intermediate processing element (such as a switch) acknowledges receipt of a request or response packet from a source. If a packet cannot be accepted for any reason, an acknowledgment control symbol indicates that the original packet and any already transmitted subsequent packets should be resent. This behavior provides a flow control and transaction ordering mechanism between processing elements.

...

An end point device shall transmit an acknowledge control symbol for a request before the response transaction corresponding to that request.

...

A packet requires an identifier to uniquely identify its acknowledgment. This identifier, known as the acknowledge ID (or ackID), is three bits, allowing for a range of one to eight outstanding unacknowledged request or response packets between adjacent processing elements, however only up to seven outstanding unacknowledged packets are allowed at any one time.

This is also all under the hood on the DSP; the TI hardware can't do anything without control symbols. Two control symbols delimit an SWRITE on the wire, for example, which is why it's not necessary to put a byte count in the packet. You do get some access to control symbols on TI's core; you may need to do this to re-synchronise the identifiers covered above, for example, if something goes wrong.

Physical destination address in SWRITEs:

E: You're presumably not referring to the fact that the transmitter puts a destination address in an SWRITE, since you seem to be saying that that's not the issue.

P: That is precisely the issue.

Ok, but as I said, it was TI's implementation decision to DMA incoming SWRITEs directly into user memory. It doesn't have to be done that way. A device driver could have intercepted the SWRITE and done whatever processing it considered necessary, or rejected the packet. I don't have a problem with TI's implementation, and it makes life easier for me when sending data from the FPGA. I put the address of a portion of a physical circular buffer in the SWRITE packet, and it all just works, if the receiver can keep up. You don't have to do it this way if you don't like it; you can just use the mailboxes instead. This doesn't make RapidIO 'flawed' in any way. In a system where you have full control of both end-points, it makes a lot of sense.

The receiver is completely passive and is only notified when the transmitter explicitly pings it to show that the data have been sent.

This is the crux of the problem. If this is true, then TI's SRIO implementation cannot handle incoming streaming via SWRITEs, and is broken. That's what this thread is about. The statement is false if the TI hardware can generate an interrupt when the incoming DMA has completed, and can tell you where the incoming packet is, and how long it is. If it cannot generate an interrupt, then it is possible that there is no general solution to using incoming SWRITEs. The problem is that the receiver thread only cares about DMA completion; knowing that the transmitter has sent an additional transmit flag is completely useless. Note also that your statement is not true in general. Incoming messages are handled exactly as you would expect, by filling out buffer descriptors and generating an interrupt. The issue here is whether this also happens for Direct I/O (ie. incoming SWRITEs).

How can a receiver that doesn't know that a transfer into its memory is being initiated by a remote transmitter know to say no before anything is transmitted and memory is potentially corrupted?

The receiver always knows what's going on. It gets a control symbol when a packet is coming, and one when it finishes. The rest is down to how TI implemented the receiver. They almost certainly have a hardware receive buffer (the maximum packet size is 276 bytes), and they calculate and check a CRC on the fly, and they only decide to DMA into user memory when they complete the checks. The hardware may, or may not, care what the DMA target address is. Caveat emptor.

0 Evan52341 over 13 years ago in reply to Evan52341

Prodigy 160 points

Just found the preview button - I'll use that next time :)

0 Peter Robertson over 13 years ago in reply to Evan52341

Expert 2770 points

As I deal with the DSPs mainly, I see SRIO through the lens of TI's implementation. It may well be that the underlying physical mechanism is much better, but as implemented by TI it's a complete pain. Most of the control you see ("The receiver always knows what's going on") simply isn't there when you have to use what TI provides. That said, it's quite possible that there's an obscure way of getting the effect that's effectively-hidden in the documentation.

You seem to be implying that having destination addresses known to the transmitter is a TI thing and not inherent to SRIO. My objection is not to what the receiver does with incoming data, simply that the transmitter has to know about anything within the receiver.

Many of the problems I experience with TI can be explained by them having a fixed (and extremely limited) view of what can or should be done (dare I say they have a "we know better attitude"?).

0 jeremie veyret over 13 years ago in reply to Peter Robertson

Prodigy 30 points

I've looked through the examples for Direct I/O to see how inter DSPs exchange is setup.

The Receiver has said where the data should go (or it's fixed)
The transmitter writes into that then sends a doorbell interrupt
The receiver waits for that doorbell interrupt, uses the data and sends a doorbell interrupt back.
The transmitter waits for that doorbell interrupt before sending more.

The question is can we safely assume that when data is written followed by a doorbell interrupt when the doorbelt interrupt is received by the DSP the data will all be in memory. (It might not have been DMAed yet)
As long as it's not overtaking the DMA transfer to memory it should be enough to monitor a streaming transfer.
The doorbell interrupt sent back is for flow control.

0 tscheck over 13 years ago in reply to jeremie veyret

TI__Mastermind 23525 points

Sorry for the delay, I was made aware of this post just today. I will try to answer your questions and give you some background info. If I miss something, let me know, this is a large thread.

- First and foremost, there is an RX hardware mechanism in the peripheral that will make sure that any data transfer arrives in memory before a DOORBELL interrupt is fired. So for directIO packet types like SWRITE, where the complete transfer can be any number of packets, as long as the SWRITE packets and DOORBELL are sent on the same priority (and by the same LSU if the DSP is the TX device) such that there is no reordering, the payloads will have landed in memory before the interrupt is fired.

- DOORBELLs can be used for generic interrupt purposes. Since they are generic and their use is defined by the application, there is no reference to MAU or any specific data transfer. The use of DOORBELLs for the interrupt in your scenario is the best way to handle it. If the length of transfer is not agreed upon by TX and RX device in the system and the RX device needs to determine the total payload size, then you can do things such as send the total byte count in the payload of the message (first word for example, like a customer header), or if there are finite number of payload sizes, you could use the particular DOORBELL interrupt source to indicate (For example DOORBELL bit 1= 256B tranfer interrupt, bit 2 = 4096B). You have 64 to choose from...

- Direct mapping of the packet's DST address to actual memory address instead of a RX windowing/mapping scheme was an implementation decision for simplicity. Right or wrong, that is what we have. I can tell you that we've debugged many customer issues on FPGAs and other processors that have used memory windowing. Messaging (type 11) on the other hand allows the RX device to store transfers where it wants to.

- There was discussion on control symbols and physical layer flow control... SRIO serial physical layer spec defines the use and handshaking of idles, control symbols, etc between link partners. We implemented receiver based flow control (which is industry standard unless there are new devices on the market with things such as VC capability), which basically means that the receiving device accepts incoming packets as long as it has available buffering. This mode is a requirement for all SRIO serial compliant devices.

I know there were comments and topics touched on, let me know if I can help clarify.

Regards,

Travis

0 Evan52341 over 13 years ago in reply to tscheck

Prodigy 160 points

Thanks Travis. I think, with the benefit of hindsight, that SWRITE is not a particularly good way to get externally-streamed data into the 6472. Stopping the streaming op in the FPGA to send a DOORBELL occasionally just seems like a poor way to do this.

Given that the FPGA needs re-working anyway, what do you think is the best way to handle this? There is a point-to-point dedicated connection between the FPGA and the 6472, so no issues with priorities and so on.

FPGA streams via SWRITE, and occasionally sends a DOORBELL, as you suggested. There's no handshaking back to the FPGA (ie. a DOORBELL from the 6472); this would simply move the buffering problem from the 6472 to the FPGA.
The FPGA continuously 'streams' type 11 messages to the 6472, ignoring responses. This has the advantage that the 6472 CPPI hardware handles buffering and interrupts, but is more complex in 6472 software.
The 6472 issues READ ops via the LSU, and the streaming data is returned in the response. Interrupts and buffering are handled by the LSU (I think).
Stay with SWRITEs from the FPGA but, instead of using a DOORBELL interrupt, a thread on the 6472 busy-waits on changes in a flag location which is written by the FPGA.

Is there any demonstration software that handles any of these? The single example program for the 6472 really is of very little use.

Thanks.

0 tscheck over 13 years ago in reply to Evan52341

TI__Mastermind 23525 points

Honestly, all those methods will work and it comes down to preference. For simplicity, I always recommend DIO if it works in your system. SWRITE/NWRITE will given you the best bandwidth, so I'd stick with option 1 or 4 if you can. Again the difference there is whether you want to use an RX interrupt approach or polling. As far as examples, did you look at the DIO lib http://processors.wiki.ti.com/index.php/DIO_Library? That is probably the best. I may be able to dig up other internal examples, however I'm not sure what state they are in and hence prefer not to give them out.

Regards,

Travis

Processors

Processors forum

Serial RapidIO Streaming Synchronisation