Rules for multiple initiators accessing SRIO LSUs

Michael P

I am trying to understand the rules for multiple CorePACs to share SRIO LSUs.

SPRUGW1 (section 2.3.2, page 2-20 in SPRUGW1B) is clear that any LSU that is programmed by EDMA must be programmed only by a single EDMA channel, and cannot be simultaneously used by any other core or EDMA channel.

A post by tscheck (on 2011-10-4, subject "C6678 SRIO User's Guide") explains that software is responsible for ensuring that each SRCID is used by at most one core, and also that LSU1_ICSR should only be used for EDMA, while LSU0_ICSR should be used for Direct I/O by software.

SPRUGW1 section 2.3.2.5.3 discusses error handling using LSU0_ICSR. It suggests suppressing good interrupts (LSU_REG4.SUP_GINT=1) to reduce CPU loading, and using LSU_STAT_REGn to determine which LSU and LTID failed (and determine the LCB and completion code for that transaction).

In the example discussed in section 2.3.2.5.3, CorePAC0 and CorePAC1 are using LSU0, which is configured to have 4 shadow register sets. Each CorePAC has a list of eight transactions it sent to LSU0 (the last one for each of LTID=0..3 x LCB=0..1). Further suppose that CorePAC0 also uses LSU1. Is it allowed to use the same SRCID with both LSU0 and LSU1?

Suppose it is, and that LSU0 and LSU1 both report errors at some point -- for example, due to flow control (Xoff). CorePAC0 reads LSU_STAT_REG0 to find out which transaction failed on LSU0; suppose it is LTID=x, LCB=y. If SUP_GINT=1, is there a way for CorePAC0 to tell whether that was its last transaction with LTID=x and LCB=y, rather than a transaction programmed by CorePAC1? Or must the software choose to either use SUP_GINT=0 or limit itself to only one LSU per SRCID? (In the latter case, reading LSU0_ICSR indicates which SRCID failed, which would indicate which LSU has the relevant failure. That would be an unfortunate restriction on usage, especially with the recommendation to only use one priority per LSU: Then each SRCID is only usable by a single core, for a single priority, with a single LSU, and one could easily use up all 16 SRCIDs.)

over 7 years ago

0 Raja over 7 years ago

TI__Guru* 81335 points

Hi Michael,
I have assigned this thread to expert and will get back to you shortly. Thank you for your patience.

0 tscheck over 7 years ago

TI__Mastermind 23525 points

Yes, you can use the same SRCID on different LSUs, but you shouldn’t use the same SRCID with different cores.

Suppressing the interrupt will stop the interrupt from happening if it is a good completion, but the LSU_STAT register will still be written. The interrupt bit from an error condition will be based on the SRCID and as stated above, needs to be routed to one core (one SRCID/core). If an error occurs, only the core with the error is notified, and the LSU that had the error is stopped (can either be flushed or restarted by SW). The core with the error can then read the LSU_STAT register and find the errored transaction. If the LSU is flushed the CC is set to 111, otherwise the CC can be maintained. No other transactions from that LSU are sent out until restart or flush command is given, so the LTID and LCB don’t keep advancing.

If you are not suppressing the good interrupt, it is easy to track all completions. If you are suppressing the good interrupts, you can tell the status of previous transactions of the LSU regardless of which core(s) wrote to the LSU by…

Example:

LSU = 8 total shadow registers

2 cores are accessing the LSU

Core 1 already has grabbed three shadow registers corresponding to (‘LCB’LTID) ‘0’1; ‘0’3; ‘0’5 which it needs to keep track of.

At the next read of the LSU for lock

if LCB changes to ‘1’, and LTID is larger than 5,

for e.g. ‘1’2; => ‘0’1 is complete’; 0’3; ‘0’5 pending

or if LTID is greater than or equal to 5;

for e.g. ‘1’6 => all ‘0’1; ‘0’3; ‘0’5 complete

if LCB does not change

if LTID > 5; for e.g. ‘0’6=> 0’1; ‘0’3; ‘0’5 none are complete

if LTID < or EQ 5; for e.g. ‘0’2 => all ‘0’1; ‘0’3; ‘0’5 complete

0 Michael P over 7 years ago in reply to tscheck

Expert 1810 points

I don't think your suggestion resolves my underlying concern.

Suppose Core 0 is writing to LSU 0 and LSU 1, using SRCID 0. Core 1 is also writing to LSU 0, but uses SRCID 1. Both suppress good completion interrupts (to reduce DSP core loading). LSU 0 is set up with 8 shadow registers.

Core 0 initiates a LSU 0 transaction with`LCB'LTID `0'1, and an LSU 1 transaction with `0'4.

Core 1 performs many transactions on LSU 0, starting with `0'2 and eventually getting back to `0'1.

The SRIO fabric sends an XOFF that makes LSU 0 report a failure for `0'1 and LSU 1 report a failure for `0'4.

Core 0 and core 1 both get error interrupts for their respective SRCIDs. How does core 0 know that the only failure relevant to it is for LSU 1?

0 tscheck over 7 years ago in reply to Michael P

TI__Mastermind 23525 points

I think what you are missing is that the completion codes, including error codes, happen right away. Everything happens in order per LSU. Lets take your example...

Michael P said:
Core 0 initiates a LSU 0 transaction with`LCB'LTID `0'1, and an LSU 1 transaction with `0'4.

So here when the core gets a LSU, the LTID corresponds to the shadow register, the context simply indicates if it is the latest or previous one. So core 0 has '0'1 and when that shadow register has it's turn to send, the XOFF is present. If core 0 has '0'2, this transaction is not sent, it is blocked until core 0 responds to the error condition and does the restart/flush.

Regards,

Travis

0 Michael P over 7 years ago in reply to tscheck

Expert 1810 points

As I understood it, some completion codes do not occur as soon as the transaction comes to the front of the shadow register queue. For example, 0b001 (Transaction Timeout occurred on Non-posted transaction) must wait for the timeout. If a transaction is broken into multiple packets, an XOFF received between two of them can cause the transaction to fail part-way through. (I believe I have seen the latter occur in my testing.)

In my imaginary scenario, core 0 issues `0'1 to LSU 0, and it succeeds. Just afterwards, core 1 issues `0'2 through `0'7, then `1'0 through `1'7, then `0'0 to LSU 0. These all succeed. Core 0 issues `0'4 to LSU 1, and core 1 issues `0'1 to LSU 0. These fail (say, because an XOFF was received before they issued the last pair of transactions, or between packets for the two transactions).

So when core 0 sees that its SRCID encountered an error, it doesn't know which LSU the error was for. It sees errors for both LSU 0 and LSU 1, in both cases for the last `LCB'LTID it used for that LSU. How does it know which one error is the one it cares about?

[Edited to add: Even if an error occurs "right away", how would core 0 known which transaction was processed first by the respective LSU? LSU 0 could have been processing a "slow" transaction from some other initiator that delayed core 0's `0'1 transaction for it.]

0 tscheck over 7 years ago in reply to Michael P

TI__Mastermind 23525 points

Michael P said:
As I understood it, some completion codes do not occur as soon as the transaction comes to the front of the shadow register queue. For example, 0b001 (Transaction Timeout occurred on Non-posted transaction) must wait for the timeout. If a transaction is broken into multiple packets, an XOFF received between two of them can cause the transaction to fail part-way through. (I believe I have seen the latter occur in my testing.)

True, even for a non-posted transaction, when an error response occurs (CC=0b011), it means that at least one or more request segments were sent out and the response has been received. But what I was referring to this that before an LSU can move to the next shadow register, all preceding segments of the current LSU have to be completed. So if an error does happen, the LSU is stalled, and nothing else is sent out on that LSU, so the shadow register with the issue is at the front.

Michael P said:
In my imaginary scenario, core 0 issues `0'1 to LSU 0, and it succeeds. Just afterwards, core 1 issues `0'2 through `0'7, then `1'0 through `1'7, then `0'0 to LSU 0. These all succeed. Core 0 issues `0'4 to LSU 1, and core 1 issues `0'1 to LSU 0. These fail (say, because an XOFF was received before they issued the last pair of transactions, or between packets for the two transactions).

In this possible scenario, where you are suppressing good interrupts (although it may require a number of things to line-up perfectly), there would need to be a mechanism outside the LSU to track once the LCB has changed among cores. What I'm thinking here is that every time the LCB changes for a given LTID, the previous owner of the LTID needs to have it's ownership of that LTID cleared. So in your scenario, if core 0's tracking '0'1, then in a table accessible by both core 0 and core 1 you have '0'1 = Core 0 SRCID, as soon as core 1 gets the '1'1 lock, it sets '1'1 = to Core 1 SRCID, and '0'1 = Null in the table effectively. So when core 0 gets the interrupt for error it would need to check this table to see which if any transactions are pending on the various LSUs it is tracking. You would need a table per LSU that is editable by all cores using that LSU.

I'll run this by the designer and ask if there is a better way
.

Regards,

Travis

0 Michael P over 7 years ago in reply to tscheck

Expert 1810 points

Thanks. I realize the scenario I described should take a lot of things lining up inconveniently, but I am writing a library that I expect my co-workers to use, and want to know any provisos that might attach.

I am using an older part, so I think I can assign each LSU to a single core, although this changes how I can allocate shadow register sets across the LSUs. That would let me avoid the ambiguity without needing a critical section in the software.

Processors

Processors forum

Rules for multiple initiators accessing SRIO LSUs