SRIO drops silently Type 11 messages

Sergiu Stambolian

Hello,

My application is running on a board with multiple TI C6678 chips and 2 SRIO Switches.
I am using SRIO communication between C6678s either directly (chip-to-chip via SRIO Ports) or indirectly (via SRIO Switches).
I have several problems, but the biggest of them is that Type 11 messages are lost during SRIO communication.

Here are the configuration and test details.

1. Configuration - SRIO Device
- Four 1x SRIO Ports are configured; SRIO Port 3 is not used.
- Each SRIO Port operates at 1.25Gbps
- Each SRIO device is connected to the neighbouring SRIO devices via SRIO Ports 1 and 2.
- Each SRIO device is connected to the remote (non-neighbouring) SRIO devices via Port 0 which is connected to a SRIO Switch.
- 8 Device IDs (one per Core) are defined as follows: one ID is defined using standard CSR and the rest of 7 IDs are defined using the TLM Port Base Routing registers.
- 6 Garbage queues are defined to collect descriptors in error situations.

2. Configuration - Queues, SRIO Drivers, Sockets, etc.
- We have a "total communication" requirement for SRIO: each Core on any C6678 chip should be able to communicate via SRIO with any Core on any other C6678 chip on the board.
- Type 11 messages are used for communication
- Given the requirement above and various limitations (16 SRIO queues per device, etc.), I came up with the SRIO topology below.
- 3 SRIO Queues/C6678 chip: each queue is assigned (CSL_SRIO_SetTxQueueSchedInfo) to one active SRIO Port (0, 1 and 2).
- 1 TX Free Queue/Core: 1 descriptor
- 1 RX Free Queue/Core: 1023 descriptors
- 1 RX Completion Queue/Core
- 4 SRIO Drivers/Core: Application Managed, Polling Mode
- 3 SRIO TX Drivers are associated (1:1) to the 3 SRIO Queues above. Each of the 3 Drivers manages SRIO transmission on a separate SRIO Port. These 3 Drivers are NOT configured for the receive operation.
- 1 SRIO RX Driver configured for the receive operation: It defines a Receive Flow which accepts messages for a given Device ID (CSL_SRIO_SetFlowControl). The Receive Flow uses the RX Free and RX Completion Queues above.
- 4 SRIO Sockets: Type 11, Raw, Non-Blocking, Multi Segment, one for each SRIO Driver above.
- the 3 SRIO sockets associated to the TX drivers have Pending Packet Count set to 8 .
- the SRIO socket associated to the RX driver is bound to the Core's Device ID, accepts ANY Mailbox/Letter and it has Pending Packet Count set to 1023 (maximum size of the RX Free Queue).

3. SRIO Driver
- I made the following change to the standard driver as supplied by TI (pdk_C6678_1_1_2_5/packages/ti/drv/srio): when starting an SRIO Driver, if the TX Queue is specified, open the associated CPDMA channel number

3. Tests
- Set-up:
   - 8 Core on the same C6678 chip wake up every 10 ms and each Core sends an 1500 bytes Type 11 message to a remote (non-neighbouring) Core.
   - Since the destination Core is remote, SRIO Port 0 and the SRIO Switch will be used.
- Execution:
   - Test 1: the 8 sending Cores do the sending operation in 100 consecutive 10 ms time slots. Results are correct: each sending Core sends 100 messages and the destination Core receives a total of 800 messages.
   - Test 2: the 8 sending Cores do the sending operation in 200 consecutive 10 ms time slots. Results are NOT correct: each sending Core sends 200 messages and the destination Core receives less than the number of expected messages (ex: 1222 instead of 1600).
- Investigation:
   - When messages are "lost", no errors are reported (TX, RX, Garbage queues, etc.) and all the RX and RX queues on all Cores have the expected number of descriptors.
   - If I "space" the sending (ex: every 3rd 10ms time slot) and repeat Test 2, the result is correct: 1600 messages are received.
   - If I increase the number of RX descriptors and the RX socket Pending Packet Count (ex: 2047 instead of 1023) and repeat Test 2, the result is correct again: 1600 messages received.
   - The bitrate of Tests 1 and 2 is the same; the only difference is that Test 2 executes for a longer duration of time (twice as long). Message size (100 bytes, 1500 bytes) seems irrelevant.

4. Request
   - I tried different investigation paths and I am looking for new ideas.
   - I would appreciate any suggestions/help to solve this problem.

5. Please note that
- I read the SRIO User Guide, SRIO LLD, Silicon Errata, etc.
- I followed some of the related Forum discussions.
- I executed successfully most of the SRIO test programs provided by TI (pdk_C6678_1_1_2_5/packages/ti/drv/srio) including the test using 2 EVMs and a Break Out Card.

Thanks,
Sergiu

over 9 years ago

0 Ganapathi Dhandapani95 over 9 years ago

TI__Mastermind 28085 points

Hi,

For my understanding SRIO dropped packets due to the message queues is full. That only if you add the delay between the transaction, the test case is working properly.

Most of the TI SRIO examples use 4 separate receive queues because they know it takes time to update a queue when receiving a message. By having multiple queues to receive, additional SRIO messages can be accepted while other queues are updating.

Better to increase the TX and RX Queues on your test case and test it. Also change the SRIO Lane Rate to max freuency.

Thanks,

0 tscheck over 9 years ago

TI__Mastermind 23525 points

Sergiu,

Couple questions...

1) you mentioned garbage queues, so are you seeing anything showing up in the these queues from the TX side? Timeouts or Retries would be the most important ones? Is the response timer value and corresponding clk prescalar value set appropriately?

2) Are you checking the CC of the RX descriptor everytime? Do you see any timeout indicator on the RX side? If so the descriptor and corresponding buffer should be treated as incomplete/invalid.

3) When you set the TX queue priority, how are you doing this? We see this done incorrectly by many folks. In order to program the RIO_TX_QUEUE_SCH_INFO registers, the TXU has to be disabled, i.e. RIO_BLK3_EN. Most importantly, It should be noted the 'txChPriority' reflects the CDMA channel priority, where

Priority 3 is the lowest
Priority 0 is the highest
This is the inverse of the SRIO priority scheme, the translation is performed in hardware automatically

We've seen wierd things happen when type 11 messages are incorrectly sent out using SRIO priority 3 (CDMA priority 0).

The RapidIO type 11 packets will never be dropped. However when Retries or RX buffer starvation comes into play, things can slow down and timeout. Retries will only occur when the 16 RX segmentation contexts are used up. You may want to use your own self imposed RX context count so the transmit devices don't exceed the available number. If there is a valid RX context, but no free descriptors available, everything just stalls until one becomes available. You should really try to avoid both of these conditions.

Regards,

Travis

0 Sergiu Stambolian over 9 years ago in reply to Ganapathi Dhandapani95

Prodigy 185 points

Hi Ganapathi,

Thanks for your quick response.

My understanding of this "4 separate receive queues" configuration is this:
The 4 queues are specified in the Receive Flow definition in fields rx_fdq0_sz0_qnum, rx_fdq1_qnum, rx_fdq2_qnum and rx_fdq3_qnum.
Please confirm.

My initial configuration had 1 RX Free Queue specified in both rx_fdq0_sz0_qnum and rx_fdq1_qnum.
rx_fdq2_qnum and rx_fdq3_qnum were set to 0.
This is the configuration I used to execute the tests I reported in my initial posting.

With this understanding, yesterday I did the following tests, as suggested by you:
1. I specified the 1 RX Free Queue in all 4 fields above.
- The Result is the same: missed messages.
2. I specified 3 distinct RX Free Queues as follows:
- Queue 1 in rx_fdq0_sz0_qnum and rx_fdq1_qnum
- Queue 2 in rx_fdq2_qnum
- Queue 3 in rx_fdq3_qnum
- The Result is the same: missed messages (slightly worse compared to my initial report).
3. I specified 4 distinct RX Free Queues as follows:
- Queue 1 in rx_fdq0_sz0_qnum
- Queue 2 in rx_fdq1_qnum
- Queue 3 in rx_fdq2_qnum
- Queue 4 in rx_fdq3_qnum
- The Result is the same: missed messages (slightly worse compared to my initial report).

I also checked the TI SRIO Projects for the RX Queue configuration:
1. SRIO Drivers using Driver Managed Configuration:
- 1 RX Free Queue set in all 4 fields above.
2. SRIO Drivers using Application Managed Configuration:
- 1 RX Free Queue set in rx_fdq0_sz0_qnum and rx_fdq1_qnum
- rx_fdq2_qnum and rx_fdq3_qnum are set to 0.

Travis replied as well. I'll start investigating the areas he suggested.

Regards,
Sergiu

0 Sergiu Stambolian over 9 years ago in reply to tscheck

Prodigy 185 points

Hi Travis,Thanks for your reply: interesting points...
I'll reply using the same numbering scheme.1. Garbage Queues
---------------------------
- I don't see anything in these error queues. In the past I did see Timeout errors (i.e. descriptors in the Timeout Queue), but they were legitimate and they disappeared once I fixed the mistakes.
- As code base I used a TI Project (pdk_C6678_1_1_2_5/packages/ti/drv/srio/test/tput_benchmarking)
- Response timer: CSL_SRIO_SetPortLinkTimeoutCSR and CSL_SRIO_SetPortResponseTimeoutCSR: same as the TI Project above
- Prescaler: CSL_SRIO_SetLLMPortIPPrescalar: same as the TI Project above
2. Completion Code - Descriptor--------------------------------------------
- I monitor them indirectly via a SRIO Socket counter (ccRxErrorCounter)
- I don't see any error in the descriptor completion codes
3. RIO_TX_QUEUE_SCH_INTO Registers
--------------------------------------------------------- Block 3:
- I did NOT disable Block 3 when programming this register. I missed the information in Table 2-33, SPRUGW1 - SRIO User Guide.
- I made the fix (Disable Block 3, Program RIO_TX_QUEUE_SCH_INTO, Enable Block 3), but the result is the same.
- txChPriority: - I was not aware of this priority issue (SRIO priority 3).
- In fact all the TI example projects use this value for the Application Managed SRIO Drivers: see symbolic value Srio_PktDma_Prio_Low which maps into numeric value 3).
- I made the fix (I used Srio_PktDma_Prio_MediumHigh which maps into numeric value 1), but the result is the same.
Type 11 and TX/RX Segmentation Contexts
------------------------------------------------------------
- The test I am running should use at most 8 TX Segmentation Contexts at any point in time: Cores 0 to 7 on the sending chip send a message to one core (Core 0) on a different chip then each core waits for the TX descriptor to be returned before initiating the next send.
- Given that there are at most 8 concurrent TX Segmentation Contexts on the sending chip, I would expect at most 8 concurrent Rx Segmentation Contexts on the destination chip.
- It looks like some accumulation/congestion on the RX side which eases up if introduce more spacing between send operations or I increase the number of RX descriptors and/or Socket Pending Packets.
- Note that message size is irrelevant for the test: same result for messages 120 bytes long or 1500 bytes long. Isn't this observation taking the number of RX Contexts out of picture?
- The initial test had all 8 sending cores on the same C6678 chip. I replaced them with Core 0 on 8 different chips: the result is the same.
- I don't think bandwidth is an issue either: I am using 1% of it.Next------
- I will look into RXU programming and Block 4 enable/disable.
- Investigate Processing Element features.
- I mentioned in my initial posting that I did some changes to the SRIO Driver (please see details in my initial posting). Do you want to discuss them?
Thanks for your help,Sergiu

0 tscheck over 9 years ago in reply to Sergiu Stambolian

TI__Mastermind 23525 points

These are the relevant e2e threads that I’d like for you to go through. The third one has a link to the debug gel.

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/170264/752157.aspx#752157 - Software error recovery, SRIO for beginners

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/255031.aspx - Refclk port_ok discussions, VRANGE, MSYNC Serdes settings

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/264325/927003.aspx#927003 – Error status, and debug gel

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/267043/937560.aspx#937560- disable C66x port-writes

0 Sergiu Stambolian over 9 years ago in reply to tscheck

Prodigy 185 points

Hi Travis,

I think I fixed the problem. Here are the details:

1. I had a mistake in my test application and not all SRIO messages were "consumed" as soon as they were available in the Receive SRIO socket.

2. As a consequence messages were accumulating in the first matching Receive SRIO socket.

3. My application is using multiple Receive SRIO sockets: the reason is routine Srio_listCat() which has linear complexity in the size of the input. More sockets with shorter pending lists are better than one socket with one long pending list.

3. When a new SRIO messages is received, the SRIO Driver is looking only for the first matching Receive SRIO socket. If this socket is full, the message is dropped silently, even if more matching, non-full Receive SRIO sockets are available.

4. I changed the SRIO Driver to consider all matching Receive SRIO sockets when looking for a matching, non-full Receive SRIO socket.

5. With this change, all is well: no SRIO messages are dropped (i.e. several sockets are able to take care of the accumulation).

6. I also changed my test application to "consume" SRIO messages ASAP. This will reduce the chances of accumulation.

I would like to confirm with you the change I made to the SRIO Driver.

------------------------------------

While investigating this issue using the GEL script you pointed to, I noticed problems with Port 1 and Port 2:

1. Port Error Status CSR shows "input/output error-stopped" state and "Port Unititialized" state.

2. Initially these 2 ports are OK (I verify their state after the initial configuration).

3. I tried the recover suggested by the "Software Assisted Error Recovery" document (write 0x2003F044 into PLM Port n Control Symbol Transmit 1 register), but I couldn't recover.

Any idea about the error and the method to recover?

I collected the output of the GEL script execution according to the "Bug Report Template".

Please, note that Port 0 is always in "Port OK" state.

At the application level, I noticed a while ago that Port 1 and 2 are not reliable and I used Port 0 in my test application.

Thanks,

Sergiu

0 tscheck over 9 years ago in reply to Sergiu Stambolian

TI__Mastermind 23525 points

Sergiu,

Good to hear that you figured out missing packet issue.

As far as the port not being initialized, or in your case being initialized but then going out of initialization, I'd look at the VMIN setting...

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/196080/850001.aspx#850001

And if you every have to force reinitialization because of differences in link partner bringup, the Force_reinit bit is bit 26 of the RIO_PLM_SP(n)_IMP_SPEC_CTL register.

Also, can you make sure the following order of events is adhered to:

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/258414

Regards,
Travis

0 Sergiu Stambolian over 9 years ago in reply to tscheck

Prodigy 185 points

Hi Travis,
Setting the number of VALID code-groups required for synchronization (VMIN) to 15 as suggested in your reply/Forum posting solved the problem with SRIO Ports 1 and 2.
Thanks,Sergiu

Processors

Processors forum

SRIO drops silently Type 11 messages