This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66 SRIO full duplex bandwidth?

Other Parts Discussed in Thread: TMS320C6678

Greetings!

Thought I'd run a general SRIO question out there to the experts at large.  First, my specifics for background:

- custom board with TMS320C6678 connected to an FPGA

- simple point to point link, no other devices connected

- 4 SRIO 3.125 Gbps channels. 

- DSP is master, FPGA is slave.  We only use NREAD and NWRITE

- payload size is on the order of 8K Bytes both directions

- the TX and RX data are independent, there are no system level data concurrency requirements

- the links / data passing is all working fine, it's only the bandwidth that is an issue

What I am seeing is that the bandwidth is fine in either direction when the other direction is in a 
gap, but when the RX and TX activity line up, I see the throughput go down.  For example, RX /
NREAD data causes large gaps in the TX / NWRITE data (looking at FPGA signals on a logic
analyzer).  We tried increasing the priority of the NWRITEs, but this made no difference.  I suspect
that the NWRITE packets are getting stalled in the FPGA physical layer FIFO (perhaps while a 
bunch of NREADs wait to be serviced via the transport layer, etc), so I am looking into that. Note
that the source / sink in the FPGA can handle the full channel bandwidth in both directions, so
my custom FPGA logic should not be the cause.  The FPGA SRIO core on the other hand could be.

But, what I am now curious about is whether I am seeking a realistic goal?

I have looked at SPRABK5A and the SRIO throughput it shows makes sense, but it only shows
RX or TX alone. 

Does anyone know of any real world measurements of throughput during SRIO full duplex with a
C66x DSP as master?  I'd just like to be able to rule out a fundamental issue with the DSP.

Then of course, if examples exist where the throughput on both RX and TX are good in full duplex,  
then the next question is does anyone have suggestions as to what is the best way to architect  
the system / structure the data requests so as to not lose bandwidth? 

Do I need to manage the data requests in the DSP so that NREADs don't block NWRITEs, or
vice versa?  This would be unfortunate, since the RX and TX data streams are fully independent,
and there is no system requirement to have one side's behavior depend upon the other.  Am
considering changing the RX side of the DSP to be a slave to the FPGA, but that's a fair amount of
design activity I'd prefer to avoid if I don't have to. 

Any insight would be mucho appreciated. 


Dale

  • Hi Dale,

    Please take a look at below thread:

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/322606/1154810.aspx#1154810

    I have tested the SRIO example project (SRIO_TputBenchmarkingTestProject) in loopback mode and compare the result with throughput document value mostly both results are same. I have attached the log files for your reference.

    [C66xx_0] ********************************
    [C66xx_0] *********** CONSUMER ***********
    [C66xx_0] ********************************
    [C66xx_0] WARNING: Please ensure that the CONSUMER is executing before running the PRODUCER!!
    [C66xx_0] Debug: Waiting for module reset...
    [C66xx_0] Debug: Waiting for module local reset...
    [C66xx_0] Debug: Waiting for SRIO ports to be operational...  
    [C66xx_0] Debug: SRIO port 0 is operational.
    [C66xx_0] Debug:   Lanes status shows lanes formed as one 4x port
    [C66xx_0] Debug: AppConfig Tx Queue: 0x2a0 Flow Id: 0
    [C66xx_0] Debug: SRIO Driver Instance 0x@00861840 has been created
    [C66xx_0] Debug: Running test in polled mode.
    [C66xx_0] Debug: SRIO Driver handle 0x861840.
    [C66xx_0] 
    [C66xx_0] 
    [C66xx_1] ********************************
    [C66xx_1] *********** PRODUCER ***********
    [C66xx_1] ********************************
    [C66xx_1] WARNING: Please ensure that the CONSUMER is executing before running the PRODUCER!!
    [C66xx_1] Debug(Core 1): Waiting for SRIO to be initialized.
    [C66xx_1] Debug: AppConfig Tx Queue: 0x2a1 Flow Id: 1
    [C66xx_1] Debug: SRIO Driver Instance 0x@00861750 has been created
    [C66xx_1] Debug: Running test in polled mode.
    [C66xx_1] Debug: SRIO Driver handle 0x861750.
    [C66xx_1] 
    [C66xx_1] 
    [C66xx_1] Latency: (DIO_NW, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	4	100	683	684	736
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	8	100	683	683	695
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	100	698	698	710
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	32	100	719	728	749
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	64	100	791	791	802
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	128	100	899	911	917
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	256	100	1150	1151	1151
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	512	100	1311	1313	1313
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	1024	100	1601	1619	1636
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	2048	100	2213	2222	2231
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	4096	100	3437	3437	3453
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	8192	100	5867	5867	5883
    [C66xx_1] 
    [C66xx_0] Throughput: (RX side, DIO_NW, 5.000GBaud, 4X, tab delimited)
    [C66xx_0] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts 	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	4	0	85.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	8	0	170.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	16	0	341.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	32	0	682.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	64	0	1365.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	128	0	2730.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	256	0	5461.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	512	0	10922.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	1024	0	13191.63	1610306.00	7800000	No	621	21	600	0	4.84
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	2048	0	13223.57	807102.50	4000000	No	1239	21	1218	0	4.96
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	4096	0	13277.15	405186.38	2000000	No	2468	21	2447	0	4.94
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	8192	0	13323.03	203293.36	1200000	No	4919	21	4898	0	5.90
    [C66xx_0] 
    [C66xx_1] Throughput: (TX side, DIO_NW, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts 	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	4	0	85.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	8	0	170.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	16	0	341.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	32	0	682.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	64	0	1365.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	128	0	2730.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	256	0	5461.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	512	0	10922.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	1024	215	13191.63	1610306.00	7800000	No	621	317	289	15	4.84
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	2048	831	13223.57	807102.50	4000000	No	1239	319	905	15	4.96
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	4096	2061	13271.77	405022.28	2000000	No	2469	317	2137	15	4.94
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	8192	4513	13320.33	203252.03	1200000	No	4920	318	4587	15	5.90
    [C66xx_1] 
    [C66xx_1] Latency: (DIO_NR, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	4	100	893	894	983
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	8	100	893	895	983
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	16	100	893	962	997
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	32	100	893	970	1032
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	64	100	893	1034	1068
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	128	100	893	1185	1214
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	256	100	893	1414	1455
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	512	100	893	1545	1632
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	1024	100	893	1866	1879
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	2048	100	893	2470	2505
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	4096	100	893	3758	3793
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	8192	100	893	6597	6599
    [C66xx_1] 
    [C66xx_1] Throughput: (TX side, DIO_NR, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	4	0	47.20	1474926.25	7200000	No	678	324	339	15	4.88
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	8	0	94.40	1474926.25	7200000	No	678	324	339	15	4.88
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	16	0	188.51	1472754.00	7200000	No	679	324	340	15	4.89
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	32	0	353.59	1381215.50	6800000	No	724	327	382	15	4.93
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	64	0	656.41	1282051.25	6400000	No	780	324	441	15	5.00
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	128	0	1137.78	1111111.13	5600000	No	900	324	561	15	5.04
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	256	0	1771.63	865051.88	4200000	No	1156	325	816	15	4.86
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	512	0	3133.89	765110.94	3800000	No	1307	324	968	15	4.97
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	1024	0	5072.45	619195.06	3200000	No	1615	328	1272	15	5.17
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	2048	0	7383.51	450653.44	2400000	No	2219	323	1881	15	5.33
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	4096	0	9389.11	286532.94	1600000	No	3490	324	3151	15	5.58
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	8192	0	10400.89	158704.97	800000	No	6301	322	5964	15	5.04
    [C66xx_1] 
    [C66xx_1] Latency: (Type-11, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	16	100	1424	1500	1612
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	32	100	1458	1549	1581
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	64	100	1607	1632	1662
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	128	100	1731	1801	1836
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	256	100	2055	2099	2105
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	512	100	2112	2165	2226
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	1024	100	2435	2499	2537
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	2048	100	3191	3207	3230
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	4096	100	4512	4530	4646
    [C66xx_1] 
    [C66xx_0] Throughput: (RX side, Type-11, 5.000GBaud, 4X, tab delimited)
    [C66xx_0] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	16	0	175.10	1367989.00	6800000	No	731	589	32	110	4.98
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	32	0	346.88	1355013.50	6800000	No	738	596	32	110	5.02
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	64	0	696.60	1360544.25	6800000	No	735	593	32	110	5.00
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	128	0	1398.91	1366120.25	6800000	No	732	589	33	110	4.98
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	256	0	2801.64	1367989.00	6800000	No	731	589	32	110	4.98
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	512	0	5565.22	1358695.63	6800000	No	736	591	33	112	5.00
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	1024	0	11055.33	1349527.63	6800000	No	741	592	39	110	5.04
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	2048	0	11636.36	710227.25	3600000	No	1408	598	700	110	5.07
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	4096	0	11640.50	355239.78	1800000	No	2815	594	2110	111	5.07
    [C66xx_0] 
    [C66xx_1] Throughput: (TX side, Type-11, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	16	214	175.10	1367989.00	6800000	No	731	140	574	17	4.98
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	32	214	346.88	1355013.50	6800000	No	738	140	581	17	5.02
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	64	230	696.60	1360544.25	6800000	No	735	140	578	17	5.00
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	128	215	1398.91	1366120.25	6800000	No	732	140	575	17	4.98
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	256	216	2801.64	1367989.00	6800000	No	731	140	574	17	4.98
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	512	228	5565.22	1358695.63	6800000	No	736	140	579	17	5.00
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	1024	233	11055.33	1349527.63	6800000	No	741	144	580	17	5.04
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	2048	0	11644.63	710732.06	3600000	No	1407	145	1245	17	5.07
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	4096	0	11640.50	355239.78	1800000	No	2815	144	2654	17	5.07
    [C66xx_1] 
    
      
    [C66xx_0] ********************************
    *********** CONSUMER ***********
    ********************************
    WARNING: Please ensure that the CONSUMER is executing before running the PRODUCER!!
    Debug: Waiting for module reset...
    Debug: Waiting for module local reset...
    Debug: Waiting for SRIO ports to be operational...  
    Debug: SRIO port 0 is operational.
    Debug:   Lanes status shows lanes formed as one 4x port
    Debug: AppConfig Tx Queue: 0x2a0 Flow Id: 0
    Debug: SRIO Driver Instance 0x@00861780 has been created
    Debug: Running test in polled mode.
    Debug: SRIO Driver handle 0x861780.
    
    
    [C66xx_1] ********************************
    *********** PRODUCER ***********
    ********************************
    WARNING: Please ensure that the CONSUMER is executing before running the PRODUCER!!
    Debug(Core 1): Waiting for SRIO to be initialized.
    Debug: AppConfig Tx Queue: 0x2a1 Flow Id: 1
    Debug: SRIO Driver Instance 0x@00861690 has been created
    Debug: Running test in polled mode.
    Debug: SRIO Driver handle 0x861690.
    
    
    Latency: (DIO_NW, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    1	4	5.000	C-I-C	DIO_NW	4	100	683	699	735
    1	4	5.000	C-I-C	DIO_NW	8	100	683	698	702
    1	4	5.000	C-I-C	DIO_NW	16	100	698	709	717
    1	4	5.000	C-I-C	DIO_NW	32	100	737	737	755
    1	4	5.000	C-I-C	DIO_NW	64	100	791	808	817
    1	4	5.000	C-I-C	DIO_NW	128	100	917	918	935
    1	4	5.000	C-I-C	DIO_NW	256	100	1151	1155	1169
    1	4	5.000	C-I-C	DIO_NW	512	100	1313	1313	1324
    1	4	5.000	C-I-C	DIO_NW	1024	100	1601	1619	1619
    1	4	5.000	C-I-C	DIO_NW	2048	100	2213	2213	2225
    1	4	5.000	C-I-C	DIO_NW	4096	100	3401	3412	3419
    1	4	5.000	C-I-C	DIO_NW	8192	100	5795	5795	5813
    
    [C66xx_0] Throughput: (RX side, DIO_NW, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts 	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    0	4	5.000	C-I-C	DIO_NW	16	4	0	85.33	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	8	0	170.67	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	16	0	341.33	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	32	0	682.67	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	64	0	1365.33	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	128	0	2730.67	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	256	0	5461.33	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	512	0	10922.67	2666666.75	12600000	No	375	21	354	0	4.73
    0	4	5.000	C-I-C	DIO_NW	16	1024	0	13191.63	1610306.00	7800000	No	621	21	600	0	4.84
    0	4	5.000	C-I-C	DIO_NW	16	2048	0	13223.57	807102.50	4000000	No	1239	21	1218	0	4.96
    0	4	5.000	C-I-C	DIO_NW	16	4096	0	13277.15	405186.38	2000000	No	2468	21	2447	0	4.94
    0	4	5.000	C-I-C	DIO_NW	16	8192	0	13323.03	203293.36	1200000	No	4919	21	4898	0	5.90
    
    [C66xx_1] Throughput: (TX side, DIO_NW, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts 	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    1	4	5.000	C-I-C	DIO_NW	16	4	0	85.33	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	8	0	170.67	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	16	0	341.33	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	32	0	682.67	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	64	0	1365.33	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	128	0	2730.67	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	256	0	5461.33	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	512	0	10922.67	2666666.75	12600000	No	375	319	41	15	4.73
    1	4	5.000	C-I-C	DIO_NW	16	1024	215	13191.63	1610306.00	7800000	No	621	317	289	15	4.84
    1	4	5.000	C-I-C	DIO_NW	16	2048	831	13223.57	807102.50	4000000	No	1239	319	905	15	4.96
    1	4	5.000	C-I-C	DIO_NW	16	4096	2061	13271.77	405022.28	2000000	No	2469	317	2137	15	4.94
    1	4	5.000	C-I-C	DIO_NW	16	8192	4513	13320.33	203252.03	1200000	No	4920	318	4587	15	5.90
    
    Latency: (DIO_NR, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    1	4	5.000	C-I-C	DIO_NR	4	100	924	961	997
    1	4	5.000	C-I-C	DIO_NR	8	100	924	959	997
    1	4	5.000	C-I-C	DIO_NR	16	100	924	963	997
    1	4	5.000	C-I-C	DIO_NR	32	100	924	990	1000
    1	4	5.000	C-I-C	DIO_NR	64	100	924	1049	1081
    1	4	5.000	C-I-C	DIO_NR	128	100	924	1177	1214
    1	4	5.000	C-I-C	DIO_NR	256	100	924	1414	1488
    1	4	5.000	C-I-C	DIO_NR	512	100	924	1547	1587
    1	4	5.000	C-I-C	DIO_NR	1024	100	924	1867	1900
    1	4	5.000	C-I-C	DIO_NR	2048	100	924	2466	2503
    1	4	5.000	C-I-C	DIO_NR	4096	100	924	3702	3762
    1	4	5.000	C-I-C	DIO_NR	8192	100	924	6456	6496
    
    Throughput: (TX side, DIO_NR, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    1	4	5.000	C-I-C	DIO_NR	28	4	0	46.11	1440922.25	7200000	No	694	323	356	15	5.00
    1	4	5.000	C-I-C	DIO_NR	28	8	0	92.22	1440922.25	7200000	No	694	323	356	15	5.00
    1	4	5.000	C-I-C	DIO_NR	28	16	0	180.79	1412429.38	6800000	No	708	325	368	15	4.82
    1	4	5.000	C-I-C	DIO_NR	28	32	0	344.09	1344086.00	6800000	No	744	325	404	15	5.06
    1	4	5.000	C-I-C	DIO_NR	28	64	0	635.24	1240694.75	6000000	No	806	328	463	15	4.84
    1	4	5.000	C-I-C	DIO_NR	28	128	0	1102.26	1076426.25	5400000	No	929	325	589	15	5.02
    1	4	5.000	C-I-C	DIO_NR	28	256	0	1750.43	854700.88	4200000	No	1170	328	827	15	4.92
    1	4	5.000	C-I-C	DIO_NR	28	512	0	3075.07	750750.75	3800000	No	1332	330	987	15	5.06
    1	4	5.000	C-I-C	DIO_NR	28	1024	0	5056.79	617283.94	3200000	No	1620	326	1279	15	5.19
    1	4	5.000	C-I-C	DIO_NR	28	2048	0	7400.18	451671.19	2400000	No	2214	328	1871	15	5.31
    1	4	5.000	C-I-C	DIO_NR	28	4096	0	9495.22	289771.09	1600000	No	3451	324	3112	15	5.52
    1	4	5.000	C-I-C	DIO_NR	28	8192	0	10532.95	160720.03	800000	No	6222	325	5882	15	4.98
    
    Latency: (Type-11, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    1	4	5.000	C-I-C	Type-11	16	100	1451	1473	1589
    1	4	5.000	C-I-C	Type-11	32	100	1455	1509	1581
    1	4	5.000	C-I-C	Type-11	64	100	1539	1599	1663
    1	4	5.000	C-I-C	Type-11	128	100	1763	1796	1839
    1	4	5.000	C-I-C	Type-11	256	100	2051	2094	2181
    1	4	5.000	C-I-C	Type-11	512	100	2106	2158	2183
    1	4	5.000	C-I-C	Type-11	1024	100	2352	2446	2499
    1	4	5.000	C-I-C	Type-11	2048	100	3110	3113	3222
    1	4	5.000	C-I-C	Type-11	4096	100	4465	4514	4530
    
    [C66xx_0] Throughput: (RX side, Type-11, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    0	4	5.000	C-I-C	Type-11	24	16	0	172.04	1344086.00	6800000	No	744	601	32	111	5.06
    0	4	5.000	C-I-C	Type-11	24	32	0	342.70	1338688.13	6800000	No	747	604	32	111	5.08
    0	4	5.000	C-I-C	Type-11	24	64	0	693.77	1355013.50	6800000	No	738	595	32	111	5.02
    0	4	5.000	C-I-C	Type-11	24	128	0	1395.10	1362397.88	6800000	No	734	591	32	111	5.00
    0	4	5.000	C-I-C	Type-11	24	256	0	2797.81	1366120.25	6800000	No	732	588	33	111	4.98
    0	4	5.000	C-I-C	Type-11	24	512	0	5565.22	1358695.63	6800000	No	736	593	32	111	5.01
    0	4	5.000	C-I-C	Type-11	24	1024	0	11025.57	1345895.00	6800000	No	743	590	42	111	5.05
    0	4	5.000	C-I-C	Type-11	24	2048	0	11829.60	722021.69	3600000	No	1385	598	676	111	4.99
    0	4	5.000	C-I-C	Type-11	24	4096	0	11838.15	361271.69	1800000	No	2768	596	2061	111	4.98
    
    [C66xx_1] Throughput: (TX side, Type-11, 5.000GBaud, 4X, tab delimited)
    Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    1	4	5.000	C-I-C	Type-11	24	16	228	172.04	1344086.00	6800000	No	744	141	586	17	5.06
    1	4	5.000	C-I-C	Type-11	24	32	228	342.70	1338688.13	6800000	No	747	141	589	17	5.08
    1	4	5.000	C-I-C	Type-11	24	64	226	693.77	1355013.50	6800000	No	738	141	580	17	5.02
    1	4	5.000	C-I-C	Type-11	24	128	214	1395.10	1362397.88	6800000	No	734	141	576	17	5.00
    1	4	5.000	C-I-C	Type-11	24	256	214	2797.81	1366120.25	6800000	No	732	141	574	17	4.98
    1	4	5.000	C-I-C	Type-11	24	512	228	5565.22	1358695.63	6800000	No	736	141	578	17	5.01
    1	4	5.000	C-I-C	Type-11	24	1024	228	11025.57	1345895.00	6800000	No	743	145	581	17	5.05
    1	4	5.000	C-I-C	Type-11	24	2048	0	11829.60	722021.69	3600000	No	1385	147	1221	17	4.99
    1	4	5.000	C-I-C	Type-11	24	4096	0	11838.15	361271.69	1800000	No	2768	146	2605	17	4.98
    
    

    Thanks,

  • You can change the DDR configuration.

    Look at table 4-1  of the SPRUGV8D—April 2014 KeyStone Architecture DDR3 Memory Controller User Guide.  You can see all the registers that effect the priorities of accessing the DDR. In particular look at line 120h section 4.30 for the read/write threshold.  This may help in your case

     

    Ran

  • HI Ganapathi,

    Thanks for the fast response.  The throughput numbers you obtained with the benchmark are encouraging.  I'll look further into it / try the DSP in loopback to see what kind of throughput I can get.

    Cheers,

    Dale

  • Hi Ran,

    Not sure where our bottleneck is -> could very well be the DDR.  Will check it out.  Thanks.


    Cheers,

    Dale

  • Hi again,

    I'll mark this as closed, as I need to dig into the suggestions as given to look for
    insights to the bottleneck.  We have a growing suspicion that how we access DDR
    is the problem. 

    Thanks for the help!

    Dale