This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Increase SRIO performance (C6678)

I'm doing some test  for my SRIO connection between 2 C6678 DSP

I've configured the peripheral for the 5 Gbps speed. That means considering the 8/10b encode an effective speed of 4 Gbps.
Using 4 Lane mode the theorical speed is 16 Gbps = 2GB/s

In my test i've measured an effective speed of 1,568 GB/s

Is there any way to improve this speed?

  • Hi Mark,

    Take a look at the Throughput Performance Guide for C66x KeyStone Devices (Rev. A). This will give you the maximum throughput measured in an ideal test environment. This will give you a benchmark to compare with your measurements.

    Regards, Bill

  • Thanks Mark.

    I've seen that using DirectIO optimal performance should be about 14 Gb/s while in my board i'm measuring 12,548 Gb/s.

    I would like to improve this performance, so, is there any way to optimize it?

  • Hi Mark,

    What payload size are you using for you testing?

    Regards, Bill

  • Hi Bill.

    I've done several tests searching for the optimal size of the payload, also according to the application that will have to use the SRIO peripheral.

    Actually i've tested Payload of 8KB, 16 KB, 32 KB, 64 KB, 128KB and 256KB, trasmitting from L2 of first DSP to L2 of second DSP and the result is always the same: 12,548 Gb/s.
    (However, most probably the firmware will use the 16KB payload size)

  • Hi Mark,

    I'm not surprised that you're seeing the same data rate for all the payload sizes you've tested since they all exceed the maximum SRIO message payload size. All your transfers will be segmented into the same number of SRIO messages with an equal amount of overhead and you'll be in the flat part of the throughput curve. There might be some optimizations to your code that could increase your throughput. I'll nudge the software experts to see if they can help. 

    Regards, Bill

  • Hi Mark,

    I have tested the SRIO example project (SRIO_TputBenchmarkingTestProject) in loopback mode and compare the result with throughput document value mostly both results are same. I have attached the log file for your reference.

    [C66xx_0] ********************************
    [C66xx_0] *********** CONSUMER ***********
    [C66xx_0] ********************************
    [C66xx_0] WARNING: Please ensure that the CONSUMER is executing before running the PRODUCER!!
    [C66xx_0] Debug: Waiting for module reset...
    [C66xx_0] Debug: Waiting for module local reset...
    [C66xx_0] Debug: Waiting for SRIO ports to be operational...  
    [C66xx_0] Debug: SRIO port 0 is operational.
    [C66xx_0] Debug:   Lanes status shows lanes formed as one 4x port
    [C66xx_0] Debug: AppConfig Tx Queue: 0x2a0 Flow Id: 0
    [C66xx_0] Debug: SRIO Driver Instance 0x@00861840 has been created
    [C66xx_0] Debug: Running test in polled mode.
    [C66xx_0] Debug: SRIO Driver handle 0x861840.
    [C66xx_0] 
    [C66xx_0] 
    [C66xx_1] ********************************
    [C66xx_1] *********** PRODUCER ***********
    [C66xx_1] ********************************
    [C66xx_1] WARNING: Please ensure that the CONSUMER is executing before running the PRODUCER!!
    [C66xx_1] Debug(Core 1): Waiting for SRIO to be initialized.
    [C66xx_1] Debug: AppConfig Tx Queue: 0x2a1 Flow Id: 1
    [C66xx_1] Debug: SRIO Driver Instance 0x@00861750 has been created
    [C66xx_1] Debug: Running test in polled mode.
    [C66xx_1] Debug: SRIO Driver handle 0x861750.
    [C66xx_1] 
    [C66xx_1] 
    [C66xx_1] Latency: (DIO_NW, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	4	100	683	684	736
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	8	100	683	683	695
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	100	698	698	710
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	32	100	719	728	749
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	64	100	791	791	802
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	128	100	899	911	917
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	256	100	1150	1151	1151
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	512	100	1311	1313	1313
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	1024	100	1601	1619	1636
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	2048	100	2213	2222	2231
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	4096	100	3437	3437	3453
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	8192	100	5867	5867	5883
    [C66xx_1] 
    [C66xx_0] Throughput: (RX side, DIO_NW, 5.000GBaud, 4X, tab delimited)
    [C66xx_0] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts 	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	4	0	85.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	8	0	170.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	16	0	341.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	32	0	682.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	64	0	1365.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	128	0	2730.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	256	0	5461.33	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	512	0	10922.67	2666666.75	12600000	No	375	21	354	0	4.73
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	1024	0	13191.63	1610306.00	7800000	No	621	21	600	0	4.84
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	2048	0	13223.57	807102.50	4000000	No	1239	21	1218	0	4.96
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	4096	0	13277.15	405186.38	2000000	No	2468	21	2447	0	4.94
    [C66xx_0] 0	4	5.000	C-I-C	DIO_NW	16	8192	0	13323.03	203293.36	1200000	No	4919	21	4898	0	5.90
    [C66xx_0] 
    [C66xx_1] Throughput: (TX side, DIO_NW, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts 	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	4	0	85.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	8	0	170.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	16	0	341.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	32	0	682.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	64	0	1365.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	128	0	2730.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	256	0	5461.33	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	512	0	10922.67	2666666.75	12600000	No	375	319	41	15	4.73
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	1024	215	13191.63	1610306.00	7800000	No	621	317	289	15	4.84
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	2048	831	13223.57	807102.50	4000000	No	1239	319	905	15	4.96
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	4096	2061	13271.77	405022.28	2000000	No	2469	317	2137	15	4.94
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NW	16	8192	4513	13320.33	203252.03	1200000	No	4920	318	4587	15	5.90
    [C66xx_1] 
    [C66xx_1] Latency: (DIO_NR, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	4	100	893	894	983
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	8	100	893	895	983
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	16	100	893	962	997
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	32	100	893	970	1032
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	64	100	893	1034	1068
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	128	100	893	1185	1214
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	256	100	893	1414	1455
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	512	100	893	1545	1632
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	1024	100	893	1866	1879
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	2048	100	893	2470	2505
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	4096	100	893	3758	3793
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	8192	100	893	6597	6599
    [C66xx_1] 
    [C66xx_1] Throughput: (TX side, DIO_NR, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	4	0	47.20	1474926.25	7200000	No	678	324	339	15	4.88
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	8	0	94.40	1474926.25	7200000	No	678	324	339	15	4.88
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	16	0	188.51	1472754.00	7200000	No	679	324	340	15	4.89
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	32	0	353.59	1381215.50	6800000	No	724	327	382	15	4.93
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	64	0	656.41	1282051.25	6400000	No	780	324	441	15	5.00
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	128	0	1137.78	1111111.13	5600000	No	900	324	561	15	5.04
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	256	0	1771.63	865051.88	4200000	No	1156	325	816	15	4.86
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	512	0	3133.89	765110.94	3800000	No	1307	324	968	15	4.97
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	1024	0	5072.45	619195.06	3200000	No	1615	328	1272	15	5.17
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	2048	0	7383.51	450653.44	2400000	No	2219	323	1881	15	5.33
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	4096	0	9389.11	286532.94	1600000	No	3490	324	3151	15	5.58
    [C66xx_1] 1	4	5.000	C-I-C	DIO_NR	28	8192	0	10400.89	158704.97	800000	No	6301	322	5964	15	5.04
    [C66xx_1] 
    [C66xx_1] Latency: (Type-11, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	PktSize	NumPkts	MnLCycs	AgLCycs	MxLCycs
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	16	100	1424	1500	1612
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	32	100	1458	1549	1581
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	64	100	1607	1632	1662
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	128	100	1731	1801	1836
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	256	100	2055	2099	2105
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	512	100	2112	2165	2226
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	1024	100	2435	2499	2537
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	2048	100	3191	3207	3230
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	4096	100	4512	4530	4646
    [C66xx_1] 
    [C66xx_0] Throughput: (RX side, Type-11, 5.000GBaud, 4X, tab delimited)
    [C66xx_0] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	16	0	175.10	1367989.00	6800000	No	731	589	32	110	4.98
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	32	0	346.88	1355013.50	6800000	No	738	596	32	110	5.02
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	64	0	696.60	1360544.25	6800000	No	735	593	32	110	5.00
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	128	0	1398.91	1366120.25	6800000	No	732	589	33	110	4.98
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	256	0	2801.64	1367989.00	6800000	No	731	589	32	110	4.98
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	512	0	5565.22	1358695.63	6800000	No	736	591	33	112	5.00
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	1024	0	11055.33	1349527.63	6800000	No	741	592	39	110	5.04
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	2048	0	11636.36	710227.25	3600000	No	1408	598	700	110	5.07
    [C66xx_0] 0	4	5.000	C-I-C	Type-11	24	4096	0	11640.50	355239.78	1800000	No	2815	594	2110	111	5.07
    [C66xx_0] 
    [C66xx_1] Throughput: (TX side, Type-11, 5.000GBaud, 4X, tab delimited)
    [C66xx_1] Core	Lanes	Speed	Conn	MsgType	OHBytes	PktSize	Pacing	Thruput	PktsSec.	NumPkts	PktLoss	AgPCycs	AgLCycs	AgICycs	AgOCycs	Seconds
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	16	214	175.10	1367989.00	6800000	No	731	140	574	17	4.98
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	32	214	346.88	1355013.50	6800000	No	738	140	581	17	5.02
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	64	230	696.60	1360544.25	6800000	No	735	140	578	17	5.00
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	128	215	1398.91	1366120.25	6800000	No	732	140	575	17	4.98
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	256	216	2801.64	1367989.00	6800000	No	731	140	574	17	4.98
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	512	228	5565.22	1358695.63	6800000	No	736	140	579	17	5.00
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	1024	233	11055.33	1349527.63	6800000	No	741	144	580	17	5.04
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	2048	0	11644.63	710732.06	3600000	No	1407	145	1245	17	5.07
    [C66xx_1] 1	4	5.000	C-I-C	Type-11	24	4096	0	11640.50	355239.78	1800000	No	2815	144	2654	17	5.07
    [C66xx_1] 
    

    I think the existing example project is most efficient code for SRIO performance test. Please use the MCSDK SRIO example code for your testing and let me know the result.

    Thanks,