How to improve C6678 EVM ethernet speed

FENG WU

Other Parts Discussed in Thread: SYSBIOS

Based on the demo image-processing project (IPC version), the test result on Ethernet speed is ~55Mbs/s for transmitting data from DSP to PC, ~220Mb/s for receiving data from PC.

Does anyone have experience to improve the Ethernet speed on 6678 evm board ?

Thanks a lot

over 12 years ago

0 FENG WU over 12 years ago

Prodigy 220 points

There should be someone use 1.0 Gb connection for EVM 6678, what's your Ethernet speed ?

Since my receiving speed is 4x the transmitting speed, there must be some way to improve the tx speed. Can anyone give some ideas where the problem should be, the configuration, the queue number, etc... ?

Thanks

0 FENG WU over 12 years ago in reply to FENG WU

Prodigy 220 points

Adding the Tx buffer size from 8192 to 64000 could double the speed to ~100Mb /s.

There is no effect by adding the Rx buffer size.

Also there is no effect by increasing the PKT_NUM_FRAMEBUF in pbm_data.c

Could those TI experts give any suggestion?

Thanks

0 Johannes over 12 years ago in reply to FENG WU

Mastermind 6240 points

Hi,

Any idea what could be causing this? I'm getting similar results.

Thanks

0 Ralf Goebel over 12 years ago in reply to Johannes

Genius 3715 points

Hi,

there are multiple problems with the NIMU driver for the C6678:

If you increase the TCP transmit buffer to 32 kB or more, the driver will start dropping packets, because the TX packet queue can hold only 16 packets.
See also:
http://e2e.ti.com/support/embedded/bios/f/355/p/253488/891759.aspx

The NIMU driver uses the CSL cache API, which doesn't implement a workaround for a silicon bug:
http://e2e.ti.com/support/embedded/bios/f/355/t/253237.aspx

There is also a potential problem with the prefetch buffer:
http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/214649/762847.aspx

After fixing these issues and some other improvements im getting about 920 Mbit/s for TCP transmission.

Ralf

0 Jose Carlos Cazarin Filho over 12 years ago in reply to Ralf Goebel

Prodigy 90 points

Hi Ralf, can you explain exactly how you you changed the driver to use the SYS/BIOS API instead of the CSL api?

Can I just replace the CSL_cache functions with the analogous sysbios functions?

Thanks

0 Ralf Goebel over 12 years ago in reply to Jose Carlos Cazarin Filho

Genius 3715 points

Hi,

first, you have to include:

#include <ti/sysbios/family/c66/Cache.h>

Then, you can replace the function call.
Example:

Cache_inv((void *)pHostDesc, sizeof(Cppi_HostDesc), Cache_Type_L2, 1);

Ralf

0 Jose Carlos Cazarin Filho over 12 years ago in reply to Ralf Goebel

Prodigy 90 points

Thanks Ralf.

I changed the cache API and did the prefetchtBufferInvalidate,recreated the correspondent library of the ndk and recompiled my .out,did you got a significant improvement by doing that?

Here's a snippet of my code, did I changed the cache API right?

/* Invalidate cache based on where the memory is */
if((uint32_t)(pHostDesc) & EMAC_EXTMEM ){
// CACHE_invL2((void *)pHostDesc, sizeof(Cppi_HostDesc), CACHE_WAIT);
Cache_inv((void *)pHostDesc, sizeof(Cppi_HostDesc), Cache_Type_L2,CACHE_WAIT);
}

if ((uint32_t)(pHostDesc) & EMAC_MSMCSRAM ) {
// CACHE_invL1d((void *)pHostDesc, sizeof(Cppi_HostDesc), CACHE_WAIT);
Cache_inv((void *)pHostDesc, sizeof(Cppi_HostDesc), Cache_Type_L1D,CACHE_WAIT);
}

if((uint32_t)(pHostDesc->buffPtr) & EMAC_EXTMEM ){

CSL_XMC_invalidatePrefetchBuffer();

//CACHE_invL2((void *) pHostDesc->buffPtr, pHostDesc->buffLen, CACHE_WAIT);
Cache_inv((void *) pHostDesc->buffPtr, pHostDesc->buffLen, Cache_Type_L2,CACHE_WAIT);
}

I got only 4 or 5 Mbits/s more by doing that. Can you give more tips on what you have done? Seems like you're the only person on the forum who got an decent speed of the Ethernet.

Thanks in advance.

0 Ralf Goebel over 12 years ago in reply to Jose Carlos Cazarin Filho

Genius 3715 points

Don't forget to change the other cache calls for 'accum_list_ptr' and replace CACHE_wbL2() too.

The next thing you can do is to decrease the interrupt delay in Setup_Rx():

accCfg.timerLoadCount = 5;

After this, I would try to increase the TCP transmit and receive buffer size (e.g. 64000). But first you have to make sure that the driver doesn't drop any outgoing packets if the transmit buffer is full:
http://e2e.ti.com/support/embedded/bios/f/355/p/253488/891759.aspx#891759

I also would change the ratio between RX and TX descriptors to get more room for outgoing packets. Simply put the following lines underneath the included header files:

#if NIMU_NUM_TX_DESC + NIMU_NUM_RX_DESC != 126
#error Sum of NIMU_NUM_TX_DESC and NIMU_NUM_RX_DESC must be same as defined in resource_mgr.h
#endif
#undef NIMU_NUM_TX_DESC
#undef NIMU_NUM_RX_DESC
#define NIMU_NUM_TX_DESC                48u /**< Maximum number of TX descriptors used by NIMU */
#define NIMU_NUM_RX_DESC                78u /**< Maximum number of RX descriptors used by NIMU */

Another modification I didn't mention yet is that I'm directly using PBM buffers as buffers for the RX descriptors. This removes an additional copy operation for each received packet in EmacRxPktISR(). But this change is more complicated.

Ralf

0 Jose Carlos Cazarin Filho over 12 years ago in reply to Ralf Goebel

Prodigy 90 points

I already had changed the cache calls for accum_list_ptr and CACHE_wbL2().

Its weird whats happening with the interrupt delay. If I change from 40 (default) to 5, the speed its almost the same. So I tried 2 ticks, and the speed increased by ~150Mbits/s. Then I tried 1 tick and the speed increased by ~180Mbits/s. Is that normal? Anyway, thanks for that tip, thats better now, but im hitting 480Mbits/s, still far from from what it should be.

I tried the other suggestions too, but the speed didn't changed very much.

About the PBM buffers, you changed all the buffer functions for those functions on the PBM packet? Or something like that?

Thanks again!

0 Ralf Goebel over 12 years ago in reply to Jose Carlos Cazarin Filho

Genius 3715 points

Can you verify the window size the NDK is using with Wireshark? It seems that it is still using a smaller window like 8kB, which would require more interrupts.

About the PBM Buffers:
What the driver normally does is that it allocates memory for the RX descriptors in Setup_Rx(). Then, if it receives a packet, it allocates a PBM Buffer and copies the descriptor buffer into the PBM buffer, which is then processed by the stack.
In my version, I'm directly allocating PBM buffers used by the RX descriptors. If a new packet arrives, I can directly give the PBM buffer to the stack. I only have to allocate a new PBM buffer and assign it to the RX descriptor.
I think this modification will only lower the CPU load a little bit.

Ralf

0 Jose Carlos Cazarin Filho over 12 years ago in reply to Ralf Goebel

Prodigy 90 points

I checked the windows size, and I think it is indeed 8kB I guess, see the figure below:

Is it possible to change this size to a more suitable value?

Thanks!

0 Ralf Goebel over 12 years ago in reply to Jose Carlos Cazarin Filho

Genius 3715 points

If you are using the legacy API to configure the NDK, you can set the RX and TX buffers like this:

    int rc = 64000;
    // TCP Transmit buffer size
    CfgAddEntry( hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTCPTXBUF,
                 CFG_ADDMODE_UNIQUE, sizeof(uint), (UINT8 *)&rc, 0 );

    // TCP Receive buffer size (copy mode)
    CfgAddEntry( hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTCPRXBUF,
                 CFG_ADDMODE_UNIQUE, sizeof(uint), (UINT8 *)&rc, 0 );

If you are using RTSC instead, it should work like this:

var Tcp = xdc.useModule('ti.ndk.config.Tcp');
Tcp.transmitBufSize = 64000;
Tcp.receiveBufSize = 64000;

Ralf

0 Jose Carlos Cazarin Filho over 12 years ago in reply to Ralf Goebel

Prodigy 90 points

Thats it, now i'm getting more than 900Mb/s.

First I tried your suggestion of 64000 and got 620Mb/s of speed, then I increased the size of the buffer till i got to more than 900Mb/s, now the buffer size is 150000 and I think I got maximum speed.

Thanks for the help!!

0 FENG WU over 12 years ago in reply to Jose Carlos Cazarin Filho

Prodigy 220 points

Hi, Raf

I did all your suggestions except the Rx buffer stuff. I got over 900 Mb/s too. The window size is playing a big role here both in DSP and PC sides.

Appreciate your replies.

0 Ondrej Husnik over 12 years ago in reply to FENG WU

Prodigy 145 points

Hallo experts,

I correct my previous statement.

did anyone take a look at load of core with IP stack? I found out that when I receive data (approx. 170MBit/s) via Eth, core0 load (measured by Load module) reaches almost 70% 40% (even so it seems to me too high) . Core0 receives only data from eth and send it to core1 via MessageQ. I measured some times of execution and it looks that core spends most of time in parts of code related to IP stack (recv etc.) MessageQ costs much more CPU time then I thought. Could you tell me if your troubles, solved above, have something to do with core load or are matter of other IPstack problems? If you have any experience with excessive load I appreciate any help.

Thanks Ondrej

Processors

Processors forum

How to improve C6678 EVM ethernet speed