This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IP packets transmitted with bad source mac and/or destination mac or skipped bytes.

This issue is having a severe impact at customer sites. Our DM8168-based products are installed at secure installations utilizing ethernet mac port security. The switch port shuts down if a packet with an unknown source mac address is transmitted. This is occurring fairly regularly at a number of sites with different systems in the middle of product use.

I created a private lan between two systems with a pc also connected running wireshark to be able to filter on any unknown mac. In a video call with udp packets there are multiple instances a day where the destination mac address in a packet is bad. The source mac is ok and other parts of the packets are ok. Sometimes other parts of the packets are bad like the lenght byte or wireshark detects an FCS error. More rare is the occurrence of the bad source mac. It might be days to a week apart, and in that case the whole packet looks bad, mostly zero’s, but some bytes not zero including the source and destination mac bytes. I can't detect a pattern with any of the bad data.

I added code to the davinci_emac driver in emac_dev_xmit right before the call to cpdma_chan_submit to test the first 12 bytes of data for the correct source and destination mac and it does not see a problem. I also tested the address after the xmit_complete and the data still looks ok.

This looks like a problem with cpdma or lower.

I also ran a test with iperf and tcp and see another problem. It looks like a bunch of bytes at the start of the packet get skipped and packet payload data is at the start where the source and dest mac should be.

This looks similar to these issues:
e2e.ti.com/.../389291
e2e.ti.com/.../388303

Our products have different ethernet configurations. One has a directly connected Realtek phy and the other has a direct GMII phyless connection to an embedded switch. The problem happens on both systems. Link speed and duplex do not matter. I think the related issues were seen on an evm. The kernel that we are using is in sync with current arago linux-omap kernel.

We have tuned tcp_rmem and tcp_wmem for our application. Using the defaults does not change the behavior.
cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 1208320
cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 1208320

  • Hello,

    The similar thread/issue was recently update with the below post. Can you check it if will be in help to you?

    e2e.ti.com/.../1494217

    Regards,
    Pavel

  • Have you also check the below wiki page?

    processors.wiki.ti.com/.../TI81XX_PSP_04.04.00.02_Feature_Performance_Guide

    Increase Network stack's s/w queue size to eliminate network stack packet drops using below commands

    sysctl -w net.core.rmem_max=33554432
    sysctl -w net.core.wmem_max=33554432
    sysctl -w net.core.rmem_default=33554432
    sysctl -w net.core.wmem_default=33554432
    sysctl -w net.ipv4.udp_mem='4096 87380 33554432'
    sysctl -w net.ipv4.route.flush=1

    Regards,
    Pavel
  • It looks like flushing the cache might be the fix. The test has been running for over 4 days without a failure. Normally there are 3-5 failures per day although I have seen it go a day or two without a failure. I wasn't quite sure what kernel api to use to do the flush. I looked around the kernel code and did not see much use of any of the various flush calls that are available. I used v7_flush_kern_dcache_area in cache-v7.S. Of course there is some compute overhead flushing every packet. I may just flush the first cache line to make sure the mac's are good and let the application deal with data loss when it occurs.

    /*
    * v7_flush_kern_dcache_area(void *addr, size_t size)
    *
    * Ensure that the data held in the page kaddr is written back
    * to the page in question.
    *
    * - addr - kernel address
    * - size - region size
    */
    ENTRY(v7_flush_kern_dcache_area)
    dcache_line_size r2, r3
    add r1, r0, r1
    1:
    mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line / unified line
    add r0, r0, r2
    cmp r0, r1
    blo 1b
    dsb
    mov pc, lr
    ENDPROC(v7_flush_kern_dcache_area)
  • Hello,

    I see in linux kernel are also used these two functions for A8 cache flush:

    flash_cache_all()
    v7_flash_dcache_all

    Regards,
    Pavel
  • Yes, thanks. I had seen those being used but did not think flushing the whole cache on every packet was the best thing to do.