This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5726: PRU Ethernet Tx queue issue

Part Number: AM5726
Other Parts Discussed in Thread: AM5728

Hi,

We are using AM5726 on one of our products and are using the Processor SDK for Linux RT: 04_03_00_05

We are seeing an issue sometimes where the PRU ethernet's (MII0 in our case) Tx queue gets full and never recovers from this situation unless the box is rebooted.

We always observe this issue after a reboot of the system. We have never seen the issue while the box has been up for long time.

I am trying to understand what could be causing this issue to occur? What could the reason that the Tx queue never gets emptied and whats preventing it to be emptying?

Is the firmware stuck up and not dequeuing packets from the Tx queue?

One debug log that I added to the PRU ethernet driver shows me the following:

prueth_tx_enqueue: out of queue space, pkt_block_size: 5, free_blocks: 2

Can you please help me understand what could be causing this issue and how can I debug further to know the exact reason for this issue to occur?

Thanks,

Mukul

  • Hello Mukul,

    We are taking a look and will have another post in a day or so.

    Regards,

    Nick

  • Thanks Nick.

    I will wait for your reply.

    -Mukul

  • Hi Nick,

    This is a very high priority issue for us as this issue renders the box useless and we need to find a solution to this issue as soon as possible.

    Because of this issue, our system goes into a state where only a power reboot recovers the box.

    One point which might be important to you is that we have the AM5726 (1.1 silicon) on our box on which we see this issue.

    As a result, we are using the am437x related firmware files as recommended by TI.

    We have another box which has the 2.0 silicon, with which we use the am57xx firmware and we have not observed this issue on 2.0 silicon box.

    Not sure if this helps in your investigation.

    Please suggest what can we do to figure out the cause of this issue.

    Thanks,

    Mukul

  • Hi Nick,

    As mentioned in my previous posts, we are using the am437x firmware because we have the 1.1 silicon on our box.

    Our test scenario is such that we have a loopback stub inserted into the port and we send custom ethernet packets to stress the link and see if there are any drops seen.

    I found that if I disabled IPv6 on the ports or globally, then I don't see this Tx out-of-queue issue or lockups.

    My suspicion is IPv6 process will keep on sending its protocol messages (MLD, NS etc) and because they are looped back into the port, it somehow causes the firmware to lockup and never recover from it.

    Disabling IPv6 stops these messages from going out and the firmware stays happy.

    I have tried about 100 reboots with IPv6 disabled and I haven't seen the TX out-of-space issue till now. However, this is only on 1 box. I will try on multiple boxes and let you know if I see something the issue again.

    Do you know of any such limitations in the firmware where if IPv6 protocol packets are looped back in, they can cause the firmware to lock up?

    Maybe you can try the same test scenario on your box and see if you also see lock ups.

    Thanks,

    Mukul

  • Hello Mukul,

    I apologize for the delayed response. Thank you for the additional debug information, that is helpful.

    Yes, using AM5726 SR1.1 with AM437x PRUETH firmware does make a difference for debugging this issue.

    We have ongoing discussions on this side about potential causes and how to best support you. So far I do not have anything to report, but we will keep you updated.

    To confirm: This is an issue with a product that is currently deployed?

    Are there any more details we should know about how to reproduce the issue?

    Regards,

    Nick

  • Hi Nick,

    Yes the product is deployed but its running an old version of PRU ethernet driver. Our old deployed kernel is 3.14 Linux kernel.

    We are in the process of updating the software to 4.9 based Linux kernel and SDK 04_03_00_05.

    It is this version that we see this issue.

    One additional information is that we have VLANs also setup on the PRU ethernet ports. I think the rest of the information I have provided in my previous email gives an overview of our test setup.

    Please let me know if things are not clear or you need any more explanation about our test setup.

    Thanks,

    Mukul

  • Hello Mukul,

    We are starting to look at replicating this issue on our side.

    1) Could you provide us with the test app you are using and the exact steps you are running to set up the Ethernet port and run the tests?

    2) You said that if the issue is going to show up, then it will show up sometime soon after a reboot. About how frequently should we expect to see the issue? (e.g., about half the time? 1 in 100 times? etc)

    3) What does the issue look like? e.g., is there specific terminal output? Does this freeze the entire system, or does the Ethernet port just stop sending packets? Etc.

    4) Is there any other hardware setup required other than plugging a loopback cable into the one port that is being tested?

    Regards,

    Nick

  • Hi Nick,

    1. The test application is embedded within our framework. To get a standalone app to do the same thing might require changes on our end. I will see what I can do.

    Our app just sends custom broadcast Ethernet packets at line rate to stress the port. Those packets are looped back from the loopback stub and we count the difference between sent and received to see if any packets were lost.

    I am not sure if the standalone app will lead to the issue as in our system, we have other applications also starting up after reboot.

    2. In our case, we see the issue every 1 out of 3 or 4 times.

    3. In our case, we see the kernel thread ktimersoftd (there are 2 ktimersoftirq threads created for each core. We see the issue randomly on core 0 or core 1) running at 100%. Debugging that lead to prueth_tx_enqueue() function. Adding debug prints showed that prueth_tx_enqueue() was printing out-of-space error and it never recovers from this situation. Another observation was the output of 'cat /proc/softirqs' showed the NET_TX softirq on either CPU0 or CPU1 was incrementing at a rate of about 71000 softirqs per second.

    4. No, we only have a loopback stub inserted into the port.

    As mentioned in my previous emails, if I disable IPv6 process in Linux then I don't observe this issue. So my suspicion is looping back of IPv6 protocol packets back into the same port, might be doing something to the firmware.

    Thanks,

    Mukul

  • Hi Nick,

    Please find attached the application that we use to generate packets.

    The usage for this application:

    ./packetloop <interface-name>  <num_packets> <packet_size> <speed>

    For example, ./packetloop dp1 666 1470 100

    Thanks,

    Mukul

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <stdarg.h>
    #include <stdlib.h>
    #include <errno.h>
    #include <string.h>
    #include <unistd.h>
    #include <sys/time.h>
    #include <sys/ioctl.h>
    #include <net/if.h>
    #include <net/if_arp.h>
    #include <arpa/inet.h>
    #include <netpacket/packet.h>
    #include <net/ethernet.h>
    #include "packetloop.h"
    
    
    // line length for printdata routine
    #define PDATA_LINELEN 16
    
    #define SOCK_BUF_LEN_TX 10000
    #define SOCK_BUF_LEN_RX 1000000
    #define SOCK_BUF_LEN_DEFAULT 100
    
    // A structure for recording packet counts
    struct ifdata {
    	int recpackets;
    	int sentpackets;
    	int recerrors;
    	int senterrors;
    	int recframe;
    	int sentcarrier;
    };
    
    struct ifdata getifdata(char *interface);
    int  verifymessage(char *message, int messagelength, int headerlength, int len, const u_char * data);
    
    int ploop(char *output, int outlen, char *interface, int passes, int messagelength, int speed, LoopMode mode)
    {
    	char message[2048];
    	char buffer[2048];
    	struct timeval starttime, finishtime;
    	struct ifreq ifrequest;
    	struct sockaddr_ll ethsockaddr;
    	float transfertime, sentbytes;
    	int ethertype;
    	signed int useconds;
    	int headerlength = ETH_HLEN;
    	int debug = 0;
    	int i, seconds;
    	char mac_addr[6];
    
    	int ifsocket;
    	unsigned int outoforderrcvd = 0;
    
    	// represents our output buffer passed into the zprintf function. when the
    	// pointer is null (due to output == NULL), this function will output to
    	// stdout. otherwise, the zprintf routines will append to the output buffer
    	struct prdata* p = NULL;
    
    	// allocate the print buffer on the stack, but only use it if we are
    	// provided an output buffer
    	struct prdata print_buff;
    	if ((NULL != output) && (outlen > 0)) {
    		print_buff.buflen = outlen;
    		print_buff.outbuf = output;
    		print_buff.slength = 0;
    		p = &print_buff;
    	}
    	else {
    		// When running as a standalone program, derive debug from here.
    		debug = outlen;
    	}
    
    	// Previously we differentiated socket options based on interface, the type
    	// of which was derived from the name. Now we just use the same options for all.
    	ethertype = ETH_P_IP;
    
    	// Open a packet socket to catch frames, but don't specify a protocol yet
    	ifsocket = socket(AF_PACKET, SOCK_RAW, 0);
    	if (ifsocket == -1) {
    		BISTMODE_ZPRINTF(mode, p, "FAIL: socket");
    		return 1;
    	}
    	memset(&ethsockaddr, 0, sizeof(ethsockaddr));
    	memset(&mac_addr, 0, sizeof(mac_addr));
    	strncpy(ifrequest.ifr_name, interface, IFNAMSIZ);
    	if (ioctl(ifsocket, SIOCGIFINDEX, &ifrequest) < 0) {
    		BISTMODE_ZPRINTF(mode, p, "ioctl 1 %s", strerror(errno));
    		close(ifsocket);
    		return 1;
    	}
    	ethsockaddr.sll_ifindex = ifrequest.ifr_ifindex;
    	ethsockaddr.sll_family = AF_PACKET;
    	ethsockaddr.sll_protocol = htons(ethertype);
    
    	if (ioctl(ifsocket, SIOCGIFHWADDR, &ifrequest) < 0) {
    		BISTMODE_ZPRINTF(mode, p, "ioctl 2 %s", strerror(errno));
    		close(ifsocket);
    		return 1;
    	}
    	memcpy(mac_addr, ifrequest.ifr_hwaddr.sa_data, ETH_ALEN);
    
    	// Now bind, catching all IP frames on the given interface
    	if (bind(ifsocket, (struct sockaddr *)&ethsockaddr, sizeof(ethsockaddr))) {
    		BISTMODE_ZPRINTF(mode, p, "bind %s", strerror(errno));
    		close(ifsocket);
    		return 1;
    	}
    
    	// Run with reasonable send and receive buffer sizes
    	int sock_buf_size = SOCK_BUF_LEN_TX;
    	if (setsockopt(ifsocket, SOL_SOCKET, SO_SNDBUF, (char *)&sock_buf_size, sizeof(sock_buf_size))) {
    		BISTMODE_ZPRINTF(mode, p, "Failed to increase socket send buffer (%d:%s)", errno, strerror(errno));
    
    		return 1;
    	}
    
    	sock_buf_size = SOCK_BUF_LEN_RX;
    	if (setsockopt(ifsocket, SOL_SOCKET, SO_RCVBUF, (char *)&sock_buf_size, sizeof(sock_buf_size))) {
    		BISTMODE_ZPRINTF(mode, p, "Failed to increase socket receive buffer (%d:%s)", errno, strerror(errno));
    
    		return 1;
    	}
    
    	// Prepare message
    #ifdef USE_RANDOM_PACKETS
    	srand(42);
    	for (i = 0; i < messagelength - headerlength; i++) {
    		message[i + headerlength] = rand() & 0xff;
    	}
    #else
    	for (i = 0; i < messagelength - headerlength; i++) {
    		message[i + headerlength] = i & 0xff;
    	}
    #endif
    
    	// Overwrite first few characters with interface name, this ensures
    	// that anything not flushed from the rx queue will cause a failure.
    	memcpy(&message[headerlength + 2], interface, strlen(interface));
    
    	struct ifdata olddata, newdata;
    	struct timeval timer;
    	fd_set set_rx;
    	fd_set set_tx;
    	int length, rc;
    	int rcvd, sent;
    	int busycount;
    	int time_after_sent;
    	int time_from_start;
    	int delay = 0;
    
    	// Flush the stream before we start
    	while (recv(ifsocket, buffer, messagelength, MSG_DONTWAIT) > 0) usleep(1);
    
    	rcvd = sent = busycount = time_after_sent = time_from_start = 0;
    
    	// Accomodate the 1 Gig to 100mbit switch interface problem by delaying after tx based on speed
    	// The following delay is based on sending 1000 bytes to a 100 mB link..
    	// 1000 bytes == 8000 bits = 100x10e6/8000 fps = 12,500 fps = 1 frame every 80 uS
    	// 1500 bytes == 12000 bits = 100x10e6/120000 fps = 8,333 fps = 1 frame every 120 uS
    	if (speed > 0 && speed < 1000) {
    		int txtime_1G = (messagelength * 8) / 1000;
    		int txtime_speed = (messagelength * 8) / speed;
    		delay = txtime_speed - txtime_1G;
    		//delay = txtime_speed;
    	}
    	if (debug) {
    		zprintf(p, "After packet delay set to %d uS\n", delay);
    	}
    
    	olddata = getifdata(interface);
    	gettimeofday(&starttime, NULL);
    	uint16_t tseq = 0, rseq = 0, nseq = 0;
    
    	/* Destination MAC address */
    	message[0] = 0xff;
    	message[1] = 0xff;
    	message[2] = 0xff;
    	message[3] = 0xff;
    	message[4] = 0xff;
    	message[5] = 0xff;
    
    	/* Source MAC address. Different with the real source MAC address, otherwise some
    	 * MAC controller (e.g. PRUETH) will drop the packets.
    	 */
    	message[6] = mac_addr[0];
    	message[7] = mac_addr[1];
    	message[8] = mac_addr[2];
    	message[9] = mac_addr[3];
    	message[10] = mac_addr[4];
    	message[11] = (mac_addr[5] + 1) & 255;
    
    	/* ethertype protocol */
    	message[12] = ethertype >> 0x08;
    	message[13] = ethertype & 0xff;
    
    	while (1) {
    		FD_ZERO(&set_rx);
    		FD_ZERO(&set_tx);
    		FD_SET(ifsocket, &set_rx);
    		if (sent < passes) {
    			FD_SET(ifsocket, &set_tx);
    		}
    		timer.tv_sec = 0;
    		timer.tv_usec = 500000;
    
    		if (sent < passes) {
    			rc = select(ifsocket + 1, &set_rx, &set_tx, NULL, &timer);
    		} else {
    			rc = select(ifsocket + 1, &set_rx, NULL, NULL, &timer);
    		}
    		if (rc > 0) {
    			// Handle the receiver
    			if (FD_ISSET(ifsocket, &set_rx)) {
    				length = recv(ifsocket, buffer, messagelength, 0);
    				if (length == -1) {
    					BISTMODE_ZPRINTF(mode, p, "Receive error\n");
    				} else {
    					if (debug > 1) {
    						zprintf(p, "Got a packet of length %d\n", length);
    					}
    					if (length == messagelength) {
    						int errlocn = verifymessage(message, messagelength, headerlength,
    							length, (unsigned char *)buffer);
    
    						if (debug) zprintf(p, "r");
    
    						if (errlocn >= 0) {
    							BISTMODE_ZPRINTF(mode, p, "Bad verify at offset %d in packet of length %d..\n",
    											 errlocn, length);
    							BISTMODE_PRINTDATA(mode, p, length, (u_char *) buffer, PDATA_LINELEN);
    							break;
    						}
    
    						// Packet length and data verify, is it the correct sequence number?
    						nseq = (buffer[headerlength] * 256) + (buffer[headerlength + 1] & 255);
    						if (nseq == rseq) {
    							++rcvd; /* Move this line before the nseq == rseq conditional to allow out-of-order */
    							++rseq;
    						} else {
    							BISTMODE_ZPRINTF(mode, p, "Expected packet %d, got %d\n", rseq, nseq);
    							rseq = ++nseq;
    							++outoforderrcvd;
    						}
    
    						// Are we done?
    						if (rcvd == passes) {
    							// We are done
    							break;
    						}
    					} else {
    						BISTMODE_ZPRINTF(mode, p, "Unknown packet of length %d..\n", length);
    						BISTMODE_PRINTDATA(mode, p, length, (u_char *) buffer, PDATA_LINELEN);
    						break;
    					}
    					if (debug > 1) {
    						printdata(p, length, (u_char *) buffer, PDATA_LINELEN);
    					}
    				}
    			}
    
    			// Handle the transmitter
    			if ((sent < passes) && FD_ISSET(ifsocket, &set_tx)) {
    				if (delay) usleep(delay);
    
    				/* Sequence number */
    				message[headerlength] = tseq >> 8;
    				message[headerlength + 1] = tseq & 255;
    
    				++tseq;
    				if (-1 == sendto(ifsocket, message, messagelength, 0, NULL, 0)) {
    					if (errno != EBUSY) {
    						BISTMODE_ZPRINTF(mode, p, "sendto %s\n", strerror(errno));
    						break;
    					} else {
    						BISTMODE_ZPRINTF(mode, p, "Transmit EBUSY\n");
    						++busycount;
    						usleep(10000);		// Delay a bit before retrying to let queue empty a bit.
    					}
    				} else {
    					if (debug) zprintf(p, "s");
    					++sent;
    				}
    			}
    
    		} else {
    			// Timeout
    			if (debug) zprintf(p, "t");
    			if (busycount >= 50) {
    				BISTMODE_ZPRINTF(mode, p, "Tried sending %d times and got busy each time.  Giving up...\n", busycount);
    				return(1);
    			}
    
    			// Ensure that the transmitter finishes its job..
    			if (sent == passes && (++time_after_sent > 2)) {
    				BISTMODE_ZPRINTF(mode, p, "Timeout waiting for receive\n");
    				break;
    			}
    			// When sending thousands of frames, the interface eventually fills and
    			// no longer goes transmit ready.  Give the interface an absolute limit of 5 seconds.
    			if (++time_from_start > 10) {
    				//zprintf(p, "sent = %d rcvd = %d\n", sent, rcvd);
    				BISTMODE_ZPRINTF(mode, p, "Transmitter did not finish its job within 5 seconds!\n");
    				break;
    			}
    		}
    	}
    	if (debug) zprintf(p, "\n");
    
    	gettimeofday(&finishtime, NULL);
    
    	// We should have received exactly what the transmitter sent..
    	if (mode == LOOPMODE_BIST && rcvd != passes) {
    		int fail = passes - rcvd;
    		zprintf(p, "Lost %d out of %d packets (%.2f%% loss)", fail, passes, (float)fail / (float)passes * 100.0);
    		// There should be no frame errors
    		newdata = getifdata(interface);
    		if (newdata.recframe > olddata.recframe) {
    			zprintf(p, " frame errors: %d", newdata.recframe - olddata.recframe);
    		}
    
    		// There should be no carrier failures
    		if (newdata.sentcarrier > olddata.sentcarrier) {
    			zprintf(p, " carrier failures: %d", newdata.sentcarrier - olddata.sentcarrier);
    		}
    		zprintf(p, "\n");
    	} else {
    		BISTMODE_ZPRINTF(mode, p, "%d/%d passes successful\n", rcvd, passes);
    	}
    
    	sentbytes = (float)messagelength *(float)passes;
    
    	// Calculate transfer speed
    	if (mode == LOOPMODE_BIST)
    	{
    		seconds = finishtime.tv_sec - starttime.tv_sec;
    		useconds = finishtime.tv_usec - starttime.tv_usec;
    		if (useconds < 0) {			// Fix carry
    			useconds += 1000000;
    			seconds -= 1;
    		}
    		transfertime = seconds;
    		transfertime += (float)useconds / 1000000.0;
    		zprintf(p, "Sent %.0f bytes in %.3fs (%.6f Mbps)\n",
    			sentbytes, transfertime, 8.0 * sentbytes / 1000000.0 / transfertime);
    	}
    	sock_buf_size = SOCK_BUF_LEN_DEFAULT;
    	setsockopt(ifsocket, SOL_SOCKET, SO_SNDBUF, (char *)&sock_buf_size, sizeof(sock_buf_size));
    	setsockopt(ifsocket, SOL_SOCKET, SO_RCVBUF, (char *)&sock_buf_size, sizeof(sock_buf_size));
    	close(ifsocket);
    
    	return passes - rcvd;
    }
    
    
    struct ifdata getifdata(char *interface)
    {
    	struct ifdata ret = { 0, 0, 0, 0, 0, 0 };
    	FILE *fh;
    	char *line;
    	size_t length;
    	char *tmp;
    	char *tmp2;
    	int t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15, t16;
    
    	fh = fopen("/proc/net/dev", "r");
    
    	while (!feof(fh)) {
    		line = NULL;
    		length = 1024;
    		getline(&line, &length, fh);
    		if (!feof(fh) && line) {
    			for (tmp = line; tmp[0] == ' '; tmp++) ;
    			tmp2 = index(tmp, ':');
    			if (tmp2) {
    				tmp2[0] = '\0';
    				tmp2++;
    				if (!strcmp(tmp, interface)) {
    					sscanf(tmp2, "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d",
    						&t1, &t2, &t3, &t4, &t5, &t6, &t7, &t8, &t9, &t10, &t11, &t12, &t13, &t14, &t15, &t16);
    					ret.recpackets = t2;
    					ret.recerrors = t3;
    					ret.recframe = t6;
    					ret.sentpackets = t10;
    					ret.senterrors = t11;
    					ret.sentcarrier = t15;
    				}
    			}
    		}
    		free(line);
    	}
    
    	fclose(fh);
    	return ret;
    }
    
    int verifymessage(char *message, int messagelength, int headerlength, int len, const u_char * data)
    {
    	int offset;
    
    	offset = len - messagelength + headerlength;
    	//zprintf(p, "offset: %d len: %d headerlength: %d messagelength: %d\n", offset, len, headerlength, messagelength);
    
    	// Acount for the two byte sequence number
    	offset += 2;
    	messagelength -= 2;
    
    	if (memcmp(&message[offset], &(data[offset]), messagelength - headerlength) == 0) return(-1);
    
    	char *s = (char*)&message[offset], *d = (char*)&(data[offset]);
    	int i = 0;
    	while (i++ < len) if (*s++ != *d++) return(i-1);
    	return(len);
    }
    
    void zprintf(struct prdata *p, char *fmt, ...)
    {
    	va_list argv;
    
    	va_start(argv, fmt);
    
    	if (NULL == p) {
    		vprintf(fmt, argv);
    	}
    	else {
    		if (p->outbuf) {
    			p->slength += vsnprintf(&p->outbuf[p->slength], p->buflen - p->slength - 1, fmt, argv);
    			// snprintf obeys your restriction on max chars to write, but always returns the number of chars you passed in!
    			if (p->slength > p->buflen - 1) p->slength = p->buflen - 1;
    		}
    	}
    
    	va_end(argv);
    }
    
    /**
     * @copydoc printdata
     *
     * See testloop_print_utils.h for details.
     */
    void printdata(struct prdata *p, int len, const u_char *data, int linelen)
    {
    	u_char *buffer;
    	int i, j;
    
    	buffer = (u_char*) malloc(linelen+1);
    
    	if (NULL == buffer) {
    		return;
    	}
    
    	for (i = 0; i < len; i++) {
    		// On new line boundaries, clear our ascii display buffer
    		if (i % linelen == 0) {
    			for (j = 0; j < linelen + 1; ++j) {
    				buffer[j] = '\0';
    			}
    			zprintf(p, "%.4x: ", i);
    		}
    
    		// First print the data as hex digits
    		zprintf(p, "%.2x ", data[i]);
    
    		// Build the ascii display buffer as we go
    		if (data[i] >= 32 && data[i] < 127) {
    			// Character is printable
    			buffer[i % linelen] = data[i];
    		} else {
    			buffer[i % linelen] = '.';
    		}
    
    		// And print it after each linelen hex digits or when we reach the end of data
    		if (i % linelen == linelen-1
    		 || i + 1 == len) {
    			for (j = i % linelen; j < linelen; j++) {
    				zprintf(p, "   ");
    			}
    			zprintf(p, "%s\n", buffer);
    		}
    	}
    	zprintf(p, "\n");
    
    	free(buffer);
    }
    
    int main(int argc, char **argv)
    {
    	int messagelength = 1000;
    	int debug = 0;
    	int passes;
    	int speed = 0;
    	char *interface;
    
    	if (argc < 3) {
    		fprintf(stderr, "Usage: %s <interface> <passes> [packetsize [speed_in_mbps]]\n", argv[0]);
    		fprintf(stderr, "  adding multiple -d to the end of the line invokes successively higher debugging\n");
    		return 1;
    	}
    	interface = argv[1];
    
    	passes = atoi(argv[2]);
    	if (passes <= 0) {
    		fprintf(stderr, "passes must be at least 1\n");
    		return 1;
    	}
    
    	// Keep adding "-d"s to the end of the command line to get ever more debug
    	if (argc > 3) {
    		if (!strcmp(argv[3], "-d")) {
    			++debug;
    		} else {
    			messagelength = atoi(argv[3]);
    			if (messagelength < 64 || messagelength > 1500) {
    				fprintf(stderr, "packet size must be at least 64 and at most 1500\n");
    				return 1;
    			}
    		}
    	}
    	if (argc > 4) {
    		if (!strcmp(argv[4], "-d")) {
    			++debug;
    		} else {
    			speed = atoi(argv[4]);
    			if (speed > 10000) {
    				fprintf(stderr, "speed must be <= 10000 mbps\n");
    				return 1;
    			}
    		}
    	}
    	if (argc > 5 && !strcmp(argv[5], "-d")) ++debug;
    	if (argc > 6 && !strcmp(argv[6], "-d")) ++debug;
    
    	setvbuf(stdout, (char *)NULL, _IONBF, 0);
    	return(ploop(NULL, debug, interface, passes, messagelength, speed, LOOPMODE_BIST) ? 1 : 0);
    }
    
    // vim:ts=4:sw=4:

    packetloop.h

  • Now that I think of it again, I don't think you need our application to see this issue.

    We see the issue after a reboot when we have not yet initiated our application.

    Hopefully, looping back IPv6 packets back into the port should be good enough to see the issue occur.

    Please note we are using the RT version of the linux kernel.

  • Hello Mukul,

    We were not able to see your issue on an AM437x IDK board. We will try an AM5728 SR1.1 IDK board next. Please note that the thanksgiving holidays might slow down responses next week.

    Regards,

    Nick

  • Hi,

    We are still working on reproducing this issue. Can you provide some additional details that might help us narrow down the issue:

    -Firmware version: are you using am437x firmware from 4.3.0.5 release now, or the same firmware from the previous release? (if you can provide the hash of the binaries we can exactly make sure we are testing the same firmware e.g. 'sha256sum /lib/firmware/ti-pruss/am437x-pru0-prueth-fw.elf')
    -Kernel version: this should be set for the 4.3.0.5 release, but you can verify with 'uname -a' output
    -Device tree files: I believe you may have needed to modify the device tree files to use a different firmware, can you share any modifications you have made?

    As for your testing, do you have an AM572x SR1.1 IDK board that you can reproduce this on? This would give us a common point to test and rule out hardware differences. Additionally, if you could provide a full log from boot until you observe this issue, it may help us identify any other information that could lead to an issue.

    Regards,
    Aaron

  • Hi Aaron,

    Sorry for the late reply.

    Below is my response to your questions:

    - Here is the sha256sum for the prueth firmware:

    BIST:~# sha256sum /lib/firmware/ti-pruss/am437x-pru0-prueth-fw.elf
    a2a893fecc31ae795078c710bb4ca82ca63c23871f503dc5b732cc39840eb6de  /lib/firmware/ti-pruss/am437x-pru0-prueth-fw.elf

    - For the kernel version, we took the 4.3.0.5 release as our base and updated it to 4.9.82 version. So our uname -a output doesn't show 4.3.0.5 but 4.9.82 version that we use.

      I am not sure if that should affect the PRU functionality.

    - Instead of modifying the device tree, we added code to identify if it was a 1.1 version or 2.0 version of the silicon and then load the correct firmware.

    - I am not sure we have the AM572x SR1.1 board with us. I will check internally, if I find it I will try to reproduce the issue on it.

    - There are quite a few custom drivers/applications that we use on our box, so I am not sure if the boot logs will help you much. I have verified that there were no failures reported from the PRU module during bootup.

    As I mentioned previously disabling IPv6 seems to have resolved our problem. We have not seen the Tx stuck up issue in any of our boxes after disabling IPv6.

    Still we would like to know what could cause the Tx lock up to occur in the first place.

    Thanks,

    Mukul

  • Hello Mukul,

    I am not going to root cause this since you have found a suitable workaround.

    Regards,

    Nick