Linux/AM3352: Looking for quick AND efficient Uart handling in userspace

Richard McAleer

Part Number: AM3352

Tool/software: Linux

I'm running on a custom board based on the beaglebone running kernel 4.4.19

root@am335x-evm:~/cm# uname -a
Linux am335x-evm 4.4.19-gdb0b54cdad #238 PREEMPT Fri Aug 25 16:25:18 AEST 2017 armv7l GNU/Linux

I'm wondering what is the best way to get fast and efficient UART comms.

The board has been laid out such that there is a UART for comms between the am3352 and a DSP
There is also a GPIO that serves as a IRQ from the DSP to the am3352.

The process flow is as follows.
- DSP asserts IRQ GPIO line
- DSP sends 22 bytes on UART
- am3352 detects IRQ GPIO and sends 12bytes to DSP
- am3352 receives bytes from UART

There is an exchange like this every 10mS.

Presently I am calling poll watching on the file descriptors for both the GPIO and UART nodes.
The UART port is configured for 115k2 baud, raw mode and low latency.

With this configuration I find that the 22bytes is normally read in 3 blocks.
i.e. every 10ms the poll returns once for the GPIO and three times for the UART

Watching using top this process is using 93% of the CPU; which seems quite excessive.

Ideally I would be looking to specify that the UART notified of incoming bytes after a "timeout" of about 200uS; so that I could read the bytes out once

However VTIME in the termios struct only has resolution of 0.1S
- does anyone know of a way to configure the behaviour of a tty node in raw mode with finer resolution than this?
- or is there another way to do this?

Thanks for any suggestions.

All the best,
Richard

over 8 years ago

0 prabhuraj tavag20 over 8 years ago

Expert 2020 points

Hi Richard,

I went through the AM335x TRM from www.ti.com/.../spruh73p.pdf. Page 4319 has details about DMA requests for UART. I am curious to know if you can look into this. It can help the CPU load reduce and your applications can run smoothly.
Please share your observations.

Thanks,
Prabhuraj
BlackPepper Technologies

0 Richard McAleer over 8 years ago in reply to prabhuraj tavag20

Expert 1485 points

Thanks

I had a look at the TRM around the uart handling

I tried using the DRM channels that had been allocated to another (unused) UART but it appeared to give me resource busy errors quickly when trying to read form the file
So I have returned to the non-drm method for now.

I have been experimenting with the VTIME and VMIN and have some unexpected observations.

It looks like the VTIME threshold is significantly less than the claimed 0.1 sec.
With some tinkering I can receive my 24byte packets as single reads.

Implementing timers within thin loop I can see that I am spending >95% of the time waiting for the poll call to return

However if I run top on another shell I can see that the simple test application is taking >92% of the cpu

I would be grateful for any insight as to why this would be happening?

Thanks for any advice.

Best regards,

Richard

0 prabhuraj tavag20 over 8 years ago in reply to Richard McAleer

Expert 2020 points

Hi Richard,

Thanks for sharing your observations. Are you sure that the >92% CPU load is due to your ARM-DSP UART code? I can see that you are using a GPIO for IRQ every 10ms. Is the IRQ getting raised every time or is it in polling mode?

Thanks,
Prabhuraj
BlackPepper Technologies

0 Richard McAleer over 8 years ago in reply to prabhuraj tavag20

Expert 1485 points

I have removed the GPIO poll for now to try and isolate where the cycles are going.

The guts of the code look like:

static inline void _handleGet(int fd){
	const uint16_t kReadBufferSize = 512;
	uint8_t buffer[kReadBufferSize];

	int bytes_available;
	ioctl(fd, FIONREAD, &bytes_available);

	if (bytes_available > kReadBufferSize){
		InCounters.overrun++;
		// game over - clear out
		tcflush(fd,TCIFLUSH);
	}
	int n = read(fd, &buffer, kReadBufferSize);
	if (n > 0){
		FIFOBuffer* fb = comms_getRX();
		for (int i = 0; i < n; i++) {
			fifo_put(fb, buffer[i]);
		}
		_logIncomingBytes(n);
	}
}

// called in an infinite while loop 

// processing handled outside based on return code
TriggerEvent _handleTriggers(const int portFile) {
	uint8_t gpioByte;

	const uint8_t kDescriptorCount = 1;

	struct pollfd polls[2] = {
			{ .fd = portFile, .events = (POLLIN | POLLPRI | POLLERR) } };

	struct pollfd* port = &polls[0];

	int rv = 0;
	TriggerEvent result = kNoEvent;

	while (result == kNoEvent) {
		_stampTime(kPoll);

		rv = poll(polls, kDescriptorCount, 20000);

		_stampTime(kPollStops);

		if (rv > 0) {
			if ((port->revents & POLLIN) || (port->revents & POLLPRI)) {
				result |= kMessageEvent;
                                _handleGet(port->fd);
			}
		} else {
			sToCount++;
		}
	}
	return result;
}

I timestamp before and after the poll function and keep running histograms of the differences between these two stamps that I reset periodically:

Time in Poll Time (uS) value = bin upper limit
        Bin  0    9000      55
        Bin  1    9250       6
        Bin  2    9500       4
        Bin  3    9750      55
        Bin  4   10000    2822
        Bin  5 > 10000      58
Time outside Poll Time (uS)
        Bin  0     200    2463
        Bin  1     500     528
        Bin  2    1000       1
        Bin  3    2000       0
        Bin  4    5000       7
        Bin  5 >  5000       0

The bytes are being sent 10ms apart and the byte logging confirms I am receiving 24 every time that _handleGet is called

So for the vast majority of the time this is just waiting for the poll call to return:

But running top gives:

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                       
 1375 root      20   0    9620   1100   1024 R 94.2  0.4   0:41.61 dspwatch    << this is the simple test app                                                  
 1380 root      20   0    3036   1776   1388 R  3.4  0.7   0:00.64 top                                                           
  749 root      20   0    2292    836    748 S  0.9  0.3   7:59.32 rngd

This is what I cannot explain.

I see no noticeable change in the CPU usage between using select, poll or epoll
It also does not change if I add/remove polling for the GPIO file descriptor or remove the transmit of the outgoing packet

Thanks for any suggestions.

All the best,

Richard

Processors

Processors forum

Linux/AM3352: Looking for quick AND efficient Uart handling in userspace