This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3352: Looking for quick AND efficient Uart handling in userspace

Part Number: AM3352


Tool/software: Linux

Hi

I'm running on a custom board based on the beaglebone running kernel 4.4.19

root@am335x-evm:~/cm# uname -a
Linux am335x-evm 4.4.19-gdb0b54cdad #238 PREEMPT Fri Aug 25 16:25:18 AEST 2017 armv7l GNU/Linux

I'm wondering what is the best way to get fast and efficient UART comms.

The board has been laid out such that there is a UART for comms between the am3352 and a DSP
There is also a GPIO that serves as a IRQ from the DSP to the am3352.

The process flow is as follows.
- DSP asserts IRQ GPIO line
- DSP sends 22 bytes on UART
- am3352 detects IRQ GPIO and sends 12bytes to DSP
- am3352 receives bytes from UART

There is an exchange like this every 10mS.

Presently I am calling poll watching on the file descriptors for both the GPIO and UART nodes.
The UART port is configured for 115k2 baud, raw mode and low latency.

With this configuration I find that the 22bytes is normally read in 3 blocks.
i.e. every 10ms the poll returns once for the GPIO and three times for the UART

Watching using top this process is using 93% of the CPU; which seems quite excessive.

Ideally I would be looking to specify that the UART notified of incoming bytes after a "timeout" of about 200uS; so that I could read the bytes out once

However VTIME in the termios struct only has resolution of 0.1S
- does anyone know of a way to configure the behaviour of a tty node in raw mode with finer resolution than this?
- or is there another way to do this?

Thanks for any suggestions.


All the best,
Richard

  • Hi Richard,

    I went through the AM335x TRM from www.ti.com/.../spruh73p.pdf. Page 4319 has details about DMA requests for UART. I am curious to know if you can look into this. It can help the CPU load reduce and your applications can run smoothly.
    Please share your observations.

    Thanks,
    Prabhuraj
    BlackPepper Technologies
  • Thanks

    I had a look at the TRM around the uart handling 

    • I tried using the DRM channels that had been allocated to another (unused) UART but it appeared to give me resource busy errors quickly when trying to read form the file
    • So I have returned to the non-drm method for now.

    I have been experimenting with the VTIME and VMIN and have some unexpected observations.

    • It looks like the VTIME threshold is significantly less than the claimed 0.1 sec.
    • With some tinkering I can receive my 24byte packets as single reads.

    Implementing timers within thin loop I can see that I am spending >95% of the time waiting for the poll call to return

    However if I run top on another shell I can see that the simple test application is taking >92% of the cpu 

    I would be grateful for any insight as to why this would be happening?

    Thanks for any advice.

    Best regards,

    Richard

  • Hi Richard,

    Thanks for sharing your observations. Are you sure that the >92% CPU load is due to your ARM-DSP UART code? I can see that you are using a GPIO for IRQ every 10ms. Is the IRQ getting raised every time or is it in polling mode?

    Thanks,
    Prabhuraj
    BlackPepper Technologies
  • I have removed the GPIO poll for now to try and isolate where the cycles are going.

    The guts of the code look like:

    static inline void _handleGet(int fd){
    	const uint16_t kReadBufferSize = 512;
    	uint8_t buffer[kReadBufferSize];
    
    	int bytes_available;
    	ioctl(fd, FIONREAD, &bytes_available);
    
    	if (bytes_available > kReadBufferSize){
    		InCounters.overrun++;
    		// game over - clear out
    		tcflush(fd,TCIFLUSH);
    	}
    	int n = read(fd, &buffer, kReadBufferSize);
    	if (n > 0){
    		FIFOBuffer* fb = comms_getRX();
    		for (int i = 0; i < n; i++) {
    			fifo_put(fb, buffer[i]);
    		}
    		_logIncomingBytes(n);
    	}
    }
    
    // called in an infinite while loop

    // processing handled outside based on return code TriggerEvent _handleTriggers(const int portFile) { uint8_t gpioByte; const uint8_t kDescriptorCount = 1; struct pollfd polls[2] = { { .fd = portFile, .events = (POLLIN | POLLPRI | POLLERR) } }; struct pollfd* port = &polls[0]; int rv = 0; TriggerEvent result = kNoEvent; while (result == kNoEvent) { _stampTime(kPoll); rv = poll(polls, kDescriptorCount, 20000); _stampTime(kPollStops); if (rv > 0) { if ((port->revents & POLLIN) || (port->revents & POLLPRI)) { result |= kMessageEvent; _handleGet(port->fd); } } else { sToCount++; } } return result; }

    I timestamp before and after the poll function and keep running histograms of the differences between these two stamps that I reset periodically:

    Time in Poll Time (uS) value = bin upper limit
            Bin  0    9000      55
            Bin  1    9250       6
            Bin  2    9500       4
            Bin  3    9750      55
            Bin  4   10000    2822
            Bin  5 > 10000      58
    Time outside Poll Time (uS)
            Bin  0     200    2463
            Bin  1     500     528
            Bin  2    1000       1
            Bin  3    2000       0
            Bin  4    5000       7
            Bin  5 >  5000       0
    

    The bytes are being sent 10ms apart and the byte logging confirms I am receiving 24 every time that _handleGet is called 

    So for the vast majority of the time this is just waiting for the poll call to return: 

    But running top gives:

      PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                       
     1375 root      20   0    9620   1100   1024 R 94.2  0.4   0:41.61 dspwatch    << this is the simple test app                                                  
     1380 root      20   0    3036   1776   1388 R  3.4  0.7   0:00.64 top                                                           
      749 root      20   0    2292    836    748 S  0.9  0.3   7:59.32 rngd     

    This is what I cannot explain.

    I see no noticeable change in the CPU usage between using select, poll or epoll
    It also does not change if I add/remove polling for the GPIO file descriptor or remove the transmit of the outgoing packet

    Thanks for any suggestions.

    All the best,

    Richard