This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux latency/performance issues when using Ethernet

Hi,

We are currently running into a performance issue with the DM8147 using linux DVR-RDK (2.6.37).
This prevents us from doing our periodic processing within a reasonable window of determinism. From our investigations, we have determined that the ethernet (cpsw) driver is somehow hogging the processor.

Is this a known issue when using DVR-RDK?

Details....

I've written a simplified version of our driver/userspace application that I used to reproduce the problem (code below).

The driver wakes up the userspace RT thread every 20ms. The performance is measured by driving gpio[27] in the isr and driving gpio[28] in the context of the RT thread process.

Issue:

The wake up is often delayed by multiple ms ( sometimes up to 180 ms) if we have an Ethernet cable connected. If we unconnect the Ethernet cable we can run for multiple days without any problems.

Typical Latency:
ISR: 10 us
Wake Up RT Thread: 18 us  <<< This is what gets affected by using Ethernet or not

Driver Code

static irqreturn_t isr(int irq, void *device)
{
   gpio_set_value(27, 1);
   gpio_set_value(27, 0);

   if (atomic_read(&device_data->awake) == 0) // Someone sleeping on it?
   {
      atomic_set(&device_data->awake, 1);
      wake_up_interruptible(&device_data->irqWaitq);
   }
   else
   {
      // Bad: irq was flagged by fpga no one was waiting to catch it.
      device_data->irqUnservicedCount++;
      printk("%s(): %d LOST IRQ\n", __FUNCTION__, device_data->irqUnservicedCount);
   }

   return IRQ_HANDLED;
}

 

static unsigned int device_poll(struct file *filp, struct poll_table_struct * wait)
{
   unsigned int mask = 0;

   poll_wait(filp, &device_data->irqWaitq,  wait);

   mutex_lock(&device_data->lock);

   if(atomic_read(&device_data->awake) == 1)
   {
      gpio_set_value(28, 1);
      mask |= POLLIN | POLLRDNORM;
      atomic_set(&device_data->awake, 0);
   }
   else
   {
      gpio_set_value(28, 0);
   }

   mutex_unlock(&device_data->lock);

   return mask;
}

Userspace Application

bool end_thread = false;
const char *dev_name = "/dev/irq_issue";

void *executeThread(void *arg)
{
   struct sched_param param;
   param.sched_priority = sched_get_priority_max(SCHED_FIFO);
   pthread_setschedparam( pthread_self(), SCHED_FIFO, &param);

   dev_t dev = makedev(233, 0);
   mknod(dev_name, S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH, dev);

   int fd = open(dev_name, O_RDWR);

   struct pollfd poll_fd;
   poll_fd.fd = fd;
   poll_fd.events = POLLIN;
   poll_fd.revents = 0;

   while (!end_thread)
   {
      poll(&poll_fd, 1, -1);
   };

   close(fd);

   return NULL;
}

int main(void)
{
   pthread_t thread_id;
   pthread_create(&thread_id, NULL, executeThread, NULL);
   pthread_join(thread_id, NULL);

   return 0;
}

 

 

  • Hi Stephane,

    Please see if the below wiki pages will be in help:

    http://processors.wiki.ti.com/index.php/TI81XX_PSP_04.04.00.02_Feature_Performance_Guide#Ethernet_Switch_Driver

    http://processors.wiki.ti.com/index.php/Demystifying_Ethernet_Performance

    Regards,
    Pavel

  • Thank you but the wiki pages doesn't help me.

    Currently the Ethernet interface (100 FD) is not used to transfer any data. The only data receive/transmitted is the 'network noise' so I looked at what it represent and during 20 minutes we got 3.3 interrupts/s and 484 bytes/s. So its very low traffic.

    But every now and then the network drivers seems to hold onto the CPU for a long period of time (from 25 ms up to 180 ms) and this, like I said in my initial post, prevents my thread from executing every 20 ms.

  • Stephane,

    Check if all the network/ethernet/emac related linux kernel patches are inside your linux kernel:

    http://arago-project.org/git/projects/?p=linux-dvr-rdk-dm81xx.git;a=shortlog;h=refs/heads/dvrrdk_kernel_int_branch

    See also if the below e2e thread will be in help:

    http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/252988.aspx

    Regards,
    Pavel

  • Hi,

    We are using dvrrdk_kernel_rel_04.01.00.00. I tried with the code on dvrrdk_kernel_int_branch and the problem is still present.

    Thanks,

    Stephane

  • Stephane,

    I am afraid you are hitting real-time performance limitations of stock 2.6 kernel. One thing you can try (if not tried yet) is change preemption in kernel configuration (make menuconfig->kernel features->preemption model). By default it is set to "no forced preemption". You can try voluntary preemption or low-latency desktop and see if it helps your issue without breaking something else.

    At the same time, this may or may not be sufficient. real time patches are known to deliver single digit millisecond latencies but these are not supported in the kernel that comes with DVR RDK. This is explained here:

    https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#What_are_real-time_capabilities_of_the_stock_2.6_linux_kernel.3F

    if the above does not help the only way forward I can think of is moving your periodic processing into the kernel space.

    Regards,

    Michael

  • Hi Michael,

    we have been using the CONFIG_PREEMPT mode.  Moving our processing in kernel space is not practical as we use external user space libraries.  I want to stress again that no data is actually transferred to/from our application on the ETH port.  Just plugging the network cable causes the issue.  So it is not a throughput issue.

    I have attached the complete simplified code to reproduce the problem.  You may need to provide an interrupt source to reproduce our timing constraints.

    7737.irq_issue.tar.gz

    thanks

  • Hi Robert,

    According to the article i quoted earlier in CONFIG_PREEMPT mode the linux kernel should not cause 180 msec latency. It means that the latency is likely introduced by the network driver. The driver code is found under linux/drivers/net/cpsw.c . One thing you can do is profile the code to confirm the source of the latency.

    Regards,

    Michael