This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CPU load with encodedecode pass through

Attached logs are taken from DM365 EVM board with encodedecode application ./encodedecode -y 2 -p -r 720x576.

color space UYVY planar

bootargs: mem=70M console=ttyS0,115200n8 root=/dev/nfs rw noinitrd nfsroot=10.0.0.230:/home/sudhanshu/work/evm/target video=davincifb:vid0=OFF:vid1=OFF:osd0=720x576x16,4050K dm365_imp.oper_mode=0 davinci_capture.device_type=4 davinci_enc_mngr.ch0_mode=pal davinci_enc_mngr.ch0_output=COMPOSITE davinci_enc_mngr.ch0_mode=pal

Attached log which shows CPU consumption going upto 25% for nearly 2 minutes before it went back to 0-1 %. Same is observed with top. Most of this is contributed by the display thread which is just doing DQBUF and QBUF.

Anyone who has noticed this issue and has nay suggestions ?

 

Regards,

Sudhanshu

 

  • This does seem a bit unusual, the CPU load is generally very low all the time as all the data movement is done by the capture/display hardware and all the codec processing is being done by the accelerator, I have not noticed these bumps in CPU load on my end. 

    Is this something that happens reliably every time you run or is it more intermittent?

    When it does happen is the CPU load following the same pattern or is the CPU usage more random?

    What software version are you using, DVSDK 2.10.01.18?

    If this is happening within the display thread than it sounds like it may be some form of display driver issue, the display driver datasheet shows very low CPU load for run time and should not be putting this kind of overhead onto the system, if all the display thread is truly doing in this case is queueing and dequeueing buffers.

  • Yes, my dvsdk  version is 2.10.01.18.

    It is not intermittent I tried plotting the load graph using top (top -b -d 10 -H) to list out display thread load. Its periodic (most of the times) I read around 20% CPU for display for atleast 10+ readings (10 seconds every reading) on top. It happens in nearly 40 minutes at all other times its near 0-1 %.

    I tried to isolate the problem without display its not having.

    In one run I removed DMA copies only DQBUF and QBUF is done and still I see such a spike.

    Have you tried taking log of top for hours of run ?

     

     

  • 2860.cpu_load_ntsc_pal_semiplanar.xls

     

    I did more experiment on this jump of CPU usage to 25% on display thread on DM365.

    1. Raised kernel memory to 110 MB and reduced CMEM to 18 MB

    2. Removed all other thread just display get and put is running with a constant data (no capture not even a single DMA).

    3. There are no EDMA interrupts in the run, rate at which davinci_osd interrupt is coming is also constant 

    4. Display is running in interlaced mode

     I recorded the result of top for various configurations:

    Following are the set of configurations:

    1. NTSC + YUV 4:2:0  -- average ARM load 3-4 % no spike

    2. NTSC + YUV 4:2:2 -- average ARM load 3-4 % no spike

    3. PAL + YUV 4:2:0 -- average ARM load 0-1 %, spike of ARM load to 25% for nearly a minute, it happens every 40-60 minutes

    4. PAL + YUV 4:2:2 -- average ARM load 0-1 %, spike of ARM load to 25% for nearly a minute, it happens every 40-60 minutes

    I am not sure is the driver/kernel having some problem related to PAL or is it just the rate at which interrupt is coming 30 FPS for NTSC  as opposed to 25 FPS of PAL.

    Please find the attached xls as the report of CPU usage in %age for all 4 configurations.

    Has any one observed such a difference in CPU loads from PAL to NTSC ?

    This CPU usage can be critical for my use case with all other processes involved in the system. Any help/inputs to debug are most welcome.

    -Sudhanshu

     

  • when you say you removed all other threads, does this mean the system is not doing any encode or decode (as in original encodedecode demo you started with).  If so, then this would seem like a problem with perhaps the Linux driver.  If you are doing encode and decode, then this can be the codecs as well. 

    Nevertheless, your findings are very interesting.

  • Juan,

     

    Yes there is no encode/decode nothing much other than ssh, top, and just display thread doing QBUF and DQBUF.

    I have verified sysfs settings for PAL and NTSC.

    Also, there are no EDMA completion interrupts in the run so its not related to DMA as well.

    I will try to post the snippet code as well, did anyone else notice such a behavior ?

     

    -Sudhasnhu

  • I was busy with other issues, so I was not able to post a small application which causes the bug.

    The following code can be saved as  v4l2_userptr_loopback.c (I have cut down original PSP application to bare minimum 250 odd lines)

    This is run on DM365 EVM, there is nothing running on capture. 

    I see spikes of 25% CPU usage,  please try running it and capture top logs for an hour. 

     

    /****************************************************************************

     * This application is taken from PSP 02.10.00.14. Parts of application are cut down

     * to just run DQBUF and QBUF for COMPOSITE,PAL, YUV 422 mode.

     * This helps in seeing the spike in CPU usage because of display.

     * DVSDK version: 2.10.01.18

     *

     * Build steps

     * arm_v5t_le-gcc  -c v4l2_userptr_loopback.c

     * arm_v5t_le-gcc -o v4l2_userptr_loopback v4l2_userptr_loopback.o

     * **************************************************************************/

     

    /*

     * Test setup

     * ***********************************************************************************************************

     * Boot Args:

     * setenv bootargs mem=70M console=ttyS0,115200n8 root=/dev/nfs rw noinitrd nfsroot=<NFS Server>:<NFS root> video=davincifb:vid0=OFF:vid1=OFF:osd0=720x576x16,4050K dm365_imp.oper_mode=0 davinci_capture.device_type=4 davinci_enc_mngr.ch0_mode=pal davinci_enc_mngr.ch0_output=COMPOSITE davinci_enc_mngr.ch0_mode=pal ip=10.0.0.233:10.0.0.230:10.0.0.1:255.255.255.0:dart::off

     * 

     * ************************************************************************************************************

     * Run loadmodules.sh

     * Contents of loadmodules.sh

      #!/bin/sh

      rmmod cmemk 2>/dev/null

      rmmod irqk 2>/dev/null

      rmmod edmak 2>/dev/null

      rmmod dm365mmap 2>/dev/null

     

      # Pools configuration

      insmod cmemk.ko phys_start=0x85000000 phys_end=0x88000000 pools=6x4096,2x8192,1x11908,2x13184,1x2697152,6x4096,1x30720,3x81920,1x3185664,64x56,1x320,1x640,1x81920,1x6650880,2x608,1x296,1x28,2x24,1x154288,10x1658880 allowOverlap=1 phys_start_1=0x00001000 phys_end_1=0x00008000 pools_1=1x28672

      insmod irqk.ko 

      insmod edmak.ko

      insmod dm365mmap.ko

     *

     *

     * run v4l2_userptr_loopback

     ************************************************************************************************************

     * Capture output of top

     *

     *

     * Linux bash shell on PC --->

     *

     * script evm_display_top.log

     * ssh root@<board_ip>

     *

     * on ssh:

     * top -b -d 1

     *

     * This command will generate a log file evm_display_top.log. Keep the board in this state for few hours.

     * grep for v4l2_userptr_loopback on this file for CPU usage, copy and plot graph using Open Office/MS Excel

     *

     * mailto: sudhanshu.saxena@gmail.com (for any questions/help)

     */

     

    /*******************************************************************************

     * HEADER FILES

     */

    #include <stdio.h>

    #include <string.h>

    #include <stdlib.h>

    #include <sys/ioctl.h>

    #include <fcntl.h>

    #include <linux/fb.h>

    #include <asm/types.h>

     

    /* Kernel header file, prefix path comes from makefile */

    #include <linux/videodev2.h>

    #include <linux/videodev.h>

     

    /*******************************************************************************

     * LOCAL DEFINES

     */

    #define WIDTH_PAL 720

    #define HEIGHT_PAL 576

    #define MIN_BUFFERS 4

     

     

    /* Device parameters */

    #define VID0_DEVICE "/dev/video2"

    #define OSD0_DEVICE "/dev/fb/0"

    #define OSD1_DEVICE "/dev/fb/2"

     

    /* Function error codes */

    #define SUCCESS 0

    #define FAILURE -1

     

    #define DISPLAY_INTERFACE_COMPOSITE "COMPOSITE"

    #define DISPLAY_MODE_PAL "PAL"

     

    #define word_align(address) (((((unsigned int)address) & ~(0x3)) + 4))

     

    #define VIDEO_NUM_BUFS 4

     

    int *buffers[VIDEO_NUM_BUFS];

     

    /* Standards and output information */

    #define ATTRIB_MODE "mode"

    #define ATTRIB_OUTPUT "output"

     

    /*******************************************************************************

     * FILE GLOBALS

     */

     

    static int fd_vid, fd_osd0, fd_osd1;

    static struct v4l2_requestbuffers reqbuf;

    static int numbuffers = VIDEO_NUM_BUFS;

    int display_image_size = 0;

     

     

    static int put_display_buffer(int, int);

    static int get_display_buffer(int);

    static int start_display(int, int, int);

    static int init_vid_device();

    static int change_sysfs_attrib(char *attribute, char *value);

     

    /*******************************************************************************

     * FUNCTION DEFINITIONS

     */

     

    /*******************************************************************************

     * Takes the index

     * of the buffer, and QUEUEs the buffer to display

     */

    static int put_display_buffer(int vid_win, int idx)

    {

    struct v4l2_buffer buf;

    int i = 0;

    int ret;

    memset(&buf, 0, sizeof(buf));

     

    buf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;

    buf.memory = V4L2_MEMORY_USERPTR;

    buf.index = idx;

    buf.length = display_image_size;

    buf.m.userptr = (unsigned long)buffers[idx];

    ret = ioctl(vid_win, VIDIOC_QBUF, &buf);

    return ret;

    }

     

    /*******************************************************************************

     * Does a DEQUEUE and gets/returns the address of the

     * dequeued buffer

     */

    static int get_display_buffer(int vid_win)

    {

    int ret;

    struct v4l2_buffer buf;

    memset(&buf, 0, sizeof(buf));

    buf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;

    ret = ioctl(vid_win, VIDIOC_DQBUF, &buf);

    if (ret < 0) {

    perror("VIDIOC_DQBUF\n");

    return -1;

    }

    return buf.index;

    }

     

     

    /******************************************************************************/

    static int init_vid_device()

    {

    int mode = O_RDWR;

    int i = 0, ret = 0;

     

    struct v4l2_format fmt, setfmt;

    struct v4l2_fmtdesc format;

    struct v4l2_capability capability;

        if (change_sysfs_attrib(ATTRIB_OUTPUT, DISPLAY_INTERFACE_COMPOSITE))

    return FAILURE;

    if (change_sysfs_attrib(ATTRIB_MODE, DISPLAY_MODE_PAL))

    return FAILURE;

     

    /* open osd0, osd1 devices and disable */

    fd_osd0 = open(OSD0_DEVICE, mode);

    ioctl(fd_osd0, FBIOBLANK, 1);

     

    fd_osd1 = open(OSD1_DEVICE, mode);

    ioctl(fd_osd1, FBIOBLANK, 1);

     

    fd_vid = open(VID0_DEVICE, mode);

    if (-1 == fd_vid) {

    printf("failed to open VID1 display device\n");

    return -1;

    }

    printf("done\n");

     

    reqbuf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;

    reqbuf.count = numbuffers;

    reqbuf.memory = V4L2_MEMORY_USERPTR;

    ret = ioctl(fd_vid, VIDIOC_REQBUFS, &reqbuf);

    if (ret < 0) {

    return -1;

    }    

     

     

        /*allocate buffers*/

        {

            int i = 0;

            for(i=0; i<VIDEO_NUM_BUFS; i++)

                buffers[i]=(int *)malloc(736*576*2);

                buffers[i] = (int *)word_align(buffers[i]);

        }

     

     

        memset(&setfmt, 0x00, sizeof(struct v4l2_format));

    setfmt.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;

    setfmt.fmt.pix.pixelformat = V4L2_PIX_FMT_UYVY;

    setfmt.fmt.pix.bytesperline = WIDTH_PAL * 2;

    setfmt.fmt.pix.sizeimage = setfmt.fmt.pix.bytesperline * HEIGHT_PAL;

    setfmt.fmt.pix.field = V4L2_FIELD_ANY;

     

    display_image_size = setfmt.fmt.pix.sizeimage; 

    ret = ioctl(fd_vid, VIDIOC_S_FMT, &setfmt);

    if (ret < 0) {

    perror("VIDIOC_S_FMT\n");

    close(fd_vid);

    return -1;

    }

     

        /* PRIME PUT VIDEO_NUM_BUFS BUFFERS AND STREAMON */

        {

       struct v4l2_buffer buf;

       enum v4l2_buf_type type;

            int idx = 0;

     

            for(idx=0;idx<VIDEO_NUM_BUFS;idx++){

            bzero(&buf, sizeof(buf));

                buf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;

           buf.memory = V4L2_MEMORY_USERPTR;

             buf.index = idx;

           buf.length = display_image_size;

            buf.m.userptr = (unsigned long)buffers[idx];

     

            ret = ioctl(fd_vid, VIDIOC_QBUF, &buf);

             if (ret < 0) {

              printf("QBUF failed fd = %d\n", fd_vid);

             return -1;

           }else{

                    printf("primed buffer number [%d] \n", idx);

                }

             }

     

            type = V4L2_BUF_TYPE_VIDEO_OUTPUT;

           ret = ioctl(fd_vid, VIDIOC_STREAMON, &type);

             if (ret < 0) {

            perror("VIDIOC_STREAMON\n");

           return -1;

            }   

     

        }

    return SUCCESS;

    }

     

     

    static int change_sysfs_attrib(char *attribute, char *value)

    {

    int sysfd = -1;

    char init_val[32];

    char attrib_tag[128];

     

    bzero(init_val, sizeof(init_val));

    strcpy(attrib_tag, "/sys/class/davinci_display/ch0/");

    strcat(attrib_tag, attribute);

     

    sysfd = open(attrib_tag, O_RDWR);

    if (!sysfd) {

    printf("Error: cannot open %d\n", sysfd);

    return FAILURE;

    }

    printf("%s was opened successfully\n", attrib_tag);

     

    read(sysfd, init_val, 32);

    lseek(sysfd, 0, SEEK_SET);

    printf("Current %s value is %s\n", attribute, init_val);

     

    write(sysfd, value, 1 + strlen(value));

    lseek(sysfd, 0, SEEK_SET);

     

    memset(init_val, '\0', 32);

    read(sysfd, init_val, 32);

    lseek(sysfd, 0, SEEK_SET);

    printf("Changed %s to %s\n", attribute, init_val);

     

    close(sysfd);

    return SUCCESS;

    }

     

     

    /******************************************************************************/

    /* main function */

    int main(int argc, char *argv[])

    {

        int display_index = 0;

        int ret;

        int count=VIDEO_NUM_BUFS;

     

        init_vid_device();  /* change sysfs, open video, prime with VIDEO_NUM_BUFS buffers */

        while(1){

    display_index = get_display_buffer(fd_vid);

    if (display_index < 0) {

    printf("Error in getting the  display buffer:VID1\n");

    return ret;

    }

     

            ret = put_display_buffer(fd_vid, display_index);

    if (ret < 0) {

    printf("Error in putting the display buffer\n");

    return ret;

    }

            count++;

            if ( !(count%15000) ){

                printf("buffers put count [%d] \n", count);

            }

     

        }        

        return ret;

    }

  • Hi Sudhanshu,

    I am also seeing the same behaviour with video capture driver (DQBUF and QBUF only) on 2.6.18 MVL kernel on DM6467. Did you find solution for this issue? If so, please share.

    This is also posted at http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/100/t/131644.aspx However, no one has replied.

    Regards,

    Santosh

     

     

     

  • Hi Santosh

    In DM368 we solved this problem by opening v4l2 device in nonblocking mode and queueing/dequeueing buffers every 40ms (for pal frame rate).

    Regards

    George

  • Hi Santosh,

    I am no longer working on TI Davinci boards. I have forwarded the query to relevant members in that team.

    My understanding is this issue keeps coming on DM-365, DM-368 boards. We could not find a resolve just lived with the CPU spike reducing all other CPU consumption to have tolerance.

    Regards,

    Sudhanshu

  • I am also running it successfully in the way George does. Jinh T.

  • Hi All,

    Thanks for the solution. I will also try with this.

    Regards,

    Santosh