This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

high cpu workload issue with 3.10.00.19 DVSDK and vanilla 2.6.32 kernel

Hi,

After upgrading dvsdk from 2.10.01.18 to 3.10.00.19 and kernel from montavista 2.6.18 to vanilla 2.6.32, we observed the issue that when running the same application, the cpu workload on the new system is much higher almost twice than the cpu workload when running on the old system(2.10.01.18 dvsdk and 2.6.18 kernel).

Anyone has any idea or sugguestion?

  • I am not sure under which use case you are checking the CPU load ?

    If it is related to H.264 encoder, There was an issue with more edma error interrupts being generated which used to have a hot on ARM ultilization. This is fixed in the new release.  http://software-dl.ti.com/dsps/dsps_public_sw/codecs/DM36x/index_FDS.html, H.264 Encoder, Version 2.10.00.09 and beyond.

    Can you confirm the codec version being used ?

    regards

    Yashwant

  • Hi Yashwant,

    thanks for your response.

    we are using "top" command to check the CPU load.

    Codec version is h264-02.20.00.05.

    Do you want to get more information from me?

    This issue is very critical for us. thanks so lot.

  • Hi William,

    Which process/thread is taking higher load when compared to previous version and what is the difference ?

    regards

    Yashwant.

  •  

     

    Hi Yashwant,

    from our observation, it looks like memcpy taking higher load. please refer to below information.

    New SDK/Kernel

    # opreport -l
    samples  %        app name                 symbol name
    12427    25.9463  libc-2.8.so              memcpy

    1980      4.1340  vmlinux                  schedule
    1746      3.6455  vmlinux                  __do_softirq
    1287      2.6871  vmlinux                  csum_partial_copy_from_user
    990       2.0670  vmlinux                  emac_poll
    928       1.9376  wis-streamer             UCServerMediaSubsession::doRTPSend(int, unsigned char*, int, unsigned char*, int)
    778       1.6244  vmlinux                  ip_append_data
    713       1.4887  vmlinux                  udp_sendmsg
    698       1.4574  vmlinux                  sub_preempt_count
    683       1.4260  libc-2.8.so              memset
    588       1.2277  vmlinux                  add_preempt_count
    587       1.2256  vmlinux                  fget_light
    503       1.0502  vmlinux                  handle_IRQ_event
    499       1.0419  vmlinux                  default_idle
    489       1.0210  vmlinux                  net_rx_action
    454       0.9479  vmlinux                  vector_swi
    451       0.9416  vmlinux                  ip_push_pending_frames
    398       0.8310  vmlinux                  dev_queue_xmit
    367       0.7663  vmlinux                  udp_push_pending_frames
    332       0.6932  vmlinux                  emac_dev_xmit
    331       0.6911  vmlinux                  __mutex_lock_slowpath
    320       0.6681  vmlinux                  ip_finish_output

    Old SDK/Kernel:

    # opreport -l
    samples  %        app name                 symbol name
    27223    46.4350  vmlinux-720p             default_idle

    6896     11.7627  encode_stream            /stream/bin/encode_stream
    3337     
    5.6920  libc-2.5.90.so           memcpy
    1148      1.9582  vmlinux-720p             csum_partial_copy_from_user
    1136      1.9377  wis-streamer             /stream/bin/wis-streamer
    1048      1.7876  vmlinux-720p             __schedule
    731       1.2469  vmlinux-720p             __delay
    643       1.0968  vmlinux-720p             emac_dev_tx
    495       0.8443  vmlinux-720p             fget_light
    462       0.7880  vmlinux-720p             vector_swi
    440       0.7505  vmlinux-720p             udp_sendmsg
    427       0.7283  vmlinux-720p             ip_append_data
    398       0.6789  vmlinux-720p             ip_push_pending_frames
    363       0.6192  cmemk                    /cmemk
    344       0.5868  libpthread-2.5.90.so     pthread_mutex_lock
    287       0.4895  vmlinux-720p             emac_tx_bdproc
    275       0.4691  vmlinux-720p             arm926_dma_clean_range
    252       0.4298  vmlinux-720p             dev_queue_xmit
    246       0.4196  libc-2.5.90.so           memset

     

  • William,

    Is it possible to find which component is calling this memcpy ? I will move this thread to linux forum as it does seems to be a codec issue.

    regards

    Yashwant

     

  • I’ve figured out that the memcpy workload so high is because that I’m copying unaligned data. I’ve fixed this issue. Now the bottleneck seems to lie in kernel.

     

    Here’s the general report, when sending 18Mbps data, it seems that the kernel consumes most of the time slices.

    [root@localhost ]# opreport

        44124 70.2858 vmlinux

         6129  9.7630 libc-2.8.so

         5857  9.3297 encode_stream

         2867  4.5669 wis-streamer

         2315  3.6876 libpthread-2.8.so

          411  0.6547 cmemk

          301  0.4795 web_server

          300  0.4779 libgcc_s.so.1

           83  0.1322 dm365mmap

           76  0.1211 dm365_vpbe

           50  0.0796 dm365_vpfe

     

    The report on kernel shows that schedule and __do_softirq is the culprit.

    [root@localhost ]# opreport -l -t 1 /vmlinux

    samples  %        symbol name

    3144      7.1254  schedule

    2769      6.2755  __do_softirq

    2153      4.8794  csum_partial_copy_from_user

    1676      3.7984  emac_poll

    1285      2.9122  ip_append_data

    1067      2.4182  udp_sendmsg

    1065      2.4137  sub_preempt_count

    1022      2.3162  add_preempt_count

    931       2.1100  fget_light

    758       1.7179  net_rx_action

    752       1.7043  ip_push_pending_frames

    730       1.6544  dev_queue_xmit

    703       1.5932  handle_IRQ_event

    691       1.5660  vector_swi

    617       1.3983  sys_sendmsg

    612       1.3870  __copy_from_user

     

    This is the top statistic, kernel space and softirq workload is very high.

    Cpu(s): 30.8%us, 49.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  1.8%hi, 18.0%si,  0.0%st

     

    I also checked the number of network interrupts during 10 senconds:

    All the interrupts

     

    new

    old

    23623

    8713

    23798

    9497

    23852

    9339

    24655

    8848

     

     

    eth0 interrupts

     

    new

    old

    16691

    1101

    16530

    1168

    16768

    1207

    16479

    1103

    When sending the same amount of data(18M), the network interrupts in new system is over 10 times then the interrupts on the old system. I don’t know is the any problem in the NIC driver?

  • Hi William,

    I am facing similar issue. I am using dvsdk 4.02.00.06. The H.264 encoder is taking around 25% of the CPU time.

    Out of which 12% to 15% is used for actual encoding purpose at VGA resolution and 2 Mbps bit rate and 10% to 12% is used to send the encoded data over the network.

    I am also using memcopy while sending the data over the network. Can you please tell me how did you fixed this issue?

    And have you fixed the encoder issue as well? Is there any patch we need to apply here?

    Thanks and Regards,

    Avneet

  • Hi,

    Not sure whether this would help, but we had observed higher cpu load when we upgraded IPNC to 2.6.37 from 2.6.18. Adding 'highres=off' and 'nohz=off' to the bootargs had significantly reduced the cpu load. 

    Akshita K A