This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

USB Issue in the 3.14 kernel with AM335x EVMSK board

Hi Experts,

During the execution of Linux kernel version 3.14 with the USB device connected to the AM335x EVMSK board, suddenly the kernel crashes with the following log.

[  225.367465] omap_i2c 44e0b000.i2c: controller timed out
[  226.667472] omap_i2c 44e0b000.i2c: controller timed out

[  284.237116] [sched_delayed] sched: RT throttling activated
[  286.506988] omap_i2c 44e0b000.i2c: controller timed out

  • Any response ??
  • Hi lyf,

    Could you post me which way you are compile kernel, because I build and execute kernel version 3.14 without any problem.

    BR
    Ivan
  • Hi Ivan,

    I am compiling as per the link processors.wiki.ti.com/index.php

    This issue occurs when I stress the USB device transfers. Normally its working fine.
    During peak data transfer this issue is specifically obtained.
  • Hi Ivan,

    Also this issue seems to bit common in the Beagle Bone Black which uses the same Am335x.

    Kindly find the link

    groups.google.com/forum
    groups.google.com/forum

    It would be great if you could help me to sort this out.
  • Hi Lyf,

    I can't recreate the issue. Could you post all step by step for compiling kernel, insert module and specify this stress transfer.

    BR
    Ivan
  • Hi Ivan,

    I enabled the musb related configurations and configured in the Host only mode.

    Then do the active data transfer like read/write into any test device which will be powered from the Host. It will be more appropriate if it is connected through a Hub..
  • We are getting this same issue with stock EZSDK 8.00.00.00 kernel.  It seems the RT throttling is killing the I2C and USB transfers for a time.

    We are trying to using the USB with a hub + 2 video capture dongles.

    I am going to try with pre-emptible kernel, but I think this issue comes up due to selecting SMP kernel...

  • I will confirm, switching to preemptible kernel did stop the RT throttling messages. Also, I just noted that trying dual video stream with TI CPPI DMA engine locks up the system. Using the Inventra DMA engine seems to work, if a bit choppy.
  • Hi David,

    Thanks for posting here with add on informations.
  • OK, I made several changes to our kernel config and now I am happy with the performance.

    First: Changed the default CPU frequency governor to "performance" (instead of "ondemand"). It seems like changing frequencies while the USB is under load causes hiccups on the USB. Or, maybe it was something to do with SGX driver, since we are using both USB and SGX driver here.

    Second: Changed kernel config to "Preemptible Kernel" instead of "Voluntary Preemption."

    Third: Unchecked "SMP" kernel.

    Fourth (only matters if using two video streams): Modified MUSB FIFO Config to provide 2 4KB endpoints.

    I am running my test with TI's CPPI DMA and it's working well right now. I haven't seen the mentioned issues since making these changes.
  • David,

    Thanks for the update. Have you tried to not change the 2nd, but only the other three? I'd like to understand if the preemptible is relevant.
  • Honestly, I have no idea what's going on. There's definitely something broken.

    When I switched back to "Voluntary Preemption" our UI performance dropped dramatically. I switched back to Preemptible, and the UI starts up OK, but if I swipe the interface several times, the Qt application starts loading 60%+ of the CPU or higher and becomes unusable. So, I think this means preemptible kernel definitely has an overall effect on the system performance.

    Regarding the UI, we never had this kind of behavior on the 3.2 kernel and SGX 04.09.00.01.

    I also had enabled the 2 CAN busses, and while they were working, I noticed the kernel's soft IRQs were taking about 50% of the CPU. These went away as soon as I turned off (ifconfig can0 down) the CAN bus. The application was already turned off, but the sirqs didn't go away until shutting down CAN.
  • Hi David/Luf,

    Yesterday I made more that 15 copied through USB for about 2GB files and had not any problem. I made only one changes in .config file that I choose the usb host driver to be in a part of kernel space. To escalate the problem I want to be reproduce to can explain what happens. I tested on my EVM TI platform. Could you explain which way to reproduce the issue.

    BR
    Ivan
  • As for me, I've identified a separate issue with Qt. When the application gets slow, I see many repeated futex calls to the same location in strace. When the application is running reasonably well, these repeated calls do not happen (instead, we just get 1 call at a time).

    I've heard of this being caused by Kernel changes, libc changes, etc. So I'm going to check and see if it's worth while to use SDK 8's toolchain (I've made a lot of additions to ours from SDK 6) and RFS, and rebuild everything.

    It will take some time before I can get back to the USB specific issue.
  • Hello Ivan,

    Something I remembered made me think of this. I tended to see DMA issues specifically when connecting more than one high-bandwidth device through an external USB hub. Are you doing this?

    Luf suggested you should test through an external hub, too, so I just wanted to mention it as well for your test setup.

    I haven't seen the RT throttling issues directly related to the USB system on our board since updating everything to the new RFS/Toolchain.
  • Thats great work David.

    I would like stress more on the HIGH power consuming device. Hub and modem is the typical example I believe.

  • Hello,

    I do not get the RT Throttling messages, but I am able to consistently get system slow downs with high CPU utilization.  I'm not sure if this is due to MUSB system.  Here are the steps to reproduce:

    1) Hook up UVC camera to USB (does not have to be through hub, I've duplicated the problem with a direct connection)

    2) launch UVC stream with gstreamer: gst-launch-0.10 v4l2src device=/dev/video0 ! image/jpeg,width=320 ! jpegdec ! ffmpegcolorspace ! fbdevsink &

    3) Run top, monitor CPU utilization.  In my case typical utilization looks like ~40% for gst-launch, 0% for just about everything else

    4) Wait about 1 hour, usually the slow down occurs between 30-45 minutes, though I had it happen right around 1 hour once.  When this happens, I had 78% CPU utilization for gstreamer, 16% for top, and 2% for ksoftirqd.  The sirq value at the top was reading 17%.

    I was able to narrow this down to gstreamer only, so I've ruled out the SGX drivers.  Only factors should be MUSB, V4L2/UVC driver, gstreamer.  I've checked and both gstreamer-0.10 and gstreamer-1.0 have the same issue.

    I have had this issue with both TI CPPI DMA and PIO only mode.  When I run strace, I see that gstreamer is trying repeatedly to open the stream.  When I look at tracers for softirq_entry, I see v4l2src is getting a ton of repeated irqs, which do not normally show up.  I've attached various logs/traces I've captured during testing.

    Thanks,

    Dave

    5775.gst_v4l2_high_cpu_logs.zip

  • Hello,

    David Paden said:
    gst-launch-0.10 v4l2src device=/dev/video0 ! image/jpeg,width=320 ! jpegdec ! ffmpegcolorspace ! fbdevsink &

    I am not aware with your board and the SDK that you are using.

    You are using ffmpegcolorspace, this element is a software element which  converts video from one colorspace to another, so it is normal to observe  high CPU load. 

    Could you check  this pipeline, are you will observe the same issue:

    gst-launch-0.10 v4l2src device=/dev/video0 ! image/jpeg,width=320 ! jpegdec ! fakesink silent=true

    I would recommend you to check the caps image/jpeg,width=320, is there a  particular reason are not passing the height in the caps?

    Best Regards,

    Margarita

  • Hi Margarita,

    I just tested with BeagleBone Black (Rev C) and TI EZSDK 8.00.00.00 (used files from the "bin" tarball).  Same issue.  I tested after killing matrix_browser and selecting performance governor (echo performance > /sys/bus/cpu/devices/cpu0/cpufreq/scaling_governor).

    Gstreamer is smart enough to figure out the aspect ratio of the source stream.  The uvc adapter supports 640x480, 320x240, and 160x120, so specifying width=320 is enough for it to infer height=240.

    Normally, I get CPU usage of around 30% usr and 0% sirq while running this command.  When the issue occurs (an hour or so after I start gstreamer), the CPU usage jumps to 57% usr and 26% sirq, and only then does the console (and other processes) become slow.

    I ran the command with fakesink you provided.  Issue occurred in less than 5 minutes.  Started at 7% usr, 0% sys, 0% sirq.  When the slow-down occured, I had 56% usr, 15% sys, 27% sirq.

    It is NOT the normal CPU usage of gstreamer that's the issue, it's the CPU usage after some event occurs while gstreamer is running.

    Note that we tried the Debian image from BeagleBoard community.  We did NOT see the slow-down issue with their kernel+rfs.

    Am about to try the EZSDK 8 zip file listed as "Only used when creating an SD card on Windows".  My colleague indicated using this zip file he does not get the slow down issue.  I used the tarball and extracted the files to the SD card in Linux.

  • Alright, so the issue still occurs with the zip file version, so that was a dead lead.

    But I found the difference between my test and my colleague's: performance governor. If the system is set to "performance" the system will slow down during testing. If set to "ondemand" (the default in EZSDK 8), the problem did not occur over a weekend test. I have reached over 1 hour of testing in "ondemand" mode and it hasn't slowed down yet.

    I need to go back and do more testing with the BeagleBoard/Debian kernel. We can make this kind of behavior happen with different applications, not just streaming video over USB, so we are worried there is a deeper kernel issue here.
  • David,

    We've reproduced this issue and we believe you are correct. There appears to be an issue, possibly with the cpuidle subsystem. It is very apparent when using the performance governor.

    If you'd like to try something that we are trying as a validation/root cause, you can simply rename the PM firmware, in /lib/firmware, and run without it. In our testing, this makes the problem go away.

    We are working on a better workaround patch, and of course, a permanent solution. We will continue to monitor this thread and post updates as we have them.

    I'm sorry you've hit this issue and we certainly want to get it corrected ASAP. I also don't want you to lose anymore valuable development time trying to hunt it down.
  • Hi RonB,

    Glad you are able to reproduce.  I don't know if this helps, but I did confirm that the issue does not appear to show up using the BeagleBoard Debian setup, even with the performance governor set to "performance."

    I was about to build their kernel and test on our platform.  I guess the next step is to find out if it's the kernel, the PM firmware, or both that make the issue go away.

    While I will verify again without the PM firmware, our customer requires the standby function, so that won't work for our software release.

    Thanks!

  • David,

    We are still working on the correct solution. One suggestion to keep you going and not getting into the slow state, can you modify arch/arm/mach-omap2/cpuidle33xx.c and comment out the C1 state in am33xx_ddr3_states:

    //      {
    //              .exit_latency = 130,
    //              .target_residency = 200,
    //              .power_usage = 497,
    //              .flags = CPUIDLE_FLAG_TIME_VALID | AM33XX_FLAG_MPU_PLL,
    //              .enter = am33xx_enter_idle,
    //              .name = "C1",
    //              .desc = "Bypass MPU PLL",
    //      },

    This will stop the slowdown but should still allow standby/resume.

    Steve K.

  • Hi Steve,

    I also noticed that I was testing with BeagleBoard 3.8 kernel with no issue. The BeagleBoard 3.14 kernel set to performance DOES have the slow down issue. I thought I was using BeagleBoard 3.14 for my test, but I wasn't.

    Testing your patch seems to work OK so far. Will prepare to test our application with it tomorrow.

    Thanks,
    Dave P.
  • Hi!

    Is there any correct solution?

    I've got the same issue (SDK_8.0, kernel 3.14.26, our custom board and am335x Starter Kit) with high loading ethernet and/or uarts. With "performance" governor mode it totally slowing, with "ondemand" frequency switching helps system to stay alive (but slow).

    Fix issue with:

     #
     # CPU Idle
     #
    -CONFIG_CPU_IDLE=y
    -# CONFIG_CPU_IDLE_MULTIPLE_DRIVERS is not set
    -CONFIG_CPU_IDLE_GOV_LADDER=y
    -CONFIG_CPU_IDLE_GOV_MENU=y
    -
    -#
    -# ARM CPU Idle Drivers
    -#
    +# CONFIG_CPU_IDLE is not set
     # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set

  • Hi Brad, may I ask what does changing from 0xd to 0x10 in the fix really mean? Is that the same commenting out C1 state of asam33xx_ddr3_states? Your reply is greatly appreciated.
  • You're sending a different command to the M3 firmware:

           CMD_ID_CPUIDLE_V2       = 0xd,  // corresponds to a8_cpuidle_v2_handler

           CMD_ID_CPUIDLE          = 0x10, // corresponds to a8_cpuidle_handler

    Here's a snippet of the corresponding code:

    void a8_cpuidle_handler(struct cmd_data *data)
    {
            struct deep_sleep_data *local_cmd = &data->data->deep_sleep;
    
            configure_wake_sources(local_cmd->wake_sources);
    
            clkdm_sleep(CLKDM_MPU);
    }
    
    void a8_cpuidle_v2_handler(struct cmd_data *data)
    {
            struct deep_sleep_data *local_cmd = &data->data->deep_sleep;
            unsigned int per_st;
            unsigned int mpu_st;
    
            pll_bypass(DPLL_MPU);
    
            a8_i2c_sleep_handler(data->i2c_sleep_offset);
    
            configure_wake_sources(local_cmd->wake_sources);
    
            per_st = get_pd_per_stctrl_val(local_cmd);
            mpu_st = get_pd_mpu_stctrl_val(local_cmd);
    
            /* MPU power domain state change */
            pd_state_change(mpu_st, PD_MPU);
    
            /* PER power domain state change */
            pd_state_change(per_st, PD_PER);
    
            if (local_cmd->pd_mpu_state != PD_ON)
                    hwmod_disable(HWMOD_IEEE5000);
    
            clkdm_sleep(CLKDM_MPU);
    }
    

  • Hi Brad,
    After applying the patch, I don't see the system slowdown issue on my board anymore. But I have a question, if a board is configured to have M3 disabled, do we still need to apply this patch (i.e. use a different idle state available on the current wkup_m3 firmware)? Thanks alot.
  • You shouldn't hit the issue with the M3 disabled. However, in that case adding this one line patch would have no impact. I would recommend doing so in case anyone ever decides to enable the M3.
  • Thank you for the reply, Brad. Is there a way to check in "make menuconfig" or in rootfs of target platform to see if M3 is enabled or disabled? Also, are there any documents on how to disable the M3 coprocessor for am335x? Thanks.
  • Just delete the firmware and it won't get loaded.
  • So by removing .bin file from /lib/firmware, M3 coprocessor will be disabled?  Is this the only way to disable/enable M3?  Thank you.

  • Yes. You're not generally expected to disable the M3. Normally you would disable features such as cpuidle if you didn't want that capability for some reason.