This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OMAP3530 lock up problem

We are having a problem with our device - a VoIP desk phone based on OMAP 3530.  We are running TI's Android 2.3 ver 1 (Linux 2.6.32) from the Rowboat project.

Out of the blue, the OMAP will simply lock up.  By "lock up", I mean that it will stop executing instructions, stop responding to JTAG.  Here are some salient features of the lock up failure:

  1. No messages are written to serial console, no kernel panic, nothing.  The serial port is completely unresponsive.  One cannot use sysreq to send commands to the kernel.

  2. We are unable to use the JTAG interface to halt the processor.  We are using TI Code Composer with a Blackhawk XDS100v2 JTAG emulator to access the JTAG interface.  Prior to the failure it is possible to halt the processor and look at registers, and then restart the processor.  Once the failure happens we get this message connecting to the target:

    Trouble Halting Target CPU:
    Error 0x80001820/-2062
    Fatal Error during: Execution, Timeout, Target,
    Cannot halt the processor


  3. The display remains frozen with the screen is lit up and showing whatever was on it when the failure happened, in other words phone's DRAM (framebuffer) is alive.

  4. No process is now running in LinuxWe have a very simple program that writes a "0" to /dev/watchdog every 30 seconds.  The purpose of this program is to reboot the phone after the lockup.  On lockup, the process doesn't run anymore, and eventually OMAP's Watchdog Timer triggers a Reset.

  5. We have an app that can trigger the lockup.  This app scrolls images thumbnails left/right quickly and the lockup happens in less than a minute.  However, when we ran this app on an older version of our software based on Android Eclair (Linux 2.6.29) it took about 4-5 hours before the lock up happened.

  6. We are using the PowerVR/SGX library.  However, the lock up occurs even on a build that doesn't use these libraries.

  7. We are not using the OMAP 3530's DSP.  In fact, our build uses software acceleration for video playback.

  8. Other than the app that scrolls images, we can make the failure occur in several ways all of which involve using the screen.  Phones will sometimes (although rarely) lock up even when no one is using the phone (the clock on the screen shows minutes so there is some screen activity even when the phone is idle).

  9. From user reports, we feel strongly that there is a correlation of the lock up with heavier phone usage.

So far, we have not been able to figure out why such a lock up would happen and how to debug it given that JTAG isn't available

Any help on this would be great appreciated.

Thanks.

  • Hi Ravin,

    I would like to advise you to try some workarounds but before to do something you should using the last version of the CCS to ensure that this is not a bug of CCS.

    1. Try to forbid to cpu to go to sleep using the command:

    echo nosleep > /sys/power/wake_lock

    2. Check the available governors by cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors command and if the performance presents set it.

    echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

    This would solve the problem but if the problem still exists it would be needed some logs and more detailed debugging.

    BR

    Tsvetolin Shulev

  • Hi Raven

     

    We have exactly the same issue with the OMAP 3530, we run a JAVA application on Linux 2.6.32, when it crash we cannot halt the CPU and the video freeze. But it take 1 to 5 days to get a crash, so it is very difficult de debug with such a long time to get feedback and see if any changes has posivite impact.

    Have you been able to find the reason of the crash?

    We would like to write an application like you did to accelerate crash occurence. Can you give me a little more information about how you perform the scroll. Do you display an image than the next one etc, or do you really scroll pixel colum by pixel colum in order to move to the next image?

    Thanks

     

    Rejean  

  • Hi Rejean,

    We have two kinds of apps that exhibit the failure.  One is an Android app, where the failure occurs in as little as 2 sec.

    The other is a native app.  Here the failure occurs in 3-4 hours consistently (we had 170 failures on 12 devices over 48 hr period).

    Is your device based on Android?  I'd be happy to provide you either app. 

    Please provide a description of your device.

    Thanks.

  • Hi Tsvetolin,

    We've tried both suggestions as well as Code Composer 5.2.1.00018 (earlier we were using CC 4).

    echo nosleep > /sys/power/wake_lock
    echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

    The lockup still happens and we can't access the processor through JTAG after the lockup.

    Thanks.

  • Hi Ravin

    We use JAVA aplication running on Linux, thus we are not compatible with Android. There must be a way to import your application in our system. Is your Android app is JAVA?

     

    Thanks for your reply, iit is greattly appreciated to see that we are not alone with this bug.

     

    Rejean

  • Hi Ravin

     

    Hi Ravin

    It shall be possible to get the content of the OMAP embeded trace buffer by doing direct access through DAP (Direct Access Port), The DAP access directly all register without the need of the CPU. Refer to http://processors.wiki.ti.com/index.php/Debug_Access_Port_(DAP)  for more info. CPU does not need to be halted to read the trace buffer. Have you try this?

    In the technical reference Manual of the OMAp  25.6.5 there is a section  25.6.5 showing the mapping for the ETB (embedded trace buffer) stuff on the ARM, they clearly state that the ETB is accessible through DAP.

    I try to connect to DAP on a working unit, it works, I can read all registers, even memory, but I cannot read the ETB stuff, there must be a missing configuration to enable trace buffer….i still need to investigate why.

    About your program who accelerate crash occurrence, it would be very useful for me to get the code since has soon I will be able to read the ETB, I will want many crash to collect and analyses data. Is it still possible for you to send me both sources codes (native or Android), I don't know how we can exchange this information using this Forum, it seems there is no possibility to email something directly to you like we use to do with other Forum.

    Thanks

    Rejean

     

  • Rejean,

    Please email me at ravin.suri@cloudtc.com and I will send you the source code for the programs.

    Thanks,

    Ravin

  • Did you ever figure out the cause of the lock up?

    Do you have CONFIG_OMAP_32K_TIMER=y ?

    If yes, then try using the MPU timer instead (CONFIG_OMAP_MPU_TIMER).