This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OMAP L-138 Internal Reboot

Other Parts Discussed in Thread: OMAP-L138

We've been working on a very tough problem over the last few days.  We're running applications on both the DSP and ARM9 with shared memory communication.  The application will crash after between 10 minutes to 1+ hrs.  When the crash occurs, the DSP will be held in reset and the ARM9 will be stuck on the data access error interrupt.  The bizarre thing we're seeing is that our proprietary bootloader appears to have been loaded into shared RAM by the TI ROM bootloader.   The ARM9 MMU translation table is in the shared ram that gets hit.  We're monitoring the reset input pin and it is not causing the issue.  We also verifed timer1, which can be configured as a watchdog if corrupted is in tact, all zeroed out because it's not used. Any ideas of what else could cause the ROM bootloader to run without reset toggle?  Also, all ARM9 processor registers appear to be in tact, eliminating external triggered reset as the cause.

At this point we're not even sure which core is the initial trigger, or if it is a software sourced problem.  We're adding monitoring of the input voltages to see if there's any transient noise present when the problem occurs.

Any ideas would be greatly appreciated.

Thanks,

Jim

  • Dear James,
    I presume that you are not using Linux OS.
    To narrow down the problem, try to use simple DSP blink code on DSP core and use the actual app on ARM core.
    Also, try to run simple LED code on ARM core and run actual code on DSP core.
    Try to run simple LED code on both ARM & DSP cores to make sure that no HW issues.

    I think, it would be SW problem.
  • The power rails are certainly the first place I would check.  Try to measure as close to the OMAP-L138 as possible.

    Systematically simplifying your applications is another useful method.  It will likely be a slow process given the length of time it takes to see a failure.  If possible, I recommend testing several boards in parallel.  That way, when you make a change (e.g. eliminating some piece of your software) you can test it on multiple boards.  That will help offset the statistical nature of some of the failures, e.g. you won't have to wait as long in general to see a failure, and when you go a long time without a failure on any of the boards you'll know you're making progress.