This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: Roughly 10% of the time "rmmod omap-remoteproc" crashes unloading ipu module based on ex02_messageq

Part Number: AM5728
Other Parts Discussed in Thread: SYSBIOS

Hello,

We have ipu code based on AM57x8-IPC/Main/ipc_3_47_02_00/examples/DRA7XX_linux_elf/ex02_messageq/ipu1

The code works correctly and usually starts up and shuts down correctly. However, about 10% of the time the code crashes during system shutdown. This can be replicated with these commands:

rmmod -v omap-remoteproc
modprobe -v omap-remoteproc

At this point, it appears the problem is "MessageQ_get" doesn't handle the shutdown. It's also possible the BIOS code itself doesn't handle the shutdown. I've determined there appears to be a missing call to BIOS_exit at the end of "smain" in MainIpu1.c. Without that call, when smain end the system hangs. However, even with that the system still hangs when unloading the module.

Note: this problem happens with the unchanged example code. I'm using u-boot "loglevel=9" and there are no messages sent to the serial console when the system hangs. Also, the "ipu1" code is the only one loaded. (No ipu2, dsp1, or dsp2)

I need help diagnosing this problem and/or suggestions how to fix it.

Thanks,

Scott

  • Hello,

    This "unload" is during system shutdown. It can be simulated with the "rmmod" or "modprobe -r" commands. This happens with the unmodified example code.

    Is there anyone that knows how to fix the code so the system cab reliably shut down?

    Thanks,

    Scott

  • Hello Apps team,

    Just wanted to give a quick update based on my call yesterday with Scott.

    So one quick thing is like Scott mentioned, only IPU 1 is used, but Scott has found that IPU2 must be loaded with firmware before IPU1 code will execute. Is there a way to only load one IPU and have the firmware run?

    The second is that IPU2 firmware is just ex02_messageq, when running this example, IPU1 will execute as expected, but in order to enable clean shutdown the PM module is added to the ex02 example, and then the IPU1 code seems to stop working with no interrupts generated from the timer modules. Is there any known interaction between PM and the IPU cores which would cause potential issues?

    Regards,

    Munan

  • Scott,

    Can you try with the messageq_single firmware image and confirm if you see the same issues or if the stability is improved?

    I'm looking into this along with Munan's clarification points and will follow up further.

    Best regards,

    Dave

  • Hi Dave,

    I copied

    ipu/ipc_3_47_02_00 /packages/ti/ipc/tests/bin/ti_platforms_evmDRA7XX_ipu2/messageq_single.xem4

    to the unit

    /lib/firmware/ti-coproc/server_ipu2.xem4

    and rebooted.

     

    This binary was created from the original, unchanged source files. There were no other system changes except I copied the file.

     

    The same result. The ipu1 code had a running timer, and then the timer stopped incrementing.

    Thanks,

    Scott

  • Scott,

    Thanks for confirming.

    best regards,

    Dave

  • can you also confirm your kernel version?

    -Dave

  • ~# uname -a
    Linux sg3 4.14.54-g10c0ccef20 #1 SMP PREEMPT Fri Sep 4 14:39:01 UTC 2020 armv7l GNU/Linux

  • Scott,

    Thanks for confirming. You are on an older release and there have been some updates. Can I ask if you can try two additional checks with your version?

    1. Is the issue reproducible with test_omx_ipuX_vayu.xem4 image?
    2. Is the issue reproducible with either of messageq_single or test_omx image on IPU2 instead of IPU1?

    We're looking through the update and test history in the meantime.

    Best regards,

    Dave

  • Hello Dave,


    With idle power management enabled in the code running on both ipu1 and ipu2, they can both be unloaded & reloaded successfully as many times as we want. However, with idle power management enabled on ipu2 that stops a timer on ipu1. This happens with the example code on ipu1 with the addition of creating and starting a timer. (This source code was provided) On ipu2 we're using unmodified example code. For both ipu1 and ipu2 the *.cfg file had the lines appended to enable idle power management.

    It would be a lot of work to change test_omx_ipuX_vayu to use MessageQ_* and start a timer. Similarly, our code was written to use ipu1 and so changing it to use ipu2 would some effort.

    We don't need code running on ipu2. In fact, messages are never sent to it nor received so, in theory, it's sitting idle. How can ipu2 be completely disabled? If we just remove server_ipu2.xem4 then ipu1 doesn't work. I've tried simply exiting the ipu2 loop in server.c, but it doesn't look like ipu2 stops running. Is there a way to exit the BIOS code? I've tried BIOS_exit but ipu2 still doesn't exit. What is the correct way to exit the ipu code or just disable it?


    Thanks,

    Scott

  • Scott,

    Can you confirm your timer information for me? Specifically, what timer are they using in your IPU1 image and did you add this timer to the IPU1 timers list on the kernel dts side? And the same question for IPU2, even though not used.

    We have primarily a BIOS tick timer and a Watchdog timer per core. If there are any additional timers, they can be requested on the kernel-side by just adding them to the the remoteproc timers property in DT.

    https://git.ti.com/gitweb?p=ti-linux-kernel/ti-linux-kernel.git;a=blob;f=Documentation/devicetree/bindings/remoteproc/ti%2Comap-remoteproc.txt;h=873481b552a2a4d74ca50c56cc23cd0142561bdb;hb=refs/heads/ti-linux-4.14.y 

    We have default timers assigned already in a common dtsi file,
    https://git.ti.com/gitweb?p=ti-linux-kernel/ti-linux-kernel.git;a=blob;f=arch/arm/boot/dts/dra7-ipu-dsp-common.dtsi;h=90132c9d8fd120868fa6994f3a345119ec7e8cba;hb=refs/heads/ti-linux-4.14.y 

    Which you can adjust in your custom-board dts file if any of this doesn’t match your application needs.

    Best regard,

    Dave

  • Hi Dave,

    The timer in the ipu1 code that stops running when ipu2 idle power management is enabled is accessed through the Timer_* apis. Specifically, the code calls Timer_create, Timer_setPeriod, and Timer_start. (This is all in Server.c in the source code provided)

    To enable the idle power management, these lines are added to the bottom of Ipu1.cfg & Ipu2.cfg:

    xdc.loadPackage('ti.pm');
    var Power = xdc.useModule('ti.sysbios.family.arm.ducati.smp.Power');
    Power.loadSegment = "PM_DATA";

    /* Idle Power Management functions for each core */
    Idle.addCoreFunc('&IpcPower_idle', 0); /* IpcPower_idle must be at the end */
    Idle.addCoreFunc('&IpcPower_idle', 1); /* IpcPower_idle must be at the end */

    If the last line in Ipu2.cfg is commented out (just prepend "/*"), the timer in ipu1 continues working. If the last line in Ipu2.cfg is not commented out, then the timer in ipu1 stops running after about 10-12 seconds. (The ISR stops being called)

    The only difference between working code (The ISR is being called at intervals) and not working is the last line of Ipu2.cfg

    Thanks,

    Scott

  • Scott,

    To re-confirm, you are not making any changes to the dts, correct? By default, the IPU2 examples are using gptimer 3 for os tick, and gptimers 9 and 4 for watchdog (if that is enabled).

    When you call Timer_create in your IPU1 code, your will be passing a timerId. Can you confirm what is that timerId?

    Best regards,

    Dave

  • Hi Dave,

    Do you have the source files that were provided?

    Timer_create is called with "Timer_ANY".

    From Ipu1.cfg:

    var Clock = xdc.useModule('ti.sysbios.knl.Clock');
    Clock.tickSource = Clock.TickSource_USER;
    /* Configure GPTimer11 as BIOS clock source */
    Clock.timerId = 10;

    var WD = xdc.useModule('ti.deh.Watchdog');
    WD.timerIds.length = 2;
    WD.timerSettings.length = 2;
    WD.timerIds[0] = "GPTimer7";
    WD.timerSettings[0].intNum = 60;
    WD.timerSettings[0].eventId = -1;
    WD.timerIds[1] = "GPTimer2";
    WD.timerSettings[1].intNum = 61;
    WD.timerSettings[1].eventId = -1;

    The code doesn't enable the watchdog.

    Thanks,

    Scott

  • Scott,

    It looks like you have changed one of the Watchdog timers, so you will need to make a corresponding adjustment in your board dts-file for the watchdog-timers property under ipu1.

    We are unclear if the way you are using the additional timer will work. Can you try instead with a fixed timer and then add this timer to the timers list in the ipu1 dtsi file? It is advisable to have Linux knowledge of the resource acquisition for the timer, as the clock management is handled by Linux.

    Best regards,

    Dave

  • Scott,

    Regarding IPU2, if truly unused in your application you can either disable the node in dts (and the associated reserved-memory node to give back memory to kernel) or you can remove the default firmware name from FS. The whole Power Management scheme will also shutdown any core automatically after 10 seconds of no communication activity between the MPU and IPU. As a third option, you can also use remoteproc sysfs 'state' file to shutdown the core explicitly.

    Best regards,

    Dave

  • Hi Dave,

    Which dts file should be changed? What statements should I look for?

    The arch\arm\boot\dts\dra7-ipu-dsp-common.dtsi file contains:

    &ipu1 {
            mboxes = <&mailbox5 &mbox_ipu1_ipc3x>;
            timers = <&timer11>;
            watchdog-timers = <&timer7>, <&timer2>;
    };

    &ipu2 {
            mboxes = <&mailbox6 &mbox_ipu2_ipc3x>;
            timers = <&timer3>;
            watchdog-timers = <&timer4>, <&timer1>;
    };

    The two watchdog timers match the lines from Ipu1.cfg:

    WD.timerIds.length = 2;
    WD.timerSettings.length = 2;
    WD.timerIds[0] = "GPTimer7";
    WD.timerSettings[0].intNum = 60;
    WD.timerSettings[0].eventId = -1;
    WD.timerIds[1] = "GPTimer2";
    WD.timerSettings[1].intNum = 61;
    WD.timerSettings[1].eventId = -1;

    I don't understand what you mean about using an additional timer. Are you referring to using the BIOS Timer_* APIs, including Timer_create with the documents parameter of "Timer_ANY"? The code works correctly, unless ipu2 has idle power management enabled. If so, the timer in ipu1 works correctly for 10-12 seconds and then stops. Without the idle power management added in Ipu1.cfg and Ipu2.cfg, everything works correctly but the ipu module crashes 2%-3% of the time during system shutdown. (This shutdown crash can be simulated with "modprobe -r omap-remoteproc" to unload and "modprobe omap-remoteproc" to reload.)

    Thanks,

    Scott

  • Hi Dave,

    We tried disabling ipu2 but then ipu1 stops working. The easy change, we removed /lib/firmware/ti-coproc/server_ipu2.xem4. I've tried disabling the ipu in the dts, but I'll try again.

    I just tried using "echo stop > /sys/class/remoteproc/remoteproc1/state" with no other changes. It appeared to have worked - the state was "offline" but when I started our app and it sent a message, ipu1 crashed and restarted. I know that happened because when testing, I usually "tail -n 99 -F /sys/kernel/debug/remoteproc/remoteproc0/trace0". It showed the usually startup messages. When I started our app, the tail command showed "tail: /sys/kernel/debug/remoteproc/remoteproc0/trace0 has been replaced; following end of new file"

    Thanks,

    Scott

  • After a lot of testing & experimentation, I found the answers.

     

    According to the documentation, Timer_create( TIMER_Any ) will allocate and use a BIOS timer. As it turns out, the timer is really a general purpose timer. The BIOS code doesn’t know what timers are used by the main cpu, either ipu, or either dsp. General purpose timer 2 was reserved in the device tree for ipu1, but the BIOS didn’t know so it allocated and used timer 3. (The api Timer_getstatus showed the same results: timer 2 wasn’t available but timer 3 was) However, timer 3 was being used by ipu2 for the BIOS. When power management was enabled for ipu2, it could change to idle mode which stopped timer 3. Since ipu1 was using it, the timer stopped in ipu1, too. Solution: In the device tree, manually allocate timer 3 for ipu1 and the ipu code should explicitly use timer 3 with Timer_create( 2 ). (Use ‘2’ because the timer number is zero based)

     

    Since we weren’t using the watchdog timer in ipu1, it was suggested to remove the lines from Ipu1.cfg that started with WD.timer*. Usually we saw two “watchdog disabled” messages. However, occasionally we saw a message a watchdog was enabled and then disabled. If the watchdog code isn’t told which timer to use, apparently it defaults to timer 9. Therefore, when ipu1 stopped it’s watchdog timers, it stopped timer 9 which interfered with other code that had timer 9 assigned in the device tree. Similarly, if the ipu went idle it stopped timer 9. To correct this, I removed from Ipu1.cfg the line “var WD = xdc.useModule('ti.deh.Watchdog')”. Now the log has no watchdog messages.

     

    Before those changes, ipu1 was using timer 3 (from Timer_create) and ipu2 was initializing that timer. If ipu2 wasn’t loaded then the ipu1 code crashed because the timer wasn’t initialized by the Timer_* routines. The ipu1 code works correctly with the device tree and code updated to use timer 3 for ipu1. Since ipu1 is now initializing timer 3, the status for ipu2 could be set to “disabled”.

     

    Since power management is now enabled in ipu1, system crashes have been greatly improved while unloading the code during system shutdown and loading the code during system boot.

  • Scott,

    That's great. Thanks for sharing the update and glad to see the positive outcome.

    Best regards,

    Dave