AM62P: mailbox channel is not freed during R5 remoteproc stop call

Part Number: AM62P

Tool/software:

Hi everyone,

on ti-linux-6.6.y, drivers/remoteproc/ti_k3_r5_remoteproc.c, I noticed during k3_r5_rproc_stop() call, a message is sent to the remote core (cortex R5). If the firmware does not reply to this message, it stays into the mailbox preventing the system to go into suspend mode (echo mem > /sys/power/state):

    platform 79000000.r5f: k3_r5_suspend: timedout waiting for rproc completion event
    omap-mailbox 29010000.mailbox: fifo 1 has unexpected unread messages
    omap-mailbox 29010000.mailbox: PM: dpm_run_callback(): omap_mbox_suspend+0x0/0xc4 [omap_mailbox] returns -16

If 'stop' is not called, then the suspend works fine.

'ti_k3_m4_remoteproc' driver does free the mailbox channel during the stop function, therefore I added the same to 'ti_k3_r5_remoteproc' driver to fix suspend on the AM62P SoC:

```diff
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 42a1afed490c..95bd276995a7 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -836,6 +836,8 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
                        return ret;
                }

+               mbox_free_channel(kproc->mbox);
+
                ret = wait_for_completion_timeout(&kproc->shut_comp, to);
```

I would like to know if this is indeed the correct approach, and if so, if this can be fixed into ti-linux kernel.

Thanks for your time,

Hiago.

  • Hello Hiago,

    You are observing graceful shutdown, which is the expected behavior. By default, our Linux remoteproc driver will not shut down a non-Linux core unless the non-Linux core responds to the shutdown notification with an ACK (i.e., the non-Linux core says "I am in a good state to shut down").

    Please find more information about graceful shutdown in the AM62Px academy:
    Multicore > booting & disabling Processor cores > Graceful shutdown
    https://dev.ti.com/tirex/explore/node?node=A__Aa9uP63kd2xbTRZBzxKauA__AM62P-ACADEMY__fp5YxRM__LATEST

    Regards,

    Nick

  • Hi Nick, thanks for your reply.

    I understand this graceful shutdown, but this goes deeper than that. In this case, the stop command has already been called (echo stop > /sys/class/remoteproc/remoteproc0/state) and the remote core, R5F, is already stopped and in offline state. Not every firmware will reply this request, for example, a simple hello-world firmware. Therefore, the core is already stopped and the message inside the mailbox was not consumed, preventing the system to go into suspend forever (or until the message has been consumed).

    Best regards,
    Hiago.

  • Hello Hiago,

    What might be going on

    Do you have custom code running on the MCU R5F core?

    The DM R5F core is running the ipc_rpmsg_echo_linux example (i.e., DM task is running, and also IPC_Echo task is running). This firmware sends a mailbox to other non-Linux cores, and then after the other non-Linux cores respond, then it does an IPC test with the other non-Linux cores. The IPC_Echo task is just demo code, it is not actually used by any of your applications during runtime.

    However, if you have your own custom code running on the MCU R5F (or no code), then your code is probably not listening for a mailbox message from demo code. In that case, the mailbox message would go unread.

    Then when Linux is trying to do a low power mode transition, it will check for unread mailbox messages, see the unread message, and refuse to shut down.

    You probably want to update the DM R5F code anyway 

    Since you are using SDK 10.x, your design may be susceptible to this DM R5F crash:
    [FAQ] [Alert] DM R5F can crash in certain conditions: AM62x, AM62Ax, AM62Dx, AM62Px, AM67, AM67A

    Preventing the DM R5F 49 day crash involves updating the DM R5F firmware. If you want, I can provide patches or code for the DM R5F to both fix the 49 day crash, and get rid of the IPC_Echo task so that a mailbox message is not sent to the MCU R5F core (basically, switch to using the empty project).

    Regards,

    Nick

  • Hello Nick,

    In this case I am not running any custom firmware, this is the hello-world demo from TI MCU SDK. Thanks for pointing me out the SDK version, I downloaded and the issue still present.

    My concern is the driver enforces something (the mailbox ping) that not necessarily the firmware is prepared to respond, which causes the CPU not to go into suspend anymore, even if the core is stopped. Therefore the message should be dropped or saved to be later used, and not prevent the whole CPU to go into suspend mode, so from my point of view this is a bug that would need to be addressed on the kernel driver side. Am I correct?

    Best regards,
    Hiago.

  • Hello Hiago,

    The "Hello World" demo does not have any code to listen for mailboxes and respond to them. So I would expect that the DM R5F core's IPC_Echo task would send a mailbox message to the MCU R5F core, and that mailbox would go unread.

    It is not a kernel bug for a low power mode transition to abort if Linux thinks that other parts of the system have unfinished work.

    Here is how to verify you are seeing the issue that I am thinking of:
    1) Do you see the same behavior if you do not load any firmware into the MCU R5F? (just rename the firmware binary and it will fail to load)
    2) Does the behavior go away if you load the default ipc_rpmsg_echo_linux project into the MCU R5F?

    If the answer to both of those questions is "yes", then what I described above is what you are seeing.

    Easiest workarounds 

    1) Load a different DM R5F project (I can provide) that does not send a mailbox to MCU R5F

    2) Figure out which mailbox is getting used to send the message from DM R5F to MCU R5F. Then remove that mailbox from the Linux devicetree. That way Linux will not check to see if there is an unread message in the mailbox. I can provide an example code patch.

    Regards,

    Nick

  • Hi Nick,

    Sorry for the delay.

    I understand your points, they are correct for this case. However this changes were also accepted on Linux mainline, see the discussion and path here:

    - https://lore.kernel.org/all/20250806-v1-fix-am62-hmp-suspend-v1-1-1c4a81bb5dde@toradex.com/

    - https://lore.kernel.org/all/prvj5e2y3ruqgn35auolaia5zwoahtfecosumwshappy32ylrq@eivdt62vp7rh/

    This change was also supported by Andrew Davis <afd@ti.com> and Beleswar Prasad Padhi <b-padhi@ti.com>, both from TI.

    I was wondering if you could maybe talk to them and check if we could do the same for the TI downstream kernel?

    I again understand your points, however from my point of view (and the discussion on the linux kernel mailing list) the driver should not prevent the whole CPU going into sleep mode if the firmware did not touch the mailbox. With that, a simple firmware cannot be used, and the system cannot go into suspend even tough the cortex-r5 has already stopped.

    Best regards,
    Hiago.

  • Hello Hiago,

    More background

    Thank you for continuing to dig into this, and for linking to your kernel patches. We were actually talking about two different mailbox messages that are automatically sent to PRU cores. I was unaware of your exact mailbox issue until today (apparently I have not tried low power modes at the same time that I was running MCU+ code that did not have RPMsg enabled).

    I was talking about a mailbox message that is sent from DM R5F ipc_rpmsg_echo_linux project in IpcNotify_syncAll():
    https://github.com/TexasInstruments/mcupsdk-core-k3/blob/k3_main/examples/drivers/ipc/ipc_rpmsg_echo_linux/ipc_rpmsg_echo.c 

    For future readers, Hiago found that the Linux kernel also automatically sends a mailbox message to a non-Linux core while initializing the IPC infrastructure. So if the non-Linux core does not have Linux IPC configured, it will not read the mailbox message. Low power mode transitions will be blocked, just like when the ipc_rpmsg_echo_linux project sends a mailbox that goes unread.

    Running a couple more experiments 

    I am the author of the multicore academy. When I wrote the pages you read, I was not aware of a big difference between an empty resource table and a resource table with IPC enabled. If I gave you test code, could I get you to run a few more tests? I'll update the academy as needed based on the results.

    I want to try:

    1) zero out the resource table region of DDR (no data)

    2) empty resource table (I am actually doing a project right now for AM62Px DM R5F, I could share the patches and easy to modify on MCU R5F)

    3) autogenerated resource table with Linux IPC enabled (sounds like you already tested, is this where you saw a message from k3_rproc_kick())?

    Confirming your solution

    Your patch looks good for kernel 6.6 and 6.12. There is no reason for Linux to send a mailbox message to the non-Linux core during initialization on these kernel releases.

    I am not sure if it makes sense to upstream the patch at this point in time. The remoteproc infrastructure is still in development, and at least some of the proposals for future features would involve sending mailbox messages (e.g., to establish a list of "supported features" for interacting between Linux and the non-Linux core). I will continue discussing with our dev team.

    Regards,

    Nick

  • By the way, please check that your design is not impacted by the DM R5F 49 day crash 

    Since you are using AM62Px on SDK 10.x, I want to make sure that your design is not affected:
    [FAQ] [Alert] DM R5F can crash in certain conditions: AM62x, AM62Ax, AM62Dx, AM62Px, AM67, AM67A

    Regards,

    Nick