AM625: M4F FW Load Retry

Part Number: AM625

Hello,

When we load the firmware on the M4F of the AM625 after boot on the main cores it can fail. So far we have to reset the full device to retry. Is there a way to retry loading the firmware on the M4F without resetting the AM625?

Thank you and best regards,

Ambroise

  • Hello Ambroise,

    Thanks for your query.

    Can you please tell, Which SDK are you using?

    Is it Processor SDK or MCU+SDK?

    Regards,

    Tushar

  • Hi Tushar,

    For the MCU part, they are following this section of the documentation with a beagleplay AM625:

    https://dev.ti.com/tirex/explore/node?devtools=SK-AM62&node=A__AVjm7chph.4Q-bCWodAr.w__AM62-ACADEMY__uiYMDcq__LATEST

     

    the "Running the RPMsg userspace example" part. With this sample code :https://git.ti.com/git/rpmsg/ti-rpmsg-char.git

     

    and I have been concerned about the "Graceful shutdown" section.

    this section in the explanation :

     

    graceful shutdown only works if the remote core is able to respond. If the remote core has crashed or entered a bad state, the Linux driver will throw a timeout error instead of forcing the remote core off. This preserves the remote core state if debugging is required. In a multicore system, a bad state can come from outside the remote core (e.g., if the remote core is waiting for data from another core). So turning off and restarting the remote core may not actually address the source of the issue. Throwing a timeout error instead of blindly shutting off the core allows customers to handle the timeout error and then take whatever action is appropriate for their specific usecase.

    Is there a way to properly reset the M4F if it fails during operation without reseting the entire device?

    Thank you and best regards,
    Ambroise

  • Hello Ambroise,

    Thanks for sharing the above details.

    I am routing your query to our domain expert. Please expect a reply in a day or two.

    Regards,

    Tushar

  • Hello Ambroise,

    FYI: Collecting feedback on the multicore academy module

    I am happy to see customers using the AM62x academy! I am currently updating the multicore academy module for SDKs 9.1 and 9.2, and then I will restructure the pages and add additional information to improve the academy based on customer feedback. If your customer has any questions or constructive feedback, please encourage them to post that for me here on the forums so that I can make those changes.

    How to "properly" reset the M4F? 

    This depends on the customer's exact usecase and situation. I am NOT an RTOS developer, so my responses here will be more "system design" focused instead of "RTOS expert" focused.

    In general, the "safest" option is to reset the entire processor if your M4F core actually crashes (i.e., the core is in an irrecoverable bad state, instead of just waiting on something else). 

    The customer COULD just modify the Linux driver to shut down the remote core anyway after the timeout by removing the "return" line. But for most usecases, I would NOT suggest doing that, for all the detailed reasons in section "Why does graceful shutdown matter?":

     static int k3_m4_rproc_stop(struct rproc *rproc)
     {
     ...
             ret = wait_for_completion_timeout(&kproc->shut_comp, to);
             if (ret == 0) {
                     dev_err(dev, "%s: timedout waiting for rproc completion event\n", __func__);
    -                return -EBUSY; // this is the line that keeps us from shutting down the M4F
             };
    

    What if the M4F is using any peripherals?

    See this part of the IPC page you linked:

    If the remote core is powered off during Linux runtime without warning the remote core, then the remote core is not able to tell the DM core to release its peripherals. When the remote core requests its peripherals after being rebooted, the DM core will refuse the request, because The DM core will think that the peripherals are already in use. At that point, the remote core typically stalls, and the entire processor needs to be rebooted.

    Why is the M4F becoming nonresponsive? 

    What is your customer's exact situation?

    e.g., if the core is running fine, but just waiting on an input from another part of the system, it would make more sense to reset that other part of the system, add a timeout to your M4F code, etc, instead of rebooting the M4F core.

    Regards,

    Nick