This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] How do I trigger the DM R5F crash? How do I verify that the crash is actually fixed?


This is a companion FAQ for

 [FAQ] [Alert] DM R5F can crash in certain conditions: AM62x, AM62Ax, AM62Dx, AM62Px, AM67, AM67A  

Please read the alert first. You can also find more information in the main FAQ here:

 [FAQ] DM R5F can crash in certain conditions: AM62x, AM62Ax, AM62Dx, AM62Px, AM67, AM67A  

This FAQ applies to AM62x, AM62Ax, AM62Dx, AM62Px.

It takes 49 days for the counter to roll over and crash the DM R5F.

1) How do I test within minutes if the DM R5F will crash? (instead of waiting 49 days)

2) How can I verify that this patch actually fixes the crash?

3) Oooh, great! There is a patch! Can I just patch my SDK 8.x or SDK 9.x MCU+ SDK and use that version of the DM R5F? (no)

.

This FAQ should apply to SDKs 8.6 through SDK 10.1.

  • How do I get the DM R5F to crash in a minute? (instead of waiting 49 days)

    Usually, TI tries to provide "known good" examples as a starting point for customer development. This is a "known bad" example, that should allow you to see the DM R5F crash within 60 seconds. You can then compare your design against this "known bad" example.

    Let's test what actually happens when elapsedTicks is allowed to increment up to SystemP_TIMEOUT. Does the DM R5F actually crash, or does the timer just roll over harmlessly?

    Modify Linux devicetree file so that it no longer attaches to the DM R5F or initializes the RPMsg infrastructure with the DM R5F  

    This is needed so that Linux never initializes the RPMsg infrastructure. That way, the DM R5F will stay in function RPMessage_waitForLinuxReady long enough to trigger the crash.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/2477.0001_2D00_remove_2D00_DM_2D00_R5F_2D00_from_2D00_the_2D00_Linux_2D00_devicetree_2D00_file.patch

    Modify the MCU+ example and driver so that elaspedTicks = SystemP_TIMEOUT after a minute instead of 49 days 

    Apply this patch to the MCU+ SDK:

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_SITSW_2D00_6185_2D00_Trigger_2D00_waitForLinuxReady_2D00_timeout_2D00_after_2D00_6.patch

    Rebuild tispl.bin with the updated code 

    We modified the MCU+ driver. So make sure to rebuild the RTOS libraries before rebuilding the IPC RPMsg Echo example:
    https://software-dl.ti.com/mcu-plus-sdk/esd/AM62X/10_01_00_33/exports/docs/api_guide_am62x/GETTING_STARTED_BUILD.html 

    For steps to rebuild the DM R5F firmware and package the output binary into tispl.bin, refer to
    AM62x/AM62Ax/AM62Px academy > Multicore > Application Development > Developing on the DM R5F
    https://dev.ti.com/tirex/explore/node?node=A__AZNhqJdyJ3LM.YBw-Z2UAw__AM62-ACADEMY__uiYMDcq__LATEST 

    NOTE: If you are building the IPC RPMsg Echo example from CCS, the output binary may not be stripped like if you used makefile commands. You can strip the CCS binary like this:

    /path/to/ti/ti-cgt-armllvm_3.2.2.LTS/bin/tiarmstrip -p filename.release.out -o filename.tiarmstrip.out

    Summary of results

    Yes, the DM R5F crashed after about a minute on all 5 of 5 test runs (100% crashes).

    Terminal output when a DM R5F crash occurs 

    Sometimes there was no terminal output immediately after the DM R5F crash. However, I could verify that the DM R5F was no longer responding by trying to go into a low power mode, and seeing that the system did not return after 3 seconds.

    sample terminal output. There were no error prints before the low power mode test:

    |  _  |___ ___ ___ ___   |  _  |___ ___  |_|___ ___| |_
    |     |  _| .'| . | . |  |   __|  _| . | | | -_|  _|  _|
    |__|__|_| |__,|_  |___|  |__|  |_| |___|_| |___|___|_|
                  |___|                    |___|
    
    Arago Project am62xx-evm ttyS2
    
    Arago 2023.10 am62xx-evm ttyS2
    
    am62xx-evm login: ***************************************************************
    ***************************************************************
    ...
    ***************************************************************
    ***************************************************************
    [   16.633947] kauditd_printk_skb: 11 callbacks suppressed
    [   16.633966] audit: type=1701 audit(23.148:27): auid=4294967295 uid=0 gid=0 ses=429
    4967295 subj=kernel pid=408 comm="ti-apps-launche" exe="/usr/bin/ti-apps-launcher" si
    g=11 res=1
    [   16.670777] audit: type=1334 audit(23.188:28): prog-id=21 op=LOAD
    [   16.676990] audit: type=1334 audit(23.196:29): prog-id=22 op=LOAD
    [   16.684280] audit: type=1334 audit(23.200:30): prog-id=23 op=LOAD
    [   21.150809] audit: type=1334 audit(27.668:31): prog-id=23 op=UNLOAD
    [   21.157226] audit: type=1334 audit(27.668:32): prog-id=22 op=UNLOAD
    [   21.163515] audit: type=1334 audit(27.668:33): prog-id=21 op=UNLOAD
    
    // logged in after 200 seconds
    am62xx-evm login: root
    [  200.367045] audit: type=1006 audit(206.884:34): pid=1254 uid=0 subj=kernel old-aui
    d=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=7 res=1
    [  200.380066] audit: type=1300 audit(206.884:34): arch=c00000b7 syscall=64 success=y
    es exit=1 a0=8 a1=fffff9b79bf8 a2=1 a3=1 items=0 ppid=1 pid=1254 auid=0 uid=0 gid=0 e
    uid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=7 comm="(systemd)" exe="/us
    r/lib/systemd/systemd-executor" subj=kernel key=(null)
    [  200.407137] audit: type=1327 audit(206.884:34): proctitle="(systemd)"
    [  200.413639] audit: type=1334 audit(206.900:35): prog-id=24 op=LOAD
    [  200.419911] audit: type=1300 audit(206.900:35): arch=c00000b7 syscall=280 success=
    yes exit=8 a0=5 a1=fffff2321528 a2=90 a3=0 items=0 ppid=1 pid=1254 auid=0 uid=0 gid=0
     euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=7 comm="systemd" exe="/us
    r/lib/systemd/systemd" subj=kernel key=(null)
    r/lib/systemd/systemd" subj=kernel key=(null)
    [  200.446240] audit: type=1327 audit(206.900:35): proctitle="(systemd)"
    [  200.452759] audit: type=1334 audit(206.924:36): prog-id=24 op=UNLOAD
    [  200.459163] audit: type=1300 audit(206.924:36): arch=c00000b7 syscall=57 success=y
    es exit=0 a0=8 a1=1 a2=0 a3=ffff8ec28c60 items=0 ppid=1 pid=1254 auid=0 uid=0 gid=0 e
    uid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=7 comm="systemd" exe="/usr/
    lib/systemd/systemd" subj=kernel key=(null)
    [  200.485243] audit: type=1327 audit(206.924:36): proctitle="(systemd)"
    [  200.491826] audit: type=1334 audit(206.924:37): prog-id=25 op=LOAD
    root@am62xx-evm:~#
    
    // no error outputs until running low power test
    root@am62xx-evm:~# rtcwake -m mem -s 3
    rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:03:50 1970
    [  220.339122] PM: suspend entry (deep)
    [  220.342902] Filesystems sync: 0.000 seconds
    [  220.363634] Freezing user space processes
    [  220.369553] Freezing user space processes completed (elapsed 0.001 seconds)
    [  220.376559] OOM killer disabled.
    [  220.379786] Freezing remaining freezable tasks
    [  220.385629] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
    [  220.393030] printk: Suspending console(s) (use no_console_suspend to debug)
    
    ** 8057 printk messages dropped **
    [  269.787509] pca953x 1-0022: failed reading register
    [  269.787534] pca953x 1-0022: failed reading register
    [  269.787562] pca953x 1-0022: failed reading register
    ...
    etc

    Sample terminal output. In this boot, there were error prints before the low power mode test:

    // we can do normal things like low power modes before 60 seconds elapses
    root@am62xx-evm:~# rtcwake -m mem -s 3
    rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:00:34 1970
    [   23.983395] PM: suspend entry (deep)
    [   23.987166] Filesystems sync: 0.000 seconds
    [   24.006693] Freezing user space processes
    [   24.012608] Freezing user space processes completed (elapsed 0.001 seconds)
    [   24.019646] OOM killer disabled.
    [   24.022885] Freezing remaining freezable tasks
    [   24.028746] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
    [   24.036218] printk: Suspending console(s) (use no_console_suspend to debug)
    [   24.052755] ti-sci 44043000.system-controller: ti_sci_cmd_set_device_constraint: device: 179: state: 1: ret 0
    [   24.052941] ti-sci 44043000.system-controller: ti_sci_cmd_set_device_constraint: device: 178: state: 1: ret 0
    [   24.053815] am65-cpsw-nuss 8000000.ethernet eth0: Link is Down
    [   24.062152] omap8250 2800000.serial: PM domain pd:146 will not be powered off
    [   24.062745] ti-sci 44043000.system-controller: ti_sci_cmd_set_device_constraint: device: 117: state: 1: ret 0
    [   24.092541] remoteproc remoteproc0: stopped remote processor 5000000.m4fss
    [   24.096490] Disabling non-boot CPUs ...
    [   24.098153] psci: CPU1 killed (polled 4 ms)
    [   24.101946] psci: CPU2 killed (polled 0 ms)
    [   24.105357] psci: CPU3 killed (polled 0 ms)
    [   24.106810] Enabling non-boot CPUs ...
    [   24.107238] Detected VIPT I-cache on CPU1
    [   24.107293] GICv3: CPU1: found redistributor 1 region 0:0x00000000018a0000
    [   24.107357] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
    [   24.108558] CPU1 is up
    [   24.108833] Detected VIPT I-cache on CPU2
    [   24.108861] GICv3: CPU2: found redistributor 2 region 0:0x00000000018c0000
    [   24.108901] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
    [   24.109771] CPU2 is up
    [   24.110054] Detected VIPT I-cache on CPU3
    [   24.110092] GICv3: CPU3: found redistributor 3 region 0:0x00000000018e0000
    [   24.110142] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
    [   24.111217] CPU3 is up
    [   24.111741] ti-sci 44043000.system-controller: ti_sci_resume: wakeup source: 0x50
    [   24.127611] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19
    [   24.136948] am65-cpsw-nuss 8000000.ethernet eth0: PHY [8000f00.mdio:00] driver [TI DP83867] (irq=POLL)
    [   24.136973] am65-cpsw-nuss 8000000.ethernet eth0: configuring for phy/rgmii-rxid link mode
    [   24.143702] am65-cpsw-nuss 8000000.ethernet eth1: PHY [8000f00.mdio:01] driver [TI DP83867] (irq=POLL)
    [   24.143714] am65-cpsw-nuss 8000000.ethernet eth1: configuring for phy/rgmii-rxid link mode
    [   24.331834] OOM killer enabled.
    [   24.334978] Restarting tasks ... done.
    [   24.341035] random: crng reseeded on system resumption
    [   24.346610] k3-m4-rproc 5000000.m4fss: Core is off in resume
    [   24.352387] remoteproc remoteproc0: powering up 5000000.m4fss
    [   24.358226] remoteproc remoteproc0: Booting fw image am62-mcu-m4f0_0-fw, size 496356
    [   24.367864] rproc-virtio rproc-virtio.2.auto: assigned reserved memory node m4f-dma-memory@9cb00000
    [   24.381964] virtio_rpmsg_bus virtio0: rpmsg host is online
    [   24.383656] virtio_rpmsg_bus virtio0: creating channel ti.ipc4.ping-pong addr 0xd
    [   24.387746] rproc-virtio rproc-virtio.2.auto: registered virtio0 (type 7)
    [   24.395338] virtio_rpmsg_bus virtio0: creating channel rpmsg_chrdev addr 0xe
    [   24.402010] remoteproc remoteproc0: remote processor 5000000.m4fss is now up
    [   24.428423] PM: suspend exit
    [   27.190799] am65-cpsw-nuss 8000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
    
    // now the DM R5F times out
    root@am62xx-evm:~# [   56.918065] ti-sci 44043000.system-controller: Mbox timedout in resp(caller: ti_sci_cmd_put_device+0x18/0x24)
    [   56.928037] ti-sci 44043000.system-controller: Mbox send fail -110
    [   60.182079] ti-sci 44043000.system-controller: Mbox timedout in resp(caller: ti_sci_cmd_put_device+0x18/0x24)
    [   60.192034] ti-sci 44043000.system-controller: Mbox send fail -110
    [   64.566052] ti-sci 44043000.system-controller: Mbox timedout in resp(caller: ti_sci_cmd_put_device+0x18/0x24)
    [   64.576005] ti-sci 44043000.system-controller: Mbox send fail -110
    
    // let's try to do another low power mode transition
    root@am62xx-evm:~# rtcwake -m mem -s 3
    rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:01:23 1970
    [   69.601253] PM: suspend entry (deep)
    [   69.604987] Filesystems sync: 0.000 seconds
    [   69.623345] Freezing user space processes
    [   69.629221] Freezing user space processes completed (elapsed 0.001 seconds)
    [   69.636241] OOM killer disabled.
    [   69.639469] Freezing remaining freezable tasks
    [   69.645321] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
    [   69.652731] printk: Suspending console(s) (use no_console_suspend to debug)
    ** 10183 printk messages dropped **
    [  123.962076] pca953x 1-0022: failed reading register
    [  123.962081] pca953x 1-0022: failed reading register
    [  123.962086] pca953x 1-0022: failed reading register
    [  123.962091] pca953x 1-0022: failed reading register
    ...
    etc

  • How can I verify that this patch actually fixes the crash?

    Apply the updates from the patch, then re-run the tests from before. You should be able to successfully enter and exit low power modes after a minute has passed, without running into any errors.

  • Oooh, great! There is a patch! Can I just patch my SDK 8.x or SDK 9.x MCU+ SDK and use that version of the DM R5F?

    No.

    If you are going to production with SDK 8.x or SDK 9.x, then you still need to update your DM R5F and TIFS firmware to be from SDK 10.0 or later. That is because a critical PLL instability issue was fixed starting in SDK 10.0. For more information, refer to Linux SDK section "PLL Programing Sequence Update To Avoid PLL Instability"
    https://software-dl.ti.com/processor-sdk-linux/esd/AM62X/10_00_07_04/exports/docs/devices/AM62X/linux/Release_Specific_Migration_Guide.html