Tool/software:
Hi,
With SDK 10.1, we've observed that when we attempt to put the EVM (revision PROC135E3) into deep sleep via "rtcwake -s 30 -m mem" or "echo mem > /sys/power/state", the DM R5 often (always?) attempts to print an assertion failure and the system appears to hang. Power readings across the sense resistors in this state largely match the values when awake. Some further debug details for SDK 10.1 follow below, after the divider.
With console_suspend disabled, the other symptoms we see are the same as PROCESSOR-SDK-AM62A: Deep Sleep Error in SDK 10.00: mailbox timeout first in ti_sci_suspend(), then followed by an error in the e5010_jpeg_enc kernel module during resume, or further ti-sci errors and MMC timeouts if e5010_jpeg_enc is unloaded before attempting to sleep. Perhaps the e5010_jpeg_enc and MMC errors are simply a result of the system getting into a bad state from the failed suspend and attempted resume?
Aside from similar symptoms in Linux, the DM behavior in SDKs 10.0 and 10.1 does appear to be different: in 10.0, we don't ever observe the assertion, and the DM does appear to make it to WFI (whether or not the e5010_jpeg_enc driver is unloaded first); attaching to the R5 via JTAG (or subsequently unpausing and pausing its execution), its PC is always one instruction after a WFI; presumably the debugger wakes it from the WFI? However, even if the DM is making it to WFI in SDK 10.0, power readings across the sense resistors are much closer to the values when awake.
In SDK 9.2, after unloading the DSP remoteproc module (since that version of the DSP firmware doesn't support graceful shutdown), sleep does appear to succeed (and combined, I read ~20-30 mW across the sense resistors), so this seems like a regression in SDKs 10.0/10.1. In 9.2, it's not necessary to unload the e5010_jpeg_enc driver and no mailbox or other errors are printed. As in SDK 10.0, attaching to the R5 via JTAG shows it does appear to make it to WFI.
All three SDK versions we tested on the EVM for this were the edgeai images from here with no modifications.
Note also that the DM appears to have a placeholder version string in at least SDKs 9.2, 10.0, and 10.1.
9.2:
##DM Built On: Apr 2 2024 18:17:23
##Sciserver Version: v2023.11.0.0REL.MCUSDK.MM.NN.PP.bb
##RM_PM_HAL Version: vMM.NN.PP
10.0:
##DM Built On: Aug 13 2024 21:19:51
##Sciserver Version: v2023.11.0.0REL.MCUSDK.MM.NN.PP.bb
##RM_PM_HAL Version: vMM.NN.PP
10.1:
##DM Built On: Dec 10 2024 20:25:19
##Sciserver Version: v2023.11.0.0REL.MCUSDK.MM.NN.PP.bb
##RM_PM_HAL Version: vMM.NN.PP
Unfortunately the assertion message in SDK 10.1 gets truncated to ~36 or 37 characters pretty consistently, which is only enough to print the "FreeRTOS-Kernel/t" part of the file where it occurred, but once, it did manage to print "FreeRTOS-Kernel/ta", so it seems the assertion may be in tasks.c.
Inspecting the R5 via JTAG, a few of the core registers contain addresses of relevant-sounding strings that could be the rest of the assertion message:
- R1: 0x9D051194 -> "
FreeRTOS-Kernel/queue.c" - R2: 0x9D050E79 -> "
xQueueGiveMutexRecursive" - R5: 0x9D0524E6 -> "
vTaskSwitchContext" - R6: 0x9D0511AC -> "
FreeRTOS-Kernel/tasks.c" - R9: 0x9D04A559 -> "
(uint32_t)(( ( &( pxReadyTasksLists[ uxTopPriority ] ) )->uxNumberOfItems ) > 0)"
Based on the sources in MCU+ SDK 10.01.00.33, the assertion condition pointed to by R9 does appear to occur in vTaskSwitchContext() via a call to the macro taskSELECT_HIGHEST_PRIORITY_TASK(), so it seems plausible that this was indeed what the DM was trying to print.
I see that the logging function uses a semaphore and that releasing it could have called xQueueGiveMutexRecursive(); perhaps a second assertion occurred there, which could be why the first didn't finish printing? The DM ends up in a tight loop polling a memory location on the stack (with value 1, comparing to 0); maybe one of the forever loops in _DebugP_assertNoLog() or _DebugP_assert()?