AM62P: low probability abnormal thermal reset

Biao Li

Part Number: AM62P

Tool/software:

Hi Expert,

Customer report one abnormal warm reset issue, from the MCU_CTRL_MMR_CFG0_RST_SRC register, the value is 0x10, bit4 turn 1, and from the TRM, the reason of this warm reset is Thermal reset, so I let customer print the temp of VTM sensor, and Customers also used cooling solutions, the print of the VTM sensor always is around 48 and 50C, I also check the WKUP_VTM_MISC_CTRL2[25-16] MAXT_OUTRG_ALERT_THR, the value is 0x2f8, it represent 123C, so that this is no possible is a real Thermal reset, and I also let customer disable the VTM out of range alter as well. clear the WKUP_VTM_MISC_CTRL[0] ANYMAXT_OUTRG_ALERT_EN bit and change the source code to disable the Thermal reset as well. this bit is always 0 during the test, and the low probability (1/3000) abnormal reset is still here.

so I think although the reset reason said it is thermal reset, but it must be a mistake, it can't be the thermal reset, but I can 't find the reason of abnormal reset. and I also let customer to have test on TI EVM, it can be reproduced as well using the TI default SDK in around 700 times power on and off. they are using OSPI+EMMC boot mode, the emmc speed is DDR52. I need you help to find out any reason will cause the SOC trig the warm reset and record it as thermal reset?

BR,

Biao

3 months ago

0 Tao Han 3 months ago

TI__Expert 6465 points

Hi Experts:

We can reproduce customer report problem at AM62P EVM board.

Below is reproduce method.

HW: AM62P-SK EVM board run at 25C room temperature

SW: SD card boot method run RDK10.1 AM62P prebuild file system + print out register 0x0451 8178h MMR0_RST_SRC register uboot.

attached env.c file are our modifyed.

Test method, power on EVM board run 30 second insure AM62P enter linux OS, then power off 5 second.

We can find that at log line 1342306 happen thermal reset. We believe it is same as customer side issue it is abnormal thermal reset.

Please help us reproduce at your side and debug with us together.

Best Regards!

Han Tao

Run those test 3000 circuit we find at

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/minicom_2D00_20250709.7z

Fullscreen env.c Download

// SPDX-License-Identifier: GPL-2.0+
/*
 * Copyright (C) 2017 Google, Inc
 * Written by Simon Glass <sjg@chromium.org>
 */

#include <common.h>
#include <env.h>
#include <env_internal.h>
#include <log.h>
#include <asm/global_data.h>
#include <linux/bitops.h>
#include <linux/bug.h>

DECLARE_GLOBAL_DATA_PTR;

static struct env_driver *_env_driver_lookup(enum env_location loc)
{
	struct env_driver *drv;
	const int n_ents = ll_entry_count(struct env_driver, env_driver);
	struct env_driver *entry;

	drv = ll_entry_start(struct env_driver, env_driver);
	for (entry = drv; entry != drv + n_ents; entry++) {
		if (loc == entry->location)
			return entry;
	}

	/* Not found */
	return NULL;
}

static enum env_location env_locations[] = {
#ifdef CONFIG_ENV_IS_IN_EEPROM
	ENVL_EEPROM,
#endif
#ifdef CONFIG_ENV_IS_IN_EXT4
	ENVL_EXT4,
#endif
#ifdef CONFIG_ENV_IS_IN_FAT
	ENVL_FAT,
#endif
#ifdef CONFIG_ENV_IS_IN_FLASH
	ENVL_FLASH,
#endif
#ifdef CONFIG_ENV_IS_IN_MMC
	ENVL_MMC,
#endif
#ifdef CONFIG_ENV_IS_IN_NAND
	ENVL_NAND,
#endif
#ifdef CONFIG_ENV_IS_IN_NVRAM
	ENVL_NVRAM,
#endif
#ifdef CONFIG_ENV_IS_IN_REMOTE
	ENVL_REMOTE,
#endif
#ifdef CONFIG_ENV_IS_IN_SPI_FLASH
	ENVL_SPI_FLASH,
#endif
#ifdef CONFIG_ENV_IS_IN_UBI
	ENVL_UBI,
#endif
#ifdef CONFIG_ENV_IS_NOWHERE
	ENVL_NOWHERE,
#endif
};

static bool env_has_inited(enum env_location location)
{
	return gd->env_has_init & BIT(location);
}

static void env_set_inited(enum env_location location)
{
	/*
	 * We're using a 32-bits bitmask stored in gd (env_has_init)
	 * using the above enum value as the bit index. We need to
	 * make sure that we're not overflowing it.
	 */
	BUILD_BUG_ON(ENVL_COUNT > BITS_PER_LONG);

	gd->env_has_init |= BIT(location);
}

/**
 * arch_env_get_location() - Returns the best env location for an arch
 * @op: operations performed on the environment
 * @prio: priority between the multiple environments, 0 being the
 *        highest priority
 *
 * This will return the preferred environment for the given priority.
 * This is overridable by architectures if they need to and has lower
 * priority than board side env_get_location() override.
 *
 * All implementations are free to use the operation, the priority and
 * any other data relevant to their choice, but must take into account
 * the fact that the lowest prority (0) is the most important location
 * in the system. The following locations should be returned by order
 * of descending priorities, from the highest to the lowest priority.
 *
 * Returns:
 * an enum env_location value on success, a negative error code otherwise
 */
__weak enum env_location arch_env_get_location(enum env_operation op, int prio)
{
	if (prio >= ARRAY_SIZE(env_locations))
		return ENVL_UNKNOWN;

	return env_locations[prio];
}

/**
 * env_get_location() - Returns the best env location for a board
 * @op: operations performed on the environment
 * @prio: priority between the multiple environments, 0 being the
 *        highest priority
 *
 * This will return the preferred environment for the given priority.
 * This is overridable by boards if they need to.
 *
 * All implementations are free to use the operation, the priority and
 * any other data relevant to their choice, but must take into account
 * the fact that the lowest prority (0) is the most important location
 * in the system. The following locations should be returned by order
 * of descending priorities, from the highest to the lowest priority.
 *
 * Returns:
 * an enum env_location value on success, a negative error code otherwise
 */
__weak enum env_location env_get_location(enum env_operation op, int prio)
{
	return arch_env_get_location(op, prio);
}

/**
 * env_driver_lookup() - Finds the most suited environment location
 * @op: operations performed on the environment
 * @prio: priority between the multiple environments, 0 being the
 *        highest priority
 *
 * This will try to find the available environment with the highest
 * priority in the system.
 *
 * Returns:
 * NULL on error, a pointer to a struct env_driver otherwise
 */
static struct env_driver *env_driver_lookup(enum env_operation op, int prio)
{
	enum env_location loc = env_get_location(op, prio);
	struct env_driver *drv;

	if (loc == ENVL_UNKNOWN)
		return NULL;

	drv = _env_driver_lookup(loc);
	if (!drv) {
		debug("%s: No environment driver for location %d\n", __func__,
		      loc);
		return NULL;
	}

	return drv;
}

int env_load(void)
{
	struct env_driver *drv;
	int best_prio = -1;
	int prio;

	uint32_t  RST_SRC;

	if (CONFIG_IS_ENABLED(ENV_WRITEABLE_LIST)) {
		/*
		 * When using a list of writeable variables, the baseline comes
		 * from the built-in default env. So load this first.
		 */
		env_set_default(NULL, 0);
	}

	for (prio = 0; (drv = env_driver_lookup(ENVOP_LOAD, prio)); prio++) {
		int ret;

		if (!env_has_inited(drv->location))
			continue;

		printf("Loading Environment from %s... ", drv->name);
		/*
		 * In error case, the error message must be printed during
		 * drv->load() in some underlying API, and it must be exactly
		 * one message.
		 */
		ret = drv->load();
		if (!ret) {
			printf("OK\n");
			gd->env_load_prio = prio;
        RST_SRC=*(volatile uint32_t *) 0x43018178;
		    printf("The WKUP_MMR0_RST_SRC value is 0X %x\n", RST_SRC);  
			return 0;
		} else if (ret == -ENOMSG) {
			/* Handle "bad CRC" case */
			if (best_prio == -1)
				best_prio = prio;
		} else {
			debug("Failed (%d)\n", ret);
		}
	}

	/*
	 * In case of invalid environment, we set the 'default' env location
	 * to the best choice, i.e.:
	 *   1. Environment location with bad CRC, if such location was found
	 *   2. Otherwise use the location with highest priority
	 *
	 * This way, next calls to env_save() will restore the environment
	 * at the right place.
	 */
	if (best_prio >= 0)
		debug("Selecting environment with bad CRC\n");
	else
		best_prio = 0;

	gd->env_load_prio = best_prio;

	return -ENODEV;
}

int env_reload(void)
{
	struct env_driver *drv;

	drv = env_driver_lookup(ENVOP_LOAD, gd->env_load_prio);
	if (drv) {
		int ret;

		printf("Loading Environment from %s... ", drv->name);

		if (!env_has_inited(drv->location)) {
			printf("not initialized\n");
			return -ENODEV;
		}

		ret = drv->load();
		if (ret)
			printf("Failed (%d)\n", ret);
		else
			printf("OK\n");

		if (!ret)
			return 0;
	}

	return -ENODEV;
}

int env_save(void)
{
	struct env_driver *drv;

	drv = env_driver_lookup(ENVOP_SAVE, gd->env_load_prio);
	if (drv) {
		int ret;

		printf("Saving Environment to %s... ", drv->name);
		if (!drv->save) {
			printf("not possible\n");
			return -ENODEV;
		}

		if (!env_has_inited(drv->location)) {
			printf("not initialized\n");
			return -ENODEV;
		}

		ret = drv->save();
		if (ret)
			printf("Failed (%d)\n", ret);
		else
			printf("OK\n");

		if (!ret)
			return 0;
	}

	return -ENODEV;
}

int env_erase(void)
{
	struct env_driver *drv;

	drv = env_driver_lookup(ENVOP_ERASE, gd->env_load_prio);
	if (drv) {
		int ret;

		if (!drv->erase) {
			printf("not possible\n");
			return -ENODEV;
		}

		if (!env_has_inited(drv->location)) {
			printf("not initialized\n");
			return -ENODEV;
		}

		printf("Erasing Environment on %s... ", drv->name);
		ret = drv->erase();
		if (ret)
			printf("Failed (%d)\n", ret);
		else
			printf("OK\n");

		if (!ret)
			return 0;
	}

	return -ENODEV;
}

int env_init(void)
{
	struct env_driver *drv;
	int ret = -ENOENT;
	int prio;

	for (prio = 0; (drv = env_driver_lookup(ENVOP_INIT, prio)); prio++) {
		if (!drv->init || !(ret = drv->init()))
			env_set_inited(drv->location);
		if (ret == -ENOENT)
			env_set_inited(drv->location);

		debug("%s: Environment %s init done (ret=%d)\n", __func__,
		      drv->name, ret);

		if (gd->env_valid == ENV_INVALID)
			ret = -ENOENT;
	}

	if (!prio)
		return -ENODEV;

	if (ret == -ENOENT) {
		gd->env_addr = (ulong)&default_environment[0];
		gd->env_valid = ENV_VALID;

		return 0;
	}

	return ret;
}

int env_select(const char *name)
{
	struct env_driver *drv;
	const int n_ents = ll_entry_count(struct env_driver, env_driver);
	struct env_driver *entry;
	int prio;
	bool found = false;

	printf("Select Environment on %s: ", name);

	/* search ENV driver by name */
	drv = ll_entry_start(struct env_driver, env_driver);
	for (entry = drv; entry != drv + n_ents; entry++) {
		if (!strcmp(entry->name, name)) {
			found = true;
			break;
		}
	}

	if (!found) {
		printf("driver not found\n");
		return -ENODEV;
	}

	/* search priority by driver */
	for (prio = 0; (drv = env_driver_lookup(ENVOP_INIT, prio)); prio++) {
		if (entry->location == env_get_location(ENVOP_LOAD, prio)) {
			/* when priority change, reset the ENV flags */
			if (gd->env_load_prio != prio) {
				gd->env_load_prio = prio;
				gd->env_valid = ENV_INVALID;
				gd->flags &= ~GD_FLG_ENV_DEFAULT;
			}
			printf("OK\n");
			return 0;
		}
	}
	printf("priority not found\n");

	return -ENODEV;
}

0 hehahe hehahe 3 months ago

Prodigy 180 points

Hello Expert,
The environment we reproduced on the TI development board is as follows:
Boot method: OSPI + eMMC
MCU program resides in Flash
SoC program is stored in eMMC
MCU software: Uses the Hello World program from the SDK with additional register printing (source code provided in the attachment).
Main domain software: Our proprietary product software.

Please let us know if further details are needed.

0 hehahe hehahe 3 months ago in reply to hehahe hehahe

Prodigy 180 points

/*
 *  Copyright (C) 2023-2024 Texas Instruments Incorporated
 *
 *  Redistribution and use in source and binary forms, with or without
 *  modification, are permitted provided that the following conditions
 *  are met:
 *
 *    Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 *    Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the
 *    distribution.
 *
 *    Neither the name of Texas Instruments Incorporated nor the names of
 *    its contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <stdlib.h>
#include <kernel/dpl/DebugP.h>
#include "ti_drivers_config.h"
#include "ti_board_config.h"
#include "ti_drivers_open_close.h"
#include "ti_board_open_close.h"
#include "FreeRTOS.h"
#include "task.h"

#define MAIN_TASK_PRI  (configMAX_PRIORITIES-1)

#define MAIN_TASK_SIZE (16384U/sizeof(configSTACK_DEPTH_TYPE))
StackType_t gMainTaskStack[MAIN_TASK_SIZE] __attribute__((aligned(32)));

StaticTask_t gMainTaskObj;
TaskHandle_t gMainTask;

#define REG_RESET_WKUP  0x43018178
#define REG_RESET_MCU   0x04518178
#define REG_TEMPER      0x00B01010
#define VTM_REGISTER (*(volatile uint32_t *)0x00B0100C)

volatile uint32_t *reg_wkup_ptr = (volatile uint32_t *)REG_RESET_WKUP;
volatile uint32_t *reg_mcu_ptr = (volatile uint32_t *)REG_RESET_MCU;
volatile uint32_t *reg_temp_ptr = (volatile uint32_t *)REG_TEMPER;

void hello_world_main(void *args);
void hello_world_main01(void *args);

void freertos_main(void *args)
{
    int32_t status = SystemP_SUCCESS;

    /* Open drivers */
    Drivers_open();
    /* Open flash and board drivers */
    status = Board_driversOpen();
    DebugP_assert(status==SystemP_SUCCESS);

    hello_world_main01(NULL);

    /* Close board and flash drivers */
    Board_driversClose();
    /* Close drivers */
    Drivers_close();

    vTaskDelete(NULL);
}


int main()
{
    /* init SOC specific modules */
    System_init();
    Board_init();

    /* This task is created at highest priority, it should create more tasks and then delete itself */
    gMainTask = xTaskCreateStatic( freertos_main,   /* Pointer to the function that implements the task. */
                                  "freertos_main", /* Text name for the task.  This is to facilitate debugging only. */
                                  MAIN_TASK_SIZE,  /* Stack depth in units of StackType_t typically uint32_t on 32b CPUs */
                                  NULL,            /* We are not using the task parameter. */
                                  MAIN_TASK_PRI,   /* task priority, 0 is lowest priority, configMAX_PRIORITIES-1 is highest */
                                  gMainTaskStack,  /* pointer to stack base */
                                  &gMainTaskObj ); /* pointer to statically allocated task object memory */
    configASSERT(gMainTask != NULL);

    /* Start the scheduler to start the tasks executing. */
    vTaskStartScheduler();

    /* The following line should never be reached because vTaskStartScheduler()
    will only return if there was not enough FreeRTOS heap memory available to
    create the Idle and (if configured) Timer tasks.  Heap management, and
    techniques for trapping heap exhaustion, are described in the book text. */
    DebugP_assertNoLog(0);

    return 0;
}


void hello_world_main01(void *args)
{
    const TickType_t delay_1s = pdMS_TO_TICKS(1000); 
    DebugP_log("h----\r\n");
    while (1) 
    {
        // DebugP_log("Hello World111!\r\n");
        uint32_t reg_mcu = *reg_mcu_ptr;
        uint32_t reg_wkup = *reg_wkup_ptr;
        uint32_t reg_temp = *reg_temp_ptr;
        DebugP_log(" %x %x %x %x\r\n", reg_mcu, reg_wkup, reg_temp, VTM_REGISTER);
        vTaskDelay(delay_1s);
    }
}

0 Swargam Anil 3 months ago in reply to hehahe hehahe

TI__Mastermind 47276 points

Hello Han Tao,

I moved to right expert to comment on the above issue .

Regards,

Anil.

0 Tao Han 2 months ago in reply to Swargam Anil

TI__Expert 6465 points

Hi Anshu:

Do you reproduce the thermal reset at your side?

I has runed another test circle 6000 at weekend, attached is test log this time i can not reproduce room temperature thermal protect at my board. Could you please try to setup the same test environment at your side?

Till now test more than ~12，000 test circle, just reproduce one time at EVM board now.

Best Regards!

Han Tao

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/onoff120000.7z

0 Anshu Madwesh 2 months ago in reply to Tao Han

TI__Mastermind 18590 points

Hi Tao,

There are a lot of components involved in this setup which each can create a variable in influencing this reset observation.

From what I noted:

Are you using Linux SDK 10.0 or SDK 10.1? Please confirm.
Boot Method of OSPI and eMMC
There are changes to k3_j72xx_bandgap.c driver
Changes to the MCU Firmware

We need to isolate which of these components could cause the reset.

Can you confirm if the SoC is actually reseting or is it just the register saying its reseting due to thermal?

Thanks,

Anshu

0 Anshu Madwesh 2 months ago in reply to Anshu Madwesh

TI__Mastermind 18590 points

Hi Tao,

What does the _RST_SRC register return on a successful test cycle?

Thanks,

Anshu

0 Biao Li 2 months ago in reply to Anshu Madwesh

TI__Expert 6470 points

Hi Anshu,

before this issue occurred, there is no any change related to k3_j72xx_bandgap.c driver and mcu firmware, the orginal SDK10.1 can reproduce this issue on EVM as well. and this issue will not related the bootmode, because we have reproduced the issue using SD card and Nor flash boot.

something needs to be corrected, after customer clear the bit of WKUP_VTM_MISC_CTRL[0] ANYMAXT_OUTRG_ALERT_EN, customer said the abnormal reset is gone (after over 30000 times test confirm), previous feedback from customer has some misunderstanding.

so Anshu, we need you help why the temperature sensor value is ~48C(both on chip sensor and on custom board sensor said it is ~48C), VTM still will output the THERM_MAXTEMP_OUTRANGE_ALERT to trig the warm reset.

Anshu Madwesh said:
What does the _RST_SRC register return on a successful test cycle?

the value is 0.

Anshu Madwesh said:
Are you using Linux SDK 10.0 or SDK 10.1? Please confirm.

customer is using SDK10.0 in EVM and Custom board, Field team test 10.1 on EVM, all can reproduce this issue.

BR,

Biao

0 Biao Li 2 months ago in reply to Biao Li

TI__Expert 6470 points

Hi Anshu,

below log can prove the warm reset has occurred. not just the register value change.

warm reset log.txt.

BR,

Biao

0 Tao Han 2 months ago in reply to Biao Li

TI__Expert 6465 points

Hi Anshu:

Keep AM62P DIE temperature at ~100C looks like not increase the issue reproduce probability. We has run about 5 hours, use customer image do not report thermal reset. Just run about ~3000 circle random trigger this problem.

Best Regards!

Han Tao

This is EVM board print out log. FYI

[BOOTLOADER PROFILE] App_loadSelfcoreImage : 4242us
[BOOTLOADER_PROFILE] SBL Total Time Taken : 44668us

Image loading done, switching to application ...
Starting 2nd stage bootloader
[BOOTLOADER_PROFILE] Boot Media : FLASH
[BOOTLOADER_PROFILE] Boot Media Clock : 166.667 MHz
[BOOTLOADER_PROFILE] Boot Image Size : 909 KB
[BOOTLOADER_PROFILE] Cores present :
hsm-m4f0-0
mcu-r5f0-0
a530-0
[BOOTLOADER PROFILE] System_init : 2148us
[BOOTLOADER PROFILE] Board_init : 2us
[BOOTLOADER PROFILE] FreeRtosTask Create : 258us
[BOOTLOADER PROFILE] SBL Drivers_open : 1010us
[BOOTLOADER PROFILE] SBL Board_driversOpen : 130us
[BOOTLOADER PROFILE] App_loadImages : 3692us
[BOOTLOADER PROFILE] App_loadMCUImages : 6048us
[BOOTLOADER PROFILE] App_loadLinuxstart /usr/bin/start-dra ...
net.core.wmem_max = 4194304
net.core.wmem_default = 1048576
start_idrive ...
start_idrive finish !!!
start /usr/bin/start-dra done.
_rpmsg_char_find_rproc: 79000000.r5f does not have any virtio devices!
_rpmsg_char_find_rproc: 79000000.r5f does not have any virtio devices!

_____ _____ _ _
| _ |___ ___ ___ ___ | _ |___ ___ |_|___ ___| |_
| | _| .'| . | . | | __| _| . | | | -_| _| _|
|__|__|_| |__,|_ |___| |__| |_| |___|_| |___|___|_|
|___| |___|

root@Linuxcat /sys/class/thermal/thermal_zone*/temp
98445
99147
100195
root@Linuxcat /sys/class/thermal/thermal_zone*/temp
101064
101064
100891
root@Linuxcat /sys/class/thermal/thermal_zone*/temp
102275
103134
102963
root@Linuxcat /sys/class/thermal/thermal_zone*/temp
103819
103648
103648
root@Linuxcat /sys/class/thermal/thermal_zone*/temp
103819
105012
104672
root@Linuxcat /sys/class/thermal/thermal_zone*/temp
104502
105012
104842
root@Linuxcat /sys/class/thermal/thermal_zone*/temp
105690
105521
105860

0 Anshu Madwesh 2 months ago in reply to Tao Han

TI__Mastermind 18590 points

Hi Tao,

Can you please provide a dump of these registers?

VTM_CFG2_CLK_CTRL Register (0x00B0_1008)
VTM_CFG2_MISC_CTRL (0x00B0_100C)
VTM_CFG2_MISC_CTRL2 (0x00B0_1010)
VTM_CFG2_SAMPLE_CTRL (0x00B0_1020)

We can review these register outputs with the hardware team.

If the VTM driver is unloaded before running these test, what behavior occurs?

Thanks,

Anshu

0 Tao Han 2 months ago in reply to Anshu Madwesh

TI__Expert 6465 points

Hi Anshu:

0x00b01008 value is 0x14

0x00b0100c value is 0x1

0x00b01010 value is 0x28802F8

0x00b01020 value is 0xAA9

This is customer software LInux boot up set results.

Best Regards!

Han Tao

root@Linux:~# devmem2 0x00b01008
/dev/mem opened.
Memory mapped at address 0xffff9edf0000.
Read at address 0x00B01008 (0xffff9edf0008): 0x00000014
root@Linux:~# devmem2 0x00b0100c
/dev/mem opened.
Memory mapped at address 0xffffa04c7000.
Read at address 0x00B0100C (0xffffa04c700c): 0x00000001
root@Linux:~# devmem2 0x00b01010
/dev/mem opened.
Memory mapped at address 0xffff9daf4000.
Read at address 0x00B01010 (0xffff9daf4010): 0x028802F8
root@Linux:~# devmem2 0x00b01020
/dev/mem opened.
Memory mapped at address 0xffffb741b000.
Read at address 0x00B01020 (0xffffb741b020): 0x00000AA9

0 Anshu Madwesh 2 months ago in reply to Tao Han

TI__Mastermind 18590 points

Hi Tao,

The simplest way to unload the VTM module it to remove it from the device tree. Then recompile the device tree and test.

diff --git a/arch/arm64/boot/dts/ti/k3-am62p.dtsi b/arch/arm64/boot/dts/ti/k3-am62p.dtsi
index 809e1dbf2..9151fc9de 100644
--- a/arch/arm64/boot/dts/ti/k3-am62p.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62p.dtsi
@@ -137,7 +137,7 @@ dss1_vp1_clk: clock-divider-oldi-dss1 {
                clock-mult = <1>;
        };

-       #include "k3-am62p-j722s-common-thermal.dtsi"
+       //#include "k3-am62p-j722s-common-thermal.dtsi"
 };

 /* Now include peripherals for each bus segment */

I've run the default Linux SDK for about 100 iterations, but I didn't observe this behavior.

For each cycle, how was the device power cycled? Was the PCB's power supply removed and inserted? Or is there another method?

In the case that the device sits idle for a long period of time (meaning no cycling), will the device see a thermal reset?

Thanks,

Anshu

0 Mukul Bhatnagar 2 months ago in reply to Anshu Madwesh

TI__Guru* 84075 points

Hi Tao and team

Anshu has shared on how to disable the VTM by removing from device tree , hopefully that helps

We are trying to further understand

If the reset occurs if the VTM reset/interrupt is masked
Does the reset happen closer to boot/initialization time or later during the program running
Will the failure occur if you were to run the program for long enough duration or requires power cycling in between to reproduce. Unclear if there are any other “glitches” that are causing some issue

For die-id read out you can use the following e2e post to read out the register values

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1436394/am67a-die-registers-of-the-am67a/5520800

I want to summarize the observations so far, please correct and add as needed

The issue is present on 100% of the boards customer has tested. Failure Is ~ 1/3000 time ?
Issue can be reproduced on the EVM , using customer’s test case
Issue seems to be sensitive to the software load/interaction , a simple test may not reproduce the issue
The reported temperature where the reset happens is ~ 55C , even though it is configured to reset at ~ 123C etc
Raising the temperature of the EVM by blowing hot air , ~ 100C does not increase the frequency of failure /reset.

Let us know if we missed anything ?

0 Tao Han 2 months ago in reply to Mukul Bhatnagar

TI__Expert 6465 points

Hi Anshu:

You can use Agilent E3634A power supply automatic power on/off AM62P EVM board.

I attached the LInux script for you. Through PC UART interface you can control Agilent E3634A power supply through below command

# source poweronoff.txt

This is DIE ID capture from customer/field EVM board. Please check it.

Best Regards!

Han Tao

	Customer Board #1	Customer side EVM	TI EVM
WKUP_CTRL_MMR0_CFG0_DIE_ID0 0x43000020	0x0C8A0001	0x10520000	0x5F120000
WKUP_CTRL_MMR0_CFG0_DIE_ID1 0x43000024	0x00000000	0x00000000	0x00000000
WKUP_CTRL_MMR0_CFG0_DIE_ID2 0x43000028	0x0800FC69	0x08331FB6	0x0833E338
WKUP_CTRL_MMR0_CFG0_DIE_ID3 0x4300002C	0x0601B016	0x01036ECD	0x01038918

Fullscreen poweronoff.txt Download

#!/bin/bash

 

echo “Automatic power on off E3634A power supply：”

 

 

echo *IDN? >/dev/ttyUSB0

sleep 1

echo *IDN? >/dev/ttyUSB0

sleep 1

 

i=1

 

while [[ $i -le 12000 ]] ; do

   echo "$i"

echo Output off >/dev/ttyUSB0

echo *IDN? >/dev/ttyUSB0

sleep 5

  (( i += 1 ))

echo Output on >/dev/ttyUSB0

echo *IDN? >/dev/ttyUSB0

sleep 30

done

 

#echo Output on >/dev/ttyUSB0

#echo Output off >/dev/ttyUSB0

0 Anshu Madwesh 2 months ago in reply to Tao Han

TI__Mastermind 18590 points

Hi Tao,

We're discussing with the hardware team on the DIE ID results. I'm also working on a better setup to run iterative test on my EVM.

What is the result of the test when the VTM is removed? Please see my previous post for instructions on how to disable the VTM.

I would like to see what the VTM's value is before a thermal reset. Can you please run this script upon booting?

#!/bin/bash


while true; do
        echo "========Thermal Zone Output========="
        cat /sys/class/thermal/thermal_zone*/temp
        echo "===================================="
        sleep 4
done

Can you share the A53 logs of a failed test cycle? Please include the shutdown logs and the next boot log with the RST_SRC output. Also please include the Thermal zone output in the logs. I would like to see if the Linux logs indicate any information.

Thanks,

Anshu

0 Tao Han 2 months ago in reply to Anshu Madwesh

TI__Expert 6465 points

Hi Anshu:

Thanks for help us check DIE ID information with factory.

Please check i has send at last week. The AM62P DIE temperature has captured through minicom.

This log is AM62P EVM board at TI lab and increase Tj to ~105C tested result. I increase Tj to 105C do not accelerate issue reproduce.

Test more than >2000 circle i do not reproduce it.

soclog.txt is customer board capture trigger AM62P thermal reset printout log. I has capture it and the reset happen at AM62P start about 4.7 second.

Please check it.

Best Regards!

Han Tao

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/100ctemp.7z热复位soclog.txt

0 Anshu Madwesh 2 months ago in reply to Tao Han

TI__Mastermind 18590 points

Hi Tao,

Please share the latest device tree source file so we can review it with the software development team.

Thanks,

Anshu

0 Anshu Madwesh 2 months ago in reply to Anshu Madwesh

TI__Mastermind 18590 points

Hi Tao,

Please provide the output of these registers:

0x00B01304
0x00B01324
0x00B01344

These registers hold the VTM's Trim values (Gain and Offset). After you share this output, the hardware team can compare it against other AM62P samples to make sure the values are within reason.

Thanks,

Anshu

0 Biao Li 2 months ago in reply to Anshu Madwesh

TI__Expert 6470 points

Hi Anshu,

I don't think this trim will help. because this is not single chip issue, this is all chips issue. if the trim is wrong, the value of temp sensor will be always wrong, but now the value of temp sensor is correct, but it will randomly trig the thermal reset, anyway pls see below register value from different boards:

board1:

root@am62pxx-evm:~# devmem2 0x00B01304
/dev/mem opened.
Memory mapped at address 0xffff9e874000.
Read at address 0x00B01304 (0xffff9e874304): 0x0000080F
root@am62pxx-evm:~# devmem2 0x00B01324
/dev/mem opened.
Memory mapped at address 0xffffba89f000.
Read at address 0x00B01324 (0xffffba89f324): 0x0000080F
root@am62pxx-evm:~# devmem2 0x00B01344
/dev/mem opened.
Memory mapped at address 0xffff95b2b000.
Read at address 0x00B01344 (0xffff95b2b344): 0x0000080F

board2:

BR,

Biao

0 Biao Li 2 months ago in reply to Biao Li

TI__Expert 6470 points

Hi Anshu,

customer board:

root@Linux:~# devmem2 0x00B01304
/dev/mem opened.
Memory mapped at address 0xffffa017d000.
Read at address 0x00B01304 (0xffffa017d304): 0x0000090F
root@Linux:~# devmem2 0x00B01324
/dev/mem opened.
Memory mapped at address 0xffff93f21000.
Read at address 0x00B01324 (0xffff93f21324): 0x0000090F
root@Linux:~# devmem2 0x00B01344
/dev/mem opened.
Memory mapped at address 0xffffa40ab000.
Read at address 0x00B01344 (0xffffa40ab344): 0x0000090F
root@Linux:~#

EVM board in customer side ：
root@Linux:~# devmem2 0x00B01304
/dev/mem opened.
Memory mapped at address 0xffffb6fa4000.
Read at address 0x00B01304 (0xffffb6fa4304): 0x0000090F
root@Linux:~# devmem2 0x00B01324
/dev/mem opened.
Memory mapped at address 0xffffb85cf000.
Read at address 0x00B01324 (0xffffb85cf324): 0x0000090F
root@Linux:~# devmem2 0x00B01344
/dev/mem opened.
Memory mapped at address 0xffffa0df9000.
Read at address 0x00B01344 (0xffffa0df9344): 0x0000090F
root@Linux:~#

BR,

Biao

0 Tao Han 2 months ago in reply to Anshu Madwesh

TI__Expert 6465 points

Hi Anshu:

EVM board run RDK10.1 uboot/filesystem i use default dtb file.

Below are my hands EVM board used dtb and de-compile dts file. locate at ti-processor-sdk-linux-am62pxx-evm-10.01.10.04/board-support/ti-linux-kernel-6.6.58+git-ti/arch/arm64/boot/dts/ti. Use dtc command de-compile for you reference.

#dtc -I dtb -O dts k3-am62p5-sk.dtb -o am62p-evm.dts

Best Regars!

Han Tao

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/am62p_2D00_evm.dtshttps://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/2148.k3_2D00_am62p5_2D00_sk.dtb

0 Tao Han 2 months ago in reply to Tao Han

TI__Expert 6465 points

Hi experts:

Below is customer board power on/off waveform test results. Please review it.

Power on the start point is AM62P PMIC 3.3V power supply reach 100% voltage.

Power off the start point is PMIC NRSTOUT change to 0.

Best Regards!

han tao

	PMIC Power net name	Test point name	Power on/off reference power rail	Test case	Result（ms）
Power Up sequence	VCCA PVIN EN/PB/VSENSE GPIO4/VMON2/ADC_IN/nSLEEP1 (SYS_3V3)	TD9	Power on start point	Reference	-
	LDO2 (DVDD_3V3)↑100%- SYS_3V3↑100%	TD16	DVDD_3V3 compare with SYS_3V3 delay	Tdelay(ms)	1.184
	BUCK3 (VDD_IO_1V8)↑0%- DVDD_3V3↑100%	TD13	VDD_IO_1V8 compare with DVDD_3V3 delay	Tdelay(ms)	1.066
	LDO1 (VDDA_1V8)↑0%-VDD_IO_1V8↑0%	TD15	VDDA_1V8 compare with VDD_IO_1V8 delay time	Tdelay(ms)	-0.038763
	BUCK4 (VDD_DDR_1V1)↑0%- VDDA_1V8↑0%	TD14	VDD_DDR_1V1 compare with VDDA_1V8 delay time	Tdelay(ms)	2.347
	BUCK1 (VDD_CORE)↑0%- VDD_DDR_1V1↑0%	TD12	VDD_CORE compare with VDD_DDR_1V1 delay time	Tdelay(ms)	0.599353
	LDO3 (VCC_0V85)↑0%- VDD_CORE↑0%	TD17	VCC_0V85 compare with VDD_CORE delay time	Tdelay(ms)	0.547117
	VMON1/nINT (VDD_IO_3V3)↑0%-DVDD_3V3↑0%	TP30	VDD_IO_3V3 compare with DVDD_3V3 delay time	Tdelay(ms)	1.831
	NRSTOUT↑100%-VCC_0V85↑100%	R118	NRSTOUT compare with VCC_0V85 delay time	Tdelay(ms)	10.756
	NRSTOUT	R118	Reference start point	Reference	-
Power down sequence	VMON1 (VDD_IO_3V3)↓100%- NRSTOUT↓100%	TP30	VDD_IO_3V3 compare with NRSTOUT delay time	Tdelay(ms)	-0.039628
	LDO3 (VCC_0V85)↓100%- VDD_IO_3V3↓100%	TD17	VCC_0V85 compare with VDD_IO_3V3 delay time	Tdelay(ms)	0
	BUCK1 (VDD_CORE)↓100%- VCC_0V85↓100%	TD12	VDD_CORE compare with VCC_0V85 delay time	Tdelay(ms)	0
	BUCK4 (VDD_DDR_1V1)↓100%- VDD_CORE↓100%	TD14	VDD_DDR_1V1 compare with VDD_CORE delay time	Tdelay(ms)	0
	LDO1 (VDDA_1V8)↓100%- VDD_DDR_1V1↓100%	TD15	VDDA_1V8 compare with VDD_DDR_1V1 delay time	Tdelay(ms)	0
	BUCK3 (VDD_IO_1V8)↓100%- VDDA_1V8↓100%	TD13	VDD_IO_1V8 compare with VDDA_1V8 delay time	Tdelay(ms)	0
	LDO2 (DVDD_3V3)↓100%-VDD_IO_1V8↓100%	TP16	DVDD_3V3 compare with VDD_IO_1V8 delay time	Tdelay(ms)	0
	SYS_3V3↓100%-DVDD_3V3↓100%	TD9	SYS_3V3 compare with DVDD_3V3 delay time	Tdelay(ms)	0

0 Tao Han 2 months ago in reply to Tao Han

TI__Expert 6465 points

Hi experts:

Below are disable MCU R5F (AM62p@0x79000000) at RDK10.1 and printout reset source register at uboot, run power on/off case modified.

Want to use disable R5F core method check whether MCU-R5F trigger thermal reset at AM62P.

Disable method:

1. modify RDK10.1 kernel folder ./arch/arm64/boot/dts/ti/k3-am62p5-sk.dts file disable R5F node.

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/k3_2D00_am62p5_2D00_sk_2D00_dts.patch

2. delete all filesystem contained r5f firmware.
a. root@am62pxx-evm:/lib/firmware# rm am62p-mcu-r5f0_0-fw
root@am62pxx-evm:/lib/firmware# rm am62p-mcu-r5f0_0-fw-sec
b. usr/lib/opkg/alternatives# rm am62p-mcu-r5f0_0-fw
usr/lib/opkg/alternatives# rm am62p-mcu-r5f0_0-fw-sec

run power on 30 second, off 10 second case.

Best Regards!

Han Tao

0 Tao Han 2 months ago in reply to Tao Han

TI__Expert 6465 points

Hi experts:

Please check below logs. Looks like disable MCU R5F can not resolve this problem.

Best Regards!

Han Taohttps://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/disable_2D00_mcu_2D00_r5f.7z

0 Tao Han 1 month ago in reply to Tao Han

TI__Expert 6465 points

Hi teams:

In order to reproduce the issue and monitor VTM driver enable whether trigger thermal reset.

Please enable kernel log is 7 and provide kernel bootup log. We will use this test it at EVM board again.

Best Regards!

Han Tao

Processors

Processors forum

AM62P: low probability abnormal thermal reset