This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62P: low probability abnormal thermal reset

Part Number: AM62P

Tool/software:

Hi Expert,

Customer report one abnormal warm reset issue, from the MCU_CTRL_MMR_CFG0_RST_SRC register, the value is 0x10, bit4 turn 1, and from the TRM, the reason of this warm reset is Thermal reset, so I let customer print the temp of VTM sensor, and Customers also used cooling solutions, the print of the VTM sensor always is around 48 and 50C, I also check the WKUP_VTM_MISC_CTRL2[25-16] MAXT_OUTRG_ALERT_THR, the value is 0x2f8, it represent 123C, so that this is no possible is a real Thermal reset, and I also let customer disable the VTM out of range alter as well. clear the WKUP_VTM_MISC_CTRL[0] ANYMAXT_OUTRG_ALERT_EN bit and change the source code to disable the Thermal reset as well. this bit is always 0 during the test, and the low probability (1/3000) abnormal reset is still here.

so I think although the reset reason said it is thermal reset, but it must be a mistake, it can't be the thermal reset, but I can 't find the reason of abnormal reset. and I also let customer to have test on TI EVM, it can be reproduced as well using the TI default SDK in around 700 times power on and off. they are using OSPI+EMMC boot mode, the emmc speed is DDR52. I need you help to find out any reason will cause the SOC trig the warm reset and record it as thermal reset?

BR,

Biao 

  • Hi Experts:

    We can reproduce customer report problem at AM62P EVM board.

    Below is reproduce method.

    HW: AM62P-SK EVM board run at 25C room temperature

    SW: SD card boot method run RDK10.1 AM62P prebuild file system + print out register 0x0451 8178h MMR0_RST_SRC register uboot. 

          attached env.c file are our modifyed.

    Test method, power on EVM board run 30 second insure AM62P enter linux OS, then power off 5 second.

    We can find that at log line 1342306 happen thermal reset. We believe it is same as customer side issue it is abnormal thermal reset.

    Please help us reproduce at your side and debug with us together.

    Best Regards!

    Han Tao

    Run those test 3000 circuit we find at 

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/minicom_2D00_20250709.7z

    // SPDX-License-Identifier: GPL-2.0+
    /*
     * Copyright (C) 2017 Google, Inc
     * Written by Simon Glass <sjg@chromium.org>
     */
    
    #include <common.h>
    #include <env.h>
    #include <env_internal.h>
    #include <log.h>
    #include <asm/global_data.h>
    #include <linux/bitops.h>
    #include <linux/bug.h>
    
    DECLARE_GLOBAL_DATA_PTR;
    
    static struct env_driver *_env_driver_lookup(enum env_location loc)
    {
    	struct env_driver *drv;
    	const int n_ents = ll_entry_count(struct env_driver, env_driver);
    	struct env_driver *entry;
    
    	drv = ll_entry_start(struct env_driver, env_driver);
    	for (entry = drv; entry != drv + n_ents; entry++) {
    		if (loc == entry->location)
    			return entry;
    	}
    
    	/* Not found */
    	return NULL;
    }
    
    static enum env_location env_locations[] = {
    #ifdef CONFIG_ENV_IS_IN_EEPROM
    	ENVL_EEPROM,
    #endif
    #ifdef CONFIG_ENV_IS_IN_EXT4
    	ENVL_EXT4,
    #endif
    #ifdef CONFIG_ENV_IS_IN_FAT
    	ENVL_FAT,
    #endif
    #ifdef CONFIG_ENV_IS_IN_FLASH
    	ENVL_FLASH,
    #endif
    #ifdef CONFIG_ENV_IS_IN_MMC
    	ENVL_MMC,
    #endif
    #ifdef CONFIG_ENV_IS_IN_NAND
    	ENVL_NAND,
    #endif
    #ifdef CONFIG_ENV_IS_IN_NVRAM
    	ENVL_NVRAM,
    #endif
    #ifdef CONFIG_ENV_IS_IN_REMOTE
    	ENVL_REMOTE,
    #endif
    #ifdef CONFIG_ENV_IS_IN_SPI_FLASH
    	ENVL_SPI_FLASH,
    #endif
    #ifdef CONFIG_ENV_IS_IN_UBI
    	ENVL_UBI,
    #endif
    #ifdef CONFIG_ENV_IS_NOWHERE
    	ENVL_NOWHERE,
    #endif
    };
    
    static bool env_has_inited(enum env_location location)
    {
    	return gd->env_has_init & BIT(location);
    }
    
    static void env_set_inited(enum env_location location)
    {
    	/*
    	 * We're using a 32-bits bitmask stored in gd (env_has_init)
    	 * using the above enum value as the bit index. We need to
    	 * make sure that we're not overflowing it.
    	 */
    	BUILD_BUG_ON(ENVL_COUNT > BITS_PER_LONG);
    
    	gd->env_has_init |= BIT(location);
    }
    
    /**
     * arch_env_get_location() - Returns the best env location for an arch
     * @op: operations performed on the environment
     * @prio: priority between the multiple environments, 0 being the
     *        highest priority
     *
     * This will return the preferred environment for the given priority.
     * This is overridable by architectures if they need to and has lower
     * priority than board side env_get_location() override.
     *
     * All implementations are free to use the operation, the priority and
     * any other data relevant to their choice, but must take into account
     * the fact that the lowest prority (0) is the most important location
     * in the system. The following locations should be returned by order
     * of descending priorities, from the highest to the lowest priority.
     *
     * Returns:
     * an enum env_location value on success, a negative error code otherwise
     */
    __weak enum env_location arch_env_get_location(enum env_operation op, int prio)
    {
    	if (prio >= ARRAY_SIZE(env_locations))
    		return ENVL_UNKNOWN;
    
    	return env_locations[prio];
    }
    
    /**
     * env_get_location() - Returns the best env location for a board
     * @op: operations performed on the environment
     * @prio: priority between the multiple environments, 0 being the
     *        highest priority
     *
     * This will return the preferred environment for the given priority.
     * This is overridable by boards if they need to.
     *
     * All implementations are free to use the operation, the priority and
     * any other data relevant to their choice, but must take into account
     * the fact that the lowest prority (0) is the most important location
     * in the system. The following locations should be returned by order
     * of descending priorities, from the highest to the lowest priority.
     *
     * Returns:
     * an enum env_location value on success, a negative error code otherwise
     */
    __weak enum env_location env_get_location(enum env_operation op, int prio)
    {
    	return arch_env_get_location(op, prio);
    }
    
    /**
     * env_driver_lookup() - Finds the most suited environment location
     * @op: operations performed on the environment
     * @prio: priority between the multiple environments, 0 being the
     *        highest priority
     *
     * This will try to find the available environment with the highest
     * priority in the system.
     *
     * Returns:
     * NULL on error, a pointer to a struct env_driver otherwise
     */
    static struct env_driver *env_driver_lookup(enum env_operation op, int prio)
    {
    	enum env_location loc = env_get_location(op, prio);
    	struct env_driver *drv;
    
    	if (loc == ENVL_UNKNOWN)
    		return NULL;
    
    	drv = _env_driver_lookup(loc);
    	if (!drv) {
    		debug("%s: No environment driver for location %d\n", __func__,
    		      loc);
    		return NULL;
    	}
    
    	return drv;
    }
    
    int env_load(void)
    {
    	struct env_driver *drv;
    	int best_prio = -1;
    	int prio;
    
    	uint32_t  RST_SRC;
    
    	if (CONFIG_IS_ENABLED(ENV_WRITEABLE_LIST)) {
    		/*
    		 * When using a list of writeable variables, the baseline comes
    		 * from the built-in default env. So load this first.
    		 */
    		env_set_default(NULL, 0);
    	}
    
    	for (prio = 0; (drv = env_driver_lookup(ENVOP_LOAD, prio)); prio++) {
    		int ret;
    
    		if (!env_has_inited(drv->location))
    			continue;
    
    		printf("Loading Environment from %s... ", drv->name);
    		/*
    		 * In error case, the error message must be printed during
    		 * drv->load() in some underlying API, and it must be exactly
    		 * one message.
    		 */
    		ret = drv->load();
    		if (!ret) {
    			printf("OK\n");
    			gd->env_load_prio = prio;
            RST_SRC=*(volatile uint32_t *) 0x43018178;
    		    printf("The WKUP_MMR0_RST_SRC value is 0X %x\n", RST_SRC);  
    			return 0;
    		} else if (ret == -ENOMSG) {
    			/* Handle "bad CRC" case */
    			if (best_prio == -1)
    				best_prio = prio;
    		} else {
    			debug("Failed (%d)\n", ret);
    		}
    	}
    
    	/*
    	 * In case of invalid environment, we set the 'default' env location
    	 * to the best choice, i.e.:
    	 *   1. Environment location with bad CRC, if such location was found
    	 *   2. Otherwise use the location with highest priority
    	 *
    	 * This way, next calls to env_save() will restore the environment
    	 * at the right place.
    	 */
    	if (best_prio >= 0)
    		debug("Selecting environment with bad CRC\n");
    	else
    		best_prio = 0;
    
    	gd->env_load_prio = best_prio;
    
    	return -ENODEV;
    }
    
    int env_reload(void)
    {
    	struct env_driver *drv;
    
    	drv = env_driver_lookup(ENVOP_LOAD, gd->env_load_prio);
    	if (drv) {
    		int ret;
    
    		printf("Loading Environment from %s... ", drv->name);
    
    		if (!env_has_inited(drv->location)) {
    			printf("not initialized\n");
    			return -ENODEV;
    		}
    
    		ret = drv->load();
    		if (ret)
    			printf("Failed (%d)\n", ret);
    		else
    			printf("OK\n");
    
    		if (!ret)
    			return 0;
    	}
    
    	return -ENODEV;
    }
    
    int env_save(void)
    {
    	struct env_driver *drv;
    
    	drv = env_driver_lookup(ENVOP_SAVE, gd->env_load_prio);
    	if (drv) {
    		int ret;
    
    		printf("Saving Environment to %s... ", drv->name);
    		if (!drv->save) {
    			printf("not possible\n");
    			return -ENODEV;
    		}
    
    		if (!env_has_inited(drv->location)) {
    			printf("not initialized\n");
    			return -ENODEV;
    		}
    
    		ret = drv->save();
    		if (ret)
    			printf("Failed (%d)\n", ret);
    		else
    			printf("OK\n");
    
    		if (!ret)
    			return 0;
    	}
    
    	return -ENODEV;
    }
    
    int env_erase(void)
    {
    	struct env_driver *drv;
    
    	drv = env_driver_lookup(ENVOP_ERASE, gd->env_load_prio);
    	if (drv) {
    		int ret;
    
    		if (!drv->erase) {
    			printf("not possible\n");
    			return -ENODEV;
    		}
    
    		if (!env_has_inited(drv->location)) {
    			printf("not initialized\n");
    			return -ENODEV;
    		}
    
    		printf("Erasing Environment on %s... ", drv->name);
    		ret = drv->erase();
    		if (ret)
    			printf("Failed (%d)\n", ret);
    		else
    			printf("OK\n");
    
    		if (!ret)
    			return 0;
    	}
    
    	return -ENODEV;
    }
    
    int env_init(void)
    {
    	struct env_driver *drv;
    	int ret = -ENOENT;
    	int prio;
    
    	for (prio = 0; (drv = env_driver_lookup(ENVOP_INIT, prio)); prio++) {
    		if (!drv->init || !(ret = drv->init()))
    			env_set_inited(drv->location);
    		if (ret == -ENOENT)
    			env_set_inited(drv->location);
    
    		debug("%s: Environment %s init done (ret=%d)\n", __func__,
    		      drv->name, ret);
    
    		if (gd->env_valid == ENV_INVALID)
    			ret = -ENOENT;
    	}
    
    	if (!prio)
    		return -ENODEV;
    
    	if (ret == -ENOENT) {
    		gd->env_addr = (ulong)&default_environment[0];
    		gd->env_valid = ENV_VALID;
    
    		return 0;
    	}
    
    	return ret;
    }
    
    int env_select(const char *name)
    {
    	struct env_driver *drv;
    	const int n_ents = ll_entry_count(struct env_driver, env_driver);
    	struct env_driver *entry;
    	int prio;
    	bool found = false;
    
    	printf("Select Environment on %s: ", name);
    
    	/* search ENV driver by name */
    	drv = ll_entry_start(struct env_driver, env_driver);
    	for (entry = drv; entry != drv + n_ents; entry++) {
    		if (!strcmp(entry->name, name)) {
    			found = true;
    			break;
    		}
    	}
    
    	if (!found) {
    		printf("driver not found\n");
    		return -ENODEV;
    	}
    
    	/* search priority by driver */
    	for (prio = 0; (drv = env_driver_lookup(ENVOP_INIT, prio)); prio++) {
    		if (entry->location == env_get_location(ENVOP_LOAD, prio)) {
    			/* when priority change, reset the ENV flags */
    			if (gd->env_load_prio != prio) {
    				gd->env_load_prio = prio;
    				gd->env_valid = ENV_INVALID;
    				gd->flags &= ~GD_FLG_ENV_DEFAULT;
    			}
    			printf("OK\n");
    			return 0;
    		}
    	}
    	printf("priority not found\n");
    
    	return -ENODEV;
    }
    

  • Hello Expert,
    The environment we reproduced on the TI development board is as follows:
    Boot method: OSPI + eMMC
    MCU program resides in Flash
    SoC program is stored in eMMC
    MCU software: Uses the Hello World program from the SDK with additional register printing (source code provided in the attachment).
    Main domain software: Our proprietary product software.

    Please let us know if further details are needed.

  • /*
     *  Copyright (C) 2023-2024 Texas Instruments Incorporated
     *
     *  Redistribution and use in source and binary forms, with or without
     *  modification, are permitted provided that the following conditions
     *  are met:
     *
     *    Redistributions of source code must retain the above copyright
     *    notice, this list of conditions and the following disclaimer.
     *
     *    Redistributions in binary form must reproduce the above copyright
     *    notice, this list of conditions and the following disclaimer in the
     *    documentation and/or other materials provided with the
     *    distribution.
     *
     *    Neither the name of Texas Instruments Incorporated nor the names of
     *    its contributors may be used to endorse or promote products derived
     *    from this software without specific prior written permission.
     *
     *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
     *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
     *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
     *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
     *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
     *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
     *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
     *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
     *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
     *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
     *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     */
    
    #include <stdlib.h>
    #include <kernel/dpl/DebugP.h>
    #include "ti_drivers_config.h"
    #include "ti_board_config.h"
    #include "ti_drivers_open_close.h"
    #include "ti_board_open_close.h"
    #include "FreeRTOS.h"
    #include "task.h"
    
    #define MAIN_TASK_PRI  (configMAX_PRIORITIES-1)
    
    #define MAIN_TASK_SIZE (16384U/sizeof(configSTACK_DEPTH_TYPE))
    StackType_t gMainTaskStack[MAIN_TASK_SIZE] __attribute__((aligned(32)));
    
    StaticTask_t gMainTaskObj;
    TaskHandle_t gMainTask;
    
    #define REG_RESET_WKUP  0x43018178
    #define REG_RESET_MCU   0x04518178
    #define REG_TEMPER      0x00B01010
    #define VTM_REGISTER (*(volatile uint32_t *)0x00B0100C)
    
    volatile uint32_t *reg_wkup_ptr = (volatile uint32_t *)REG_RESET_WKUP;
    volatile uint32_t *reg_mcu_ptr = (volatile uint32_t *)REG_RESET_MCU;
    volatile uint32_t *reg_temp_ptr = (volatile uint32_t *)REG_TEMPER;
    
    void hello_world_main(void *args);
    void hello_world_main01(void *args);
    
    void freertos_main(void *args)
    {
        int32_t status = SystemP_SUCCESS;
    
        /* Open drivers */
        Drivers_open();
        /* Open flash and board drivers */
        status = Board_driversOpen();
        DebugP_assert(status==SystemP_SUCCESS);
    
        hello_world_main01(NULL);
    
        /* Close board and flash drivers */
        Board_driversClose();
        /* Close drivers */
        Drivers_close();
    
        vTaskDelete(NULL);
    }
    
    
    int main()
    {
        /* init SOC specific modules */
        System_init();
        Board_init();
    
        /* This task is created at highest priority, it should create more tasks and then delete itself */
        gMainTask = xTaskCreateStatic( freertos_main,   /* Pointer to the function that implements the task. */
                                      "freertos_main", /* Text name for the task.  This is to facilitate debugging only. */
                                      MAIN_TASK_SIZE,  /* Stack depth in units of StackType_t typically uint32_t on 32b CPUs */
                                      NULL,            /* We are not using the task parameter. */
                                      MAIN_TASK_PRI,   /* task priority, 0 is lowest priority, configMAX_PRIORITIES-1 is highest */
                                      gMainTaskStack,  /* pointer to stack base */
                                      &gMainTaskObj ); /* pointer to statically allocated task object memory */
        configASSERT(gMainTask != NULL);
    
        /* Start the scheduler to start the tasks executing. */
        vTaskStartScheduler();
    
        /* The following line should never be reached because vTaskStartScheduler()
        will only return if there was not enough FreeRTOS heap memory available to
        create the Idle and (if configured) Timer tasks.  Heap management, and
        techniques for trapping heap exhaustion, are described in the book text. */
        DebugP_assertNoLog(0);
    
        return 0;
    }
    
    
    void hello_world_main01(void *args)
    {
        const TickType_t delay_1s = pdMS_TO_TICKS(1000); 
        DebugP_log("h----\r\n");
        while (1) 
        {
            // DebugP_log("Hello World111!\r\n");
            uint32_t reg_mcu = *reg_mcu_ptr;
            uint32_t reg_wkup = *reg_wkup_ptr;
            uint32_t reg_temp = *reg_temp_ptr;
            DebugP_log(" %x %x %x %x\r\n", reg_mcu, reg_wkup, reg_temp, VTM_REGISTER);
            vTaskDelay(delay_1s);
        }
    }

  • Hello Han Tao,

    I moved to right expert to comment on the above issue .

    Regards,

    Anil.

  • Hi Anshu:

    Do you reproduce the thermal reset at your side?

    I has runed another test circle 6000 at weekend, attached is test log this time i can not reproduce room temperature thermal protect at my board. Could you please try to setup the same test environment at your side?

    Till now test more than ~12,000 test circle, just reproduce one time at EVM board now.

    Best Regards!

    Han Tao

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/onoff120000.7z

  • Hi Tao,

    There are a lot of components involved in this setup which each can create a variable in influencing this reset observation. 

    From what I noted:

    • Are you using Linux SDK 10.0 or SDK 10.1? Please confirm.
    • Boot Method of OSPI and eMMC
    • There are changes to k3_j72xx_bandgap.c driver
    • Changes to the MCU Firmware

    We need to isolate which of these components could cause the reset.

    Can you confirm if the SoC is actually reseting or is it just the register saying its reseting due to thermal?

    Thanks,

    Anshu

  • Hi Tao,

    What does the _RST_SRC register return on a successful test cycle?

    Thanks,

    Anshu

  • Hi Anshu,

    before this issue occurred, there is no any change related to k3_j72xx_bandgap.c driver and mcu firmware, the orginal SDK10.1 can reproduce this issue on EVM as well. and this issue will not related the bootmode, because we have reproduced the issue using SD card and Nor flash boot. 

    something needs to be corrected, after customer clear the bit of WKUP_VTM_MISC_CTRL[0] ANYMAXT_OUTRG_ALERT_EN, customer said the abnormal reset is gone (after over 30000 times test confirm), previous feedback from customer has some misunderstanding.

    so Anshu, we need you help why the temperature sensor value is ~48C(both on chip sensor and on custom board sensor said it is ~48C), VTM still will output the THERM_MAXTEMP_OUTRANGE_ALERT to trig the warm reset.

    What does the _RST_SRC register return on a successful test cycle?

    the value is 0.

    Are you using Linux SDK 10.0 or SDK 10.1? Please confirm.

    customer is using SDK10.0 in EVM and Custom board, Field team test 10.1 on EVM, all can reproduce this issue.

    BR,

    Biao

  • Hi Anshu,

    below log can prove the warm reset has occurred. not just the register value change.

    warm reset log.txt.

    BR,

    Biao

  • Hi Anshu:

    Keep AM62P DIE temperature at ~100C looks like not increase the issue reproduce probability. We has run about 5 hours, use customer image do not report thermal reset. Just run about ~3000 circle random trigger this problem.

    Best Regards!

    Han Tao

    This is EVM board print out log. FYI

    [BOOTLOADER PROFILE] App_loadSelfcoreImage : 4242us
    [BOOTLOADER_PROFILE] SBL Total Time Taken : 44668us

    Image loading done, switching to application ...
    Starting 2nd stage bootloader
    [BOOTLOADER_PROFILE] Boot Media : FLASH
    [BOOTLOADER_PROFILE] Boot Media Clock : 166.667 MHz
    [BOOTLOADER_PROFILE] Boot Image Size : 909 KB
    [BOOTLOADER_PROFILE] Cores present :
    hsm-m4f0-0
    mcu-r5f0-0
    a530-0
    [BOOTLOADER PROFILE] System_init : 2148us
    [BOOTLOADER PROFILE] Board_init : 2us
    [BOOTLOADER PROFILE] FreeRtosTask Create : 258us
    [BOOTLOADER PROFILE] SBL Drivers_open : 1010us
    [BOOTLOADER PROFILE] SBL Board_driversOpen : 130us
    [BOOTLOADER PROFILE] App_loadImages : 3692us
    [BOOTLOADER PROFILE] App_loadMCUImages : 6048us
    [BOOTLOADER PROFILE] App_loadLinuxstart /usr/bin/start-dra ...
    net.core.wmem_max = 4194304
    net.core.wmem_default = 1048576
    start_idrive ...
    start_idrive finish !!!
    start /usr/bin/start-dra done.
    _rpmsg_char_find_rproc: 79000000.r5f does not have any virtio devices!
    _rpmsg_char_find_rproc: 79000000.r5f does not have any virtio devices!

    _____ _____ _ _
    | _ |___ ___ ___ ___ | _ |___ ___ |_|___ ___| |_
    | | _| .'| . | . | | __| _| . | | | -_| _| _|
    |__|__|_| |__,|_ |___| |__| |_| |___|_| |___|___|_|
    |___| |___|

    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    98445
    99147
    100195
    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    101064
    101064
    100891
    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    102275
    103134
    102963
    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    103819
    103648
    103648
    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    103819
    105012
    104672
    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    104502
    105012
    104842
    root@Linuxcat /sys/class/thermal/thermal_zone*/temp
    105690
    105521
    105860

  • Hi Tao,

    Can you please provide a dump of these registers?

    • VTM_CFG2_CLK_CTRL Register (0x00B0_1008)
    • VTM_CFG2_MISC_CTRL (0x00B0_100C)
    • VTM_CFG2_MISC_CTRL2 (0x00B0_1010)
    • VTM_CFG2_SAMPLE_CTRL (0x00B0_1020)

    We can review these register outputs with the hardware team.

    If the VTM driver is unloaded before running these test, what behavior occurs?

    Thanks,

    Anshu

  • Hi Anshu:

    0x00b01008 value is 0x14

    0x00b0100c value is 0x1

    0x00b01010 value is 0x28802F8

    0x00b01020 value is 0xAA9

    This is customer software LInux boot up set results.

    Best Regards!

    Han Tao


    root@Linux:~# devmem2 0x00b01008
    /dev/mem opened.
    Memory mapped at address 0xffff9edf0000.
    Read at address 0x00B01008 (0xffff9edf0008): 0x00000014
    root@Linux:~# devmem2 0x00b0100c
    /dev/mem opened.
    Memory mapped at address 0xffffa04c7000.
    Read at address 0x00B0100C (0xffffa04c700c): 0x00000001
    root@Linux:~# devmem2 0x00b01010
    /dev/mem opened.
    Memory mapped at address 0xffff9daf4000.
    Read at address 0x00B01010 (0xffff9daf4010): 0x028802F8
    root@Linux:~# devmem2 0x00b01020
    /dev/mem opened.
    Memory mapped at address 0xffffb741b000.
    Read at address 0x00B01020 (0xffffb741b020): 0x00000AA9

  • Hi Tao,

    The simplest way to unload the VTM module it to remove it from the device tree. Then recompile the device tree and test.

    diff --git a/arch/arm64/boot/dts/ti/k3-am62p.dtsi b/arch/arm64/boot/dts/ti/k3-am62p.dtsi
    index 809e1dbf2..9151fc9de 100644
    --- a/arch/arm64/boot/dts/ti/k3-am62p.dtsi
    +++ b/arch/arm64/boot/dts/ti/k3-am62p.dtsi
    @@ -137,7 +137,7 @@ dss1_vp1_clk: clock-divider-oldi-dss1 {
                    clock-mult = <1>;
            };
    
    -       #include "k3-am62p-j722s-common-thermal.dtsi"
    +       //#include "k3-am62p-j722s-common-thermal.dtsi"
     };
    
     /* Now include peripherals for each bus segment */
    

    I've run the default Linux SDK for about 100 iterations, but I didn't observe this behavior.

    For each cycle, how was the device power cycled? Was the PCB's power supply removed and inserted? Or is there another method?

    In the case that the device sits idle for a long period of time (meaning no cycling), will the device see a thermal reset?

    Thanks,

    Anshu

  • Hi Tao and team

    Anshu has shared on how to disable the VTM by removing from device tree , hopefully that helps

     

    We are trying to further understand

    1. If the reset occurs if the VTM reset/interrupt is masked
    2. Does the reset happen closer to boot/initialization time or later during the program running
    3. Will the failure occur if you were to run the program for long enough duration or requires power cycling in between to reproduce. Unclear if there are any other “glitches” that are causing some issue

     

     

    For die-id read out you can use the following e2e post to read out the register values

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1436394/am67a-die-registers-of-the-am67a/5520800

     

    I want to summarize the observations so far, please correct and add as needed

     

    1. The issue is present on 100% of the boards customer has tested.  Failure Is ~ 1/3000 time ?
    2. Issue can be reproduced on the EVM , using customer’s test case
    3. Issue seems to be sensitive to the software load/interaction , a simple test may not reproduce the issue
    4. The reported temperature where the reset happens is ~ 55C , even though it is configured to reset at ~ 123C etc
    5. Raising the temperature of the EVM by blowing hot air , ~ 100C does not increase the frequency of failure /reset.

     

    Let us know if we missed anything ?

  • Hi Anshu:

    You can use Agilent E3634A power supply automatic power on/off AM62P EVM board.

    I attached the LInux script for you. Through PC UART interface you can control Agilent E3634A power supply through below command

    # source poweronoff.txt

    This is DIE ID capture from customer/field EVM board. Please check it.

    Best Regards!

    Han Tao

    Customer Board #1 Customer side EVM TI EVM
    WKUP_CTRL_MMR0_CFG0_DIE_ID0
    0x43000020
    0x0C8A0001 0x10520000 0x5F120000
    WKUP_CTRL_MMR0_CFG0_DIE_ID1
    0x43000024
    0x00000000 0x00000000 0x00000000
    WKUP_CTRL_MMR0_CFG0_DIE_ID2
    0x43000028
    0x0800FC69 0x08331FB6 0x0833E338
    WKUP_CTRL_MMR0_CFG0_DIE_ID3
    0x4300002C
    0x0601B016 0x01036ECD 0x01038918

    #!/bin/bash
    
     
    
    echo “Automatic power on off E3634A power supply:”
    
     
    
     
    
    echo *IDN? >/dev/ttyUSB0
    
    sleep 1
    
    echo *IDN? >/dev/ttyUSB0
    
    sleep 1
    
     
    
    i=1
    
     
    
    while [[ $i -le 12000 ]] ; do
    
       echo "$i"
    
    echo Output off >/dev/ttyUSB0
    
    echo *IDN? >/dev/ttyUSB0
    
    sleep 5
    
      (( i += 1 ))
    
    echo Output on >/dev/ttyUSB0
    
    echo *IDN? >/dev/ttyUSB0
    
    sleep 30
    
    done
    
     
    
    #echo Output on >/dev/ttyUSB0
    
    #echo Output off >/dev/ttyUSB0
    
     
    

  • Hi Tao,

    We're discussing with the hardware team on the DIE ID results. I'm also working on a better setup to run iterative test on my EVM.

    What is the result of the test when the VTM is removed? Please see my previous post for instructions on how to disable the VTM.

    I would like to see what the VTM's value is before a thermal reset. Can you please run this script upon booting?

    #!/bin/bash
    
    
    while true; do
            echo "========Thermal Zone Output========="
            cat /sys/class/thermal/thermal_zone*/temp
            echo "===================================="
            sleep 4
    done
    


    Can you share the A53 logs of a failed test cycle? Please include the shutdown logs and the next boot log with the RST_SRC output. Also please include the Thermal zone output in the logs. I would like to see if the Linux logs indicate any information.



    Thanks,

    Anshu

  • Hi Anshu:

    Thanks for help us check DIE ID information with factory.

    Please check i has send at last week. The AM62P DIE temperature has captured through minicom.

    This log is AM62P EVM board at TI lab and increase Tj to ~105C tested result. I increase Tj to 105C do not  accelerate issue reproduce.

    Test more than >2000 circle i do not reproduce it.

    soclog.txt is customer board capture trigger AM62P thermal reset printout log. I has capture it and the reset happen at AM62P start about 4.7 second.

    Please check it.

    Best Regards!

    Han Tao

     https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/100ctemp.7z热复位soclog.txt

  • Hi Tao,

    Please share the latest device tree source file so we can review it with the software development team.

    Thanks,

    Anshu

  • Hi Tao,

    Please provide the output of these registers:

    • 0x00B01304
    • 0x00B01324
    • 0x00B01344

    These registers hold the VTM's Trim values (Gain and Offset). After you share this output, the hardware team can compare it against other AM62P samples to make sure the values are within reason.

    Thanks,

    Anshu

  • Hi Anshu,

    I don't think this trim will help. because this is not single chip issue, this is all chips issue. if the trim is wrong, the value of temp sensor will be always wrong, but now the value of temp sensor is correct, but it will randomly trig the thermal reset, anyway pls see below register value from different boards:

    board1:

    root@am62pxx-evm:~# devmem2 0x00B01304
    /dev/mem opened.
    Memory mapped at address 0xffff9e874000.
    Read at address 0x00B01304 (0xffff9e874304): 0x0000080F
    root@am62pxx-evm:~# devmem2 0x00B01324
    /dev/mem opened.
    Memory mapped at address 0xffffba89f000.
    Read at address 0x00B01324 (0xffffba89f324): 0x0000080F
    root@am62pxx-evm:~# devmem2 0x00B01344
    /dev/mem opened.
    Memory mapped at address 0xffff95b2b000.
    Read at address 0x00B01344 (0xffff95b2b344): 0x0000080F

    board2:

    root@am62pxx-evm:~# devmem2 0x00B01304
    /dev/mem opened.
    Memory mapped at address 0xffff9e874000.
    Read at address 0x00B01304 (0xffff9e874304): 0x0000080F
    root@am62pxx-evm:~# devmem2 0x00B01324
    /dev/mem opened.
    Memory mapped at address 0xffffba89f000.
    Read at address 0x00B01324 (0xffffba89f324): 0x0000080F
    root@am62pxx-evm:~# devmem2 0x00B01344
    /dev/mem opened.
    Memory mapped at address 0xffff95b2b000.
    Read at address 0x00B01344 (0xffff95b2b344): 0x0000080F

    BR,

    Biao

  • Hi Anshu,

    customer board:

    root@Linux:~# devmem2 0x00B01304
    /dev/mem opened.
    Memory mapped at address 0xffffa017d000.
    Read at address 0x00B01304 (0xffffa017d304): 0x0000090F
    root@Linux:~# devmem2 0x00B01324
    /dev/mem opened.
    Memory mapped at address 0xffff93f21000.
    Read at address 0x00B01324 (0xffff93f21324): 0x0000090F
    root@Linux:~# devmem2 0x00B01344
    /dev/mem opened.
    Memory mapped at address 0xffffa40ab000.
    Read at address 0x00B01344 (0xffffa40ab344): 0x0000090F
    root@Linux:~#

    EVM board in customer side :
    root@Linux:~# devmem2 0x00B01304
    /dev/mem opened.
    Memory mapped at address 0xffffb6fa4000.
    Read at address 0x00B01304 (0xffffb6fa4304): 0x0000090F
    root@Linux:~# devmem2 0x00B01324
    /dev/mem opened.
    Memory mapped at address 0xffffb85cf000.
    Read at address 0x00B01324 (0xffffb85cf324): 0x0000090F
    root@Linux:~# devmem2 0x00B01344
    /dev/mem opened.
    Memory mapped at address 0xffffa0df9000.
    Read at address 0x00B01344 (0xffffa0df9344): 0x0000090F
    root@Linux:~#

    BR,

    Biao

  • Hi Anshu:

    EVM board run RDK10.1 uboot/filesystem i use default dtb file.

    Below are my hands EVM board used dtb and de-compile dts file. locate at ti-processor-sdk-linux-am62pxx-evm-10.01.10.04/board-support/ti-linux-kernel-6.6.58+git-ti/arch/arm64/boot/dts/ti. Use dtc command de-compile for you reference.

    #dtc -I dtb -O dts  k3-am62p5-sk.dtb -o am62p-evm.dts

    Best Regars!

    Han Tao

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/am62p_2D00_evm.dtshttps://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/2148.k3_2D00_am62p5_2D00_sk.dtb

  • Hi experts:

    Below is customer board power on/off waveform test results. Please review it.

    Power on the start point is AM62P PMIC 3.3V power supply reach 100% voltage.

    Power off the start point is PMIC NRSTOUT change to 0.

    Best Regards!

    han tao

    PMIC Power net name Test point name Power on/off reference power rail Test case Result(ms)
    Power Up sequence VCCA
    PVIN
    EN/PB/VSENSE
    GPIO4/VMON2/ADC_IN/nSLEEP1
    (SYS_3V3)
    TD9 Power on start point Reference -
    LDO2
    (DVDD_3V3)↑100%-
    SYS_3V3↑100%
    TD16 DVDD_3V3 compare with SYS_3V3 delay Tdelay(ms) 1.184
    BUCK3
    (VDD_IO_1V8)↑0%-
    DVDD_3V3↑100%
    TD13 VDD_IO_1V8 compare with DVDD_3V3 delay Tdelay(ms) 1.066
    LDO1
    (VDDA_1V8)↑0%-VDD_IO_1V8↑0%
    TD15 VDDA_1V8 compare with VDD_IO_1V8 delay time  Tdelay(ms) -0.038763
    BUCK4
    (VDD_DDR_1V1)↑0%-
    VDDA_1V8↑0%
    TD14 VDD_DDR_1V1 compare with VDDA_1V8 delay time  Tdelay(ms) 2.347
    BUCK1
    (VDD_CORE)↑0%-
    VDD_DDR_1V1↑0%
    TD12 VDD_CORE compare with VDD_DDR_1V1 delay time  Tdelay(ms) 0.599353
    LDO3
    (VCC_0V85)↑0%-
    VDD_CORE↑0%
    TD17 VCC_0V85 compare with VDD_CORE delay time  Tdelay(ms) 0.547117
    VMON1/nINT
    (VDD_IO_3V3)↑0%-DVDD_3V3↑0%
    TP30 VDD_IO_3V3 compare with DVDD_3V3 delay time  Tdelay(ms) 1.831
    NRSTOUT↑100%-VCC_0V85↑100% R118 NRSTOUT compare with VCC_0V85 delay time  Tdelay(ms) 10.756
    NRSTOUT R118 Reference start point Reference -
    Power down sequence VMON1
    (VDD_IO_3V3)↓100%-
    NRSTOUT↓100%
    TP30 VDD_IO_3V3 compare with NRSTOUT delay time  Tdelay(ms) -0.039628
    LDO3
    (VCC_0V85)↓100%-
    VDD_IO_3V3↓100%
    TD17 VCC_0V85 compare with VDD_IO_3V3 delay time  Tdelay(ms) 0
    BUCK1
    (VDD_CORE)↓100%-
    VCC_0V85↓100%
    TD12 VDD_CORE compare with VCC_0V85 delay time  Tdelay(ms) 0
    BUCK4
    (VDD_DDR_1V1)↓100%-
    VDD_CORE↓100%
    TD14 VDD_DDR_1V1 compare with VDD_CORE delay time  Tdelay(ms) 0
    LDO1
    (VDDA_1V8)↓100%-
    VDD_DDR_1V1↓100%
    TD15 VDDA_1V8 compare with VDD_DDR_1V1 delay time  Tdelay(ms) 0
    BUCK3
    (VDD_IO_1V8)↓100%-
    VDDA_1V8↓100%
    TD13 VDD_IO_1V8 compare with VDDA_1V8 delay time  Tdelay(ms) 0
    LDO2
    (DVDD_3V3)↓100%-VDD_IO_1V8↓100%
    TP16 DVDD_3V3 compare with VDD_IO_1V8 delay time  Tdelay(ms) 0
    SYS_3V3↓100%-DVDD_3V3↓100% TD9 SYS_3V3 compare with DVDD_3V3 delay time  Tdelay(ms) 0
  • Hi experts:

    Below are disable MCU R5F (AM62p@0x79000000) at RDK10.1 and printout reset source register at uboot, run power on/off case modified.

    Want to use disable R5F core method check whether MCU-R5F trigger thermal reset at AM62P.

    Disable method:

    1. modify RDK10.1 kernel folder ./arch/arm64/boot/dts/ti/k3-am62p5-sk.dts file disable R5F node.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/k3_2D00_am62p5_2D00_sk_2D00_dts.patch

    2. delete all filesystem contained r5f firmware.
             a. root@am62pxx-evm:/lib/firmware# rm am62p-mcu-r5f0_0-fw
                 root@am62pxx-evm:/lib/firmware# rm am62p-mcu-r5f0_0-fw-sec
              b. usr/lib/opkg/alternatives# rm am62p-mcu-r5f0_0-fw
                   usr/lib/opkg/alternatives# rm am62p-mcu-r5f0_0-fw-sec

    run power on 30 second, off 10 second case.

    Best Regards!

    Han Tao

  • Hi experts:

    Please check below logs. Looks like disable MCU R5F can not resolve this problem.

    Best Regards!

    Han Taohttps://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/disable_2D00_mcu_2D00_r5f.7z

  • Hi teams:

    In order to reproduce the issue and monitor VTM driver enable whether trigger thermal reset.

    Please enable kernel log is 7 and provide kernel bootup log. We will use this test it at EVM board again.

    Best Regards!

    Han Tao