TMS570LS0914: Program Unknown Faillure during Operation

Javier Plaza

Part Number: TMS570LS0914
Other Parts Discussed in Thread: HALCOGEN,

Hi Everyone!

First of all, I want to say that this is my first post on the forum, so if it isn't the right place to post it, please advise me and I will correct it. I'm not an expert so I thank you in advance for your time and help.

Some context:

I'm working with a specific device application based on TMS570LS0914. The application uses a custom bootloader, and a FREETOS externally added (HalCoGen hasn't FRETOS option for this microprocessor). Also, the application uses FEE libraries to access EEPROM memory and save some data. These functionalities are the possible critical parts of the code.

The problem I'm experimenting is that microprocessor hangs during more or less 1h of normal operation. The failure doesn't occur always at exactly the same time, but in most cases, it happens around this time. The thing here is I need some help to debug with more detail which could be the exact reason for the failure.

I have been debugging and trying to stop the debugger when I detect the system failure. During the first tests, I loaded the bootloader on the microprocessor, and later configure CSS to not delete it and load the application image. Thanks to these tests, I suspect the failure is related to the FEE driver. Sometimes when failed, the system jumps to data abort handlers, and later to abort. asm. When I checked the register R14 and subtract 8 units to find the cause, I have seen that there are some FEE functions operating at the moment of the break. Regarding the configuration of the FEE driver, I have been working with it in other projects and I use the same configuration (2 Virtual blocks of the same size, and 1 or 2 data blocks; I write the data on the EEPROM when there is an update, and I call TI_Fee_MainFunction repetitively with a task of the OS). This functionality works as it should, I can save information on the memory, turn off the device and recover it without any problem. If I modified the configuration of Virtual Sector and Blocks, the application seems to hang sooner in some cases.

On the other hand, sometimes the debugger jumps outside of the application code (0x14C98 addr) and hangs there. This address is not part of the application code, so I began to suspect the bootloader and some ECC self-test functions. I have read a lot of information in forums and checked possibles incoherences between ECC test configuration for the bootloader code and the application one. Also, I read that a possible reason for the abort could be a bad addressing of the intvects of the bootloader (doesn't point to the correct address of the application code when data abort happend). However, today I tested the code starting on address 0x0 and the code also hangs, so I think it has to be something related to FEE driver or stack size. However, Every test implies at least 1 h of operation only for triggering the failure, so this method is not really optimized...

The last important thing to highlight is that when I turn off the power of the microprocessor, it works again perfectly well. So, in the case that I enabled the Watchdog, the system is capable of resetting and continuing to operate. However, I need to understand which is the reason for the problem and fix it.

Below I attached a lot of files that could be useful to understand the error (sys_intvects, sys_link, systartup.c, ti_fee_cfg..) for the bootloader and application. I could share more information if needed. I want to highlight that I have a really similar structural code (both bootloader and application) on the same microprocessor in another device with different functionality and it works perfectly fine. It uses the same critical parts that I named before, so I think that I'm suffering from some overflow, stack problem that I'm not seeing, but I need an advanced level of debugging to really catch it and fix it.

Thank you for your time, I hope you can help me to move forward with this problem.

Best regards,

CodeFiles.rar

over 2 years ago

0 QJ Wang over 2 years ago

TI__Guru**** 199426 points

Hi JP,

In your application linker cmd file, the starting address of VECTORS/FLASH1 ECC are not correct, and the KERNEL ECC is not included here.

Change the ECC sections to:

ECC_VEC (R) : origin=(0xf0400000 + (start(VECTORS) >> 3)) length=(size(VECTORS) >> 3) ECC={algorithm=algoL2R5F021, input_range=VECTORS}

ECC_KNL (R) : origin=(0xf0400000 + (start(KERNEL) >> 3)) length=(size(KERNEL) >> 3) ECC={algorithm=algoL2R5F021, input_range=KERNEL }

ECC_FLA1 (R) : origin=(0xf0400000 + (start(FLASH1) >> 3)) length=(size(FLASH1) >> 3) ECC={algorithm=algoL2R5F021, input_range=FLASH1 }

0 QJ Wang over 2 years ago in reply to QJ Wang

TI__Guru**** 199426 points

Hi JP,

I don't recommend to calculate ECC of application using Linker CMD. The Application ECC can be calculated using F021 Flash APIs when programming the application code to flash:

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Hi QJ Wang,

Thank you for your fast response.

I will try to change the address and come back to you with an update. I'm not quite sure of how these sections have to be calculated, could you explain it to me?

Regarding the ECC calculation with the Fapi API, where should I do it? Which would be the steps?

Finally, for adding some more information, the code has the next lines above the main:

    esmREG->SR1[2] = 0x00000008U; 
    esmREG->SSR2 = 0x00000008U;
    esmREG->EKR = 0x0000000A;
    esmREG->EKR = 0x00000005;

These were added for my colleague and I would like to clarify what is their goal and corroborate if they should have to be there.

Thank you again for your help and time.

Best regards

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

Javier Plaza said:
Regarding the ECC calculation with the Fapi API, where should I do it? Which would be the steps?

Please refer to the example used in TI bootloaders

Fullscreen 7360.bl_flash.c Download

//*****************************************************************************
//
// bl_flash.c     : The file holds the main control loop of the boot loader.
// Author         : QJ Wang. qjwang@ti.com
// Date           : 9-19-2012
//
// Copyright (c) 2006-2011 Texas Instruments Incorporated.  All rights reserved.
// Software License Agreement
//
// Texas Instruments (TI) is supplying this software for use solely and
// exclusively on TI's microcontroller products. The software is owned by
// TI and/or its suppliers, and is protected under applicable copyright
// laws. You may not combine this software with "viral" open-source
// software in order to form a larger program.
//
// THIS SOFTWARE IS PROVIDED "AS IS" AND WITH ALL FAULTS.
// NO WARRANTIES, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT
// NOT LIMITED TO, IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE APPLY TO THIS SOFTWARE. TI SHALL NOT, UNDER ANY
// CIRCUMSTANCES, BE LIABLE FOR SPECIAL, INCIDENTAL, OR CONSEQUENTIAL
// DAMAGES, FOR ANY REASON WHATSOEVER.
//
//*****************************************************************************

#include "bl_config.h"
#include "bl_flash.h"
#include "F021.h"
#include "flash_defines.h"

//#define Freq_In_MHz = SYS_CLK_FREQ;

//*****************************************************************************
//
// Returns the size of the ist sector size of the flash in bytes.
//
//*****************************************************************************
uint32_t
BLInternalFlashFirstSectorSizeGet(void)
{
	uint32_t firstSectorSize;
	firstSectorSize = (uint32_t)(flash_sector[0].start) + flash_sector[0].length;
    return (firstSectorSize);
}
//*****************************************************************************
//
// Returns the size of the internal flash in bytes.
//
// This function returns the total number of bytes of internal flash in the
// current part.  No adjustment is made for any sections reserved via
// options defined in bl_config.h.
//
// \return Returns the total number of bytes of internal flash.
//
//*****************************************************************************
uint32_t
BLInternalFlashSizeGet(void)
{
	uint32_t flashSize;
	flashSize = (uint32_t)flash_sector[NUMBEROFSECTORS-1].start + flash_sector[NUMBEROFSECTORS-1].length;
    return (flashSize);
}

//*****************************************************************************
//
//! Checks whether a given start address is valid for a download.
//!
//! This function checks to determine whether the given address is a valid
//! download image start address given the options defined in bl_config.h.
//!
//! \return Returns non-zero if the address is valid or 0 otherwise.
//
//*****************************************************************************
uint32_t
BLInternalFlashStartAddrCheck(uint32_t ulAddr, uint32_t ulImgSize)
{
    uint32_t count=0, i;

	uint32_t ulWholeFlashSize;

    //
    // Determine the size of the flash available on the part in use.
    //
    ulWholeFlashSize = (uint32_t)flash_sector[NUMBEROFSECTORS-1].start + flash_sector[NUMBEROFSECTORS-1].length;  /* 3MB */

	/* The start address must be at the begining of the sector */
    for (i = 0; i < NUMBEROFSECTORS; i++){
		if ((ulAddr >= (uint32_t)(flash_sector[i].start)) && (ulAddr < ((uint32_t)flash_sector[i].start + flash_sector[i].length)))
		{
			count++;
		}
	}
    if (count == 0){
    	return(0);
    }

    //
    // Is the address we were passed a valid start address?  We allow:
    //
    // 1. Address 0 if configured to update the boot loader.
    // 2. The start of the reserved block if parameter space is reserved (to
    //    allow a download of the parameter block contents).
    // 3. The application start address specified in bl_config.h.
    //
    // The function fails if the address is not one of these, if the image
    // size is larger than the available space or if the address is not word
    // aligned.
    //
    if((
#ifdef ENABLE_BL_UPDATE
                       (ulAddr != 0) &&
#endif
                        (ulAddr != APP_START_ADDRESS)) ||
                       ((ulAddr + ulImgSize) > ulWholeFlashSize) ||
                       ((ulAddr & 3) != 0))
    {
    	return(0);
    }
    else  {
        return(1);
    }
}


uint32_t Fapi_BlockErase( uint32_t Bank, uint32_t ulAddr, uint32_t Size)
{
	uint8_t  i=0, ucStartBank, ucEndBank, ucStartSector, ucEndSector;
    uint32_t EndAddr, status;

	EndAddr = ulAddr + Size;
	for (i = 0; i < NUMBEROFSECTORS; i++){
		if ((ulAddr >= (uint32_t)(flash_sector[i].start)) && (ulAddr < ((uint32_t)flash_sector[i].start + flash_sector[i].length)))
		{
			ucStartBank     = flash_sector[i].bankNumber;
		    ucStartSector   = i;
		    break;
		}
	}

	for (i = ucStartSector; i < NUMBEROFSECTORS; i++){
		if (EndAddr <= (((uint32_t)flash_sector[i].start) + flash_sector[i].length))
		{
			ucEndBank   = flash_sector[i].bankNumber;
			ucEndSector = i;
		    break;
		}
	}

	status=Fapi_initializeFlashBanks((uint32_t)SYS_CLK_FREQ); /* used for API Rev2.01 */

    for (i = ucStartBank; i < (ucEndBank + 1); i++){
        Fapi_setActiveFlashBank((Fapi_FlashBankType)i);
        Fapi_enableMainBankSectors(0xFFFF);                 /* used for API 2.01*/
        while( FAPI_CHECK_FSM_READY_BUSY != Fapi_Status_FsmReady );
    }

    for (i=ucStartSector; i<(ucEndSector+1); i++){
		Fapi_issueAsyncCommandWithAddress(Fapi_EraseSector, flash_sector[i].start);
    	while( FAPI_CHECK_FSM_READY_BUSY == Fapi_Status_FsmBusy );
    	while(FAPI_GET_FSM_STATUS != Fapi_Status_Success);
    }

    status =  Flash_Erase_Check((uint32_t)ulAddr, Size);

	return (status);
}

//Bank here is not used. We calculate the bank in the function based on the Flash-Start-addr
uint32_t Fapi_BlockProgram( uint32_t Bank, uint32_t Flash_Address, uint32_t Data_Address, uint32_t SizeInBytes)
{
	register uint32_t src = Data_Address;
	register uint32_t dst = Flash_Address;
	uint32_t bytes;

	if (SizeInBytes < 16)
		bytes = SizeInBytes;
	else
		bytes = 16;

	if ((Fapi_initializeFlashBanks((uint32_t)SYS_CLK_FREQ)) == Fapi_Status_Success){
		 (void)Fapi_setActiveFlashBank((Fapi_FlashBankType)Bank);
	     (void)Fapi_enableMainBankSectors(0xFFFF);                    /* used for API 2.01*/
	}else {
         return (1);
	}

	while( FAPI_CHECK_FSM_READY_BUSY != Fapi_Status_FsmReady );
	while( FAPI_GET_FSM_STATUS != Fapi_Status_Success );

    while( SizeInBytes > 0)
	{
		Fapi_issueProgrammingCommand((uint32_t *)dst,
									 (uint8_t *)src,
									 (uint32_t) bytes,
									 0,
									 0,
									 Fapi_AutoEccGeneration);

 		while( FAPI_CHECK_FSM_READY_BUSY == Fapi_Status_FsmBusy );
//        while(FAPI_GET_FSM_STATUS != Fapi_Status_Success);

		src += bytes;
		dst += bytes;
		SizeInBytes -= bytes;
        if ( SizeInBytes < 16){
           bytes = SizeInBytes;
        }
    }
	return (0);
}


uint32_t Fapi_UpdateStatusProgram( uint32_t Bank, uint32_t Flash_Start_Address, uint32_t Data_Start_Address, uint32_t Size_In_Bytes)
{
	register uint32_t src = Data_Start_Address;
	register uint32_t dst = Flash_Start_Address;
	unsigned int bytes, status;

	if (Size_In_Bytes < 16)
		bytes = Size_In_Bytes;
	else
		bytes = 16;

	Fapi_initializeAPI((Fapi_FmcRegistersType *)F021_CPU0_REGISTER_ADDRESS, (uint32_t)SYS_CLK_FREQ);
	Fapi_setActiveFlashBank((Fapi_FlashBankType)Bank);
	Fapi_issueProgrammingCommand((uint32_t *)dst,
									 (uint8_t *)src,
									 (uint32_t) bytes,   //8,
									 0,
									 0,
									 Fapi_AutoEccGeneration);

 	while( Fapi_checkFsmForReady() == Fapi_Status_FsmBusy );
	status =  Flash_Program_Check(Flash_Start_Address, Data_Start_Address, Size_In_Bytes);
	return (status);
}



uint32_t Flash_Program_Check(uint32_t Program_Start_Address, uint32_t Source_Start_Address, uint32_t No_Of_Bytes)
{
	register uint32_t *src1 = (uint32_t *) Source_Start_Address;
	register uint32_t *dst1 = (uint32_t *) Program_Start_Address;
	register uint32_t bytes = No_Of_Bytes;

	while(bytes > 0)
	{	
		if(*dst1++ != *src1++)
			return (1);   //error

		bytes -= 0x4;
	}
	return(0);
}	


uint32_t Flash_Erase_Check(uint32_t Start_Address, uint32_t Bytes)
{
	uint32_t error=0;
	register uint32_t *dst1 = (uint32_t *) Start_Address;
	register uint32_t bytes = Bytes;

	while(bytes > 0)
	{	
		if(*dst1++ != 0xFFFFFFFF){
			error = 2;
		}
		bytes -= 0x4;
	}
	return(error);
}



uint32_t Fapi_BlockRead( uint32_t Bank, uint32_t Flash_Start_Address, uint32_t Data_Start_Address, uint32_t Size_In_Bytes)
{
	register uint32_t src = Data_Start_Address;
	register uint32_t dst = Flash_Start_Address;
	register uint32_t bytes_remain = Size_In_Bytes;
	int bytes;

	if (Size_In_Bytes < 16)
		bytes = Size_In_Bytes;
	else
		bytes = 16;
	Fapi_initializeAPI((Fapi_FmcRegistersType *)F021_CPU0_REGISTER_ADDRESS, (uint32_t)SYS_CLK_FREQ);

 	while( bytes_remain > 0)
	{
		Fapi_doMarginReadByByte((uint8_t *)src,
								(uint8_t *)dst,
								(uint32_t) bytes,                //16
								Fapi_NormalRead);
		src += bytes;
		dst += bytes;
        bytes_remain -= bytes;
    }
	return (0);
}

Javier Plaza said:
These were added for my colleague and I would like to clarify what is their goal and corroborate if they should have to be there.

I think that adding those instructions to clear the error status is for debug purpose.

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Hi QJ Wang,

Thank you for your fast response and sorry for the delay.

I will try to read more about ECC implementation through Fapi API and come back to you if I have new doubts.

For now, I have changed the linker cmd file following your instructions. Now the programs can operate for more time without stacking (4 hours more or less). However, if I try to debug the code, the debugger loses the connection with the microprocessor before the failure occurs, so I can't corroborate exactly what is happening right now and if the failure is the same (it seems that yes). The debugger indicates that it loses the connection with the cortex, but the device continues operating until it stacks.

Any ideas of what could be happening?

Could be something related with the stack size?

Has any sense to call esm REG sentences cyclicing to clear the error status continuously? Maybe that is the reason why the debugger stops working... ?

On the other hand, I use one operating system function to recapitulate the operating time. This function should work every 1 s, however, I realized that the microprocessor is counting faster than expected, so maybe I have a wrong clock configuration. I will check and come back with news.

Thanks again for your time and help, it is very much appreciated.

Best regards,

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

I might be cause by deficient stack size, or a speculative fetch to a location with invalid ECC.

Your bootloader programs the ECC values for all of the Flash memory space. When bootloader programs the Application to flash, several flash sectors are erased by the bootloader using F021 Flash API, and the Application image is loaded into those erased sectors. The unused flash space has invalid ECC values.

You can use the linker cmd to generate ECC for erased sectors:

FLASH_CODE (RX) : origin=0x00200040 length=0x8000 - 0x40 fill=0xFFFFFFFF /*assume the application size < 0x8000*/
FLASH0 (RX) : origin=0x00028000 length=0x00200000 - 0x28000

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Hi QJ Wang,

Thank you for the fast reply.

I understand the problem, and it makes sense to me. However, I'm not sure how to apply that modification to my code. Should I modify the link.cmd of the bootloader, or the application code? Also, I don't fully understand why the origin is 0x00200040; in the application code, the vectors start at 0x00020100, and in the bootloader, the Flash 1 starts at 0x0002000, so why this address? Could you apply the required modification using my code as an example? Sorry, I'm a beginner and have really poor knowledge regarding link.cmd and ECC.

Also, I would like to understand why it is happening on this code and not on other devices. I have exactly the same configuration and OS in other devices with the same microprocessor and more complex functions, and I'm not facing this kind of issue...

In the meantime, I will try to reserve more space for the stack and corroborate if it could be the reason for the failure.

Thanks again.

Best regards

0 Javier Plaza over 2 years ago in reply to Javier Plaza

Prodigy 30 points

Update:

I have changed the clock of the operating system and the timing now is correct.

Also, I have increased the stack size of one of the most consuming tasks of the OS, and the stack size of the whole application code (from 1500 to 2500). It seems to have no effect on the operation. Currently, the code stops again after around 1 h of operation. Now I can debug again, so I checked and the code has stacked two times in the address 0x14C98 (outside code), and other times on dabort.asm and later on the corresponding 0x10 address.

does it add any more information?

I think the best idea is to modify the unused flash space as you said. However, it is really strange to me that this operation failure only happens on this device, when, as I explained, I have the exact same memory and FEE configuration in other devices that haven't any problem, so if it would be the unused flash space, it should happen to on the other microprocessor?

Best regards,

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

Javier Plaza said:
I don't fully understand why the origin is 0x00200040

I assume that the application is loaded locations starting from 0x200020

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

Javier Plaza said:
in the application code, the vectors start at 0x00020100, and in the bootloader, the Flash 1 starts at 0x0002000, so why this address? Could you apply the required modification using my code as an example?

Javier Plaza said:
so I checked and the code has stacked two times in the address 0x14C98 (outside code),

Your application is located at 0x20100, but the fault address is at 0x14C98. The fault is caused by the bootloader, right? Do the content of 0x14C98 has a correct ECC value?

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Hi QJ Wang,

Thank you for the information.

I have changed the .cmd and filled with 0xFFFFFFFF the different memory address. The code seems like this right now:

    VECTORS (X)  : origin=0x00020100 length=0x00000020
    //KERNEL  (RX) : origin=0x00020120 length=0x00008000
    KERNEL  (RX) : origin=0x00020120 length=0x00008000 - 0x120 fill=0xFFFFFFFF
    FLASH1  (RX) : origin=0x00028120 length=0x100000-0x20000 -0x8000 -0x20-0x100
    STACKS  (RW) : origin=0x08000000 length=0x00002000
    KRAM    (RW) : origin=0x08002000 length=0x00000800
    RAM     (RW) : origin=(0x08002000+0x800) length=(0x20000 - 0x2000-0x800)

    /*STACKS  (RW) : origin=0x08000000 length=0x00002000
    KRAM    (RW) : origin=0x08002000 length=0x00000800
    RAM     (RW) : origin=(0x08002000+0x800) length=(0x20000 - 0x800-0x800)*/

    //ECC_VEC (R)  : origin=0xf0400000 length=0x4 ECC={algorithm=algoL2R5F021, input_range=VECTORS }
    //ECC_FLA1 (R) : origin=0xf0440000 length=0x40000 ECC={algorithm=algoL2R5F021, input_range=FLASH1 }

    ECC_VEC (R)   : origin=(0xf0400000 + (start(VECTORS) >> 3))  length=(size(VECTORS) >> 3)  ECC={algorithm=algoL2R5F021, input_range=VECTORS}

	ECC_KNL (R)   : origin=(0xf0400000 + (start(KERNEL) >> 3))     length=(size(KERNEL) >> 3)     ECC={algorithm=algoL2R5F021, input_range=KERNEL }

	ECC_FLA1 (R) : origin=(0xf0400000 + (start(FLASH1) >> 3))      length=(size(FLASH1) >> 3)      ECC={algorithm=algoL2R5F021, input_range=FLASH1 }

Also, I have added the following lines to read the problematic memory address and I can read the data and result is Succes. I share a screenshot of the memory browser in that position. The code I have used is the next one:

            Fapi_FlashReadMarginModeType a = Fapi_NormalRead;
            uint32_t addr = 0x00014C98;
            Fapi_StatusType result = Fapi_doMarginRead((uint32_t*) addr ,(uint32_t*) &buffer_rx, 1, a);

However, when I executed the new code with the debugger, I received the following error from the CSS console (after around 30-45 minutes of operation) However the code was still running:

"IcePick: Error: (Error -150 @ 0x0) One of the FTDI driver functions used during configuration returned a invalid status or an error. (Emulation package 9.9.0.00040)
Dap: Error: (Error -154 @ 0x0) One of the FTDI driver functions used to write data returned bad status or an error. (Emulation package 9.9.0.00040)
IcePick: Unable to determine target status after 20 attempts
IcePick: Failed to remove the debug state from the target before disconnecting. There may still be breakpoint op-codes embedded in program memory. It is recommended that you reset the emulator before you connect and reload your program before you continue debugging
CortexR4: JTAG Communication Error: (Error -2063 @ 0x0) Unable to reset device. Power-cycle the board. If error persists, confirm configuration and/or try more reliable JTAG settings (e.g. lower TCLK). (Emulation package 9.9.0.00040)
CortexR4: Failed to remove the debug state from the target before disconnecting. There may still be breakpoint op-codes embedded in program memory. It is recommended that you reset the emulator before you connect and reload your program before you continue debugging"

It seems that the microprocessor losses the connection with the debugger. This failure is happening very often. Do you have any idea what could be happening? something related with HW?

I have executed the code with the new modifications; this time, it hangs after 3h of operation). When I reset the device after the hang and turn it on again (without deleting or modifying memory), It started directly at the bootloader, but it doesn't jump to the application by itself (it should go directly to the application code). So I think that maybe the relationship between the application code and the bootloader is not correct, due to, it seems to occur a data abort failure that jumps to the bootloader, and the program hangs there and is not capable to continue with the application or re-launch it.

In the first post, I shared a few docs of the bootloader and application .cmd and sys_startup file. I would like to know if I should add or modify the information there to have a correct link between both codes (I think it won't be necessary because these configurations have always worked for me). For example, the configuration of the ECC and self-test are more exhaustive during the startup of the bootloader but are simpler in the application code. Also in one post, I read this:

"Enable ESRAM ECC Check" will add checkRAMECC(); to sys_startup.c file. The details of the function is given below. So there will be a deliberate data abort caused which requires _dabort function in dabort.asm.. In your case since the booloader's intvecs is loaded to address 0, which does not have branch to _dabort routine at data abort since it is part of application code. Either you remove Enable ESRAM ECC Check from application or find a way to branch to _dabort during data abort."

I share the link of the post, because there they talk about ECC, data_abort and ESM which are the things I have less under control in this code, so I think it could be useful. However. I tried many of the things they proposed there and couldn't fix the problem.

Another thing is, could you please explain to me how could I detect if it would be a stack overflow problem? When I modify the stack size, the program sometimes hangs later or sooner in comparison with the normal size (0x1500), so probably it can be related to the problem. However I can check the stack usage of every function, but I don't know how to see the real usage in execution time or detect if the failure is for this reason.

Finally, I have a final suspicion, if it would be a HW failure (incorrect memory positions), would it explain all these different errors? The reason I suspect that is that if I change the virtual sectors that I used and the size of the block (for the EEPROM flash simulation), the time that the program is capable to run changes, so I think maybe there is some corruption in any position of the memory and it dosn't fail until reach that point... At this moment I don't know what more could be... (In the past we had to change one microprocessor from this device because a SPI associated pin had broken).

Sorry for the long message, but I think is better to share with you as much as possible information. Thank you for your patience and help.

Best regards,

0 Javier Plaza over 2 years ago in reply to Javier Plaza

Prodigy 30 points

Update: I have finished a new cycle of operation, this time has been 2 h and the debugger doesn't loss the connection. So I have been able to stop the program and debug a little more the problem. I will post exactly what the program is doing to see if it tells you more information:

First, when I stop the debugger when the program hangs, it jumps to sys_intvecs.asm to the _dabort line. After that, it goes to dabort.asm.

Second, the code navigates to all the instructions in _dabort, and later in noRAMerror (during this instruction, the code jumps to custom_dabort handler).

The next step is the following screen:

It seems the debugger can't find the library. I understand it could be because the library is charged on the memory through the bootloader, or something like that - see sys_link.cmd of the bootloader for more details). Yesterday I experienced the same failure when I executed the code application from address 0x0 without the bootloader (to discard the error from the bootloader). Then I saw the error and thought that it would be caused because the bootloader wasn't present and it couldn't charge the file... However, right now with bootloader configuration and experimenting with the same error...

Then, the code jumps to 0x10 address (because of the dabort), and if I continue executing, the whole process is repeated.

I hope this gives you some more useful information.

0 Javier Plaza over 2 years ago in reply to Javier Plaza

Prodigy 30 points

Final Update: I have been seeing the configuration of the Build and the debugger, and I will post some photos so you can see if it is correct.

Heap and Stack size: Should I increase it here? or only in the .cmd file?

Arm linker file search options: I have included the F021 API Library. However, if I go to the path in the computer, on the source folder there aren't any .c files, I suppose they are inside the library.

On the linker Output (Arm linker) I had written 2 sentences (In this case I don't know why, so I would like to corroborate that here is not the problem...)

Finally, in the Debug/Source LookUp path I don't have any path, should I add the F021 library here too? to avoid the last problem. Note that on all my computer I do not have the file Blanck_Check.c

Thank you again for your time and help.

Best regards.

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

Hi Javier,

The stack memory is typically used in the following constructs:

1. On function calls to save register content (such as the link register (LR) for the return address)

2. Local function variables are stored on the stack when no CPU registers are available.

3. For interrupt service (ISR) execution, the registers are store on the stack.

You should perform the analysis of the stack usage to determine how big the stack should be set for your application.

The Stack Usage View in Code Composer Studio (available in CCS 6.2 and higher) provides a static view of stack usage for your application. The information is generated on project build and displayed as a function call tree with stack usages for each function in a horizontal bar graph.

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Hi QJ Wang,

Thank you for your information. I have already used that screen (Stack usage memory). On it, I can see that there is a function that uses 100% of its inclusive stack (2444 units). The reason is that this function is called really often by the OS and have inside other function that uses "sprintf" which requires a lot of stacks. However, I don't think that the problem is here. On other devices, I have the same usage, and it is not a problem.

To calculate the estimated stack, should I add the inclusive or exclusive size for each function?

Could you please give more information about the other questions I did on the post? It would be really useful, I'm really stacked on this problem and need some help.

Finally, do you think that it could be an HW problem? Is microprocessor memory corrupted or something like that?

Thank you for your help.

0 QJ Wang over 2 years ago in reply to QJ Wang

TI__Guru**** 199426 points

The stack size in map file (in debug folder) is what you defined in linker cmd file: staring address, and length. It defines where the stack is located. The _coreInitStackPointer_() in sys_core.asm defines stack size for each mode, and --stack_size in CCS linker option should match the total size defined in _coreInitStackPointer_() function.

When stack is overflow, you will get an unexpected error.

0 QJ Wang over 2 years ago in reply to QJ Wang

TI__Guru**** 199426 points

If you use the linker cmd script to generate ECC, the ECC option in project property should be set to "ON"

0 QJ Wang over 2 years ago in reply to QJ Wang

TI__Guru**** 199426 points

BlankCheck is an API defined in F021 Flash Library.

If FEE is used, TI_FeeInternal_BlankCheck(..) is used to perform the blank check.

0 QJ Wang over 2 years ago in reply to QJ Wang

TI__Guru**** 199426 points

I mentioned that I don't recommend ECC generation using Linker CMD for the Application firmware. You can use Linker CMD to generate ECC for bootloader and the whole flash memory.

In my bootloader example, the Application firmware has to be binary format and is programmed to flash with its ECC by bootloader. If the linker cmd is used to generate ECC for Application, the size of the binary file is very large (>3GB).

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Dear QJ,

Thank you for all your information and sorry for the late reply. I have been working on this problem trying to go deeper and understand what is happening.

I have created a new project and deleted all the functions related to the FEE driver, and this project works perfectly. This in addition to the information on the dabort registers, indicates that the problem is directly related to this driver. I will give you more information about the problem to see if you can finally get the reason or explanation.

The device is capable of working properly and saving information on the Flash Bank 7 through the FEE driver. However, at some point in the operation, the data abort, and the program stucks. At this point, the WD is capable of restarting the microprocessor and running the program again. The first time it happens, the device can sometimes recover the information from the flash, but other times not. Currently, I have observed other behaviors that could be interesting to find the error. When the program hangs and restarts a few times, it begins to work properly (even loading the data of the flash correctly), however, when it reaches the first instruction to write the data on the FEE, it hangs again. So the program is capable to read of the flash, but not writing.

I want to highlight some important points:

I have been working with a lot of devices with this FEE configuration, and I have never seen this behavior (data abort + erasing flash). I have checked on the manual that I’m following all the requiere steps to update the data properly -> Calling cycling the Main Fee functions, giving privilage mode to the FreeRTOS function that performs this process, configuring at least 2 virtual sectors, etc. I don't think that this is a configuration or ECC problem.
I use a FreeRTOS added externally, due to HalCoGen hasn't a version with OS for this microprocessor.
I'm using really fast serial communication (1Mb/s). I thought that it could have some weight on the problem because I constantly communicate with the device sending instructions so it may affect the writing process of the flash. However, I raise the frequency to the microprocessor up to 150 MHz (from 100MHz) to try to avoid any timing problems.

So finally, after months of debugging, I started to think that it could be a HW problem. I have read your post (TMS570LS3137: TI FEE - Struck in TI_FeeInternal_PollFlashStatus() - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums) and read the datasheet of the microprocessor and I'm aware that a bad powering of the device could cause some problems at FEE driver. I would like to understand better what could be this problem and how could I identify it. I have been measuring the consumption and voltage, and I don't see a lower voltage than 3V, but maybe something is happening regarding the start-up or power-off sequence and slew rates. Could you please tell me which kind of powering malfunction could cause this kind of problem? I have changed the microprocessor to confirm it isn't damaged, but the behavior is the same.

Finally, I have been reading about the different error functions present on the FEE driver. I would like to implement some of it to try to have a better understanding of the problem. Maybe you could help me with that. Which function would you use for this case? Would it be right to write them inside the task in charge of the Main FEE function? How could I handle the result to really understand the problem? Note that is hard to debug this problem because not always happen at the same time, and if it takes a lot of hours, the debugger may lose the connection, so I would prefer to implement some method to send the error codes through the serial communication to guarantee that I can identify it. I think I could identify the problem and use the data abort handler to show it.

I really need to fix this problem. As I said, I have been working with the FEE driver with this microprocessor and I have never seen something like that.

Thank you again for your time and effort.

Best regards

0 Javier Plaza over 2 years ago in reply to Javier Plaza

Prodigy 30 points

Dear QJ,

I am still working on this problem. I have a few updates, and it would be great to have some feedback from you.

We are hardly testing the device's board with the problem. We have disconnected all the peripherals to avoid any external powering issues. The thing is the next one:

We have tested the same code in another device with the same microprocessor and similar architecture, and the FEE driver is working well (there is no stuck or data loss).

The original board which is suffering from this problem, has the same configuration around the microprocessor (same capacitors and resistor network, same powering components, etc). However, we are experiencing the following issue: We power on the board, and it starts to work with the FEE driver. If we turn off and turn off the device at the beginning of the operation, the FEE driver is capable of finding the "reference data"(a known number saved on the first position of the block). However, if we wait a couple of minutes, the driver isn't capable of recovering the reference data, but the device still works properly. On any occasion, the driver has been capable to recover the reference data, after a power cycle where it had lost it.

So, from our point of view, it seems to be an HW problem, because this problem isn't replicated in another similar microprocessor. We are measuring power on and off sequences and comparing them to see if there is any problem with the voltage of current feeding to the microprocessor. We need to know and understand which kind of bad powering issues could affect the FEE Driver. We haven't found any information on the driver's documentation, but in the post, I cited in the last comment, you said that a voltage below 3 V could affect the driver's work. Please explain all the causes that could affect the FEE driver. The information is not being deleted, but the microprocessor can't find it.

Thank you again for your time.

Bets regards

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

Javier Plaza said:
The information is not being deleted, but the microprocessor can't find it.

Is the data block has correct block status: 0xFFFFFFFF00000000 ? If the status is invalid (0xFFFF000000000000), the FEE driver will not be able to read the block data. If a device is reset while programming the status, then the bits being programmed when reset is asserted are indeterminate, so the status becomes invalid.

Javier Plaza said:
We power on the board, and it starts to work with the FEE driver. If we turn off and turn off the device at the beginning of the operation, the FEE driver is capable of finding the "reference data"(a known number saved on the first position of the block). However, if we wait a couple of minutes, the driver isn't capable of recovering the reference data, but the device still works properly. On any occasion, the driver has been capable to recover th

This is very strange. Does the board follow the power-up sequence? The minimum nPORRST hold time is 1ms.

The minimum IO power supply and flash pump supply is 3V:

0 Javier Plaza over 2 years ago in reply to QJ Wang

Prodigy 30 points

Dear QJ,

Thank you for your fast response.

We have been checking all the signals involved in power-on and power-off sequences. We have changed a capacitor to ensure that the PORRST signal raises slowly enough to accomplish all the timing requirements. However, the microprocessor is still not capable to recover the data after a few cycles of working.

Let me give you some more information: We have been testing and we discover that if we changed the frequency to save information on the memory, it takes more time to lose the reference. Let me give some specific times: If we save information on the memory every 15 seconds, it takes around 5 minutes in lost the data reference (around 20 writing instructions). However, if we select 5 s to call the FEE WriteSync function, it takes around 1:30 or 2 minutes to lose the reference (again close to 20 writing instructions). Note that we perform a power on/off after this time to corroborate if the reference data has been lost or not. We call the TEE_main function through the FRETOS every 10 ms. If we select other virtual blocks, this number of cycles changes a bit. We are currently using Virtual sectors 0 and 1, if we try 5 and 6, and a time of writing of 15 s, it takes around 7 minutes to lose the reference.

This code works properly on other board with the same microprocessor. So we really don't know what could it cause this problem. Any toughs?

Could the microprocessor flash memory be broken? or have any malfunctions? In that case, what could cause a malfunction of this kind? In the past, we changed the microprocessor and the new one worked exactñy like this, so if we change again we want to corroborate the possible causes of the malfunction to not repeat it.

We have read that is not recommended to write the flash too many times, and that is important to wait enough between writing. Do you have any information about this topic? is there any stipulate limit of times that you can write the flash? Please call the information regarding this would be really useful.

Best regards.

0 QJ Wang over 2 years ago in reply to Javier Plaza

TI__Guru**** 199426 points

Javier Plaza said:
We are currently using Virtual sectors 0 and 1, if we try 5 and 6, and a time of writing of 15 s, it takes around 7 minutes to lose the reference.

How many virtual sectors are used in your configuration? HalCoGen doesn't support more than 4 virtual sectors.

Can you run a simple FEE example (halcogen example) without using freeRTOS to see if you have the same problem on this MCU?

Javier Plaza said:
We have read that is not recommended to write the flash too many times, and that is important to wait enough between writing. Do you have any information about this topic?

For bank 7, it supports up to 100K write/read cycles. After 100k cycles, write/erase takes longer to complete and is not guaranteed.