TMS570 Cortex-R5F Core - All Fatal bus error events

Laura Yindra1

Other Parts Discussed in Thread: NOWECC, UNIFLASH

I need to understand why I am seeing the Group 2 channel 3 Error Signaling Module event Cortex-R5F Core - All fatal bus error events (event reference 0x71, event description bus ECC, EVNTBUSm bit 48). This error happens most often when I include SPI 2 writes to DAT 1 and reads from BUF, and FLG . I am building the binary with Code Composer Studio 6.0.1 and working with a TMS570LC43xx.

over 9 years ago

0 QJ Wang over 9 years ago

TI__Guru**** 186196 points

Hi Laura,

Could you provide some more about your code, your compiler, your flashing tools, etc?

BTW, please refer to another thread which talks about "all fatal bus error events".

Regards,

0 Laura Yindra1 over 9 years ago in reply to QJ Wang

Prodigy 200 points

Thank you for your response.

I saw the referenced thread before I created my post, so I am overwriting 128 entries of VIM RAM (0xFFF82000U) with phantom function pointers.

My question is why do I get a fatal bus error event only when using the DAT1, BUF and FLG registers of SPI2 after running for only 5 or 10 minutes. When I remove one function call to SPI2, the error doesn't happen for hours, if ever.

I am compiling with code composer studio 6.0.1.00040, compiler version TI V5.1.6, output format: eabi (ELF), Device endianness: be32, and Runtime support Library C:/ti/ccsv6/tools/compiler/arm_5.1.6/lib/rtsv7R4_T_be_v3D16_eabi.lib. The ECC section was commented out in the linker file. I am flashing and running from Trace32.

0 Sindhu Krishna over 9 years ago in reply to Laura Yindra1

Expert 1350 points

Hello TI !

Is it okay to use "rtsv7R4_T_be_v3D16_eabi.lib" as runtime support library for R5F-Core ?

0 Charles Tsai over 9 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

HI Laura,

Can you please tell me if you have filled all the holes in the flash with known data such as 0xFFFFFFFF and their corresponding ECC calculated. You can do this using the vfill command. Please see below example linker file. I'm suspecting some type of speculative access by the CPU to the hole location where the corresponding ECC is not programmed in the flash. One more thing I will suggest is that in the reset vector please change. You will need to use the debugger to manually move line to _c_int00. Sometimes, when you connect to the device the code has already been running. We don't know if the code has caused the device to be in any unrecoverable situation that might later caused the emulator not eable to connect to the device. . However, once you reset, you force the device at reset vector branching to itself. This is just a suggestion though while you are still debugging your software.

from:

resetEntry
b _c_int00

to:

resetEntry
        b   #-8

MEMORY
{
/* USER CODE BEGIN (2) */
/* USER CODE END */
    VECTORS (X)  : origin=0x00000000 length=0x00000020 fill=0xffffffff
    FLASH0  (RX) : origin=0x00000020 length=0x001FFFE0 vfill=0xffffffff
    FLASH1  (RX) : origin=0x00200000 length=0x00200000 vfill=0xffffffff
    STACKS  (RW) : origin=0x08000000 length=0x00001500
    RAM     (RW) : origin=0x08001500 length=0x0007EB00

/* USER CODE BEGIN (3) */
	ECC_VEC  (R)   : origin=0xf0400000            length=0x4             ECC={ input_range=VECTORS }
	ECC_FLA0 (R)   : origin=0xf0400000 + 0x4      length=0x3FFFC         ECC={ input_range=FLASH0  }
	ECC_FLA1 (R)   : origin=0xf0400000 + 0x40000  length=0x40000         ECC={ input_range=FLASH1  }
/* USER CODE END */
}

/* USER CODE BEGIN (4) */
/* USER CODE END */


/*----------------------------------------------------------------------------*/
/* Section Configuration                                                      */

SECTIONS
{
/* USER CODE BEGIN (5) */
/* USER CODE END */
    .intvecs : {} > VECTORS
    .text   align(8) : {} > FLASH0 | FLASH1
    .const  align(8) : {} > FLASH0 | FLASH1
    .cinit  align(8) : {} > FLASH0 | FLASH1
    .pinit  align(8) : {} > FLASH0 | FLASH1
    .bss     : {} > RAM
    .data    : {} > RAM
    .sysmem  : {} > RAM
	

/* USER CODE BEGIN (6) */
/* USER CODE END */
}

/* USER CODE BEGIN (7) */
/* USER CODE END */


/*----------------------------------------------------------------------------*/
/* Misc                                                                       */

/* USER CODE BEGIN (8) */
ECC {
	algo_name : address_mask = 0xfffffff8
	hamming_mask             = R4
	parity_mask              = 0x0c
	mirroring                = F021
}
/* USER CODE END */

0 Laura Yindra1 over 9 years ago in reply to Charles Tsai

Prodigy 200 points

In case anyone is curious
resetEntry
b #-8
means loop back. This statement is the same thing as while(1)

If you do try vfill and your processor continuously errors a few minutes after boot up, then change all vfill statements in your linker file to fill statements. fill=0xFFFFFFFF statements only have the effect of setting all empty flash to the default value. This is good since fill will reach the areas of flash corrupted by the vfill statement. You will need to program flash with no vfill or fill if using Lauderbach to calculate the ECC.

0 Charles Tsai over 9 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

Hi Laura,
Not clear if your problem is resolved by using either vfill or fill. If you use fill then the program image will be large. If you have a small program but filled the rest of the flash spaces with 0xFFFFFFFF then the flash loader will program your image as if your program occupies the entire 4MB. This is the reason I was suggesting to use vfill. Of course if your program is already large, i.e. taking up almost the entire 4MB then doing just fill will not add too much extra programming time.

0 Laura Yindra1 over 9 years ago in reply to Charles Tsai

Prodigy 200 points

I see there are several methods of calculating the ECC in section 7.1.3 "F021 Flash Tools" of the technical reference manual. I am unclear why my ECC calculation method is not working with SPI registers and fill commands.

0 Charles Tsai over 9 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

Hi Laura,
nowECC, Uniflash or CCS all should calculate the same ECC for a given program image. Are you saying you have tried all of them and you still have the bus ECC error when accessing SPI registers?

0 Charles Tsai over 9 years ago in reply to Charles Tsai

TI__Guru**** 158945 points

Hi Laura,

Could you also clarify one question for me? You said the program will run for 5-10 minutes without problem until you have access to SPI2 registers. Until it fails meaning in the first 5-10 minutes when it was running nomally did your code ever access the SPI2 registers. Of course you may have various sections of your code that try to access the SPI2. My question is for the particular sequence of code that accesses SPI2 to cause the bus ECC error, is it executed the first time or the same sequence has executed before but did not have any bus ECC error.

Sometimes back I happened to have bus ECC error in one of my programs for LC4357. I was in an interrupt ISR. As soon as I exited the ISR I saw the bus ECC error. And the root cause of this problem was because the holes in the flash were not filled with proper ECC. When CPU exists the ISR it tries to prefetch code after the ISR. This is especially true if the CPU is trying to refill the cache line. The words which are in the same cache line as the last word of a code section will be read by the cache controller as part of the cache refill operation. If the empty spaces after the ISR or a dicontinued section did not have proper ECC it can cause bus ECC error. So I'm wondering if you have similar situation.

0 Laura Yindra1 over 9 years ago in reply to Charles Tsai

Prodigy 200 points

I did add the ECC code from your example to my linker file, but that problem with fill and vfill only went away after I re-installed Code Composer Studio.

The SPI registers are accessed once every 0.01 seconds after 1 second of boot time. This could be coincident with the R5F all fatal bus error events because SPI register accesses were loops in the newest files added to this project.

I'm still seeing the issue with SPI register access by the CPU causing Cortex-R5F Core - All fatal bus error events

0 Charles Tsai over 9 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

HI Laura,

So after the fill or vfill the bus ECC error will still occur, correct?

Sorry to ask some more questions again below.

1. Do you know for sure the bus error only happens after you access spi registers or it can happen just slightly before the spi access?

2. Since the spi registers are accessed every 0.01 seconds, do you know if the bus error happens every 0.01 second? Can you clear the bus error flag and nerror pin when you see it and find out if every 0.01 second it will reoccur again due to the same spi register access?

3. Can you send me the function that you use to access the spi registers?

0 Laura Yindra1 over 9 years ago in reply to Charles Tsai

Prodigy 200 points

1) yes the error still occurs. Probably because adding ECC_FLA1 (R) : origin=0xf0400000 + 0x40000 length=0x40000 ECC={ input_range=FLASH1 } to the linker file creates a .bin file that is larger that one bank of flash.

2) the error happens once after a minute or five

3)
uint16 sampleADCRegister(uint16 Register)
{
uint32 *ptr;
uint16 G_status = 0xBADD;
uint16 G_status2 = 0xBADD;

TxData.Command.TX_Entry.spi_TX_Bits.CSHOLD = 1;
TxData.Command.TX_Entry.spi_TX_Bits.WDEL = 0;
TxData.Command.TX_Entry.spi_TX_Bits.DFSEL = DATA_FORMAT0;
TxData.Command.TX_Entry.spi_TX_Bits.CSNR = DUALA2D_DEVICE_CS;
TxData.Command.TXDATA = (uint16)Register;
ptr = (uint32 *) &TxData.Command;

TxData.reserved0.TX_Entry.spi_TX_Bits.CSHOLD = 0;
TxData.reserved0.TX_Entry.spi_TX_Bits.WDEL = 1;
TxData.reserved0.TX_Entry.spi_TX_Bits.DFSEL = DATA_FORMAT0;
TxData.reserved0.TX_Entry.spi_TX_Bits.CSNR = DUALA2D_DEVICE_CS;
TxData.reserved0.TXDATA = 0xFFFF;

SPI_BUS->DAT1 = *ptr;

while (!(SPI_BUS->FLG & 0x200) )
{ // Wait for TXINTFLG to be set, indicating data was transmitted
timer = os_timer_is_done();
}

ptr = (uint32 *) &TxData.reserved0;
SPI_BUS->DAT1 = *ptr;

while (!(SPI_BUS->FLG & 0x200) )
{ // Wait for TXINTFLG to be set, indicating data was transmitted
}

while (!(SPI_BUS->FLG & 0x100))
{ // Wait for RXINTFLG to be set, indicating data was received
}

G_status = (uint16)((SPI_BUS->BUF & 0x00001FFE) >>1);

while (!(SPI_BUS->FLG & 0x100) )
{ // Wait for RXINTFLG to be set, indicating data was received
}
G_status2 = (uint16)((SPI_BUS->BUF & 0x00001FFE) >>1);

return G_status2;
}

0 Charles Tsai over 9 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

Hi Laura,
Can this problem be reproduced in the EVM board or only in your custom board with different sensors connected to it? If this is reproducible in the EVM maybe I can try to reproduce from my side? You can send me the project privately.

0 Aaron Spaete over 8 years ago in reply to Laura Yindra1

TI__Expert 4275 points

Charles,

Will moving the SPI implementation to Transfer Groups and away from DMA compatibility mode allow me to process interrupts during SPI transactions with out generating a ESM Group2 ECC bus error?

This question is from Laura.

Regards,
Aaron

0 Laura Yindra1 over 8 years ago in reply to Aaron Spaete

Prodigy 200 points

Real Time clock interrupts are causing SPI compatibility mode DMA to hang and never complete. Real Time clock interrupts are causing CPU reads and writes to SPI DAT1 and SPIBUF to generate a ESM Group2 ECC Bus error (Cortex-R5F Core - All Fatal bus error events).

Do you recommend SPI Transfer Groups or Errata VIM#28 Workaround?

0 Charles Tsai over 8 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

Hi Laura, Aaron,

I don't think your problem is related to VIM#28. VIM#28 can only occur when you have a "real" ECC error in the VIM RAM. If you can see multiple parts with the same issue on your bench setup then it is extremely unlikely what you are facing today is related to VIM#28.

As far as SPI transfer groups, I think you are asking to change to MultiBuffer SPI mode, correct? You can try it but I'd like to understand the root cause of the problem. Without understanding the root cause it is hard to know if the problem will occur in other circumstances.

You just mentioned the first time you somehow use the DMA. Can you please confirm if you are using the DMA? How are using the DMA?

Is the problem reproducible in the EVM? Or is this only possible with your full ECU?

0 Laura Yindra1 over 8 years ago in reply to Charles Tsai

Prodigy 200 points

Some reason the Real time clock interrupt was causing an ECC all fatal bus error when there were reads and writes to SPI BUF and SPI DAT1.

There were two things:

1) using fill to erase all of flash (vfill was not sufficient)

2) started using SPI multi-buffer mode (see following example)

rxDescriptor sampleADCRegister(uint16 Register)
{
rxDescriptor rval;
rval.G_status = 0xBADD;
rval.G_status2 = 0xBADD;

// - initialize transfer groups
MIBSPI4->TGCTRL[1U] = (uint32)((uint32)1U << 30U) // oneshot
| (uint32)((uint32)0U << 29U) // pcurrent reset
| (uint32)((uint32)TRG_ALWAYS << 20U) // trigger event
| (uint32)((uint32)TRG_DISABLED << 16U) // trigger source
| (uint32)((uint32)1U << 8U); // start buffer

tfg_command.TX_Entry.spi_TFG_Bits.BUFMODE = 4U; // buffer mode
tfg_command.TX_Entry.spi_TFG_Bits.CSHOLD = 1U; // chip select hold
tfg_command.TX_Entry.spi_TFG_Bits.WDEL = 0U; // enable WDELAY
tfg_command.TX_Entry.spi_TFG_Bits.LOCK = 0U; // lock transmission
tfg_command.TX_Entry.spi_TFG_Bits.DFSEL = DATA_FORMAT0; // data format
tfg_command.TX_Entry.spi_TFG_Bits.CSNR = DUALA2D_DEVICE_CS; // chip select
MIBSPI4RAM->tx[1].control = tfg_command.TX_Entry.spi_TFG_Config;

tfg_command.TX_Entry.spi_TFG_Bits.BUFMODE = 4U; // buffer mode
tfg_command.TX_Entry.spi_TFG_Bits.CSHOLD = 0U; // chip select hold
tfg_command.TX_Entry.spi_TFG_Bits.WDEL = 1U; // enable WDELAY
tfg_command.TX_Entry.spi_TFG_Bits.LOCK = 0U; // lock transmission
tfg_command.TX_Entry.spi_TFG_Bits.DFSEL = DATA_FORMAT0; // data format
tfg_command.TX_Entry.spi_TFG_Bits.CSNR = DUALA2D_DEVICE_CS; // chip select
MIBSPI4RAM->tx[2].control = tfg_command.TX_Entry.spi_TFG_Config;
// Fill the data to be transmitted in TXDATA field in TXRAM buffers.
MIBSPI4RAM->tx[1].data = Register;
MIBSPI4RAM->tx[2].data = 0xFFFF;

// write 1 to clear the RX full flag
MIBSPI4->FLG = 0x00000100;

// Configure TGENA bit to enable the required Transfer groups. (In case of Trigger event always setting
//TGENA will trigger the transfer group).
MIBSPI4->TGCTRL[1U] = MIBSPI4->TGCTRL[1U] | 0x80000000;
// At the occurrence of the correct trigger event the Transfer group will be triggered and data gets
//transmitted and received one after the other with out any CPU intervention.
// User can poll Transfer-group interrupt flag or wait for a transfer-completed interrupt to read and write
//new data to the buffers.
while( 0 == (MIBSPI4->FLG & 0x00000100) )
{
// wait until transfer complete
}
rval.G_status = MIBSPI4RAM->rx[1].data;
rval.G_status2 = MIBSPI4RAM->rx[2].data;
return rval;
}

0 Charles Tsai over 8 years ago in reply to Laura Yindra1

TI__Guru**** 158945 points

Hi Laura,
Let me ask again if you can reproduce the all fatal ECC fault in a standalone EVM board. I think at this point I will need to recreate your problem by having your project if your project can be run in the EVM.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570 Cortex-R5F Core - All Fatal bus error events