This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570 Cortex-R5F Core - All Fatal bus error events

Other Parts Discussed in Thread: NOWECC, UNIFLASH

I need to understand why I am seeing the Group 2 channel 3 Error Signaling Module event Cortex-R5F Core - All fatal bus error events (event reference 0x71, event description bus ECC, EVNTBUSm bit 48).  This error happens most often when I include SPI 2 writes to DAT 1 and reads from BUF, and FLG .  I am building the binary with Code Composer Studio 6.0.1 and working with a TMS570LC43xx.

  • Hi Laura,

    Could you provide some more about your code, your compiler, your flashing tools, etc?

    BTW, please refer to another thread which talks about "all fatal bus error events".

    Regards,

    QJ

  • Thank you for your response.

    I saw the referenced thread before I created my post, so I am overwriting 128 entries of VIM RAM (0xFFF82000U) with phantom function pointers.  

    My question is why do I get a fatal bus error event only when using the DAT1, BUF and FLG registers of SPI2 after running for only 5 or 10 minutes.  When I remove one function call to SPI2, the error doesn't happen for hours, if ever.

    I am compiling with code composer studio 6.0.1.00040, compiler version TI V5.1.6, output format: eabi (ELF), Device endianness: be32, and Runtime support Library C:/ti/ccsv6/tools/compiler/arm_5.1.6/lib/rtsv7R4_T_be_v3D16_eabi.lib.  The ECC section was commented out in the linker file.  I am flashing and running from Trace32.

  • Hello TI !

    Is it okay to use "rtsv7R4_T_be_v3D16_eabi.lib" as runtime support library for R5F-Core ?
  • HI Laura,

      Can you please tell me if you have filled all the holes in the flash with known data such as 0xFFFFFFFF and their corresponding ECC calculated. You can do this using the vfill command. Please see below example linker file.  I'm suspecting some type of speculative access by the CPU to the hole location where the corresponding ECC is not programmed in the flash. One more thing I will suggest is that in the reset vector please change. You will need to use the debugger to manually move line to _c_int00. Sometimes, when you connect to the device the code has already been running. We don't know if the code has caused the device to be in any unrecoverable situation that might later caused the emulator not eable to connect to the device. . However, once you reset, you force the device at reset vector branching to itself. This is just a suggestion though while you are still debugging your software.

    from:

    resetEntry
            b   _c_int00

    to:

    resetEntry
            b   #-8

     
    MEMORY
    {
    /* USER CODE BEGIN (2) */
    /* USER CODE END */
        VECTORS (X)  : origin=0x00000000 length=0x00000020 fill=0xffffffff
        FLASH0  (RX) : origin=0x00000020 length=0x001FFFE0 vfill=0xffffffff
        FLASH1  (RX) : origin=0x00200000 length=0x00200000 vfill=0xffffffff
        STACKS  (RW) : origin=0x08000000 length=0x00001500
        RAM     (RW) : origin=0x08001500 length=0x0007EB00
    
    /* USER CODE BEGIN (3) */
    	ECC_VEC  (R)   : origin=0xf0400000            length=0x4             ECC={ input_range=VECTORS }
    	ECC_FLA0 (R)   : origin=0xf0400000 + 0x4      length=0x3FFFC         ECC={ input_range=FLASH0  }
    	ECC_FLA1 (R)   : origin=0xf0400000 + 0x40000  length=0x40000         ECC={ input_range=FLASH1  }
    /* USER CODE END */
    }
    
    /* USER CODE BEGIN (4) */
    /* USER CODE END */
    
    
    /*----------------------------------------------------------------------------*/
    /* Section Configuration                                                      */
    
    SECTIONS
    {
    /* USER CODE BEGIN (5) */
    /* USER CODE END */
        .intvecs : {} > VECTORS
        .text   align(8) : {} > FLASH0 | FLASH1
        .const  align(8) : {} > FLASH0 | FLASH1
        .cinit  align(8) : {} > FLASH0 | FLASH1
        .pinit  align(8) : {} > FLASH0 | FLASH1
        .bss     : {} > RAM
        .data    : {} > RAM
        .sysmem  : {} > RAM
    	
    
    /* USER CODE BEGIN (6) */
    /* USER CODE END */
    }
    
    /* USER CODE BEGIN (7) */
    /* USER CODE END */
    
    
    /*----------------------------------------------------------------------------*/
    /* Misc                                                                       */
    
    /* USER CODE BEGIN (8) */
    ECC {
    	algo_name : address_mask = 0xfffffff8
    	hamming_mask             = R4
    	parity_mask              = 0x0c
    	mirroring                = F021
    }
    /* USER CODE END */

     

     

  • In case anyone is curious
    resetEntry
    b #-8
    means loop back. This statement is the same thing as while(1)

    If you do try vfill and your processor continuously errors a few minutes after boot up, then change all vfill statements in your linker file to fill statements. fill=0xFFFFFFFF statements only have the effect of setting all empty flash to the default value. This is good since fill will reach the areas of flash corrupted by the vfill statement. You will need to program flash with no vfill or fill if using Lauderbach to calculate the ECC.

  • Hi Laura,
    Not clear if your problem is resolved by using either vfill or fill. If you use fill then the program image will be large. If you have a small program but filled the rest of the flash spaces with 0xFFFFFFFF then the flash loader will program your image as if your program occupies the entire 4MB. This is the reason I was suggesting to use vfill. Of course if your program is already large, i.e. taking up almost the entire 4MB then doing just fill will not add too much extra programming time.
  • I see there are several methods of calculating the ECC in section 7.1.3 "F021 Flash Tools" of the technical reference manual. I am unclear why my ECC calculation method is not working with SPI registers and fill commands.
  • Hi Laura,
    nowECC, Uniflash or CCS all should calculate the same ECC for a given program image. Are you saying you have tried all of them and you still have the bus ECC error when accessing SPI registers?
  • Hi Laura,

     Could you also clarify one question for me? You said the program will run for 5-10 minutes without problem until you have access to SPI2 registers. Until it fails meaning in the first 5-10 minutes when it was running nomally did your code ever access the SPI2 registers. Of course you may have various sections of your code that try to access the SPI2. My question is for the particular sequence of code that accesses SPI2 to cause the bus ECC error, is it executed the first time or the same sequence has executed before but did not have any bus ECC error.

      Sometimes back I happened to have bus ECC error in one of my programs for LC4357. I was in an interrupt ISR. As soon as I exited the ISR I saw the bus ECC error. And the root cause of this problem was because the holes in the flash were not filled with proper ECC. When CPU exists the ISR it tries to prefetch code after the ISR. This is especially true if the CPU is trying to refill the cache line. The words which are in the same cache line as the last word of a code section will be read by the cache controller as part of the cache refill operation. If the empty spaces after the ISR or a dicontinued section did not have proper ECC it can cause bus ECC error. So I'm wondering if you have similar situation.

  • I did add the ECC code from your example to my linker file, but that problem with fill and vfill only went away after I re-installed Code Composer Studio. 

    The SPI registers are accessed once every 0.01 seconds after 1 second of boot time.  This could be coincident with the R5F all fatal bus error events because SPI register accesses were loops in the newest files added to this project.

    I'm still seeing the issue with SPI register access by the CPU causing Cortex-R5F Core - All fatal bus error events

  • HI Laura,

     So after the fill or vfill the bus ECC error will still occur, correct?

     Sorry to ask some more questions again below.

    1. Do you know for sure the bus error only happens after you access spi registers or it can happen just slightly before the spi access?

    2. Since the spi registers are accessed every 0.01 seconds, do you know if the bus error happens every 0.01 second? Can you clear the bus error flag and nerror pin when you see it and find out if every 0.01 second it will reoccur again due to the same spi register access?

     3. Can you send me the function that you use to access the spi registers?

  • 1) yes the error still occurs. Probably because adding ECC_FLA1 (R) : origin=0xf0400000 + 0x40000 length=0x40000 ECC={ input_range=FLASH1 } to the linker file creates a .bin file that is larger that one bank of flash.

    2) the error happens once after a minute or five

    3)
    uint16 sampleADCRegister(uint16 Register)
    {
    uint32 *ptr;
    uint16 G_status = 0xBADD;
    uint16 G_status2 = 0xBADD;

    TxData.Command.TX_Entry.spi_TX_Bits.CSHOLD = 1;
    TxData.Command.TX_Entry.spi_TX_Bits.WDEL = 0;
    TxData.Command.TX_Entry.spi_TX_Bits.DFSEL = DATA_FORMAT0;
    TxData.Command.TX_Entry.spi_TX_Bits.CSNR = DUALA2D_DEVICE_CS;
    TxData.Command.TXDATA = (uint16)Register;
    ptr = (uint32 *) &TxData.Command;

    TxData.reserved0.TX_Entry.spi_TX_Bits.CSHOLD = 0;
    TxData.reserved0.TX_Entry.spi_TX_Bits.WDEL = 1;
    TxData.reserved0.TX_Entry.spi_TX_Bits.DFSEL = DATA_FORMAT0;
    TxData.reserved0.TX_Entry.spi_TX_Bits.CSNR = DUALA2D_DEVICE_CS;
    TxData.reserved0.TXDATA = 0xFFFF;

    SPI_BUS->DAT1 = *ptr;

    while (!(SPI_BUS->FLG & 0x200) )
    { // Wait for TXINTFLG to be set, indicating data was transmitted
    timer = os_timer_is_done();
    }

    ptr = (uint32 *) &TxData.reserved0;
    SPI_BUS->DAT1 = *ptr;

    while (!(SPI_BUS->FLG & 0x200) )
    { // Wait for TXINTFLG to be set, indicating data was transmitted
    }

    while (!(SPI_BUS->FLG & 0x100))
    { // Wait for RXINTFLG to be set, indicating data was received
    }

    G_status = (uint16)((SPI_BUS->BUF & 0x00001FFE) >>1);

    while (!(SPI_BUS->FLG & 0x100) )
    { // Wait for RXINTFLG to be set, indicating data was received
    }
    G_status2 = (uint16)((SPI_BUS->BUF & 0x00001FFE) >>1);


    return G_status2;
    }
  • Hi Laura,
    Can this problem be reproduced in the EVM board or only in your custom board with different sensors connected to it? If this is reproducible in the EVM maybe I can try to reproduce from my side? You can send me the project privately.
  • Charles,

    Will moving the SPI implementation to Transfer Groups and away from DMA compatibility mode allow me to process interrupts during SPI transactions with out generating a ESM Group2 ECC bus error?

    This question is from Laura.

    Regards,
    Aaron
  • Real Time clock interrupts are causing SPI compatibility mode DMA to hang and never complete.  Real Time clock interrupts are causing CPU reads and writes to SPI DAT1 and SPIBUF to generate a ESM Group2 ECC Bus error (Cortex-R5F Core - All Fatal bus error events). 

    Do you recommend SPI Transfer Groups or Errata VIM#28 Workaround?

  • Hi Laura, Aaron,

      I don't think your problem is related to VIM#28. VIM#28 can only occur when you have a "real" ECC error in the VIM RAM. If you can see multiple parts with the same issue on your bench setup then it is extremely unlikely what you are facing today is related to VIM#28.

      As far as SPI transfer groups, I think you are asking to change to MultiBuffer SPI mode, correct? You can try it but I'd like to understand the root cause of the problem. Without understanding the root cause it is hard to know if the problem will occur in other circumstances. 

      You just mentioned the first time you somehow use the DMA. Can you please confirm if you are using the DMA? How are using the DMA? 

      Is the problem reproducible in the EVM? Or is this only possible with your full ECU?  

  • Some reason the Real time clock interrupt was causing an ECC all fatal bus error when there were reads and writes to SPI BUF and SPI DAT1. 

    There were two things: 

    1) using fill to erase all of flash (vfill was not sufficient) 

    2) started using SPI multi-buffer mode (see following example)

    rxDescriptor sampleADCRegister(uint16 Register)
    {
    rxDescriptor rval;
    rval.G_status = 0xBADD;
    rval.G_status2 = 0xBADD;

    // - initialize transfer groups
    MIBSPI4->TGCTRL[1U] = (uint32)((uint32)1U << 30U) // oneshot
    | (uint32)((uint32)0U << 29U) // pcurrent reset
    | (uint32)((uint32)TRG_ALWAYS << 20U) // trigger event
    | (uint32)((uint32)TRG_DISABLED << 16U) // trigger source
    | (uint32)((uint32)1U << 8U); // start buffer

    tfg_command.TX_Entry.spi_TFG_Bits.BUFMODE = 4U; // buffer mode
    tfg_command.TX_Entry.spi_TFG_Bits.CSHOLD = 1U; // chip select hold
    tfg_command.TX_Entry.spi_TFG_Bits.WDEL = 0U; // enable WDELAY
    tfg_command.TX_Entry.spi_TFG_Bits.LOCK = 0U; // lock transmission
    tfg_command.TX_Entry.spi_TFG_Bits.DFSEL = DATA_FORMAT0; // data format
    tfg_command.TX_Entry.spi_TFG_Bits.CSNR = DUALA2D_DEVICE_CS; // chip select
    MIBSPI4RAM->tx[1].control = tfg_command.TX_Entry.spi_TFG_Config;

    tfg_command.TX_Entry.spi_TFG_Bits.BUFMODE = 4U; // buffer mode
    tfg_command.TX_Entry.spi_TFG_Bits.CSHOLD = 0U; // chip select hold
    tfg_command.TX_Entry.spi_TFG_Bits.WDEL = 1U; // enable WDELAY
    tfg_command.TX_Entry.spi_TFG_Bits.LOCK = 0U; // lock transmission
    tfg_command.TX_Entry.spi_TFG_Bits.DFSEL = DATA_FORMAT0; // data format
    tfg_command.TX_Entry.spi_TFG_Bits.CSNR = DUALA2D_DEVICE_CS; // chip select
    MIBSPI4RAM->tx[2].control = tfg_command.TX_Entry.spi_TFG_Config;
    // Fill the data to be transmitted in TXDATA field in TXRAM buffers.
    MIBSPI4RAM->tx[1].data = Register;
    MIBSPI4RAM->tx[2].data = 0xFFFF;

    // write 1 to clear the RX full flag
    MIBSPI4->FLG = 0x00000100;

    // Configure TGENA bit to enable the required Transfer groups. (In case of Trigger event always setting
    //TGENA will trigger the transfer group).
    MIBSPI4->TGCTRL[1U] = MIBSPI4->TGCTRL[1U] | 0x80000000;
    // At the occurrence of the correct trigger event the Transfer group will be triggered and data gets
    //transmitted and received one after the other with out any CPU intervention.
    // User can poll Transfer-group interrupt flag or wait for a transfer-completed interrupt to read and write
    //new data to the buffers.
    while( 0 == (MIBSPI4->FLG & 0x00000100) )
    {
    // wait until transfer complete
    }
    rval.G_status = MIBSPI4RAM->rx[1].data;
    rval.G_status2 = MIBSPI4RAM->rx[2].data;
    return rval;
    }

  • Hi Laura,
    Let me ask again if you can reproduce the all fatal ECC fault in a standalone EVM board. I think at this point I will need to recreate your problem by having your project if your project can be run in the EVM.