This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AMIC110: Reducing boot time

Part Number: AMIC110
Other Parts Discussed in Thread: TMDXICE110, UNIFLASH

Tool/software: TI-RTOS

Hi,

booting up the amic takes really long, like 10 seconds.
I have had a look now, why it takes so long. I see, that the flash is read by something around 25Mhz. But it read just one byte in 6µs. So it reads 8bits with 25MHz, and makes than a break for 6µs, before it reads the next eight bits.

It doesn't makes much sense to me, that the amic waits 6µs, before it reads the next byte, because it just needs to write this stuff to its memory, so it shouldn't be a big deal, right?
Is there anything I can change in the bootloader, to make it read the image faster?

The same question is valid for writing the flash from the PC. Is there a way to make this faster? For me it takes approximately 30 minutes to write the image.

Best Regards,
Stefan

  • The RTOS team have been notified. They will respond here.
  • Stefan,

    How big is your application image? I am assuming that you are using SPI flash on AMIC110 ICE to boot or is this a custom hardware. I am checking internally if there are ways to tune the MMU/cache settings to improve app load times. On a custom board you could choose to use a faster boot media like GPMC NOR/NAND to speed up the boot times but I will need to see what we can provide interms of boot and flash time optimization for SPI. Have you also checked if the flashing and read back is using 25MHz clock. I would imagine the SPI flash writer is using a slower clock setting.

    Can you please also indicate what is the target boot time requirement for your application from SPI.

    Regards,
    Rahul
  • Hi Rahul,

    thanks for your quick reply. Yes I'm using the TMDXICE110 and using there the SPI winbond Flash to boot the system.

    The Image is around 3MByte and takes 20 seconds to boot. I just measured it again.
    To get an idea, what I'm talking about, here some pictures.

    This is almost the hole booting process at the spi bus. The names of the signals are wrong, see the pictures lower for the write labeling.

    So it looks like the boot process is separated into two processes. The second one takes much longer than the first one. So I think the main Image is transferred during the second process.  

    To see whats going on, during reading the Flash, I took two other pictures, which are zoomed into the second process:

    So what you see here, is that the data is transferred with 25MHz, as wanted. But between two bytes the amic waits 6µs for nothing... Why it is doing that?

    If it just would wait for 1µs it would speed up the process from 20s to approximately 3s. This would be ok for me. 1s would be even better. 

    I had a look into the write_flash program and I couldn't see any wait command. Maybe you have an idea?

    In my opinion it is not the fault from the flash, but an problem of the amic boot process.

    Best Regards,
    Stefan

  • Since many days already went, here a more detailed question.

    I added into the bootloaders SPI function a short delay to see want is going on.

            for(i=0;i<20;i++) {}

    When I count till 100 it takes 70µs(!) longer between the bytes, reducing it to 20 it takes 19µs(!). So it looks like the processor is only running at 1MHz. How is that possible? I checked the settings of the PLL. But it looks ok for me.

    Best Regards

  • Stephan,

    Are you using default PLL setup code from Processor SDK RTOS or have you changed any setting. Can you share the PLL setting for us to review if you have modified the settings?

    The team internally pointed me to some patches that were used in Industrial SDK to optimize the Cache and MMU settings that improved the performance but I am trying to figure out how this can be applied to code in Processor SDK RTOS. I will provide the patch here if there is a simple way to apply it to the current baseline.

    Regards
    Rahul
  • I am just using the default bootloader from the pdk starterware. At the moment I am using the default clock settings, which are pre setted in the bootloader. I just played a little bit around with the settings.

    But the bootloader inits the whole board, so it should take 1us per command, doesn't it?
  • Stefan,

    Earlier in the year, we measured booting times of DDRless Ethercat app using size optimized boot loader (from Processor SDK RTOS 4.1)and had found the following result:

    Size of the SBL/MLO

    Size of the application

    Time taken by RBL to load SBL

    (ARM at 500 MHz)

    Time taken by SBL  to load app + ECAT firmware

    (ARM at 600 MHz)

    9KB

    108 KB

    12.3 ms

    343.8 ms

    Based on this, I do believe that 3 MB image would take about 10 seconds as if I linearly scale the number above I get roughly about the same number. However, I looked at your original post and find it odd that you indicate The issue seems to be worse with the flashing utility where flashing their app and MLO to SPI and read back seems to take up to 30 minutes. Is this a typo ? did you mean 30 seconds?  If you meant 30 minutes for flashing, can you confirm that you used GEL to configure the SOC clock and if the file was read over emulator? Did you check to see if majority of the time was consumed reading the file over emulator or where the erase and flash write operation took majority of the time. Flash operation involves, reading images over emulator/ erase/write and read back to validate so I can see how this can take upto 30 seconds. 

    Internally it was indicated to me that there was a fast boot patch to support the TI design which some reason has been reverted during subsequent Industrial SDK/Processor SDK RTOS releases:

    http://www.ti.com/lit/ug/tidual8a/tidual8a.pdf

    Creating patch for PRSDK with those updates is going to need more time as the bootloader code has evolved to add multiple platforms and more features.

    Can you try some simpler fixes before we go down that route, let us first start with updating PLL in bootloader to check for improvements. first update to try would be to match the AMIC110 PLL setting with AM335x. I have highlighted the change in PLL. 

    if(BOARD_AMIC110 == boardId)
    {
    mpuDpllMult = 24U;
    mpuDpllPostDivM2 = 1U;

    Second thing to try would be to measure the boot time by enabling MMU and cache as described in the files provided below: You can trace the  MACRO ENABLEMMU_CACHE in the file sbl_main.c . Note the file provided can`t be used as is as it is from a different SDK baseline. you can extract the code enclosed by #ifdef ENABLEMMU_CACHE.

    /**
     * \file  bl_main.c
     *
     * \brief Implements main function for StarterWare bootloader
     *
    */
    
    /*
    * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/
    *
    *  Redistribution and use in source and binary forms, with or without
    *  modification, are permitted provided that the following conditions
    *  are met:
    *
    *    Redistributions of source code must retain the above copyright
    *    notice, this list of conditions and the following disclaimer.
    *
    *    Redistributions in binary form must reproduce the above copyright
    *    notice, this list of conditions and the following disclaimer in the
    *    documentation and/or other materials provided with the
    *    distribution.
    *
    *    Neither the name of Texas Instruments Incorporated nor the names of
    *    its contributors may be used to endorse or promote products derived
    *    from this software without specific prior written permission.
    *
    *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    */
    
    //#include "uartStdio.h"
    #include "bl_copy.h"
    #include "bl_platform.h"
    #include "bl.h"
    #ifdef  XIP_NOR
    #include "bl_norexec.h"
    #include "soc_AM335x.h"
    #endif
    #include "hw_types.h"
    #include "hw_cm_per.h"
    #include "gpio_v2.h"
    #include "hw_control_AM335x.h"
    #include "hw_cm_wkup.h"
    
    #ifdef ENABLEMMU_CACHE
    #include "mmu.h"
    #include "cache.h"
    #ifdef __TMS470__
    #pragma DATA_ALIGN(pageTable, 16384);
    static volatile unsigned int pageTable[4*1024];
    #endif
    #endif
    
    
    #define APPL_BUILD_VER  "1.0.8"
    
    /*
     *  1.0.8 - Version released with ISDK 1.1.0.5 (Fast Boot)
     *  1.0.7 - Version released with ISDK 1.1.0.3
     *  1.0.6 - Starterware 2.0.1.1 merged
     *  1.0.5 - Version released with ISDK 1.1.0.1
     *  1.0.4 - Version released with ISDK 1.0.0.6
     *  1.0.3 - Version released with ISDK 1.0.0.5
     */
    
    /******************************************************************************
    **                    External Variable Declararions 
    *******************************************************************************/
    extern char *deviceType;
    #ifndef FASTBOOT_MODE
    extern void bl_UARTPuts(char* buffer);
    extern void bl_UARTInit(void);
    #else
    #define bl_UARTPuts(x) ;
    #endif
    
    
    /******************************************************************************
    **                     Local Function Declararion 
    *******************************************************************************/
    
    static void (*appEntry)();
    
    
    /******************************************************************************
    **                     Global Variable Definitions
    *******************************************************************************/
    
    unsigned int entryPoint;
    unsigned int DspEntryPoint;
    
    
    /******************************************************************************
    **                     Global Function Definitions
    *******************************************************************************/
    #if (defined  XIP_NOR) && (!defined HW_ICE_V2)
    extern void* ram_code_load_start;
    extern void* ram_code_size;
    extern void* ram_code_run_start;
    
    void copy_ram_code()
    {
    	unsigned int cnt = 0;
    	unsigned int len = (unsigned int)&ram_code_size;
    	unsigned int * src = (unsigned int *)&ram_code_load_start;
    	unsigned int * dst = (unsigned int *)&ram_code_run_start;
    
    
    	for(cnt = 0 ; cnt < (len/4); cnt++)
    	{
    		dst[cnt] = src[cnt];
    	}
    }
    
    #endif
    #ifdef ENABLEMMU_CACHE
    
    #define START_ADDR_DDR                     (0x80000000)
    #define START_ADDR_DEV                     (0x44000000)
    #define START_ADDR_OCMC                    (0x402f0000)
    #define NUM_SECTIONS_DDR                   (512)
    #define NUM_SECTIONS_DEV                   (1280)
    #define NUM_SECTIONS_OCMC                  (2)
    
    
    /*
    ** Function to setup MMU. This function Maps three regions (1. DDR
    ** 2. OCMC and 3. Device memory) and enables MMU.
    */
    void MMUConfigAndEnable(void)
    {
        /*
        ** Define DDR memory region of AM335x. DDR can be configured as Normal
        ** memory with R/W access in user/privileged modes. The cache attributes
        ** specified here are,
        ** Inner - Write through, No Write Allocate
        ** Outer - Write Back, Write Allocate
        */
        REGION regionDdr = {
                            MMU_PGTYPE_SECTION, START_ADDR_DDR, NUM_SECTIONS_DDR,
                            MMU_MEMTYPE_NORMAL_NON_SHAREABLE(MMU_CACHE_WT_NOWA,
                                                             MMU_CACHE_WB_WA),
                            MMU_REGION_NON_SECURE, MMU_AP_PRV_RW_USR_RW,
                            (unsigned int*)pageTable
                           };
        /*
        ** Define OCMC RAM region of AM335x. Same Attributes of DDR region given.
        */
        REGION regionOcmc = {
                             MMU_PGTYPE_SECTION, START_ADDR_OCMC, NUM_SECTIONS_OCMC,
                             MMU_MEMTYPE_NORMAL_NON_SHAREABLE(MMU_CACHE_WT_NOWA,
                                                              MMU_CACHE_WB_WA),
                             MMU_REGION_NON_SECURE, MMU_AP_PRV_RW_USR_RW,
                             (unsigned int*)pageTable
                            };
    
        /*
        ** Define Device Memory Region. The region between OCMC and DDR is
        ** configured as device memory, with R/W access in user/privileged modes.
        ** Also, the region is marked 'Execute Never'.
        */
        REGION regionDev = {
                            MMU_PGTYPE_SECTION, START_ADDR_DEV, NUM_SECTIONS_DEV,
                            MMU_MEMTYPE_STRONG_ORD_SHAREABLE,
                            MMU_REGION_NON_SECURE,
                            MMU_AP_PRV_RW_USR_RW  | MMU_SECTION_EXEC_NEVER,
                            (unsigned int*)pageTable
                           };
    
        /* Initialize the page table and MMU */
        MMUInit((unsigned int*)pageTable);
    
        /* Map the defined regions */
        //MMUMemRegionMap(&regionDdr);
        MMUMemRegionMap(&regionOcmc);
        MMUMemRegionMap(&regionDev);
    
        /* Now Safe to enable MMU */
        MMUEnable((unsigned int*)pageTable);
    }
    
    
    #endif
    
    
    
    #ifdef MEASURE_BOOT_TIME
    void EnableBootTimeMessurement()
    {
            //Powerup GPIO module 
        if( BOOTTIME_GPIO_BASE_ADDR == SOC_GPIO_0_REGS )
        {
            HWREG( SOC_CM_WKUP_REGS + CM_WKUP_GPIO0_CLKCTRL )  |=
                    CM_WKUP_GPIO0_CLKCTRL_MODULEMODE_ENABLE;
        }
        if( BOOTTIME_GPIO_BASE_ADDR == SOC_GPIO_1_REGS)
        {
            HWREG( SOC_PRCM_REGS + CM_PER_GPIO1_CLKCTRL )      |=
                    CM_PER_GPIO1_CLKCTRL_MODULEMODE_ENABLE;
        }
    	if( BOOTTIME_GPIO_BASE_ADDR == SOC_GPIO_2_REGS)
        {
            HWREG( SOC_PRCM_REGS + CM_PER_GPIO2_CLKCTRL )      |=
                    CM_PER_GPIO2_CLKCTRL_MODULEMODE_ENABLE;
        }
        if( BOOTTIME_GPIO_BASE_ADDR == SOC_GPIO_3_REGS)
        {
            HWREG( SOC_PRCM_REGS + CM_PER_GPIO3_CLKCTRL )      |=
                    CM_PER_GPIO3_CLKCTRL_MODULEMODE_ENABLE;
        }
    
    
    
        GPIOModuleEnable(BOOTTIME_GPIO_BASE_ADDR);
        //Mux the pin
        HWREG( SOC_CONTROL_REGS + CONTROL_CONF_UART_TXD(4)) = (7);
        //Set pin direction
        GPIODirModeSet(BOOTTIME_GPIO_BASE_ADDR,BOOTTIME_GPIO_NUM,GPIO_DIR_OUTPUT);
        //Enable pin
        GPIOPinWrite(BOOTTIME_GPIO_BASE_ADDR,BOOTTIME_GPIO_NUM,GPIO_PIN_LOW);
    
    }
    #endif
    
    /*
     * \brief This function initializes the system and copies the image. 
     *
     * \param  none
     *
     * \return none 
    */
    int main(void)
    {
    #ifdef ENABLEMMU_CACHE
        volatile int itr = 0;
        MMUConfigAndEnable();
        CacheEnable(CACHE_ALL);
    #endif
    #ifdef MEASURE_BOOT_TIME
        EnableBootTimeMessurement();
    #endif
        /* Configures PLL and DDR controller*/
        BlPlatformConfig();
        *(unsigned int*)0x44E00508 = 2;			// select 32kHz
    
        /* UART Initialization */
    #ifndef FASTBOOT_MODE
        bl_UARTInit();
    #endif
        bl_UARTPuts("\n\r*** StarterWare ");
        //bl_UARTPuts(deviceType);
        bl_UARTPuts(" Boot Loader. Build - 1.0.7 - ");
        #ifdef HW_ICE_V2
        bl_UARTPuts(" for ICE V2 ");
        #elif defined (HW_ICE)
        bl_UARTPuts(" for ICE V1 ");
        #elif defined (HW_IDK)
        bl_UARTPuts(" for IDK ");
        #endif
        
        #ifdef XIP_NOR
        bl_UARTPuts(" to run from NOR ");
        #endif
    
    	//bl_UARTPuts(APPL_BUILD_VER);
    
        /* Copies application from non-volatile flash memory to RAM */
    
        if(ImageCopy() != E_FAIL)
        {
            bl_UARTPuts(" Image Copy Successful, Executing Application..\n\r");
    
            /* Do any post-copy config before leaving boot loader */
            BlPlatformConfigPostBoot();
    
            /* Giving control to the application */
            appEntry = (void (*)(void)) entryPoint;
    #ifdef MEASURE_BOOT_TIME
            GPIOPinWrite(BOOTTIME_GPIO_BASE_ADDR,BOOTTIME_GPIO_NUM,GPIO_PIN_LOW);
    #endif
        #ifdef ENABLEMMU_CACHE
            CacheDataCleanInvalidateAll();
            CacheInstInvalidateAll();
    
            CacheDisable(CACHE_ALL);
            MMUDisable();
            //for(itr = 0; itr < 1000000 ; itr++);
        #endif
            (*appEntry)( );
        }
    
    #ifdef XIP_NOR
        #if (defined HW_ICE) && (!defined HW_ICE_V2)
        copy_ram_code();
        #endif
        find_NORImage_exec();
    #endif
    #ifdef MEASURE_BOOT_TIME
        GPIOPinWrite(BOOTTIME_GPIO_BASE_ADDR,BOOTTIME_GPIO_NUM,GPIO_PIN_LOW);
    #endif    
        bl_UARTPuts(" Application image not found, Aborting..\n\r");
        BootAbort();
    
    }
    
    void BootAbort(void)
    {
    #ifdef ENABLEMMU_CACHE
        CacheDataCleanInvalidateAll();
        CacheInstInvalidateAll();
    
        CacheDisable(CACHE_ALL);
        MMUDisable();
    #endif
        while(1);
    }
    
    /******************************************************************************
    **                              END OF FILE
    *******************************************************************************/

    Please note that when you apply the patch and rebuild the MLO, you don`t need to change the app image on the flash so that should save you come time as you are only replacing about 45 KB on the flash. Let us know if you see improvements in your boot times. 

    I will continue to look at SPI flashing tool and see if I can provide similar updates based on your response.

    Regards,

    Rahul 

  • Hi Rahul,

    I realy ment 30 minutes. And I measured it again and it is ca. 90minutes including verifying.

    I use the emulator to flash and reading the file is realy fast. Just the writing the flash takes so long. It looks like it is the same problem. I will try this tomorrow in the morning an report, if it helped.

    Maybe you can have a look at my other topic about i2c bus as well.
  • Yes, we were able to obtain similar results by using AMIC110 based SPI diagnostics. I will look to see if there is something glaring about the flashing utility but I agree with your observation. It appears that the flash writer is performing SPI transfers 1 byte at a time and operating in polling mode where it keeps writing and reading 1 byte to MCSPI register until done. This is quite inefficient and seems to be slowing the flashing process.

    There is potential improvement using interrupt mode and edma as is the case with mcspi:
    pdk_am335x_1_0_11\packages\ti\starterware\examples\mcspi\flash
    or even pdk_am335x_1_0_11\packages\ti\drv\spi\test\mcspi_serial_flash\src

    but in both the examples, it only performs a single sector write so the code will need to be modified to read the image from host, compute the size in terms of the sectors and program all the sectors. I have contacted the development team and currently they are preparing for a a release of Uniflash tool with coverage for the RTOS usecase.
    processors.wiki.ti.com/.../Sitara_Uniflash_Quick_Start_Guide

    I have asked for an early release candidate to test the flash timing and work with Frank to provide the update if it satisfies your requirement.

    Regards,
    Rahul
  • Hi,
    any new information?
    I tried MMU init, but the processor just hangs. Looks like the register are not correct defined.
    Have any other example for enabling the MMU and Cache?
  • Hi Rahul,

    I'm still trying to enable the MMU and Cache. Have you any idea, why it doesn't work?
    Or have you any other example to enable it for the am335 or amic110?

    Best Regards,
    Stefan
  • Stefan,

    This issue is currently being assigned to the development team to re-enable the FAST boot patch. for MMU and Cache enabling in bootloader, the only other code that I can provide for reference is :
    pdk_am335x_1_0_10\packages\ti\starterware\examples\cache_mmu

    You had indicated that your application takes about 10 secs to boot. Can you indicate what is your target boot time requirement for an image of 4 MB.

    For flashing, the Processor SDK RTOS release 5.x at the end of this month will support a unified flashing tool for AMIC110 similar to processors.wiki.ti.com/.../Sitara_Uniflash_Quick_Start_Guide

    We have asked the team to evaluate flash times with that tool and they have indicated that this will be several times better as compared to CCS based tools as it read the images over a peripheral interface.




    Regards,
    Rahul
  • I tried to run the cache_mmu example, but even this doesn't work. Can you try it on an amic board?
    The boot time of arround 1s oder 2s would be great. At the moment it takes around 20s.

    I just tried to use not the XDS110 programmer for flashing the amic but a USB560 V2 blackhawk debug probe, and this reduced my flash time from 90 minutes to 10 minutes. 10 minutes is ok for me. So i would say, this particular problem is solved.