This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Speed up the code by running from SARAM (TMS320F28069)

I am trying to run a time critical code in SARAM to speed up the execution time. After moving the code from Flash into SARAM (L4 SARAM section), I measured the execution time to see how much time I saved by doing so, the execution time shows the same by running from Flash or SARAM. Does somebody know why it is? I have checked the PC register to verify that I am actually running from SARAM (PC = 0xA000). Also, I have run the code with a debugger and without a debugger. The normal execution time of my code running from Flash at 32 Mhz clock is 22 uSec, with one wait state, I expect to see a lower execution time when running from SARAM.

  • Zeke,

    With only 1 wait-state, you will not see much, if any, difference between flash and RAM execution times.  There is a flash fetch pipeline that fetches 64 bits of instruction at a time.  This is 2 to 4 instructions (depending on whether they are 16- or 32-bit instructions).  The CPU will then execute these instructions while the fetch module grabs the next 64 bits.  So, you grab 2 to 4 instructions every 2 cycles.  Worst case is 2 instructions every 2 cycles (compared to RAM execution which is 1 instructions every 1 cycle, same effective throughput).  The only time you will see flash performance reduce here is when there is a PC discontinuity (e.g., branch, call, or return).  This may cause the flash pipeline to toss out one or two instructions, and start fetching a new 64-bit packet from the new target address.  You will probably not pick this difference up just by doing coarse timing of your code.

    If you just want to see that the flash truly is running with the wait-states, try increasing your flash wait-state settings (make them large, say 15!).

    Regards,

    David

  • Thanks David,
    I tried to change the wait state while running from Flash from 1 to 15 as you have suggested but I did not see much difference in code executing time (may be 0.5 uSec out of 22 uSec). here are my changes:

    FlashRegs.FBANKWAIT.bit.RANDWAIT = 15; // (3) Random access waitstates
    FlashRegs.FBANKWAIT.bit.PAGEWAIT = 15; // (3) Paged access waitstates

    Am I missing some settings?

    The entire Flash initialization setup:
    asm(" EALLOW"); // Enable EALLOW protected register access
    FlashRegs.FPWR.bit.PWR = 3; // (3) Pump and bank set to active mode
    FlashRegs.FSTATUS.bit.V3STAT = 1; // (1) Clear the 3VSTAT bit
    FlashRegs.FSTDBYWAIT.bit.STDBYWAIT = 0x01FF; // Sleep to standby transition cycles
    FlashRegs.FACTIVEWAIT.bit.ACTIVEWAIT = 0x01FF; // Standby to active transition cycles
    FlashRegs.FBANKWAIT.bit.RANDWAIT = 15; // (3) Random access wait states ** Changed from 1 to 15
    FlashRegs.FBANKWAIT.bit.PAGEWAIT = 15; // (3) Paged access wait states ** Changed from 1 to 15
    FlashRegs.FOTPWAIT.bit.OTPWAIT = 5; // (5) OTP waitstates
    FlashRegs.FOPT.bit.ENPIPE = 1; // Enable the flash pipeline
    asm(" EDIS"); // Disable EALLOW protected register access

    Thanks for any help I can get
  • Sorry,
    I was still running out of RAM when changing the wait state to 15 even though I commented out the "memcpy". Recompiled again and now the execution time from Flash jumped from 22 uSec to 920 uSec with 15 wait state. SO, it seems that with 1 wait state executing from Flash takes the same time as executing out of RAM.
    Is there any other way that I can lower the execution time without changing the clock speed?
  • Sorry, my code was still running from RAM even though I commented out the "memcpy". Running out of Flash with wait-state of 15 had increased the execution time from 22 uSec (1 wait state) to 920 uSec. From this test I conclude that running from flash with 1 wait-state has the same execution time as running out of SARAM with 0 wait-state.

    I need to reduce the execution time of the code. Do you know of any change that I can implement that will reduce the execution time (except for changing the clock speed)?
  • Zeke,

    I don't know what your code is doing, but you if the core algorithm is written in C, make sure the compiler optimizer is fully engaged for speed. Beyond that, maybe you can do better with hand assembly. Also, look at your memory linkage and make sure you don't have any conflicts between code and data. For example, don't put code and the data used by that code in the same RAM block. That will cause a memory stall since the RAM is single access.

    Regards,
    David