This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to make sure I utilized FPU?

Other Parts Discussed in Thread: CONTROLSUITE

I need to do some heavy calculation with a lot of sin() and cos(). I found that sin() and cos() is costing me 20us per calculation, which is much much more than the 50ish cycles stated in documentation. I did some experiment and found that floating plus and multiply are ok, which cost me about 0.4us, but float dividing and sin() cos() cost me 20us. This makes me doubt that I have not utilize FPU at all.

I have "-v28 -ml -mt --float_support=fpu32" in my compiling options, I have rts2800_fpu32_fast_supplement.lib and rts2800_fpu32.lib in project options->Build->C2000 Linker->File Search Path. Is there anything else I need to set?

MPU is F28335, configured at 40MHz.

I uses code below to check the timing on oscilloscope. (uncomment each line in the loop to check timing)

====================

int i;
float xr, yr, csr, ssr;
GpioDataRegs.GPASET.bit.GPIO10 = 1;
for(ii = 0; ii < 1000; ii++)
{
//csr += cos(xr);
//csr += ii / 10.3;
//csr += ii * 10.3;

}

GpioDataRegs.GPACLEAR.bit.GPIO10 = 1;

  • Xiaomin Lin said:
    I have "-v28 -ml -mt --float_support=fpu32" in my compiling options, I have rts2800_fpu32_fast_supplement.lib and rts2800_fpu32.lib in project options->Build->C2000 Linker->File Search Path. Is there anything else I need to set?

    In the project properties->C2000 Linker->File Search Path, is the rts2800_fpu32_fast_supplement.lib listed above rts2800_fpu32.lib?? If not, the compiler will use the sin/cos function from the standard library (rts2800_fpu32). Also, are you running these functions out of flash??

  • Hi, Vishal,
    Thanks very much!
    Yes, rts2800_fpu32_fast_supplement.lib is listed above rts2800_fpu32.lib.
    I also enabled [Search libraries in priority order (--priority)]
    I set this function in ram using [#pragma CODE_SECTION(ukf_calculate, "ramfuncs");]
    So I suppose they run from ram, but not flash.
  • Hmm, do you have interrupts running?? The way i benchmark these functions is I open the disassembly window and look for the LCR (call) instruction. so if you are calling the sin function, you look for the 

    LCR #_sin

    or something like that. I then use the CCS clock (Run->Clock->enable) and then i step over this instruction; that gives you the exact cycle count for the function. 

  • Hi, Vishal,

    Thank you very much! This is really a fantastic way to check calculation performance.
    I followed you method and got really bad numbers.
    My cos() is shown in disassembly as [LCR cos], and it costs 80,645 cycles!
    My float dividing is shown in disassembly as [LCR FS$$DIV], and it costs 91,271 cycles!
    Weird thing is, this will not happen on multiplying. Multiplying is shown as [MPYF32 R1H, R0H, R1H] and cost only 1 cycle.
    I do have other interrupt running. I will clean up the interrupt and do this test again.
  • I cleared up things, moved test to main, and used Vishal's method to dig into the assembly. Right now I can see about 8000 cycles for sin/cos or div. And it is pretty sure that I am using rts2800_fpu32 but not rts2800_fpu32_fast_supplyment.

    The build command generated by CCS5 is

    ======================

    'Invoking: C2000 Linker'

    "C:/ti/ccsv5/tools/compiler/c2000_6.2.0/bin/cl2000"

    -v28 -ml -mt --float_support=fpu32 -g --gcc --define=xdc__strict --diag_warning=225

    --display_error_number --diag_wrap=off --gen_func_subsections=on --fp_mode=relaxed

    -z -m"micropulse-directional.map" --stack_size=0x380 --warn_sections

    ...

    ...

    "C:/ti/ccsv5/tools/compiler/c2000_6.2.0/lib/rts2800_fpu32.lib"

    "../rts2800_fpu32_fast_supplement.lib"

    -l"D:\############\rts2800_fpu32_fast_supplement.lib"

    -l"C:\ti\ccsv5\tools\compiler\c2000_6.2.0\lib\rts2800_fpu32.lib"

    <Linking>

    =======================

    Why those libs show up twice? What is the difference whit or without wwitch '-l'? What the order is kind of scrambled?

  • Even using rts2800_fpu32.lib, each instruction in the cos or div cost me 15~20 cycles. That is because of running on flash? Is there a way to put part of a lib, or the whole lib in ram, like ram functions?
  • I know you said fastRTS comes first and you have "read library in priority order"....so i cant really explain why its pulling from the standard library. When you open up the properties in CCS, under General->Main tab -> set runtime support library to <none>

    Xiaomin Lin said:
    Even using rts2800_fpu32.lib, each instruction in the cos or div cost me 15~20 cycles. That is because of running on flash? Is there a way to put part of a lib, or the whole lib in ram, like ram functions?

    yep you can choose specific functions from the library to place into ram in the linker command file. For example,

       ramfuncs         :  LOAD = FLASHC,
                           RUN = RAMLS1,
                           RUN_START(_RamfuncsRunStart),
                           LOAD_START(_RamfuncsLoadStart),
                           LOAD_SIZE(_RamfuncsLoadSize),
                           PAGE = 0,
    {
      --library=rts2800_fpu32_fast_supplement<sin_f32.obj>(.text)
      --library=rts2800_fpu32_fast_supplement<cos_f32.obj>(.text)
      --library=rts2800_fpu32_fast_supplement<div_f32.obj>(.text)
    }

    and then you memcpy the ramfuncs section over to ram.

  • Hi, Vishal,

    Now it seems using rts28000)fpu32_fast_supplement.lib. In disassembly, cos() now shows:
    LCR $C:/ti/controlSUITE/libs/math/FPUfastRTS/V100/source/cos_f32.asm:62:96$

    But the improvement is only from 800 cycles to 500 cycles. I need to try putting sin/cos into mem.
    It looks a little scary. I've already defined ramfuncs part in FLASHD, PAGE 0, and have some of my functions in "ramfuncs" by using
    #pragma CODE_SECTION(blahblah_func, "ramfuncs");
    If I memcpy these 3 functions (sin/cos/div), what are the source address and target address and size should I use so that I don't mess up with my existing ram functions?

    Your reply helped a lot! Thanks!
  • Hi Xiaomin,

    If you declared ramfuncs the way i showed in the earlier post, you can use the linker generated variables to memcpy the entire ramfuncs section (containing the sin/cos/div), and you can use the linker derived size to make sure you are copying the right amount.

    In your c code you would do the following,

        memcpy((uint32_t *)&RamfuncsRunStart, (uint32_t *)&RamfuncsLoadStart,
                (uint32_t)&RamfuncsLoadSize );

    You can check the .map file for these functions and it will show you the run locations (in RAM). I suggest you make a note of this, then when you run the code in the disassembly window you "step-into" the function at the call (LCR) and then make sure the code is running from the RAM location (from the .map file) and not FLASH.