This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/TMS320F28335: floating point calculation time comsuption

Part Number: TMS320F28335
Other Parts Discussed in Thread: C2000WARE,

Tool/software: Code Composer Studio

Hi TI experts,

I am using CCS 8.2 and TI F28335 experimental kit for my project.

First of all, I play with the example code from C2000ware to see how many cycles it takes to do atan() with fastRTS.

The directory of the example is C:\ti\c2000\C2000Ware_1_00_05_00\libraries\math\FPUfastRTS\c28\examples.

I found that the atan() in line 140 takes 58 cycles. I did the count, by setting a break point before atan(), and see the difference of CPUTIMER/TIMER0TIM before and after pressing F5. (I am not sure if I can measure the cycles in this way.) The result is close to what it claims in SPRCA75 (51 cycles for atan).

However, when I try to do the same thing in my own project, I find that the atan takes 150 cycles, measured by repeating the above method.

I followed the instructions of enabling fastRTS, and set the linker order properly. I am pretty sure that it is using fastRTS instead of normal RTS. This is because, if I change the linker order to let normal RTS to be first, then it will step into atanf(float y, float x), when I press F5 over the line of atan(), and it takes even longer time. If I make the linker order right (fastRTS first), then it will not step into atanf(float y, float x), when I press F5. That means it is using fastRTS.

My map file is as follows. The atan() is in pll1.c. I have copied all functions in pll1.c to ram. So I can only see pll1.obj in ramFuncs part, not .text part.

ramFuncs   0    00305987    000011c3     RUN ADDR = 00009000
.....................

....................
                  003065a0    00000104     pll1.obj (ramFuncs)
.................

.text      0    0030000a    0000566d    
                 
...................
                  003046e1    0000012b     rts2800_fpu32.lib : e_logf.c.obj (.text)
.......................
                  00304a47    00000110     rts2800_fpu32.lib : e_expf.c.obj (.text)
                  00304b57    00000107                       : ll_div28.asm.obj (.text)
.........................
                  00304fa8    00000083     rts2800_fpu32.lib : fd_mpy28.asm.obj (.text)
......................
                  00305206    00000056     rts2800_fpu32.lib : boot28.asm.obj (.text)
                  0030525c    0000004f     rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text)
.......................
                  00305438    00000034     rts2800_fpu32_fast_supplement.lib : cos_f32.obj (.text)
                  0030546c    00000034                                       : sin_f32.obj (.text)
                  003054a0    0000002d     ePwm.obj (.text:retain)
                  003054cd    0000002a     rts2800_fpu32.lib : l_div28.asm.obj (.text)
                  003054f7    00000029                       : exit.c.obj (.text)
                  00305520    00000024                       : cpy_tbl.c.obj (.text)
                  00305544    00000021     rts2800_fpu32_fast_supplement.lib : sqrt_f32.obj (.text)
                  00305565    00000020     rts2800_fpu32.lib : ll_tofsfpu32.asm.obj (.text)
                  00305585    0000001f                       : fd_tol28.asm.obj (.text)
                  003055a4    0000001e                       : ll_cmp28.asm.obj (.text)
                  003055c2    0000001d                       : memcpy.c.obj (.text)
                  003055df    0000001c                       : fs_tofdfpu32.asm.obj (.text)
                  003055fb    00000019                       : args_main.c.obj (.text)
                  00305614    00000019     rts2800_fpu32_fast_supplement.lib : div_f32.obj (.text)
                  0030562d    00000018     rts2800_fpu32.lib : strncmp.c.obj (.text)
                  00305645    00000013     canRW.obj (.text)
                  00305658    0000000b     rts2800_fpu32.lib : u_div28.asm.obj (.text)
                  00305663    00000009                       : _lock.c.obj (.text)
                  0030566c    00000008     CodeStartBranch.obj (.text)
                  00305674    00000002     rts2800_fpu32.lib : pre_init.c.obj (.text)
                  00305676    00000001                       : startup.c.obj (.text)

       Module                            code    initialized data   uninitialized data
       ------                            ----    ----------------   ------------------
    .\src\
 .................................     
       pll1.obj                          520     0                  0   

My questions are:

(1) Is it the right way to measure the cycles by looking at the difference of CPU timers?

(2) My own project is based on an old project which only uses fixed point IQmath, not floating point. I believe I have already turned on the floating point unit, as when I press F5 over the line of atan(), it will step into atanf(float y, float x).

I am just wondering what registers or setting I have missed that results in 150 cycles for atan. How should I do to check? I am willing to spend a few days to look into the reason and read some stuff.

Thank you in advance.

  • Not knowing anything regarding the two projects you are comparing but the place I would look is if the fast atan is using the fastRTS support library

  • Hi Mitjia,

    Thank you for your time. Yes, I agree with you that it is important to check whether it is using fastRTS or not. I have done the following check as I mentioned before:

    (1) the map files shows as below. Only lines with RTS are included. It seems that it is using fastRTS, as shown in the line "0030525c 0000004f rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text)".

    (2) if I change the linker order to let normal RTS to be first, then it will step into atanf(float y, float x), when I press F5 over the line of atan(), and it takes even longer time. If I make the linker order right (fastRTS first), then it will not step into atanf(float y, float x), when I press F5. That means it is using fastRTS.

    Could you please advise if there is any other way to check whether it is using fastRTS or not?

    If it is already using fastRTS, what can I check to see why?

    I have been struggling for several days, and it is of vital importance to the project. Could you please help me?

    Thank you so much.

    .text 0 0030000a 0000566d

    ...................
    003046e1 0000012b rts2800_fpu32.lib : e_logf.c.obj (.text)
    .......................
    00304a47 00000110 rts2800_fpu32.lib : e_expf.c.obj (.text)
    00304b57 00000107 : ll_div28.asm.obj (.text)
    .........................
    00304fa8 00000083 rts2800_fpu32.lib : fd_mpy28.asm.obj (.text)
    ......................
    00305206 00000056 rts2800_fpu32.lib : boot28.asm.obj (.text)
    0030525c 0000004f rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text)
    .......................
    00305438 00000034 rts2800_fpu32_fast_supplement.lib : cos_f32.obj (.text)
    0030546c 00000034 : sin_f32.obj (.text)
    ......................
    003054cd 0000002a rts2800_fpu32.lib : l_div28.asm.obj (.text)
    003054f7 00000029 : exit.c.obj (.text)
    00305520 00000024 : cpy_tbl.c.obj (.text)
    00305544 00000021 rts2800_fpu32_fast_supplement.lib : sqrt_f32.obj (.text)
    00305565 00000020 rts2800_fpu32.lib : ll_tofsfpu32.asm.obj (.text)
    00305585 0000001f : fd_tol28.asm.obj (.text)
    003055a4 0000001e : ll_cmp28.asm.obj (.text)
    003055c2 0000001d : memcpy.c.obj (.text)
    003055df 0000001c : fs_tofdfpu32.asm.obj (.text)
    003055fb 00000019 : args_main.c.obj (.text)
    00305614 00000019 rts2800_fpu32_fast_supplement.lib : div_f32.obj (.text)
    0030562d 00000018 rts2800_fpu32.lib : strncmp.c.obj (.text)
    ......................
    00305658 0000000b rts2800_fpu32.lib : u_div28.asm.obj (.text)
    00305663 00000009 : _lock.c.obj (.text)
    ...................
    00305674 00000002 rts2800_fpu32.lib : pre_init.c.obj (.text)
    ..................
  • OK, as you can confirm, that you are using fast implementation, are you running it from RAM or FLASH? Check the linker files of both projects, to see if fastRTS library is maybe put into ramfuncs section, and is being run from RAM.
  • Hi Ivan,

    You are justified in using the CPU timers to measure execution the way you are doing. The result will also include the function call overhead.

    Mitja is quite right to suggest checking the linked command file. The first cloumn in the .map file tells you the start address where the module is linked:
    0030525c 0000004f rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text)

    On F28335, 0x30520C corresponds to XINTF zone 7. It seems you have external memory and have placed this code off-chip? Nothing wrong, but you will be paying an execution cycle penalty for going off chip so 150 cycles is not too surprising. The linker file in the example doesn't use the XINTF so I think this is a feature of your old project.

    Regards,

    Richard
  • Hi Richard and Mitja,

    Thank you for your advice. You are right that I have not included the fastRTS into the RAM. That is the reason why it takes 150 cycles to compute atan().

    1. The map file shows: 0030525c 0000004f rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text).

    I think the address is flash, not XINTF zone 7. The flash address starts from 0x300000.  

    Question (1): Am I right?

    2. I have copied the faterRTS library into the RAM by adding the following in the .cmd file, as suggested online. Also I copy the part into RAM during initialization.

      .text               : {rts2800_fpu32_fast_supplement.lib(.text)},
                            LOAD = FLASHHTOC,      PAGE = 0
                            RUN = RAML1L2L3,       PAGE = 0
                            LOAD_START(_fastRTS_loadstart),
                            LOAD_END(_fastRTS_loadend),
                            RUN_START(_fastRTS_runstart)

    After I compile the project, the follow appears in the map file. 

    .text      0    00308ba2    000000f1     RUN ADDR = 0000b3c2
                      00308ba2    0000004f     rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text)
                      00308bf1    00000034                                       : cos_f32.obj (.text)
                      00308c25    00000034                                       : sin_f32.obj (.text)
                      00308c59    00000021                                       : sqrt_f32.obj (.text)
                      00308c7a    00000019                                       : div_f32.obj (.text)

    I check that it now takes around 60 cycles to finish atan2, which is very reasonable.

    However, I saw the following warning when I compile.

    warning: duplicate section name (project name).out(.text) (ignored)

    Question (2):

    Am I copying the library into the RAM in the right way? Can I neglect warning? Is there any other way to do it without warning? I do not want to run into other weird issues later due to the warning.

    Thank you so much.

  • 1. yes, this is in FLASH, different MCU's from C2000 family have a slightly different memory map, so you have to check the documentation (page 159 in tms320f28335.pdf
    2. Obviously you achieve what you wanted. I am doing it differently (as I am joining different sections in one section as it makes copying form FLASH to RAM easier). So here is my solution:

    ramfuncs: {
    rts2800_fpu32_fast_supplement.lib<atan2_f32.obj>(.text)
    rts2800_fpu32_fast_supplement.lib<div_f32.obj>(.text)
    rts2800_fpu32_fast_supplement.lib<cos_f32.obj>(.text)
    rts2800_fpu32_fast_supplement.lib<sin_f32.obj>(.text)
    rts2800_fpu32_fast_supplement.lib<sqrt_f32.obj>(.text)
    rts2800_fpu32.lib<l_div.obj>(.text)
    *(ramfuncs)
    *(.TI.ramfunc)
    *(IQmath)
    }
    LOAD = P_FF,
    RUN = P_L78,
    LOAD_START(_RamfuncsLoadStart),
    LOAD_SIZE(_RamfuncsLoadSize),
    LOAD_END(_RamfuncsLoadEnd),
    RUN_START(_RamfuncsRunStart),
    RUN_SIZE(_RamfuncsRunSize),
    RUN_END(_RamfuncsRunEnd),
    PAGE = 0, ALIGN(4)
  • Hi Mitja,

    Thank you for your comment. Your method is better than I find in other posts, as it does not show any warning during compilation.

    However, I have to replace the line "rts2800_fpu32.lib<l_div.obj>(.text)" in your code with "rts2800_fpu32.lib<l_div28.asm.obj>(.text)", as my map file only has the one with "asm".

    It seems that it is IQdiv28. I cannot find rts2800_fpu32.lib<l_div.obj>(.text). I have only rts2800_fpu32.lib<l_div28.asm.obj>(.text) and rts2800_fpu32.lib<Il_div28.asm.obj>(.text)I do use floating point division in the code.

    Question 1:

    Why do I get so many weird files, like rts2800_fpu32.lib : u_div28.asm.obj (.text), rts2800_fpu32.lib<l_div28.asm.obj>(.text), and rts2800_fpu32.lib<ll_div28.asm.obj>(.text), fd_mpy28.asm.obj? Does it mean I am actually using fixed point for floating point calculation, or vice versa? In the project, I have both IQmath and floating math. Here is all IQ and RTS related lines in the map file.

    Question 2:

    The floating point division should use fastRTS, but the multiplication should still use normal RTS. Which exct library file is multiplication using? I want to copy the library of multiplication of normal RTS into RAM as well. I have far more multiplications than any other calculation.

    .cinit     0    0030782a    00000a45    
                     ....................
                      00308249    0000000e     rts2800_fpu32.lib : exit.c.obj (.cinit)
    ................
                      0030825f    00000005     rts2800_fpu32.lib : _lock.c.obj (.cinit:__lock)
                      00308264    00000005                       : _lock.c.obj (.cinit:__unlock)
                      00308269    00000004                       : errno.c.obj (.cinit)
                      0030826d    00000002     --HOLE-- [fill = 0]
    .econst    0    00308910    00000266    
                      .....................
                      00308af4    00000010     rts2800_fpu32.lib : e_expf.c.obj (.econst)
                      00308b04    00000010                       : e_logf.c.obj (.econst)
                     ..........................
                      00308b46    00000004     rts2800_fpu32.lib : e_expf.c.obj (.econst:_halF)
                      00308b4a    00000004                       : e_expf.c.obj (.econst:_ln2HI)
                      00308b4e    00000004                       : e_expf.c.obj (.econst:_ln2LO)
    .text      0    0030000a    00005545    
                     ......................
                      00304719    0000012b     rts2800_fpu32.lib : e_logf.c.obj (.text)
                      .......................
                      00304a7b    00000110     rts2800_fpu32.lib : e_expf.c.obj (.text)
                      00304b8b    00000107                       : ll_div28.asm.obj (.text)
                     ........................
                      00304fdc    00000083     rts2800_fpu32.lib : fd_mpy28.asm.obj (.text)
                      ..................
                      0030523a    00000056     rts2800_fpu32.lib : boot28.asm.obj (.text)
                      ......................
                      00305409    00000029     rts2800_fpu32.lib : exit.c.obj (.text)
                      00305432    00000024                       : cpy_tbl.c.obj (.text)
                      00305456    00000020                       : ll_tofsfpu32.asm.obj (.text)
                      00305476    0000001f                       : fd_tol28.asm.obj (.text)
                      00305495    0000001e                       : ll_cmp28.asm.obj (.text)
                      003054b3    0000001d                       : memcpy.c.obj (.text)
                      003054d0    0000001c                       : fs_tofdfpu32.asm.obj (.text)
                      003054ec    00000019                       : args_main.c.obj (.text)
                      00305505    00000018                       : strncmp.c.obj (.text)
                     ..............
                      00305530    0000000b     rts2800_fpu32.lib : u_div28.asm.obj (.text)
                      0030553b    00000009                       : _lock.c.obj (.text)
                     ................
                      0030554c    00000002     rts2800_fpu32.lib : pre_init.c.obj (.text)
                      0030554e    00000001                       : startup.c.obj (.text)

    .ebss      1    0000c000    00000fa6     UNINITIALIZED
                     ...................
                      0000c93e    00000002     rts2800_fpu32.lib : _lock.c.obj (.ebss:__lock)
                      ................
                      0000cbbe    00000002     rts2800_fpu32.lib : _lock.c.obj (.ebss:__unlock)
                      ..............
                      0000cc7c    00000001     rts2800_fpu32.lib : errno.c.obj (.ebss)
                     ....................
                      0000cdee    00000006     rts2800_fpu32.lib : exit.c.obj (.ebss)
     

    IQmath     0    0030585f    000002a8     RUN ADDR = 00008310
                      0030585f    0000009d     IQmath_fpu32.lib : IQ24log.obj (IQmath)
                      003058fc    00000088                      : IQ24atan2PU.obj (IQmath)
                      00305984    00000060                      : IQ24mag.obj (IQmath)
                      003059e4    00000047                      : IQ16div.obj (IQmath)
                      00305a2b    00000047                      : IQ24div.obj (IQmath)
                      00305a72    00000042                      : IQ24sqrt.obj (IQmath)
                      00305ab4    00000027                      : IQ24cosPU.obj (IQmath)
                      00305adb    00000016                      : IQ16toF.obj (IQmath)
                      00305af1    00000016                      : IQ24toF.obj (IQmath)
    IQmathTables
    *          0    00306cda    00000b50     RUN ADDR = 0000a1d4
                      00306cda    00000b50     IQmath_fpu32.lib : IQmathTables.obj (IQmathTables)
    FPUmathTables
    *          0    00308270    000006a0     RUN ADDR = 0000ad24
                      00308270    000006a0     rts2800_fpu32_fast_supplement.lib : FPUmathTables.obj (FPUmathTables)
    fastRTS    0    00308b76    0000011b     RUN ADDR = 0000b3c4
                      00308b76    0000004f     rts2800_fpu32_fast_supplement.lib : atan2_f32.obj (.text)
                      00308bc5    00000019                                       : div_f32.obj (.text)
                      00308bde    00000034                                       : cos_f32.obj (.text)
                      00308c12    00000034                                       : sin_f32.obj (.text)
                      00308c46    00000021                                       : sqrt_f32.obj (.text)
                      00308c67    0000002a     rts2800_fpu32.lib : l_div28.asm.obj (.text)
    .reset     0    003fffc0    00000002     DSECT
                      003fffc0    00000002     rts2800_fpu32.lib : boot28.asm.obj (.reset)



    Thank you again for your help.

  • As for the difference in the object names, it seem that your library was build with different options or using different compiler which packaged the library differently.

    1. the library is build from multiple source files (obviously one .asm file for each operation). Without looking in the library (sources are available, so you are free to look into it) u_div28 is used when dividing two unsigned integers, l_div28 is used when dividing two longs and ll_div28 is used when dividing two long longs. You have to look through the complete project to find where are you using them. I recommend that you set you compiler to keep .asm files and that they should should have C source code interlisted (--c_src_interlist). Then you can parse for call to these functions and see where actually are you using them
    2. The multiplication does not need a library call as there is direct hardware instruction available for it. Same goes for addition and substraction.