This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28069: Sine and cosine function - long exection

Part Number: TMS320F28069
Other Parts Discussed in Thread: C2000WARE, CONTROLSUITE

Hello,

I have started a new project with the F28069.

As far as I know, I am using the Fast RTS library.

However, sine and cosine function calls each take 463 cycles, which is excessive.

The map file shows that the sine and cosine functions are pulled from the fast library.

What could be going wrong?

Thank you,

Tomas

  • Tomas,

    I see that there are some warnings on lines 471 and 472. Can you tell me what they are?

    Have you correctly included FPUmathTables in the linker cmd file if you are not using the ROMTABLES configuration?

    From the map file, it looks like you are using the FastRTS APIs in ROM. In that case, you must also link the ROM symbols library.

    If you plan to use the FLASH configuration, you must properly specify the load, run, start regions and memcpy the tables in runtime. Please refer to the examples for these configurations

    Please get back to me with this information so we can narrow down the root cause.

    -Shantanu

  • I see that there are some warnings on lines 471 and 472. Can you tell me what they are?

    Hello Shanty,

    The warnings are "Description Resource Path Location Type function "cos" declared implicitly Dyno_main.c /Dyno_F28069 line 471 C/C++ Problem".

    I did fix those warnings by including by the math header file (#include <math.h>). This has not changed the performance.

    Have you correctly included FPUmathTables in the linker cmd file if you are not using the ROMTABLES configuration?

    The FPUmath table is included in the default linker file. From the file is seems that the these libraries are not copied to the target but instead called from ROM (NOLOAD).

    I do not know what the ROMTABLES configuration even is. I searched the F28069 datasheet as well as the technical manual for this keyword nothing came up.

    I am using FLASH, again using the standard memcpy function:

    In that case, you must also link the ROM symbols library.

    If you plan to use the FLASH configuration, you must properly specify the load, run, start regions and memcpy the tables in runtime. Please refer to the examples for these configurations

    Which are the specific examples I should look at? The FLASH_F28069 does not seem to have any pertinent code.

    Thank you.

    EDIT: it seems that even if running from ROM, the functions should not take 460 cycles but only <50.

  • Tomas,

    I think I see the problem. You specified NOLOAD when allocating  the FPUMathTables, which means that you are using the tables in ROM. For this, you need to add the ROM symbols library as well so that your application can use those tables. In this case, you do not need to memcpy from FLASH to RAM because you are using the tables in the ROM. The ROM symbols library can be found in c2000ware.

    This is different when you load the tables in FLASH. In this case, you need to allocate the tables as follows: (Do not use NOLOAD in this case)

    FPUmathTables : LOAD = FLASH_BANK0_SEC1,
                                 RUN = RAMLS4,
                                 RUN_START(FPUmathTablesRunStart),
                                 LOAD_START(FPUmathTablesLoadStart),
                                 LOAD_SIZE(FPUmathTablesLoadSize),

    Then memcpy in the code as you specified. 

    In your case, I suspect that the APIs may be using the standard RTS trig functions, which work without FPU leading to higher clock cycles. You can verify this as follows: 

    Single assembly step into the cosine function call. Check if FPU32 instructions are being executed. Also check if the APIs are reading the values from the FPUMathTables locations for the computation. If these two cases are true, we may need to debug this further. 

    -Shantanu

  • FPUmathTables : LOAD = FLASH_BANK0_SEC1,
                                 RUN = RAMLS4,
                                 RUN_START(FPUmathTablesRunStart),
                                 LOAD_START(FPUmathTablesLoadStart),
                                 LOAD_SIZE(FPUmathTablesLoadSize),

    Hello Shanty,

    This has resulted in a number of issues during linking - apparently FLASH_BANK0_SEC1 and RAMLS4 are not defined in my default linker file provided by TI.

    " 0
    RUN = RAMLS4,
    "../cmd/F28069.cmd", line 156: error: no valid memory range available for
    placement of "FPUmathTables"
    FPUmathTables : LOAD = FLASH_BANK0_SEC1,
    "../cmd/F28069.cmd", line 156: error: program will not fit into available
    memory, or the section contains a call site that requires a trampoline that
    can't be generated for this section. run placement with alignment/blocking"

    And below is the print out of the assembly - it seems to execute FPU instructions.

    For my program, I started it from the provided code my the C2000 LaunchPad from ControlSuite.

    However, you are referencing C2000Ware, which is unknown to me.

    Can you provide me with a minimal functional example that I can load to see how to do configure my DSP correctly? 

    Thanks,

    Tomas

  • This has resulted in a number of issues during linking - apparently FLASH_BANK0_SEC1 and RAMLS4 are not defined in my default linker file provided by TI.

    What I provided was an example. ControlSuite is deprecated. Please refer to the examples in C2000ware. You can find it in <c2000ware>/libraries/math/FPUFastRTS/c28/examples. The out of box examples are not for F28069x. So you will have to take any of the examples and port them over (by changing the device variable in project settings and using the memory configuration from sample linker command files in device_support). Everything will remain the same except the specific memory ranges. 

    But regarding your screenshot, it looks correct and the correct API is being referenced. Can you assembly step through the function and look at the cycles (in CCS, run -> clock) to identify which instructions are contributing to the high cycle count. 

    -Shantanu

  • What I provided was an example. ControlSuite is deprecated. Please refer to the examples in C2000ware. You can find it in <c2000ware>/libraries/math/FPUFastRTS/c28/examples. The out of box examples are not for F28069x. So you will have to take any of the examples and port them over (by changing the device variable in project settings and using the memory configuration from sample linker command files in device_support). Everything will remain the same except the specific memory ranges. 

    I have tried doing so and the results are not good.

    The project is now using the correct linker files, though there was an error (TI ram funcs) so I had to google a bit and copy some code to the linker files.

    The project now compiles.

    However, when debugging, the example never makes it out from the setup function 

    //
    // Setup system clocking
    //
    FastRTS_Example_setupSysCtrl();

    This is likely caused by the example itself (C:\ti\c2000\C2000Ware_3_02_00_00\libraries\math\FPUfastRTS\c28\examples\sin_f32) since it calls for a header file for a completely different processor (F28002x).

    I have done exactly as you suggested but it seems that the examples cannot be easily ported to a different processor?

    ====

    Coming back to the assembly, I have done as you asked - it seems that each assembly instruction takes much longer than anticipated.

    Also notice the location of these instructions: 3F6xxx - they are all located in the flash memory. It seems that these functions need to be copied to RAM from flash.

      

    But what is worse, the output is wrong too - the input angle is 0.0 so the cos output should be 1.0 but it is 0.0.

    EDIT: I was actually able to fix this by including the 28069M header file instead of the 28069 header file. The trig functions now compute correctly but the execution still takes ~450 cycles each.

  • The issue is now fixed.

    This is what went wrong: 

    1) The CPU ticks between assembly steps were extremely large when considering the flash memory wait state should be ~3 or so. As it turns out, the InitFlash() function was not called correctly so the flash memory was initialized to its default values with large wait states (15).

    2) This sped up the execution from ~960 cycles to ~140 cycles. However, this is still much slower than the FastRTS datasheet. To fix that, I used the provided assembly files and changed their section to "ramfuncs", which is copied to RAM from flash. Luckily these assembly files take precedence over the functions in the built-in library. The cycle count is now 88, which matches the FastRTS documentation, more specifically the benchmark section.

    Based on these numbers, I am happy. I do not need to load the FPU tables from ROM to RAM. ROM is plenty fast for me.