Stellaris LM4F120; how to generate floating point instructions.

CJ Wilkerson

Other Parts Discussed in Thread: CODECOMPOSER

Hello,

I am trying to generate floating point instructions from a Stellaris LM4F120 board. Can someone give me a simple example of how to do this.

System info:

OS: Ubuntu 12.04 LTS

Microcontroller: Stellaris LM4F120

Software CodeComposer Studio 5.4

Thanks very much

over 12 years ago

0 Petrei over 12 years ago

Guru 26105 points

Hi,

Difficult task... I'm joking.. So you have some examples in TivaWare - one using some floating point operations is the application qs-rgb in Launchpad, but this is weak for your purposes (I think).

Another one is sine-demo in ek-lm4f232 board - but this one uses the graphical display - so an idea is to look to this example, use the formula found there ("SIN(2pi*t/4)*0.5") - and generate yourself a number of samples per period (32-64-128), and then printf them to a console.

One hint if you seems to be lost with this: try the formula first on a small PC application, written in C (GCC may be the single option in Linux) and then just move the code to main in your micro.

Petrei

0 CJ Wilkerson over 12 years ago in reply to Petrei

Prodigy 180 points

For clarification. Since I am looking to perform a lot of calculations efficiently, I want to verify that I am generating hardware floating point instructions. How can I verify that in the disassembly listing.

Thanks.

0 Petrei over 12 years ago in reply to CJ Wilkerson

Guru 26105 points

HI,

The data sheet of your micro, paragraph 2.8 lists all the asm instructions - those for floating point starts with v.... - so if you link with the right library you will be able to read in the listing some instructions, like vadd.f32 {Sx,} Sy, Sm

Petrei

0 CJ Wilkerson over 12 years ago in reply to Petrei

Prodigy 180 points

This is what Im looking at,

....
00005864:   EE000A10 FMSR            S0, R0
00005868:   4858     LDR             R0, $C$CON104
0000586a:   ED800A00 FSTS            S0, [R0, #0]
828                  accelerations_in[1] = y/1716.00407747 ;
0000586e:   9805     LDR             R0, [SP, #0x14]
00005870:   F006F9BA BL              __aeabi_f2d
00005874:   A49B     ADD             R4, PC, #0x26C $C$FL6
00005876:   E894000C LDMIA.W         R4, {R2, R3}
0000587a:   F004FC07 BL              __aeabi_ddiv
0000587e:   F006F811 BL              __aeabi_d2f
00005882:   EE000A10 FMSR            S0, R0
00005886:   489C     LDR             R0, $C$CON105
00005888:   ED800A00 FSTS            S0, [R0, #0]
829                  accelerations_in[2] = z/1716.00407747 ;
0000588c:   9806     LDR             R0, [SP, #0x18]
0000588e:   F006F9AB BL              __aeabi_f2d
00005892:   A494     ADD             R4, PC, #0x250 $C$FL6
....

I see that after a command there is a divide. This instruction is not in floating point, right? How can I make it so that the hardware generates floating point instructions.

Thanks very much, Petrei, for your help.

0 Chester Gillon over 12 years ago in reply to CJ Wilkerson

Guru 92251 points

Which TI ARM compiler version are you using, and what the the target processor version (--silicon_version, -mv), Specify floating point support (--float_support) and optimization level (--opt_level, -O) options set to?

Looking at the following code targeting a LM4F120H5QR with a target processor version of "7M4", floating point support of "FPv4SPD16" and optimization level of "2":

float w, e, mu, energy;
w = (e * mu) / (energy + 0.000000119209289f);

The TI ARM compiler v5.0.5 generated the following floating point instructions:

VADD.F32 S5, S1, S0 ; [DPU_LIN_PIPE] |260|
VMUL.F32 S4, S2, S4 ; [DPU_LIN_PIPE] |260|
VDIV.F32 S4, S4, S5 ; [DPU_LIN_PIPE] |260|

0 Chester Gillon over 12 years ago in reply to CJ Wilkerson

Guru 92251 points

CJ Wilkerson said:
accelerations_in[1] = y/1716.00407747 ;

On further consideration those constants will be implicity considered as double by the compiler, which will then promote the divide to double precision. The LM4F only has single precision hardware floating point support, so the implict double constants will force the software floating point support to be used. Try defining the constants a single precision (with a 'f' suffix), to allow the compiler to use hardware floating point:

accelerations_in[1] = y/1716.00407747f ;

0 CJ Wilkerson over 12 years ago in reply to Chester Gillon

Prodigy 180 points

Chester Gillon said:

Which TI ARM compiler version are you using, and what the the target processor version (--silicon_version, -mv), Specify floating point support (--float_support) and optimization level (--opt_level, -O) options set to?

Looking at the following code targeting a LM4F120H5QR with a target processor version of "7M4", floating point support of "FPv4SPD16" and optimization level of "2":
float w, e, mu, energy;

w = (e * mu) / (energy + 0.000000119209289f);

The TI ARM compiler v5.0.5 generated the following floating point instructions:

VADD.F32 S5, S1, S0 ; [DPU_LIN_PIPE] |260|
VMUL.F32 S4, S2, S4 ; [DPU_LIN_PIPE] |260|
VDIV.F32 S4, S4, S5 ; [DPU_LIN_PIPE] |260|

[/quote]

Target processor: 7M4

floating point support: FPvSPD16

Optimization level "0"

Compiler Version TI v5.0.4

Did not generate those instructions, instead it generated FMUL, FADD, FDIV.

0 CJ Wilkerson over 12 years ago in reply to CJ Wilkerson

Prodigy 180 points

Hello again.

We still have not been able to generate hardware floating point instructions. We have tinkered with a lot of settings and tried many other things but are still unable to make any progress on this issue.

Does anyone have any other ideas or things to try.

Thanks.

0 Petrei over 12 years ago in reply to CJ Wilkerson

Guru 26105 points

Hi,

Do you have a small test program - what you tried to do/test ? zip-it and post it, but don't forget to add also the Debug folder (generated code/listing) - I can help, first looking at the results and then compiling it under Windows, to see the differences -

Petrei

0 Chester Gillon over 12 years ago in reply to CJ Wilkerson

Guru 92251 points

CJ Wilkerson said:
We still have not been able to generate hardware floating point instructions. We have tinkered with a lot of settings and tried many other things but are still unable to make any progress on this issue.

To follow up on my previous comment about implicit double conversions, I created the following example for a LM4F120H5QR:

float float_divide (float y)
{
return y / 1716.00407747f;
}
float implicit_double_divide (float y)
{
return y / 1716.00407747;
}

int main(void)
{
return float_divide (1000.0f) + implicit_double_divide (1000.0f);
}

Hardware floating point instructions were created for the float_divide function:

float_divide:
;* --------------------------------------------------------------------------*
        SUB       SP, SP, #8            ; [DPU_3_PIPE]
        VSTR.32   S0, [SP, #0]          ; [DPU_LIN_PIPE] |6|
;----------------------------------------------------------------------
;   7 | return y / 1716.00407747f;
;----------------------------------------------------------------------
        LDR       A1, $C$FL1            ; [DPU_3_PIPE] |7|
        VMOV      S1, A1                ; [DPU_LIN_PIPE] |7|
        VLDR.32   S0, [SP, #0]          ; [DPU_LIN_PIPE] |7|
        VDIV.F32 S0, S0, S1            ; [DPU_LIN_PIPE] |7|
        ADD       SP, SP, #8            ; [DPU_3_PIPE]
        BX        LR                    ; [DPU_3_PIPE]

Whereas for implicit_double_divide the double constant caused the divide to implicitly be performed as double which used software double precision library to be called:

implicit_double_divide:
;* --------------------------------------------------------------------------*
        PUSH      {A4, LR}              ; [DPU_3_PIPE]
        VSTR.32   S0, [SP, #0]          ; [DPU_LIN_PIPE] |11|
;----------------------------------------------------------------------
; 12 | return y / 1716.00407747;
;----------------------------------------------------------------------
        LDR       A1, [SP, #0]          ; [DPU_3_PIPE] |12|
        BL        __aeabi_f2d           ; [DPU_3_PIPE] |12|
        ; CALL OCCURS {__aeabi_f2d }     ; [] |12|
        ADR       A3, $C$FL2            ; [DPU_3_PIPE] |12|
        LDMIA     A3, {A3,A4}           ; [DPU_3_PIPE] |12|
        BL        __aeabi_ddiv          ; [DPU_3_PIPE] |12|
        ; CALL OCCURS {__aeabi_ddiv }    ; [] |12|
        BL        __aeabi_d2f           ; [DPU_3_PIPE] |12|
        ; CALL OCCURS {__aeabi_d2f }     ; [] |12|
        VMOV      S0, A1                ; [DPU_LIN_PIPE] |12|
        POP       {A4, PC}              ; [DPU_3_PIPE]

Does your C code for which floating point instructions are not being generated contain any implicit or explicit double precision variables, constants or functions?

0 CJ Wilkerson over 12 years ago in reply to Petrei

Prodigy 180 points

Alright. I took the project that I was trying to get to generate hardware floating point instructions, copied it, and gutted the copy. All that is left of the code is a simple test made up of the code that Chester has provided and has proven generates hardware floating point instructions and the initialization of the FPU and the clock. I compiled it and ran it to double check that it did NOT generate hardware FP instructions before attaching it here. Any information you may need is probably stated in previous replies but if you need something, tell me where to find that and Ill post it here.

Thank you both for your help so far. Hopefully we can get this all figured out.

Attached is the compressed testproject

4834.testproject.zip

0 Chester Gillon over 12 years ago in reply to CJ Wilkerson

Guru 92251 points

CJ Wilkerson said:
I compiled it and ran it to double check that it did NOT generate hardware FP instructions before attaching it here

How did you determine that it did NOT generate hardwaare FP instructions?

I imported your testproject into a CCS 5.4 Workspace under Windows XP, changed the Unix style StellarisWare references to point to my Windows installation and run it in a Stellaris Launchpad. The CCS disassembler showed floating point instructions being generated for single point precision calculations:

40        void testfpu(float arg) {
          testfpu:
00000f44:   B500     PUSH            {LR}
00000f46:   F1AD0D14 SUB.W           R13, R13, #20
00000f4a:   ED8D0A00 FSTS            S0, [R13, #0]
42        e = arg;
00000f4e:   9800     LDR             R0, [SP]
00000f50:   9002     STR             R0, [SP, #0x8]
43        mu = arg;
00000f52:   9800     LDR             R0, [SP]
00000f54:   9003     STR             R0, [SP, #0xC]
44        energy = arg;
00000f56:   9800     LDR             R0, [SP]
45        w = (e * mu) / (energy + 0.000000119209289f);
00000f58:   ED9D1A03 FLDS            S2, [R13, #12]
44        energy = arg;
00000f5c:   9004     STR             R0, [SP, #0x10]
45        w = (e * mu) / (energy + 0.000000119209289f);
00000f5e:   482E     LDR             R0, $C$FL1
00000f60:   EE000A90 FMSR            S1, R0
00000f64:   ED9D0A04 FLDS            S0, [R13, #16]
00000f68:   EE300A80 FADDS           S0, S1, S0
00000f6c:   EDDD0A02 FLDS            S1, [R13, #8]
00000f70:   EE610A20 FMULS           S1, S2, S1
00000f74:   EE800A80 FDIVS           S0, S1, S0
00000f78:   ED8D0A01 FSTS            S0, [R13, #4]
46        w = sqrtf(e);
00000f7c:   ED9D0A02 FLDS            S0, [R13, #8]
00000f80:   F001F8A0 BL              sqrtf
00000f84:   ED8D0A01 FSTS            S0, [R13, #4]
00000f88:   B005     ADD             SP, #0x14
00000f8a:   BD00     POP             {PC}
49        {
          float_divide:
00000f8c:   F1AD0D08 SUB.W           R13, R13, #8
00000f90:   ED8D0A00 FSTS            S0, [R13, #0]
50         return y / 1716.00407747f;
00000f94:   4821     LDR             R0, $C$FL2
00000f96:   EE000A10 FMSR            S0, R0
00000f9a:   EDDD0A00 FLDS            S1, [R13, #0]
00000f9e:   EE800A80 FDIVS           S0, S1, S0
00000fa2:   B002     ADD             SP, #0x8
00000fa4:   4770     BX              R14
54        {
          implicit_double_divide:
00000fa6:   B51C     PUSH            {R2, R3, R4, LR}
00000fa8:   ED8D0A00 FSTS            S0, [R13, #0]
55         return y / 1716.00407747;
00000fac:   9800     LDR             R0, [SP]
00000fae:   F000FF29 BL              __aeabi_f2d
00000fb2:   A41B     ADD             R4, PC, #0x6C $C$FL3
00000fb4:   E894000C LDMIA.W         R4, {R2, R3}
00000fb8:   F7FFFF28 BL              __aeabi_ddiv
00000fbc:   F000FD55 BL              __aeabi_d2f
00000fc0:   EE000A10 FMSR            S0, R0
00000fc4:   BD1C     POP             {R2, R3, R4, PC}
58        int main(void) {
          main:
00000fc6:   B508     PUSH            {R3, LR}
60        init() ;
00000fc8:   F000F817 BL              init
61        testfpu(0.5f);
00000fcc:   EEB60A00 VMOVS           S0, #5.000000e-01
00000fd0:   F7FFFFB8 BL              testfpu
62        return float_divide (1000.0f) + implicit_double_divide (1000.0f);
00000fd4:   4814     LDR             R0, $C$FL4
00000fd6:   EE000A10 FMSR            S0, R0
00000fda:   F7FFFFD7 BL              float_divide
00000fde:   4812     LDR             R0, $C$FL4
00000fe0:   EEF00A40 VMOVS           S1, S0
00000fe4:   EE000A10 FMSR            S0, R0
00000fe8:   F7FFFFDD BL              implicit_double_divide
00000fec:   EE300A20 FADDS           S0, S0, S1
00000ff0:   EEBD0AC0 FTOSIZS         S0, S0
00000ff4:   EE100A10 FMRS            R0, S0
00000ff8:   BD08     POP             {R3, PC}
67        void init() {

Notes:

a) Single stepping into the sqrtf library function shows the hardware square root instruction FSQRTS being used, after the input has been validated (an invalid argument will cause an exception to be raised)

b) CCS doesn't currently display the contents of the floating point registers - which the subject of enhancement request CCS5.4 doesn't display the floating point registers for a Stellaris LM4F120H5QR

0 Chester Gillon over 12 years ago in reply to CJ Wilkerson

Guru 92251 points

CJ Wilkerson said:
Did not generate those instructions, instead it generated FMUL, FADD, FDIV.

I now realise that some of the confusion may be that I have been quoting instructions as a mixture from the CCS debugger diassembler and the assembler listing produced by the TI ARM compiler - where the two can display the same instructions differently!

E.g. the TI ARM compiler assembler listing was displaying the following, which matches the instruction names given in the TI LM4F datasheet and the ARM Cortex®-M4 Technical Reference Manual:

        VMOV      S1, A1                ; [DPU_LIN_PIPE] |45|
        VLDR.32   S0, [SP, #16]         ; [DPU_LIN_PIPE] |45|
        VADD.F32 S0, S1, S0            ; [DPU_LIN_PIPE] |45|
        VLDR.32   S1, [SP, #8]          ; [DPU_LIN_PIPE] |45|
        VMUL.F32 S1, S2, S1            ; [DPU_LIN_PIPE] |45|
        VDIV.F32 S0, S1, S0            ; [DPU_LIN_PIPE] |45|
        VSTR.32   S0, [SP, #4]          ; [DPU_LIN_PIPE] |45|

Whereas the CCS debugger diassembler displays the same instructions as:

00000f60:   EE000A90 FMSR            S1, R0
00000f64:   ED9D0A04 FLDS            S0, [R13, #16]
00000f68:   EE300A80 FADDS           S0, S1, S0
00000f6c:   EDDD0A02 FLDS            S1, [R13, #8]
00000f70:   EE610A20 FMULS           S1, S2, S1
00000f74:   EE800A80 FDIVS           S0, S1, S0
00000f78:   ED8D0A01 FSTS            S0, [R13, #4]

0 CJ Wilkerson over 12 years ago in reply to Chester Gillon

Prodigy 180 points

So the instructions FADD, FMUL, etc... are hardware floating point instructions?

0 Chester Gillon over 12 years ago in reply to CJ Wilkerson

Guru 92251 points

CJ Wilkerson said:
So the instructions FADD, FMUL, etc... are hardware floating point instructions?

Yes, from looking at the ARMv7-M Architecture Reference Manual:

a) The assembler listing from the TI ARM compiler is displaying the Unified Assembler Language (UAL) mnemonics

b) The CCS disassembly is displaying the "legacy" Pre-UAL assembler mnemonics.

Arm-based microcontrollers

Arm-based microcontrollers forum

Stellaris LM4F120; how to generate floating point instructions.