TMS320F28377D: Question about occupying computing cycle in PI MACRO(V)

user6301134

Prodigy 50 points

Part Number: TMS320F28377D

As for the calculation cycle occupied by PI, the official routine is
The occupied calculation time is
As shown in the figure, a single PI calculation takes a total of 50 calculation cycles. In FPU, the calculation of saturation function takes about 40 calculation cycles. How can this be done?

over 3 years ago

0 Cherry Zhou over 3 years ago

TI__Mastermind 22235 points

Part Number: TMS320F28377D

Hi team,

Here's an issue from the customer may need your help:

The calculation period used by the PI, the official routines are as follows:

It is calculated as:

As shown above, a single PI calculation takes a total of 50 calculation cycles, and in the FPU, the calculation of the saturation function takes approximately 40+ calculation cycles, how is this achieved? Is IQmath required to be efficient with a total of 50 calculation cycles for a single PI?

Could you help check this case? Thanks.

Best Regards,

Cherry

0 MatthewPate over 3 years ago

TI__Guru* 77510 points

I've assigned your post to the subject matter expert, but due to US Holiday, please expect our response by the end of day Monday.

Best,

Matthew

0 user6301134 over 3 years ago in reply to MatthewPate

Prodigy 50 points

Thank you for your help. Now I have summarized the previous problems through my own test

1 The same statement would take 130 cycles to put in .c file, 65 cycles to put in .h file, and the fact that I said the saturation function would take 40 or more cycles was in a .c file. Why?

2 which is faster than FPU+TMU or IQmath

2. Is there room for further compression of the PI_MACRO I rewritten? If so, please give me the answer

0 Shanty over 3 years ago in reply to user6301134

TI__Expert 5410 points

Hi,

1) Can you share the disassembly of the code in both cases? It is possible that there is some compiler optimisation that's happening. What are the compiler options?

2) It depends on the operation. In the case of multiplication, IQmpy can take between 6 to 24 cycles depending on the type of multiplication and arguments (See IQMath library documentation benchmarks section). On FPU+TMU, a 32 bit multiplication operation takes 2 pipeline cycles. Keep in mind that you need to factor in the overhead of converting from fixed point to floating point and back.

3) Apart from using the '?' operator which you have in your comments, I don't see any other way to optimise that code further.

-Shantanu

0 user6301134 over 3 years ago in reply to Shanty

Prodigy 50 points

Thank you for your response.

1 I am sorry i do not know how to get the disassemble of the code in both cases?I just wrote these two functions in the file. After calling them, I found that the computation period of the statement with the same function was significantly different in c file and H file, which was about twice as long in C file. The compiler options are shown below:

2 if FPU+TMU is faster，why does TI F2837x project （IDDK_PM_Servo_F2837x）use IQmath for multiplication etc? Is there a special consideration?

3 As for the optimization of ’？‘, could you please give specific procedures？

0 Shanty over 3 years ago in reply to user6301134

TI__Expert 5410 points

Hi,

Sorry for the late response.

1) You can open the disassembly window during a debug session from view -> disassembly. You can take a screenshot of both cases and paste it here.

2) They may be using other operations like trig functions, etc which are faster.

3) The code in your comments in the screenshot above is the optimsed version.

-Shantanu

0 user6301134 over 3 years ago in reply to Shanty

Prodigy 50 points

Thank you for your practical advices.

1 for .h file

337 PI_MACRO(pi_isq_4);
014db8: E2AF001C MOV32 R0H, @0x1c, UNCF
014dba: E2AF011A MOV32 R1H, @0x1a, UNCF
014dbc: E7200008 SUBF32 R0H, R1H, R0H
014dbe: 7700 NOP
014dbf: E2030028 MOV32 @0x28, R0H
014dc1: E2AF011E MOV32 R1H, @0x1e, UNCF
014dc3: E2AF002C MOV32 R0H, @0x2c, UNCF
014dc5: E6940001 CMPF32 R1H, R0H
014dc7: AD14 MOVST0 NF,ZF
014dc8: 6010 SB C$L30, NEQ
014dc9: E2AF0022 MOV32 R0H, @0x22, UNCF
014dcb: E2AF0128 MOV32 R1H, @0x28, UNCF
014dcd: E7000008 MPYF32 R0H, R1H, R0H
014dcf: E2AF0132 MOV32 R1H, @0x32, UNCF
014dd1: E7000008 MPYF32 R0H, R1H, R0H
014dd3: E2AF012E MOV32 R1H, @0x2e, UNCF
014dd5: E7100040 ADDF32 R0H, R0H, R1H
014dd7: 6F03 SB C$L31, UNC
C$L30:
014dd8: E2AF002E MOV32 R0H, @0x2e, UNCF
C$L31:
014dda: E203002A MOV32 @0x2a, R0H
014ddc: 062A MOVL ACC, @0x2a
014ddd: 1E2E MOVL @0x2e, ACC
014dde: E2AF012A MOV32 R1H, @0x2a, UNCF
014de0: E2AF0028 MOV32 R0H, @0x28, UNCF
014de2: E7100040 ADDF32 R0H, R0H, R1H
014de4: 7700 NOP
014de5: E203002C MOV32 @0x2c, R0H
014de7: 062C MOVL ACC, @0x2c
014de8: 1E1E MOVL @0x1e, ACC
014de9: E2AF012C MOV32 R1H, @0x2c, UNCF
014deb: E2AF0024 MOV32 R0H, @0x24, UNCF
014ded: E6940001 CMPF32 R1H, R0H
014def: AD14 MOVST0 NF,ZF
014df0: 6503 SB C$L32, LEQ
014df1: 0624 MOVL ACC, @0x24
014df2: 1E1E MOVL @0x1e, ACC
C$L32:
014df3: E2AF0026 MOV32 R0H, @0x26, UNCF
014df5: E2AF012C MOV32 R1H, @0x2c, UNCF
014df7: E6940001 CMPF32 R1H, R0H
014df9: AD14 MOVST0 NF,ZF
014dfa: 6303 SB C$L33, GEQ
014dfb: 0626 MOVL ACC, @0x26
014dfc: 1E1E MOVL @0x1e, ACC

for c file

C$L2:
015eb9: FE84 SUBB SP, #4
015eba: 0006 LRETR
57 {
pi_fun_calc():
015ebb: FE0A ADDB SP, #10
015ebc: A844 MOVL *-SP[4], XAR4
59 err = v->pi_ref - v->pi_fdb;
015ebd: 8344 MOVL XAR5, *-SP[4]
015ebe: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
015ec0: E2AF01D5 MOV32 R1H, *+XAR5[2], UNCF
015ec2: E7200008 SUBF32 R0H, R1H, R0H
015ec4: 7700 NOP
015ec5: E2030048 MOV32 *-SP[8], R0H
60 v->ui=(pi_out==ui_delta)?(v->Ki*err*v->Tc+ui_delta):ui_delta;
015ec7: E2AF0146 MOV32 R1H, *-SP[6], UNCF
015ec9: E2AF004A MOV32 R0H, *-SP[10], UNCF
015ecb: E6940001 CMPF32 R1H, R0H
015ecd: AD14 MOVST0 NF,ZF
015ece: 600F SB C$L3, NEQ
015ecf: E2AF0148 MOV32 R1H, *-SP[8], UNCF
015ed1: E2AF00F4 MOV32 R0H, *+XAR4[6], UNCF
015ed3: E7000008 MPYF32 R0H, R1H, R0H
015ed5: E2AF03E5 MOV32 R3H, *+XAR5[4], UNCF
015ed7: E7000018 MPYF32 R0H, R3H, R0H
015ed9: E2AF024A MOV32 R2H, *-SP[10], UNCF
015edb: E7100080 ADDF32 R0H, R0H, R2H
C$L3:
015edd: 0212 MOVB ACC, #18
015ede: 0744 ADDL ACC, *-SP[4]
015edf: 8AA9 MOVL XAR4, @ACC
015ee0: E20300C4 MOV32 *+XAR4[0], R0H
61 ui_delta=v->ui;
015ee2: 0212 MOVB ACC, #18
015ee3: 0744 ADDL ACC, *-SP[4]
015ee4: 8AA9 MOVL XAR4, @ACC
015ee5: 06C4 MOVL ACC, *+XAR4[0]
015ee6: 1E4A MOVL *-SP[10], ACC
62 pi_out = v->ui +v->Kp*err;
015ee7: 0212 MOVB ACC, #18
015ee8: 0744 ADDL ACC, *-SP[4]
015ee9: 8AA9 MOVL XAR4, @ACC
015eea: 0208 MOVB ACC, #8
015eeb: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
015eed: 0744 ADDL ACC, *-SP[4]
015eee: 8AA9 MOVL XAR4, @ACC
015eef: E2AF0248 MOV32 R2H, *-SP[8], UNCF
015ef1: E2AF01C4 MOV32 R1H, *+XAR4[0], UNCF
015ef3: E7000051 MPYF32 R1H, R2H, R1H
015ef5: 7700 NOP
015ef6: E7100040 ADDF32 R0H, R0H, R1H
015ef8: 7700 NOP
015ef9: E2030046 MOV32 *-SP[6], R0H
63 v->pi_out=pi_out;
015efb: C446 MOVL XAR6, *-SP[6]
015efc: 0216 MOVB ACC, #22
015efd: 0744 ADDL ACC, *-SP[4]
015efe: 8AA9 MOVL XAR4, @ACC
015eff: C2C4 MOVL *+XAR4[0], XAR6
64 if(pi_out>v->pi_out_max)
015f00: 020A MOVB ACC, #10
015f01: 0744 ADDL ACC, *-SP[4]
015f02: 8AA9 MOVL XAR4, @ACC
015f03: E2AF0146 MOV32 R1H, *-SP[6], UNCF
015f05: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
015f07: E6940001 CMPF32 R1H, R0H
015f09: AD14 MOVST0 NF,ZF
015f0a: 6509 SB C$L4, LEQ
65 v->pi_out=v->pi_out_max;
015f0b: 020A MOVB ACC, #10
015f0c: 0744 ADDL ACC, *-SP[4]
015f0d: 8AA9 MOVL XAR4, @ACC
015f0e: C4C4 MOVL XAR6, *+XAR4[0]
015f0f: 0216 MOVB ACC, #22
015f10: 0744 ADDL ACC, *-SP[4]
015f11: 8AA9 MOVL XAR4, @ACC
015f12: C2C4 MOVL *+XAR4[0], XAR6
66 if(pi_out<-v->pi_out_max)
C$L4:
015f13: 020A MOVB ACC, #10
015f14: 0744 ADDL ACC, *-SP[4]
015f15: 8AA9 MOVL XAR4, @ACC
015f16: E2AF0146 MOV32 R1H, *-SP[6], UNCF
015f18: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
015f1a: E6AF0000 NEGF32 R0H, R0H, UNCF
015f1c: E6940001 CMPF32 R1H, R0H
015f1e: AD14 MOVST0 NF,ZF
015f1f: 630D SB C$L5, GEQ
67 v->pi_out=-v->pi_out_max;
015f20: 020A MOVB ACC, #10
015f21: 0744 ADDL ACC, *-SP[4]
015f22: 8AA9 MOVL XAR4, @ACC
015f23: 0216 MOVB ACC, #22
015f24: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
015f26: 0744 ADDL ACC, *-SP[4]
015f27: 8AA9 MOVL XAR4, @ACC
015f28: E6AF0000 NEGF32 R0H, R0H, UNCF
015f2a: E20300C4 MOV32 *+XAR4[0], R0H
68 }

2 for trig function ,Isn't TMU+FPU faster than IQmath?

0 Shanty over 3 years ago in reply to user6301134

TI__Expert 5410 points

1. The number of instructions in both cases are different, which leads to difference in cycle count.

2. The cycle count for TMU+FPU and IQMath are comparable. However, IQmath offers more resolution in some cases like Q30, which offers 9 decimal places of resolution while due to the IEEE754 standard, 32 bit floating point offers betweem 6 or 7 places. It also depends on the implementation of the rest of the application. It's quicker for I/O reads and writes to directly use fixed point than floating point. Hope this helps

-Shantanu

0 Shanty over 3 years ago in reply to Shanty

TI__Expert 5410 points

I have re assigned to someone from the compiler team to explain the reason for the change in the instructions from .h to .c,

0 George Mock over 3 years ago in reply to Shanty

TI__Guru**** 244930 points

The compiler option -Ooff is being used, which means optimization is disabled. It does not make sense to consider the performance of code built with optimization disabled. Please change -Ooff to -O2 or higher, and then compare. What do you see then?

Thanks and regards,

-George

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28377D: Question about occupying computing cycle in PI MACRO(V)