This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28377D: Question about occupying computing cycle in PI MACRO(V)

Part Number: TMS320F28377D
  • As for the calculation cycle occupied by PI, the official routine is

  • The occupied calculation time is

  • As shown in the figure, a single PI calculation takes a total of 50 calculation cycles. In FPU, the calculation of saturation function takes about 40 calculation cycles. How can this be done?  
  • Part Number: TMS320F28377D

    Hi team,

    Here's an issue from the customer may need your help:

    The calculation period used by the PI, the official routines are as follows:

    It is calculated as:

    As shown above, a single PI calculation takes a total of 50 calculation cycles, and in the FPU, the calculation of the saturation function takes approximately 40+ calculation cycles, how is this achieved? Is IQmath required to be efficient with a total of 50 calculation cycles for a single PI?

    Could you help check this case? Thanks.

    Best Regards,

    Cherry

  • I've assigned your post to the subject matter expert, but due to US Holiday, please expect our response by the end of day Monday.

    Best,

    Matthew

  • Thank you for your help. Now I have summarized the previous problems through my own test  

     

    1 The same statement would take 130 cycles to put in .c file, 65 cycles to put in  .h file, and the fact that I said the saturation function would take 40 or more cycles was in a .c file.  Why?

     

    2 which is faster than FPU+TMU or IQmath  

     

    2. Is there room for further compression of the PI_MACRO I rewritten? If so, please give me the answer  

  • Hi,

    1) Can you share the disassembly of the code in both cases? It is possible that there is some compiler optimisation that's happening. What are the compiler options?

    2) It depends on the operation. In the case of multiplication, IQmpy can take between 6 to 24 cycles depending on the type of multiplication and arguments (See IQMath library documentation benchmarks section). On FPU+TMU, a 32 bit multiplication operation takes 2 pipeline cycles. Keep in mind that you need to factor in the overhead of converting from fixed point to floating point and back.

    3) Apart from using the '?' operator which you have in your comments, I don't see any other way to optimise that code further. 

    -Shantanu

  • Thank you for your response.

    1 I am sorry i do not know how to get the disassemble of the code in both cases?I just wrote these two functions in the file. After calling them, I found that the computation period of the statement with the same function was significantly different in c file and H file, which was about twice as long in C file. The compiler options are shown  below:

    2 if FPU+TMU is faster,why does TI F2837x project (IDDK_PM_Servo_F2837x)use IQmath for multiplication etc? Is there a special consideration?

    3 As for the optimization of ’?‘, could you please give specific procedures?

  • Hi,

    Sorry for the late response.

    1) You can open the disassembly window during a debug session from view -> disassembly. You can take a screenshot of both cases and paste it here.

    2) They may be using other operations like trig functions, etc which are faster. 

    3) The code in your comments in the screenshot above is the optimsed version.

    -Shantanu

  • Thank you for your practical advices.

    1 for .h file

    337 PI_MACRO(pi_isq_4);
    014db8: E2AF001C MOV32 R0H, @0x1c, UNCF
    014dba: E2AF011A MOV32 R1H, @0x1a, UNCF
    014dbc: E7200008 SUBF32 R0H, R1H, R0H
    014dbe: 7700 NOP
    014dbf: E2030028 MOV32 @0x28, R0H
    014dc1: E2AF011E MOV32 R1H, @0x1e, UNCF
    014dc3: E2AF002C MOV32 R0H, @0x2c, UNCF
    014dc5: E6940001 CMPF32 R1H, R0H
    014dc7: AD14 MOVST0 NF,ZF
    014dc8: 6010 SB C$L30, NEQ
    014dc9: E2AF0022 MOV32 R0H, @0x22, UNCF
    014dcb: E2AF0128 MOV32 R1H, @0x28, UNCF
    014dcd: E7000008 MPYF32 R0H, R1H, R0H
    014dcf: E2AF0132 MOV32 R1H, @0x32, UNCF
    014dd1: E7000008 MPYF32 R0H, R1H, R0H
    014dd3: E2AF012E MOV32 R1H, @0x2e, UNCF
    014dd5: E7100040 ADDF32 R0H, R0H, R1H
    014dd7: 6F03 SB C$L31, UNC
    C$L30:
    014dd8: E2AF002E MOV32 R0H, @0x2e, UNCF
    C$L31:
    014dda: E203002A MOV32 @0x2a, R0H
    014ddc: 062A MOVL ACC, @0x2a
    014ddd: 1E2E MOVL @0x2e, ACC
    014dde: E2AF012A MOV32 R1H, @0x2a, UNCF
    014de0: E2AF0028 MOV32 R0H, @0x28, UNCF
    014de2: E7100040 ADDF32 R0H, R0H, R1H
    014de4: 7700 NOP
    014de5: E203002C MOV32 @0x2c, R0H
    014de7: 062C MOVL ACC, @0x2c
    014de8: 1E1E MOVL @0x1e, ACC
    014de9: E2AF012C MOV32 R1H, @0x2c, UNCF
    014deb: E2AF0024 MOV32 R0H, @0x24, UNCF
    014ded: E6940001 CMPF32 R1H, R0H
    014def: AD14 MOVST0 NF,ZF
    014df0: 6503 SB C$L32, LEQ
    014df1: 0624 MOVL ACC, @0x24
    014df2: 1E1E MOVL @0x1e, ACC
    C$L32:
    014df3: E2AF0026 MOV32 R0H, @0x26, UNCF
    014df5: E2AF012C MOV32 R1H, @0x2c, UNCF
    014df7: E6940001 CMPF32 R1H, R0H
    014df9: AD14 MOVST0 NF,ZF
    014dfa: 6303 SB C$L33, GEQ
    014dfb: 0626 MOVL ACC, @0x26
    014dfc: 1E1E MOVL @0x1e, ACC

    for c file

    C$L2:
    015eb9: FE84 SUBB SP, #4
    015eba: 0006 LRETR
    57 {
    pi_fun_calc():
    015ebb: FE0A ADDB SP, #10
    015ebc: A844 MOVL *-SP[4], XAR4
    59 err = v->pi_ref - v->pi_fdb;
    015ebd: 8344 MOVL XAR5, *-SP[4]
    015ebe: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
    015ec0: E2AF01D5 MOV32 R1H, *+XAR5[2], UNCF
    015ec2: E7200008 SUBF32 R0H, R1H, R0H
    015ec4: 7700 NOP
    015ec5: E2030048 MOV32 *-SP[8], R0H
    60 v->ui=(pi_out==ui_delta)?(v->Ki*err*v->Tc+ui_delta):ui_delta;
    015ec7: E2AF0146 MOV32 R1H, *-SP[6], UNCF
    015ec9: E2AF004A MOV32 R0H, *-SP[10], UNCF
    015ecb: E6940001 CMPF32 R1H, R0H
    015ecd: AD14 MOVST0 NF,ZF
    015ece: 600F SB C$L3, NEQ
    015ecf: E2AF0148 MOV32 R1H, *-SP[8], UNCF
    015ed1: E2AF00F4 MOV32 R0H, *+XAR4[6], UNCF
    015ed3: E7000008 MPYF32 R0H, R1H, R0H
    015ed5: E2AF03E5 MOV32 R3H, *+XAR5[4], UNCF
    015ed7: E7000018 MPYF32 R0H, R3H, R0H
    015ed9: E2AF024A MOV32 R2H, *-SP[10], UNCF
    015edb: E7100080 ADDF32 R0H, R0H, R2H
    C$L3:
    015edd: 0212 MOVB ACC, #18
    015ede: 0744 ADDL ACC, *-SP[4]
    015edf: 8AA9 MOVL XAR4, @ACC
    015ee0: E20300C4 MOV32 *+XAR4[0], R0H
    61 ui_delta=v->ui;
    015ee2: 0212 MOVB ACC, #18
    015ee3: 0744 ADDL ACC, *-SP[4]
    015ee4: 8AA9 MOVL XAR4, @ACC
    015ee5: 06C4 MOVL ACC, *+XAR4[0]
    015ee6: 1E4A MOVL *-SP[10], ACC
    62 pi_out = v->ui +v->Kp*err;
    015ee7: 0212 MOVB ACC, #18
    015ee8: 0744 ADDL ACC, *-SP[4]
    015ee9: 8AA9 MOVL XAR4, @ACC
    015eea: 0208 MOVB ACC, #8
    015eeb: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
    015eed: 0744 ADDL ACC, *-SP[4]
    015eee: 8AA9 MOVL XAR4, @ACC
    015eef: E2AF0248 MOV32 R2H, *-SP[8], UNCF
    015ef1: E2AF01C4 MOV32 R1H, *+XAR4[0], UNCF
    015ef3: E7000051 MPYF32 R1H, R2H, R1H
    015ef5: 7700 NOP
    015ef6: E7100040 ADDF32 R0H, R0H, R1H
    015ef8: 7700 NOP
    015ef9: E2030046 MOV32 *-SP[6], R0H
    63 v->pi_out=pi_out;
    015efb: C446 MOVL XAR6, *-SP[6]
    015efc: 0216 MOVB ACC, #22
    015efd: 0744 ADDL ACC, *-SP[4]
    015efe: 8AA9 MOVL XAR4, @ACC
    015eff: C2C4 MOVL *+XAR4[0], XAR6
    64 if(pi_out>v->pi_out_max)
    015f00: 020A MOVB ACC, #10
    015f01: 0744 ADDL ACC, *-SP[4]
    015f02: 8AA9 MOVL XAR4, @ACC
    015f03: E2AF0146 MOV32 R1H, *-SP[6], UNCF
    015f05: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
    015f07: E6940001 CMPF32 R1H, R0H
    015f09: AD14 MOVST0 NF,ZF
    015f0a: 6509 SB C$L4, LEQ
    65 v->pi_out=v->pi_out_max;
    015f0b: 020A MOVB ACC, #10
    015f0c: 0744 ADDL ACC, *-SP[4]
    015f0d: 8AA9 MOVL XAR4, @ACC
    015f0e: C4C4 MOVL XAR6, *+XAR4[0]
    015f0f: 0216 MOVB ACC, #22
    015f10: 0744 ADDL ACC, *-SP[4]
    015f11: 8AA9 MOVL XAR4, @ACC
    015f12: C2C4 MOVL *+XAR4[0], XAR6
    66 if(pi_out<-v->pi_out_max)
    C$L4:
    015f13: 020A MOVB ACC, #10
    015f14: 0744 ADDL ACC, *-SP[4]
    015f15: 8AA9 MOVL XAR4, @ACC
    015f16: E2AF0146 MOV32 R1H, *-SP[6], UNCF
    015f18: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
    015f1a: E6AF0000 NEGF32 R0H, R0H, UNCF
    015f1c: E6940001 CMPF32 R1H, R0H
    015f1e: AD14 MOVST0 NF,ZF
    015f1f: 630D SB C$L5, GEQ
    67 v->pi_out=-v->pi_out_max;
    015f20: 020A MOVB ACC, #10
    015f21: 0744 ADDL ACC, *-SP[4]
    015f22: 8AA9 MOVL XAR4, @ACC
    015f23: 0216 MOVB ACC, #22
    015f24: E2AF00C4 MOV32 R0H, *+XAR4[0], UNCF
    015f26: 0744 ADDL ACC, *-SP[4]
    015f27: 8AA9 MOVL XAR4, @ACC
    015f28: E6AF0000 NEGF32 R0H, R0H, UNCF
    015f2a: E20300C4 MOV32 *+XAR4[0], R0H
    68 }

    2 for trig function ,Isn't TMU+FPU faster than IQmath?

  • 1. The number of instructions in both cases are different, which leads to difference in cycle count. 

    2. The cycle count for TMU+FPU and IQMath are comparable. However, IQmath offers more resolution in some cases like Q30, which offers 9 decimal places of resolution while due to the IEEE754 standard, 32 bit floating point offers betweem 6 or 7 places. It also depends on the implementation of the rest of the application. It's quicker for I/O reads and writes to directly use fixed point than floating point. Hope this helps

    -Shantanu

  • I have re assigned to someone from the compiler team to explain the reason for the change in the instructions from .h to .c, 

  • The compiler option -Ooff is being used, which means optimization is disabled.  It does not make sense to consider the performance of code built with optimization disabled.  Please change -Ooff to -O2 or higher, and then compare.  What do you see then?

    Thanks and regards,

    -George