This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Increase CPU Speed TM4C1294 and compare to CMSIS DSP library



Dear all

CPU: TM4C1294 with 120MHz

Comiler: TI v5.2.4 (CCS6.1)

CMSIS: CMSIS-SP-00300-r4p3-00rel0

I plan to implement a 2 Phase Stepper Motor Vector regulator. That mean every 50us I need 2 PI regualtors for vector and Park and I-Park conversation Also a position PID regulator is used every 50us.

If I implement the vector regulator only with CMSIS DSP libray I need 7.8us. (Much too long)

If I implement the vector regulator only with my self optimized code, I need 2.8us. Also this 2.8us are a high load for the final application.

Question:

Is it possible to shorten this 2.8us when I use RAMCODE similar to the TMS320F280x DSP processors? (Can TM4C do the same and also it works faster if code is in RAM? Where I can find some application notes for beginners?)

Can the DSP core of TM4C work parallel to the normal CPU code? If parallel processing is possible: Where I can find examples of implentation?

Are other opportunities to speed up the TM4C core?

This is my 2.8us vector regulator made myself. This code use generally 16 Bit range of all variables and the result of multiplication are 32 Bit.

      sin_ang = sinus(uiAngle);
      cos_ang = cosinus(uiAngle);
      //Park transformation
      park_Ds = (si32CurrPhD * cos_ang + si32CurrPhQ * sin_ang) >> 15;
      park_Qs = (si32CurrPhQ * cos_ang - si32CurrPhD * sin_ang) >> 15;
      //D-Q Field PI-regulator
      si32Error = park_Ds -  MotPar.si32PowerOut;
      CommutPar.Curr_D_Ui = Sint32sat(CommutPar.Curr_D_Ui + si32Error, CommutPar.Curr_DQ_UiMax, - CommutPar.Curr_DQ_UiMax); //PI
      si32CurrPhD = (si32Error * CommutPar.Curr_D_Kp + CommutPar.Curr_D_Ui * CommutPar.Curr_D_Ki) >> 8;
      si32Error = park_Qs - CommutPar.Curr_Q_Set_Val;
      CommutPar.Curr_Q_Ui = Sint32sat(CommutPar.Curr_Q_Ui + si32Error, CommutPar.Curr_DQ_UiMax, - CommutPar.Curr_DQ_UiMax); //PI
      si32CurrPhQ = (si32Error * CommutPar.Curr_Q_Kp + CommutPar.Curr_Q_Ui * CommutPar.Curr_Q_Ki) >> 8;

      si32CurrPhD = Sint32sat(si32CurrPhD, CURR_AMPL_OUT_OF_RANGE, -CURR_AMPL_OUT_OF_RANGE);
      si32CurrPhQ = Sint32sat(si32CurrPhQ, CURR_AMPL_OUT_OF_RANGE, -CURR_AMPL_OUT_OF_RANGE);

      //I-Park transformation
      si32AmplPhD = (si32CurrPhD * cos_ang - si32CurrPhQ * sin_ang) >> 15; //Alpha
      si32AmplPhQ = (si32CurrPhQ * cos_ang + si32CurrPhD * sin_ang) >> 15; //Beta

This is the 7.8us vector regulator based on CMSIS DSP:

      q31theta = ((Uint32)uiAngle<<16);
      arm_sin_cos_q31(q31theta,&q31sinVal,&q31cosVal);
      q31Ialpha = ((Uint32)si32CurrPhD<<16);
      q31Ibeta = ((Uint32)si32CurrPhQ<<16);
      arm_park_q31(q31Ialpha,q31Ibeta,&q31pId,&q31pIq,q31sinVal,q31cosVal);
      arm_pid_init_q31(&PID_park_D,0);
      arm_pid_init_q31(&PID_park_Q,0);
      arm_inv_park_q31(q31pId,q31pIq,&q31Ialpha,&q31Ibeta,q31sinVal,q31cosVal);

Hope somebody have some input how to speed up the slow TM4C duck.

Franz

  • Hello Franz,

    Let me clarify a point here: There is no DSP core in TM4C device. The TM4C is a single CPU Core (Cortex M4F).

    What is the System Clock of the TM4C being configured for?

    Regards
    Amit
  • Poster likely confuses "DSP" w/Floating Point.     Indeed the M4F has such "floating" - yet not DSP - capability...

  • Dear Amit

    Thank you for replay. My system works with 120MHz.

    Can I increase the CPU speed, if the code is running in the RAM. Have you any samples if possible?

    The idea of TM4C has a DSP performance comes from Texas. I found this:

    Floating Point Unit
     
    • Example End Applications
    – Data compression, sensor array processing,
    statistical signal processing, multi-band graphic
    equalizers
    – Measure, filter, compress real-world analog
    signals
    – Control systems such as motor control, solar
    inverters, lighting control
    – Digital signal control applications that demand
    an efficient, easy-to-use blend of control and
    signal processing capabilities

    • Combined multiply and accumulate (MAC) functions for increased precision
    – 32 x 32 multiply accumulate (MAC) with 64-bit result

     Conversions between fixed-point and floating-point data formats, and floating-point
    constant instructions
    • 32-bit instructions for single-precision data-processing operations
    – Single instruction, multiple data (SIMD) for 16-bit data types
    • Decoupled three-stage pipeline
    • Hardware support for denormals and all IEEE rounding modes
    • Supports saturation math
    • FPU may be disabled to conserve power

    Franz

  • Hello Franz,

    120MHz is the max, irrespective of the Flash or SRAM code. From SRAM code execution may be faster as the Flash wait states will not be involved while SRAM will always have single wait state. However SRAM will be located to System Bus so that may bring the performance almost the same as Flash execution. I don't have a ready to use code but there are some examples on Code Composer Forum and TM4C forum where how code can be relocated to SRAM.

    Regards
    Amit
  • May I add that very recently - a detailed series of such focused (code in RAM vs Flash) posts arrived - and the findings revealed, "Not to warrant the time & effort!"     This post centered upon locating ISR w/in SRAM - but the general findings should hold...

  • Hi Franz,

    Some points:

    • Please use the code formatter. Especially with the ugly Hungarian notation, it's really hard to read!
    • Are you certain you need to run your position loop at 20 kHz? The mechanical time constant of the system probably voids any benefit you get from running that fast. The current loop I get, but position, no...
    • As Amit said, there is no separate DSP core - rather some hardware instructions aimed at DSP operations not present on lesser cores.
    • You hide much of your code still - for example the "sinus" and "cosinus" functions are probably eating a lot of cycles - unless they use a look-up table. With the amount of RAM on the 129, you could even generate the LUT into RAM on startup. Or, you can calculate the LUT beforehand and just make a const table of it so it'll end up in Flash.
    • You seem to have knowledge that there is a separate FPU in the M4F core. Yet you write your code in fixed-point. Suggest you re-write to take full advantage of the FPU - simplifies a lot when you don't have to juggle around different precisions. Most of the operations are single-cycle IIRC, so no reason to "hold back".
    • And last but not least: have you enabled code optimization in the compile options? At least the example projects I've based my "clean project" template on, don't have any kind of optimization enabled - which of course is good for debugging but results in really inefficient code. I had an ISR come down from ~25us to ~8us just by enabling optimization. As it happens that very ISR was for doing FOC of a BLDC.

  • Dear Veikko

    Thank you for your input.

    Sine and cos are 16-Bit adapted of the old Stellaris Library. It is simple and I think for my needs the precison is ok. (without interpolation between 2 values)

    Here the code of Sinus

    //*****************************************************************************
    //
    // A table of the value of the sine function for the first ninety degrees with
    // 129 entries (that is, [0] = 0 degrees, [128] = 90 degrees).  Each entry is
    // in 0.16 fixed point notation.
    //
    //*****************************************************************************
    static const uint16_t g_pui16FixedSineTable[] =
    {
        0x0000, 0x0324, 0x0648, 0x096C, 0x0C8F, 0x0FB2, 0x12D5, 0x15F6, 0x1917,
        0x1C37, 0x1F56, 0x2273, 0x2590, 0x28AA, 0x2BC4, 0x2EDB, 0x31F1, 0x3505,
        0x3817, 0x3B26, 0x3E33, 0x413E, 0x4447, 0x474D, 0x4A50, 0x4D50, 0x504D,
        0x5347, 0x563E, 0x5931, 0x5C22, 0x5F0E, 0x61F7, 0x64DC, 0x67BD, 0x6A9B,
        0x6D74, 0x7049, 0x7319, 0x75E5, 0x78AD, 0x7B70, 0x7E2E, 0x80E7, 0x839C,
        0x864B, 0x88F5, 0x8B9A, 0x8E39, 0x90D3, 0x9368, 0x95F6, 0x987F, 0x9B02,
        0x9D7F, 0x9FF6, 0xA267, 0xA4D2, 0xA736, 0xA994, 0xABEB, 0xAE3B, 0xB085,
        0xB2C8, 0xB504, 0xB73A, 0xB968, 0xBB8F, 0xBDAE, 0xBFC7, 0xC1D8, 0xC3E2,
        0xC5E4, 0xC7DE, 0xC9D1, 0xCBBB, 0xCD9F, 0xCF7A, 0xD14D, 0xD318, 0xD4DB,
        0xD695, 0xD848, 0xD9F2, 0xDB94, 0xDD2D, 0xDEBE, 0xE046, 0xE1C5, 0xE33C,
        0xE4AA, 0xE60F, 0xE76B, 0xE8BF, 0xEA09, 0xEB4B, 0xEC83, 0xEDB2, 0xEED8,
        0xEFF5, 0xF109, 0xF213, 0xF314, 0xF40B, 0xF4FA, 0xF5DE, 0xF6BA, 0xF78B,
        0xF853, 0xF912, 0xF9C7, 0xFA73, 0xFB14, 0xFBAC, 0xFC3B, 0xFCBF, 0xFD3A,
        0xFDAB, 0xFE13, 0xFE70, 0xFEC4, 0xFF0E, 0xFF4E, 0xFF84, 0xFFB1, 0xFFD3,
        0xFFEC, 0xFFFB, 0xFFFF
    };

    //*****************************************************************************
    //
    //! Computes an approximation of the sine of the input angle.
    //!
    //! \param ui16Angle is an angle expressed as a 0.16 fixed-point value that is
    //! the percentage of the way around a circle.
    //!
    //! This function computes the sine for the given input angle.  The angle is
    //! specified in 0.16 fixed point format, and is therefore always between 0 and
    //! 360 degrees, inclusive of 0 and exclusive of 360.
    //!
    //! \return Returns the sine of the angle, in 16.16 fixed point format.
    //
    //*****************************************************************************
    int16_t sinus(uint16_t ui16Angle)
    {
        uint16_t ui16Idx;

        //
        // Add 0.5 to the angle.  Since only the upper 9 bits are used to compute
        // the sine value, adding one to the tenth bit is 0.5 from the point of
        // view of the sine table.
        //
        ui16Angle += 0x0040;

        //
        // Get the index into the sine table from bits 30:23.
        //
        ui16Idx = (ui16Angle >> 7) & 255;

        //
        // If bit 30 is set, the angle is between 90 and 180 or 270 and 360.  In
        // these cases, the sine value is decreasing from one instead of increasing
        // from zero.  The indexing into the table needs to be reversed.
        //
        if(ui16Angle & 0x4000)
        {
            ui16Idx = 256 - ui16Idx;
        }

        //
        // Get the value of the sine.
        //
        ui16Idx = g_pui16FixedSineTable[ui16Idx];

        //
        // If bit 31 is set, the angle is between 180 and 360.  In this case, the
        // sine value is negative; otherwise it is positive.
        //
        if(ui16Angle & 0x8000)
        {
            return(0 - ui16Idx);
        }
        else
        {
            return(ui16Idx);
        }
    }

    Cosinus:

    #define cosinus(ui16Angle)         sinus((ui16Angle) + 0x4000)

    Position regulator with 20kHz is not needed I think also. But at the moment I make a study of a closed loop stepper together wit Profinet Library from http://www.port.de/en/products/profinet.html

    For this study I need to calculate with the maximum CPU load.

    Optimation: I checked optimation. result is with my Code: 2.2us - 7us. So these 2.8us was a compromise between Speed and Size. But the time with CMSIS DSP Library was with same setting like this 2.8us.

    Conclusion:

    I forget this CMSIS Library because it is too slow for my 50us IRQ code. With my optimized code I'm in all cases faster. In my code I can only optimise a very small amount of time. Agree?

    Franz

  • Franz,


    The code formatter is the button with the "</>" symbol on the toolbar. If the editing toolbar is not visible, click "use rich formatting" on the bottom right of the compose box. Please use it.

    So the sine/cosine functions are using a look-up table, that's good. You could still further optimize speed (and sacrifice size) by making a full 360 degree table, then the indexing could still be simplified. Also, ensure that the optimized code inlines the sine/cosine functions (check the produced assembly code) - if the compiler doesn't do that automatically, you can "suggest" it to do so by using the inline keyword.

    What is the Sint32sat function like? Is it inlined? Every non-inlined function call causes a penalty of some 20-30 cycles (Amit might remember the exact amount).

    I don't fully understand your time measurements optimized vs. non-optimized. Could you make a table of those? And did you crank the "speed vs size" slider all the way to max speed?

    And I still urge you to switch to "real" floating point calculations, as you have hardware that supports it. You might be able to simplify some operations - at least you'll get rid of the bitshifts.

    In the end, 2.2us is only 264 cycles - there's not too much to take away anymore!

    Franz Boeni said:
    Also this 2.8us are a high load for the final application.

    I was supposed to comment on this earlier, but forgot. How can 5,6 % (2.8/50) of CPU resources used to control a motor be too much in a motor controller application?

  • Dear Veikko

    Thank you to push me.

    Here CPU speed measurement results:

    These time based on my original code. Here the measured execution time under different settings:

    Optimation off:

    0: 7.9us, 1: 7.6us, 2: 7.4us, 3:4.3us, 4: 4.3us, 5, 4.3us

    Register optimation

    5: 2.5us

    Local Optimation:

    5: 2.25us

    Global optimation:

    5: 2.25us

    Interprocedure Optimation:

    5: 2.05us

    Whole Program Optimation:

    5: 1.45us

    Now I was going on with last setting: Whole Program Optimation: 5 (speed optimsed)

    Sine and cosine calculation with a full 360° table (see new code below):

    0.81us

    additional  as inline Sint32sat  (see new code below):

    0.81us ( Whole Program Optimation will make these inlines automatcally)

    Here the Sinus Routine:

    static const uint16_t g_pui16FixedSineTable[] =
    {
            0x0000, 0x0324, 0x0648, 0x096C, 0x0C8F, 0x0FB2, 0x12D5, 0x15F6, 0x1917,
            0x1C37, 0x1F56, 0x2273, 0x2590, 0x28AA, 0x2BC4, 0x2EDB, 0x31F1, 0x3505,
            0x3817, 0x3B26, 0x3E33, 0x413E, 0x4447, 0x474D, 0x4A50, 0x4D50, 0x504D,
            0x5347, 0x563E, 0x5931, 0x5C22, 0x5F0E, 0x61F7, 0x64DC, 0x67BD, 0x6A9B,
            0x6D74, 0x7049, 0x7319, 0x75E5, 0x78AD, 0x7B70, 0x7E2E, 0x80E7, 0x839C,
            0x864B, 0x88F5, 0x8B9A, 0x8E39, 0x90D3, 0x9368, 0x95F6, 0x987F, 0x9B02,
            0x9D7F, 0x9FF6, 0xA267, 0xA4D2, 0xA736, 0xA994, 0xABEB, 0xAE3B, 0xB085,
            0xB2C8, 0xB504, 0xB73A, 0xB968, 0xBB8F, 0xBDAE, 0xBFC7, 0xC1D8, 0xC3E2,
            0xC5E4, 0xC7DE, 0xC9D1, 0xCBBB, 0xCD9F, 0xCF7A, 0xD14D, 0xD318, 0xD4DB,
            0xD695, 0xD848, 0xD9F2, 0xDB94, 0xDD2D, 0xDEBE, 0xE046, 0xE1C5, 0xE33C,
            0xE4AA, 0xE60F, 0xE76B, 0xE8BF, 0xEA09, 0xEB4B, 0xEC83, 0xEDB2, 0xEED8,
            0xEFF5, 0xF109, 0xF213, 0xF314, 0xF40B, 0xF4FA, 0xF5DE, 0xF6BA, 0xF78B,
            0xF853, 0xF912, 0xF9C7, 0xFA73, 0xFB14, 0xFBAC, 0xFC3B, 0xFCBF, 0xFD3A,
            0xFDAB, 0xFE13, 0xFE70, 0xFEC4, 0xFF0E, 0xFF4E, 0xFF84, 0xFFB1, 0xFFD3,
            0xFFEC, 0xFFFB, 0xFFFF,
            0xFFFB, 0xFFEC,
            0xFFD3, 0xFFB1, 0xFF84, 0xFF4E, 0xFF0E, 0xFEC4, 0xFE70, 0xFE13, 0xFDAB,
            0xFD3A, 0xFCBF, 0xFC3B, 0xFBAC, 0xFB14, 0xFA73, 0xF9C7, 0xF912, 0xF853,
            0xF78B, 0xF6BA, 0xF5DE, 0xF4FA, 0xF40B, 0xF314, 0xF213, 0xF109, 0xEFF5,
            0xEED8, 0xEDB2, 0xEC83, 0xEB4B, 0xEA09, 0xE8BF, 0xE76B, 0xE60F, 0xE4AA,
            0xE33C, 0xE1C5, 0xE046, 0xDEBE, 0xDD2D, 0xDB94, 0xD9F2, 0xD848, 0xD695,
            0xD4DB, 0xD318, 0xD14D, 0xCF7A, 0xCD9F, 0xCBBB, 0xC9D1, 0xC7DE, 0xC5E4,
            0xC3E2, 0xC1D8, 0xBFC7, 0xBDAE, 0xBB8F, 0xB968, 0xB73A, 0xB504, 0xB2C8,
            0xB085, 0xAE3B, 0xABEB, 0xA994, 0xA736, 0xA4D2, 0xA267, 0x9FF6, 0x9D7F,
            0x9B02, 0x987F, 0x95F6, 0x9368, 0x90D3, 0x8E39, 0x8B9A, 0x88F5, 0x864B,
            0x839C, 0x80E7, 0x7E2E, 0x7B70, 0x78AD, 0x75E5, 0x7319, 0x7049, 0x6D74,
            0x6A9B, 0x67BD, 0x64DC, 0x61F7, 0x5F0E, 0x5C22, 0x5931, 0x563E, 0x5347,
            0x504D, 0x4D50, 0x4A50, 0x474D, 0x4447, 0x413E, 0x3E33, 0x3B26, 0x3817,
            0x3505, 0x31F1, 0x2EDB, 0x2BC4, 0x28AA, 0x2590, 0x2273, 0x1F56, 0x1C37,
            0x1917, 0x15F6, 0x12D5, 0x0FB2, 0x0C8F, 0x096C, 0x0648, 0x0324,
            0x0000,
            0x0324, 0x0648, 0x096C, 0x0C8F, 0x0FB2, 0x12D5, 0x15F6, 0x1917,
            0x1C37, 0x1F56, 0x2273, 0x2590, 0x28AA, 0x2BC4, 0x2EDB, 0x31F1, 0x3505,
            0x3817, 0x3B26, 0x3E33, 0x413E, 0x4447, 0x474D, 0x4A50, 0x4D50, 0x504D,
            0x5347, 0x563E, 0x5931, 0x5C22, 0x5F0E, 0x61F7, 0x64DC, 0x67BD, 0x6A9B,
            0x6D74, 0x7049, 0x7319, 0x75E5, 0x78AD, 0x7B70, 0x7E2E, 0x80E7, 0x839C,
            0x864B, 0x88F5, 0x8B9A, 0x8E39, 0x90D3, 0x9368, 0x95F6, 0x987F, 0x9B02,
            0x9D7F, 0x9FF6, 0xA267, 0xA4D2, 0xA736, 0xA994, 0xABEB, 0xAE3B, 0xB085,
            0xB2C8, 0xB504, 0xB73A, 0xB968, 0xBB8F, 0xBDAE, 0xBFC7, 0xC1D8, 0xC3E2,
            0xC5E4, 0xC7DE, 0xC9D1, 0xCBBB, 0xCD9F, 0xCF7A, 0xD14D, 0xD318, 0xD4DB,
            0xD695, 0xD848, 0xD9F2, 0xDB94, 0xDD2D, 0xDEBE, 0xE046, 0xE1C5, 0xE33C,
            0xE4AA, 0xE60F, 0xE76B, 0xE8BF, 0xEA09, 0xEB4B, 0xEC83, 0xEDB2, 0xEED8,
            0xEFF5, 0xF109, 0xF213, 0xF314, 0xF40B, 0xF4FA, 0xF5DE, 0xF6BA, 0xF78B,
            0xF853, 0xF912, 0xF9C7, 0xFA73, 0xFB14, 0xFBAC, 0xFC3B, 0xFCBF, 0xFD3A,
            0xFDAB, 0xFE13, 0xFE70, 0xFEC4, 0xFF0E, 0xFF4E, 0xFF84, 0xFFB1, 0xFFD3,
            0xFFEC, 0xFFFB, 0xFFFF,
            0xFFFB, 0xFFEC,
            0xFFD3, 0xFFB1, 0xFF84, 0xFF4E, 0xFF0E, 0xFEC4, 0xFE70, 0xFE13, 0xFDAB,
            0xFD3A, 0xFCBF, 0xFC3B, 0xFBAC, 0xFB14, 0xFA73, 0xF9C7, 0xF912, 0xF853,
            0xF78B, 0xF6BA, 0xF5DE, 0xF4FA, 0xF40B, 0xF314, 0xF213, 0xF109, 0xEFF5,
            0xEED8, 0xEDB2, 0xEC83, 0xEB4B, 0xEA09, 0xE8BF, 0xE76B, 0xE60F, 0xE4AA,
            0xE33C, 0xE1C5, 0xE046, 0xDEBE, 0xDD2D, 0xDB94, 0xD9F2, 0xD848, 0xD695,
            0xD4DB, 0xD318, 0xD14D, 0xCF7A, 0xCD9F, 0xCBBB, 0xC9D1, 0xC7DE, 0xC5E4,
            0xC3E2, 0xC1D8, 0xBFC7, 0xBDAE, 0xBB8F, 0xB968, 0xB73A, 0xB504, 0xB2C8,
            0xB085, 0xAE3B, 0xABEB, 0xA994, 0xA736, 0xA4D2, 0xA267, 0x9FF6, 0x9D7F,
            0x9B02, 0x987F, 0x95F6, 0x9368, 0x90D3, 0x8E39, 0x8B9A, 0x88F5, 0x864B,
            0x839C, 0x80E7, 0x7E2E, 0x7B70, 0x78AD, 0x75E5, 0x7319, 0x7049, 0x6D74,
            0x6A9B, 0x67BD, 0x64DC, 0x61F7, 0x5F0E, 0x5C22, 0x5931, 0x563E, 0x5347,
            0x504D, 0x4D50, 0x4A50, 0x474D, 0x4447, 0x413E, 0x3E33, 0x3B26, 0x3817,
            0x3505, 0x31F1, 0x2EDB, 0x2BC4, 0x28AA, 0x2590, 0x2273, 0x1F56, 0x1C37,
            0x1917, 0x15F6, 0x12D5, 0x0FB2, 0x0C8F, 0x096C, 0x0648, 0x0324, 0x0000,
    };

    /**********************************************************************
      function sinus
      input   :  ui16Angle 0..0xFFFF (0..360°)
      output  :  sin(ui16Angle)
      purpose : get Sine
      call    : --
    ***********************************************************************/
    int16_t sinus(uint16_t ui16Angle)
    {
        return g_pui16FixedSineTable[ui16Angle >> 7];
    }

    #define cosinus(ui16Angle)         sinus((ui16Angle) + 0x4000)

    Here the saturation function:

    /**********************************************************************
      function Sint32sat
      input   :  value and max, min
      output  :  value saturated of min an max
      purpose : Saturation
      call    :
    ***********************************************************************/
    inline Sint32 Sint32sat(Sint32 value, Sint32 max, Sint32 min)
    {
        if (value > max)
            value = max;
        if (value < min)
            value = min;
        return value;
    }

    Code part on which I measured the CPU execution time. (Part of 20kHz IRQ on the center of PWM pulse)

          MAP_GPIOPinWrite(GPIO_PORTN_BASE,GPIO_PIN_1,0x02);

          sin_ang = sinus(uiAngle);
          cos_ang = cosinus(uiAngle);
          //Park transformation
          park_Ds = (si32CurrPhD * cos_ang + si32CurrPhQ * sin_ang) >> 15;
          park_Qs = (si32CurrPhQ * cos_ang - si32CurrPhD * sin_ang) >> 15;
          //D-Q Field PI-regulator
          si32Error = park_Ds -  MotPar.si32PowerOut;
          CommutPar.Curr_D_Ui = Sint32sat(CommutPar.Curr_D_Ui + si32Error, CommutPar.Curr_DQ_UiMax, - CommutPar.Curr_DQ_UiMax); //PI
          si32CurrPhD = (si32Error * CommutPar.Curr_D_Kp + CommutPar.Curr_D_Ui * CommutPar.Curr_D_Ki) >> 8;
          si32Error = park_Qs - CommutPar.Curr_Q_Set_Val;
          CommutPar.Curr_Q_Ui = Sint32sat(CommutPar.Curr_Q_Ui + si32Error, CommutPar.Curr_DQ_UiMax, - CommutPar.Curr_DQ_UiMax); //PI
          si32CurrPhQ = (si32Error * CommutPar.Curr_Q_Kp + CommutPar.Curr_Q_Ui * CommutPar.Curr_Q_Ki) >> 8;

          si32CurrPhD = Sint32sat(si32CurrPhD, CURR_AMPL_OUT_OF_RANGE, -CURR_AMPL_OUT_OF_RANGE);
          si32CurrPhQ = Sint32sat(si32CurrPhQ, CURR_AMPL_OUT_OF_RANGE, -CURR_AMPL_OUT_OF_RANGE);

          //I-Park transformation
          si32AmplPhD = (si32CurrPhD * cos_ang - si32CurrPhQ * sin_ang) >> 15; //Alpha
          si32AmplPhQ = (si32CurrPhQ * cos_ang + si32CurrPhD * sin_ang) >> 15; //Beta
          MAP_GPIOPinWrite(GPIO_PORTN_BASE,GPIO_PIN_1,0x00);

    Veikko:And I still urge you to switch to "real" floating point calculations, as you have hardware that supports it. You might be able to simplify some operations - at least you'll get rid of the bitshifts.

    Franz: NO I not go back to float. I still have projects where this 16Bit and 32 Bit (Interpreted like IQ16) is working very well. Here it was only a study and a compare with float. And my self written code is arround 3 times faster than float and is doing the same and it is testet for this motor applications.

    Veikko: I was supposed to comment on this earlier, but forgot. How can 5,6 % (2.8/50) of CPU resources used to control a motor be too much in a motor controller application?

    Franz:Implementation of Profinet is the problem. () If Interrupt CPU load of motor is too high, we have too much jitter in the Profinet communication.I want to reduce the Interrupt CPU load as much as possible. And in my case I develope a vector stepper motor drive, where the AD conversation and current regulation must be exactly in the center of every PWM cycle. So the profinet appliction must have lower priority than my central motor code. And there I expect problems if load is going into 20% CPU load region.

    Hope this example will help to optimize the CPU speed of TM4C processors for motor vector regulator. Thank you for all inputs of all writers in this case.

    Franz