This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28384S: Performance of PWM driverlib function call

Part Number: TMS320F28384S
Other Parts Discussed in Thread: C2000WARE

Hi TI Team,

We are using the 28384S processor and driverlib from C2000Ware_4_01_00_00, and performance of the driverlib seems to be slow.

Background: We are running a very high speed control loop in the multiple 10s of kHz frequency.  In previous projects before the introduction of driverlib, we would set the count for duty cycle by directly writing to the EPWM registers themselves like EPwm1Regs.CMPA.all = count;.  Now with driverlib we are doing this by calling EPWM_setCounterCompareValue.

Ran a test to set 10 PWM registers like this:

HWREGH(EPWM5_BASE + EPWM_O_CMPA + EPWM_COUNTER_COMPARE_A + 0x1U) = myCountForDutyCycle;

vs.

EPWM_setCounterCompareValue(EPWM5_BASE, EPWM_O_CMPA, myCouuntForDutyCycle);

The 10 direct writes take ~270nsec vs the 10 calls to driverlib function takes ~1.8usec, basically a 6x timing increase.

Are we using the driverlib correctly?

Are we missing something in terms of optimization for this library? 

  • Rafael,

    Really sorry for the delay in response. Some webpage maintenance activities and holidays in India delayed the response.

    This is the correct implementation. I do not have an exact comparison table with respect to bitfield and driverlib, but since driverlib has some additional checks, the timing difference is expected. I need to check for a better comparison between the bitfield and driverlib implementation. I'll check and get back to you on that.

    On top of this, I'll definitely recommend you to consider that if the operation is timing critical, it is definitely better to use the prior approach that you've used.

    Aditya

  • Also, are you using any optimization options? There is only an additional ASSERT check that we use inside the driverlib code, which should anyways not affect the final code when you use the RELEASE mode instead of debug.

  • Hello Aditya,

    Measurements above were done with a release build fully optimized for speed.

  • Can you please clarify the optimization used for compiler, eg. O1, O2, etc?

    You can refer to optimization guide: 4.2. Optimization levels — C2000Tm C28x Optimization Guide

    You can try using O2 level of optimization. When the optimization is off, the numbers for function call are going to be off as compared to the direct HWREG call.

  • Can you please help Rafael here on the optimization?

  • Hi Rafael,

    I looked at the function implementation of EPWM_setCounterCompareValue and it involves an if/else condition.

    If the the parameter passed is a constant, the compiler will optimize it and totally remove the if condition. But since in your case, it is a variable, there would be a comparison and a branch. One branch takes 4 cycles. + cycles for comparison, offset computation

    You can take a look at the generated assembly code for more details

    I believe you do not have DEBUG macro set under predefined symbols. If set, the ASSERT checks are also included.

    Regards,

    Veena

  • Correct we are not using the DEBUG macro.

    I ran the test myself to verify the numbers that were previously reported to me.

    Test Logic

    Set GPIO for measurement

    EPWM_setCounterCompareValue(base, counterCompareModule, count);

    ... another 9 calls to different bases, compare modules and counts to ensure is not been optimized out.

    Clear GPIO for measurement

    This takes ~1.9usec to run.

    Then going 

    Set GPIO for measurement

    HWREGH(base + registerOffset + 0x1U) = compCount;

    ... followed by another 9 of these with different data to ensure it does not get optimized out

    Clear GPIO for measurement

    This takes ~514nsec -- not as drastic as previously reported of ~270nsec, I think the previous test was getting some code optimized out.

    These results confirm that calling the driverlib API to set this counter compare value takes almost 4 times longer than the write to register directly.

    For sakes of performance, it sounds like we will have to go with a direct write.

    Is the HWREGH the right way to go, or is it better we include the .h files under device support files and use the EPwm1Regs.COMPA.all = value approach?  What is recommended?

  • Our project at the moment is using driverlib exclusively to setup/control hardware.

    But if I have to go the direct register write operation to get the best performance then it looks like we would have to include files form the device support:

    C:\ti\c2000\C2000Ware_4_01_00_00\device_support\f2838x\headers\include

    Is this the only way to get max performance?

    Or am I missing something else?

  • Is the HWREGH the right way to go, or is it better we include the .h files under device support files and use the EPwm1Regs.COMPA.all = value approach?  What is recommended?

    Some of the observations we had during performance analysis on driverlib vs direct register access :

    1) Majority of the driverlib functions are inline functions and involves direct register writes without conditions. In such cases the number of cycles it takes for driverlib and HWREG is exactly the same

    2) The inline functions which has conditional branches within - There are 2 scenarios here  : If the input parameter is a constant, compiler is smart enough to completely remove the conditional branch and directly write to the required register. The cycles would remain same. But in case compiler is not aware of the input parameter, it has to perform the conditional branch causing more cycles

    Comparing HWREG vs bitfield way -

    1) Performance : Compiler uses different addressing modes for HWREG and bitfields. We have seen bitfields works a bit better in cases where there are back to back access to the registers from same peripheral

    2) Ease of use : In my personal opinion, bitfield code is a bit cleaner and more intuitive than HWREG

    3) Safety Standard aspect  : How bitfields are packed within a structure is compiler specific, and is not described in the C standards. In case of C2000 devices, there is a single compiler from TI. So this may not be an issue. But I am not sure if usage bitfields is prohibited in any safety standards. I believe MISRA standard states not to use unions which are used in our bitfield implementation

    C:\ti\c2000\C2000Ware_4_01_00_00\device_support\f2838x\headers\include

    In addition to these headers, you would also need to add the globalvariabledefs.c and the cmd files for allocating the structs to the correct memory address

    Regards,

    Veena

  • Hi Veena,

    Appreciated the information.

    I wanted to make sure I was not using the library incorrectly or missing a simple optimize this or that setting.

    I suspected all along that due to the high speed of our control sequence we had to remain with the register direct access.

    Agree with your advice that using the bitfield access is cleaner and code will be more readable.

    Thank you.

    Rafael