This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F280038C-Q1: Clock Cycle Count discrepancy when setting register

Part Number: TMS320F280038C-Q1


Tool/software:

Hello,

From my previous post (Found Here) I was under the impression that setting registers in the cpu was potentially much quicker than the CLA as the cpu had access to a larger instruction set. On that I notion I set my code to transfer PWM settings from the CLA to the CPU hoping to avoid the 8-10 cycles per instruction I was previously seeing. I need to set about 12 different registers so the additional clock cycles quickly add up. However, I found that when trying to set the PWM registers using CLA2CPU messages I was again seeing 8-10 cycles in the CPU now. Rather than try and debug the CLA code I found similar behavior with the below dummy code where the code on the left takes 3 cycles to set PHSDIR and the code on the right takes 8-10 cycles. 

The left code sets a uint16 variable ('set_reset_bit') to 200, increments by 1, and then masks it to set the PHSDIR bit. The right code is a running uint16 counter ('set_bit_count') that increments and then masks to set the PHSDIR bit. Can you help me understand why the right code takes 3X longer? 
 

  • Hello,

    Still looking for help on this.

    Thanks

  • For the code on the left - the compiler knows the value of set_reset_bit is always 2 so it uses this knowledge - the c28x doesn't have to read a variable, it can use the number 2 directly.

    For the code on the right - set_bit_count is a variable so the compiler is doing a bit more work. The DP moves are loading the datapage pointer in order to access the variable and the register. 

  • Hi Lori,

    Thanks for responding. I see the 0x2000 constant now for the 13th bit.

    Is 8 clock ticks the best I can do then if I want to set phsdir with a variable that can alternate between 0 and 1? I am aware I can do a set/clear

    like EPwm1Regs.TBCTL |= (1<<13),  EPwm1Regs.TBCTL &= ~(0x2000) which is faster, but not much of an improvement as I would need to include the logic around it.

  • Jason, 

    One idea - C28x supports MOV and MOV32 (when FPU is enabled) instructions which can load and store addresses in the range of 0-64K. This is very useful for operations on peripherals which are in low memory.  This allows the following cases to be optimized:

    // Define a struct for the peripheral register(s) or data. Must be in <= 64k of memory
    // for the optimization
    
    typedef struct {
        int x;
        int y;
    } PERIPHERAL;
    
    // Use the attribute to tell the compiler where the struct is
    // must match the registers or data struct location
    
    __attribute__((location((<location here>)))) PERIPHERAL peripheral;
    
    // Or address it directly with a pointer as shown
    
    #define peripheral (*(PERIPHERAL*)<location here>)

    This enables loads/stores to avoid the DP for “peripheral” or data struct accesses:

    global_int1 = peripheral.x;
    
    global_int2 = peripheral.y;
    
    peripheral.x = global_int1;
    
    peripheral.y = global_int2;