This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320F28069F: RPTB GPIO

Part Number: TMS320F28069F
Other Parts Discussed in Thread: TMS320F28379D

Tool/software: TI C/C++ Compiler

Hi,

We try to drive the GPIO as a parallel bus (as fast as possible) by reading a table, the code is simple :

for(j=0;j<30;j++)
{
        GpioDataRegs.GPADAT.all    = tab[j];
}

with this code we get 2MHz max probably because of the loop overhead.

By unrolling the loop manually, we get about 22MHz

At this moment, we thought that RPTB would help because the number of cycles is a constant (30 in this case)

We tried with different optimization levels, but the precious RPTB instruction doesn't appear.

- Do you know why ?

- How to force this instruction to be used in C ?

Regards,

Marc

  • For the source file which contains this loop, please follow the directions in the article How to Submit a Compiler Test Case.

    Thanks and regards,

    -George

  • The code is based on TI GPIO example, there is a small modification to update the GPIO port by going through a table.

    Compiler version : v20.2.1.LTS

    Optimisation : lvl3, speed vs size : 5 (max) floating point : relaxed

    but I tried many different levels with the same result.

    Here is the .pp file zipped.

    1411.Example_2806xGpioToggle.zip

  • Thank you for submitting a test case.  

    Part of the problem is that the loop control variable j ...

    for(j=0;j<30;j++)

    ... is a global variable.  Within the function, add a local variable also called j ...

        int j;

    If you build with --opt_level=3 --opt_for_speed=5, then the compiler completely unrolls the loop.  That may solve your problem.

    I also built with just --opt_level=3.  In that case, the generated code is wrong.  I filed the entry EXT_EP-9940 to have this investigated.  You are welcome to follow it with the link below in my signature.  Please see that entry for the details.

    I did not find a way to have the compiler use the repeat block instruction RPTB for the inner loop.  I suspect that is because the assignment in the loop is to a volatile variable.  A volatile expression, especially an assignment, often prevents compiler optimization.  Even so, I added a note to the entry I filed to look into whether it might be done in this case.

    Thanks and regards,

    -George

  • Thank you George,

    This does unroll the loop indeed, however, the frequency is not stable/even.

    I tried a basic loop:

    int N=0;
    for (N=0; N<25; N++)
    {
        b+=a*N;
    }

    222         for (N=0; N<25; N++)
    008545:   B50900A6    RPTB         #9, @AR6
            C$L1:
    008547:   2DA9        MOV          T, @AL
    008548:   761F0284    MOVW         DP, #0x284
    224             b+=a*N;
    00854a:   123C        MPY          ACC, T, @0x3c
    00854b:   723D        ADD          @0x3d, AL
    00854c:   92AC        MOV          AL, @T
    222         for (N=0; N<25; N++)
    00854d:   9C01        ADDB         AL, #1
    00854e:   7700        NOP          
    00854f:   7700        NOP          

    The instruction RPTB does appear, so you must be right, the volatile register may prevent the compilator to further optimize the code.

    I tried to trick it by using a constant pointer

    Uint32* const pGpio = 0x00006FC0;
    for(j=0;j<30;j++)
    {
        *(pGpio)    = tab[j];
    }

    but this time, the compiler doesn't unroll the loop, and the RPTB instruction is still not there, however the frequency is a stable 3.5MHz.

    Goal is to reach 10MHz to drive one or two parallel DAC.



    Best regards,
    Marc

  • In my previous post I noted a bug I found in the compiler.  Please take a close look at that entry.  Now look at the assembly code generated by the compiler for the latest variants of your loop.  Is it possible this bug, or one nearly like it, is affecting things?  

    Thanks and regards,

    -George

  • Marc,

    Even at opt_level=3 without volatile keyword, I'm not seeing RPTB usage.

    I'll ask one of our application experts for input on best practice for this.

    Thanks
    Greg

  • Thanks Greg,

    We look forward to your answer.

    Marc

  • Marc,

    The feedback I received indicated to possibly use below approach:

    https://e2e.ti.com/support/microcontrollers/c2000/f/171/t/572480?TMS320F28069-TMS320F28069

    Greg

  • Hello Greg,

    Sorry, I'm not exactly sure where to look in this thread which is about SPI and doesn't mention the RPTB instruction.

    Do you point out this post ?

    Marc

  • Marc,

    I've asked for more apps support with your particular use case for the near term. 

    For tracking the ticket submitted earlier, and possible future compiler performance improvements, please see below:

    Thanks
    Greg

  • Hi Marc, 

    I haven't been able to locate an example that fits your situation exactly.  The post Greg mentioned is just one I found which uses IO to emulate a peripheral.  I understand it is not the same as your specific application.  

    Best Regards

    Lori

  • Marc Fournier said:
    We try to drive the GPIO as a parallel bus (as fast as possible) by reading a table

    Rather than trying to do that with the CPU, could the DMA controller be used instead?

    I must admit I haven't tried to write any code to program the DMA controller in the TMS320x2806x, but looking at the TRM there are independent step sizes for the source and destination addresses which could allow the source address to increment through a table while the destination address remains fixed for the GPIO data register.

  • Chester Gillon said:
    Rather than trying to do that with the CPU, could the DMA controller be used instead?

    That's an interesting idea!   

  • Hello Chester and Lori,

    The DMA would be perfect, but according to the datasheet, it doesn't have access to the GPIO (or does it ?)

    Marc

  • Marc,

    Your observation is correct. DMA does not have access to GPIO registers.

    Regards,

    Vivek Singh

  • Marc,

    Aside from more near term solution suggestions from apps experts, you can track progress on optimization improvements with below:

    https://sir.ext.ti.com/jira/browse/EXT_EP-9940

    Regards,
    Greg

  • We tried the TMS320F28379D without success (same code, same results).

    This one has a uPP peripheral but is limited to 8bits. DMA doesn't have access to the GPIOs.

  • Hi Marc,

    I am joining the discussion a bit late, so if this has already been covered my apologies.    I saw this in the code you sent to George, are you able to get the results you need if the GPIO is accessed one after the other?  i.e.

        
    my_GPIOMsg()
    {    
        GpioDataRegs.GPADAT.all    = tab[0];
        GpioDataRegs.GPADAT.all    = tab[1];
        GpioDataRegs.GPADAT.all    = tab[2];
    ....
        GpioDataRegs.GPADAT.all    = tab[29];
    }

  • Marc Fournier said:
    We tried the TMS320F28379D without success (same code, same results).

    Hi Marc,

    I understand the compiler is currently not generating a solution that meets your needs.  I'd like to understand if you have any solution that meets your performance needs?   In the original post you mentioned unrolling the loop manually.  Did that yield code which met the requirements?  I'm wanting to understand if you have a workaround?

    -Lori

  • Hello Lori,

    Our goal is to feed a 12 bits DAC at 10MHz with up to 4096 points from the RAM. I think this is usually done with FPGA but we don't use FPGAs, we use the 28069 in almost every project.

    Unrolling the loop manually shows the real capability of the MCU : it can reach a 22MHz throughput on the GPIOs port, it would cover our needs however, doing it manually is impractical.

    Unrolling automatically gives an uneven output frequency,  with the optimization, there are not as many instructions between each GPIODAT updates.

    I think that the RPTB instruction would solve this and allow fast "bit bang" possibilities (which might be useful for other people). It probably could be done with a simple ASM function, something like this :

    RPTB #4096, label

    MOV32 GPIODATA, tab[i++]

    NOP (optional ?)

    label

    But i'm not efficient with ASM (as you can see).

    Best would be a compiler update, so we could stay in C.

  • Marc,

    Thank you for the details and summary.  I will look at this a bit closer.  I agree that the end goal would be to enhance to the compiler's capabilities to handle this type of scenario; it may have to be in assembly until that happens. 

    Regards

    Lori

  • Marc,

    This C code with pragma UNROLL(5) forms an RPTB block.  Did you still need the infinite loop?

    void Gpio_example1_a(void)
    {
       int j = 0;
    #pragma UNROLL(5)
       for(j = 0; j<30; j++)
          GpioDataRegs.GPADAT.all = tab[j];
    }

    _Gpio_example1_a:

        MOVB XAR6,#5             ; [CPU_ALU]
        MOVL XAR4,#_tab          ; [CPU_ARAU]
        RPTB $C$L4,AR6           ; [CPU_ALU] |8687|
    $C$L3:
        MOVL XAR7,*XAR4++        ; [CPU_ALU] |8689|
        MOVW DP,#_GpioDataRegs   ; [CPU_ARAU]
        MOVL ACC,*XAR4++         ; [CPU_ALU] |8689|
        MOVL @_GpioDataRegs,XAR7 ; [CPU_ALU] |8689|
        MOVL P,*XAR4++           ; [CPU_ALU] |8689|
        MOVL XAR7,*XAR4++        ; [CPU_ALU] |8689|
        MOVL @_GpioDataRegs,ACC  ; [CPU_ALU] |8689|
        MOVL @_GpioDataRegs,P    ; [CPU_ALU] |8689|
        MOVL ACC,*XAR4++         ; [CPU_ALU] |8689|
        MOVL @_GpioDataRegs,XAR7 ; [CPU_ALU] |8689|
        MOVL @_GpioDataRegs,ACC  ; [CPU_ALU] |8689|
    $C$L4:
        LRETR                    ; [CPU_ALU]

    Regards,

    Greg

  • Marc,

    Here is a hand-coded assembly function you could try.  It assumes 30 entries in myTable, myTable is global, and the address of the data register is hard coded (0x7F00)  I did a quick check on F2837x as I do not have a F2806x with me; haven't tested it fully.   

    void sendGPIO(void)

       .if __TI_EABI__
       .asg         sendGPIO, _sendGPIO
       .asg         myTable, _myTable
       .asg         MY_GPIO_LOOP, _MY_GPIO_LOOP
       .endif
    
        .ref _myTable
        .global _sendGPIO
    GPIO_DATA_REGISTER .set 0x7F00
    
        .text
    _sendGPIO:
        MOVL XAR6, #myTable
        MOVL XAR7, #GPIO_DATA_REGISTER
    
        ; RPTB requires blocksize of 9 words minimum
        ; Each MOVL is a 16-bit opcode
        RPTB MY_GPIO_LOOP, #(6-1)   ; Loop 6 times, total = 30 writes
        MOVL ACC, *XAR6++
        MOVL *XAR7, ACC             ; write 1
        MOVL ACC, *XAR6++
        MOVL *XAR7, ACC             ; write 2
        MOVL ACC, *XAR6++
        MOVL *XAR7, ACC             ; write 3
        MOVL ACC, *XAR6++
        MOVL *XAR7, ACC             ; write 4
        MOVL ACC, *XAR6++
        MOVL *XAR7, ACC             ; write 5, blocksize 10
    _MY_GPIO_LOOP