This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Implementing software delays despite the optimizer

I am trying to implement some simple software delays
using MSP430GCC compiler (-Os optimization).

void DELAY_uS(int count) //** Microsecond delay
{
    do {
    // Assume four clocks/loop (~1uS @ 4MHz)
    } while (--count);
}

void DELAY_mS(int count) //** Millisecond delay
{
    while (count--) {
        DELAY_uS(250);
        DELAY_uS(250);
        DELAY_uS(250);
        DELAY_uS(250);
     }     
}

I could not avoid DELAY_uS being optimized away to "reta",
or else bloated with memory accesses due to volatiles
so tried an assembly approach.

void DELAY_uS(int count) //** Microsecond delay
{
#define ASM(string)  __asm__ __volatile__(string)
    //ASM("add    #-1,r12");                  // Tried this
    //ASM("add    #-1,%0" : "=r" (count));    // Tried this        
    ASM("add    #-1,%0" : "=r" (count) : "r" (count));  // And tried this
    ASM("jnz    $-2");
}

This produces following output.
DELAY_uS is implemented as expected with r12 as the parameter
(as per SLAU646A section6) but calls to DELAY_uS are broken
 - being implemented as inline with r14 as an uninitialized parameter.

00010358 <DELAY_uS>:
   10358:    3c 53           add    #-1,    r12    ;r3 As==11
   1035a:    fe 23           jnz    $-2          ;abs 0x10358
   1035c:    10 01           reta            ;

0001035e <DELAY_mS>:
   1035e:    3d 40 fa 00     mov    #250,    r13    ;#0x00fa
   10362:    3c 53           add    #-1,    r12    ;r3 As==11
   10364:    3c 93           cmp    #-1,    r12    ;r3 As==11
   10366:    0a 24           jz    $+22         ;abs 0x1037c
   10368:    3e 53           add    #-1,    r14    ;r3 As==11
   1036a:    fe 23           jnz    $-2          ;abs 0x10368
   1036c:    3e 53           add    #-1,    r14    ;r3 As==11
   1036e:    fe 23           jnz    $-2          ;abs 0x1036c
   10370:    3e 53           add    #-1,    r14    ;r3 As==11
   10372:    fe 23           jnz    $-2          ;abs 0x10370
   10374:    3e 53           add    #-1,    r14    ;r3 As==11
   10376:    fe 23           jnz    $-2          ;abs 0x10374
   10378:    80 01 62 03     mova    #66402,    r0    ;0x10362
   1037c:    10 01           reta            ;

The optimizer decides that since DELAY_uS is such a small function it can
be inlined (using r14) and sets up r13 with the call parameter (250),
but does not move this to r14.

It gets even weirder if I try to sprinkle around volatiles.
For instance it can bloat out DELAY_uS as shown below,
while DELAY_mS still uses the same broken inline as above.

00010358 <DELAY_uS>:
   10358:    b1 00 02 00     suba    #2,    r1    ;
   1035c:    81 4c 00 00     mov    r12,    0(r1)    ;
   10360:    2c 41           mov    @r1,    r12    ;
   10362:    3c 53           add    #-1,    r12    ;r3 As==11
   10364:    81 4c 00 00     mov    r12,    0(r1)    ;
   10368:    fe 23           jnz    $-2          ;abs 0x10366
   1036a:    a1 00 02 00     adda    #2,    r1    ;
   1036e:    10 01       


Can suggest how how to pass a function parameter to an inline assembly statement.
Unfortunately 'volatile' does not seem to be the solution here since this makes the volatile
item a memory location which causes unnecessary bloat. I did also try declaring count as
'register' but this seems to be ignored.

  • You shall consider to use __delay_cycles()
  • As far as I can tell, __delay_cycles() can not achieve as tight a delay as required here.
    It only takes constant argument and so best possible is
    do {
    __delay_cycles(0); // or delay_cycles(1) with same result
    } while (--count);

    which generates
    10358: 03 43 nop
    1035a: 3c 53 add #-1, r12 ;r3 As==11
    1035c: 0c 93 cmp #0, r12 ;r3 As==00
    1035e: fc 23 jnz $-6 ;abs 0x10358

    It would be ok it if didn't add the redundant nop and cmp.
    But the shortcomings of __delay_cycles() is not the
    main issue here.

    The main point here is that I SHOULD be able to resort to ASM
    when the compiler does not do exactly what is required in
    cases like this.
    But there is a problem in passing a function argument
    to an ASM statement. This is either a compiler shortcoming,
    or else I am not getting the syntax quite right.

    i.e
    void DELAY_uS(int count) //** Microsecond delay
    {
    #define ASM(string) __asm__ __volatile__(string)
    //ASM("add #-1,r12"); // Tried this
    //ASM("add #-1,%0" : "=r" (count)); // Tried this
    ASM("add #-1,%0" : "=r" (count) : "r" (count)); // And tried this
    ASM("jnz $-2");
    }
    Fails in several different ways as described previously.

    NB: I know I can put all such code into separate modules which are
    compiled without optimizer, or use assembly source modules
    but this is a hassle.
  • __dealy_cycles() generates code to wait the specified number of cycles. If you want to create a loop that wait a number of milliseconds, you can do something like the following, assuming you run at 16Mhz:

    do 

    {

        __delay_cycles(16000);

    }

    while(--count);

    The overhead created by the loop is very small compared to code generated by the intrinsic.

    An alternative to using __delay_cycles() is to put the device into low-power-mode and use a timer to wake it up.

        -- Anders Lindgren, IAR Systems, Author of the IAR compiler for MSP430

  • See initial post.

    I was trying to implement a configurable delay in units of uSec as per DELAY_us(count)
    using a 4MHz clock.

    Of course this is only possible down to a few uSec due to call and loop overhead
    but can get close with: add #-1,r12 ; jnz $-2 ; reta

    But I can not get msp430gcc to implement this in a function as detailed previously.

    The old mspgcc-2007 could generate this asm code from 'C' loop even with optimization.
    But the msp430gcc optimizer prevents this as well as preventing the asm solutions I have tried.
  • I don't know your toolchain/compiler in detail, but check if it has a #pragma to switch off/on optimization for a given code section.
    Another solution would be to place your delay function in a separate module (*.c file), and set it's optimization level to zero on IDE level. Most IDEs allow specific build settings for individual files.
  • >>The main point here is that I SHOULD be able to resort to ASM when the compiler does not do exactly what is required in cases like this.

    Main point here was: you did not read documentation (of function in question) because it is indeed what you were looking for
  • If you mean that I can implemented DELAY_uS as a macro as per
    #define DELAY_uS(count)  __delay_cycles((count * 4))   / * 1uS = 4 clocks @ 4MHz */
    then yes, this is one option (it tends to bloat the code a bit, but that is a minor issue).

    It would be more helpful if your relies were a little more detailed
    rather than the classically arrogant RTFM response.

    I am not sure what documentation (FM) you are referring to here.
    I can not find in depth documentation for msp430gcc.
    Documents like slau646a, 644,534 are don't provide high detail on all the compiler features.
    Can get some ideas from slau32 - although that is not the same compiler and much does not apply.
    __delay_cycles() is briefly mentioned there - nothing much, as there is not much to know really.
    I can't find anything like the source forge manual on the old mspgcc.
    If you know of anything, please indicate.

    While the subject line here is about software delays, which can be 'solved' in several ways
    the bigger issue is the problem I found with passing an argument to ASM.
    But perhaps it is best I start a separate query for this.

  • I am not sure what documentation (FM) you are referring to here.
    I can not find in depth documentation for msp430gcc.

    The compiler (and preprocessor) documentation, of course.

    Pragmas are compiler-dependant. Actually, they are a way to implement compiler-specific features. Most probably the msp430gcc is compatible with the 'plain' gcc in this regard.

    But having your critical function in a separate source file and applying specific build settings ("-O0") does not depend on pragmas.

    While the subject line here is about software delays, which can be 'solved' in several ways
    the bigger issue is the problem I found with passing an argument to ASM.

    I would write an empty template function in C, use the compiler to generate an assembler template ("-S"), and fill it with my code. But I usually don't mess with assembler - I'm mainly using Cortex M controller, and don't think I can beat the compiler in efficiency. And for an issue like yours, I would either use a timer, or, for very short delays, insert NOPs and measure it with a scope.

  • >It would be more helpful if your relies were a little more detailed
    >rather than the classically arrogant RTFM response.

    I provided solution for you, thou you dare to complain. There is nothing much to tell indeed - intrinsic function which which generates delay for given count of CPU cycles. What else do you want? Programming class in the forum? :) Please don't forget that this is e2e forum, nobody is obligated to do anything here.

    >passing an argument to ASM.

    Well, as you have __delay_cycles() you don't need assembler anymore - most probably that's why nobody care to answer it.

**Attention** This is a public forum