This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP-EXP430F5529LP: C code for selecting bits correectly

Part Number: MSP-EXP430F5529LP

I've writing code for selecting bits from registers. My code in C is:

  int16_t nextPhI=((RES2<<1)
		   |((RES1&0x8000)==0x8000))
    +((RES1&0x4000)==0x4000);

This gives the assembler

  40:	5f 02       	rlam	#1,	r15	;
  42:	4e 19 0a 10 	rpt #15 { rrux.w	r10		;
  46:	0f da       	bis	r10,	r15	;
  48:	4d 19 0e 10 	rpt #14 { rrux.w	r14		;
  4c:	5e f3       	and.b	#1,	r14	;r3 As==01

Then merges lower down.

But should be able to implement this as

BIT #8000, RES1
RLC RES2
BIT #4000, RES1
ADDC #0, RES2

Or

RLC RES1
RLC RES2
RLC RES1
ADDC #0, RES2

Where RES1 and RES2 are which ever registers they end up in.

Now can't find how many instruction cycles rpt#15 { rrux.w  takes - but it can't be good.

So is there a way to get gcc to c code that compiles to the simple code above, or do I have to do it as "asm" code?

  • First off, I would not use "==" in that way. What it returns is a true flag which can of course be any non-zero value. These days it would seem that compilers use a value of 1 but that could change so I wouldn't count on it.

    Second, the compiler has no mechanism for propagating condition codes and has no clue that they exist. At least on a top level. They do get used by the code generator and libraries for extended types like a long integer.

    I will admit that the bit shifting is wildly inefficient and do not know why it was done that way. Perhaps some misguided notion that it was more efficient to move that bit around. That might be true with an architecture that has a barrel shifter but on the MSP430 it is going to take one cycle per shift. (Plus more cycles for the instruction fetch.)

    Unless there are serious time constraints on this code I would leave it as is. Ugly but not worth the effort to do it in assembly.

  • Thanks David. Alas this is in a tight PWM interrupt loop, only have 400 cycles between interrupts - so yes kind sensitive to being as fast as possible.

    If I get time this weekend - I'll look at doing assembler - I've a uart attached right now, so can check the assembler is bit equivalent. Whats useful is the inputs "RES1" and "RES2" are actually memory mapped registers - so can just move the values out of the memory in assembler.

    Whats behind it is I'm doing the calculation:

    out = ( in + 0x4000 0000 ) >>31

    Where "in" is a cryptic int47_t. Anyway if you look at that, the output only depends on the 17 most significant bits (or for me 18 most, as this is living in a int48_t).

    Now using that there are only 17 bits, its how to just use those bits efficiently, rather than the painful >>31.

    Can do some simplifications like:

    out = ( (in>>30) + 1) >>1

    So now the addition is simple.

    On yes - on the "=="; what I used is right back from K&R days "A==A" has in the C language been defined to return 1. Now yes "if (A) B" runs "B" if "A" is any non zero value, but can use "==" to return a "1" or "0" value anyway. May try coding as "A?1:0" - but doubt it will make any difference.

    FOr what its worth I did start implementing the "+0x4000 000" - could do that by just writing it to the MPY32 accumulator before doing any calculation. That has the problem that I have to write 64 bits, or 4 words to initialise the MPY32 accumulator - and thats also painful - and then I still have the >>31 to do at the end  ...

  • Hi David Summers,

    The compiler is going to give you what it gives you, but it looks like you have a path forward by using the direct assembly code. 

    If you have any specific questions, please respond back to this thread.

    Regards,

    Evan

  • Are you using the optimizer? The compiler often doesn't do as well as I might have, but sometimes it surprises me. I probably would have just written it:

    nextPhI = (RES1 << 1);
    if (RES2 & 0x8000) nextPhI |= 1;
    if (RES2 & 0x4000) nextPhI += 1;

    Jumps are cheap enough -- certainly cheaper than all that shifting. And I can see what I'm doing.

  • I tried an alternate version using "?:" with identical results. The compiler shifted the bit it had. Using an even slower library call on a non CPUX part. This was at -O2 and -Os with GCC. I see no reason why your variation would be treated any differently.

    The construct makes perfect sense for targets that have single cycle barrel shifters but is wildly inefficient  for anything else.

    It can be coded much better in assembly by using the condition codes which the compiler mostly ignores. By shifting the needed bits into the carry flag there is no need for the bit testing which saves the 2 cycles required to fetch those bit masks.

  • When I do it (TI compiler) I don't see any shifting, just test/jump/or/test/jump/add. The only difference with optimization is that it does this in registers rather than in memory. I count something like 8 clocks for the longest path.

    Maybe there's something I'm missing here.

  • Yes that works better - gives the assembler:

      26:	1d 42 e8 04 	mov	&0x04e8,r13	;0x04e8
      2a:	5d 02       	rlam	#1,	r13	;
      2c:	1c 42 e6 04 	mov	&0x04e6,r12	;0x04e6
      30:	0c 93       	cmp	#0,	r12	;r3 As==00
      32:	00 34       	jge	$+2      	;abs 0x34
      34:	1d d3       	bis	#1,	r13	;r3 As==01
    
    00000036 <.L12>:
      36:	b2 b0 00 40 	bit	#16384,	&0x04e6	;#0x4000
      3a:	e6 04 
      3c:	00 24       	jz	$+2      	;abs 0x3e
      3e:	1d 53       	inc	r13		;
    

    I guess I'm showing I came from a world in the 1980,s when "if" was quite expensive; and also this would force nextPhI into a memory location - and then have to access from the memory which was expensive. Can still see a bit of that in the assembler - the test of bit 0x4000 of RES2 is done by reading the memory location it came from again, when could have just used the value in r12 - which I'd expect to be quicker.

    Anyway quick look at that - and it must be close to what I could hand code. By hand coding - I'd go be setting the carry bit in the tests, and doing RLC and ADDC to pull the carry into the nextPhI. e.g. noting that "inc" is implemented as "add #1", you could just remove the "jz" jump and instead used "addc #0" which works as "bit" sets the C bit when it gets a non zero result (e.g. bit 14 is set).

    Oh yes - am using the optimiser, varying between -Os and -O3; but with my code they almost no difference in the compiled code.

  • And just to close this thread down, I can do the same in assembler with:

      int16_t nextPhI;
      asm volatile ("mov &0x4e8, %0\n\t" // RES2
    		"mov &0x4e6, r13\n\t" // RES1
    		"rla r13\n\t" // bit 15 into Carry
    		"rlc %0\n\t" // times 2 on right bits
    		"rla r13\n\t" // bit 14 into Carry
    		"adc %0" // and rounding fraction part
    		: "=r" (nextPhI)
    		:
    		: "r13"
    		);

    And as you would expect - it compiles to:

      2a:   15 42 e8 04     mov     &0x04e8,r5      ;0x04e8
      2e:   1d 42 e6 04     mov     &0x04e6,r13     ;0x04e6
      32:   0d 5d           rla     r13             ;
      34:   05 65           rlc     r5              ;
      36:   0d 5d           rla     r13             ;
      38:   05 63           adc     r5              ;

    And if you look at the operational bits of assembly, there 4 instructions, v.s. the 7 that the best C code cold do; and the something like 40 that the worst C code did.

    So what I learn here is that gcc really isn't great with msp430, it doesn't understand the limitations of shifts and how to optimise them. So C code needs to be very carefully written. And hand written assembler wins hands down in quality of code.

    Just want to say thanks to everyone that replied to this thread. Really has helped my understanding of these little cpus.

**Attention** This is a public forum