Missing inline function code when compiling with cgt 6.4.9

Dieter Massa

Other Parts Discussed in Thread: TMS320F28377D

Hi,

we are running the following filter function on a F2833x Delfino:

static inline FLOAT32 filter3rdOrder(FilterStruct* FilterToCalc)
{
    FilterToCalc->fl32out =
      FilterToCalc->fl32b0 * FilterToCalc->fl32in
    + FilterToCalc->fl32b1 * FilterToCalc->fl32in_k1
    + FilterToCalc->fl32b2 * FilterToCalc->fl32in_k2
    + FilterToCalc->fl32b3 * FilterToCalc->fl32in_k3
    - FilterToCalc->fl32a1 * FilterToCalc->fl32out_k1
    - FilterToCalc->fl32a2 * FilterToCalc->fl32out_k2
    - FilterToCalc->fl32a3 * FilterToCalc->fl32out_k3;

    //Save outputs
    FilterToCalc->fl32in_k3 = FilterToCalc->fl32in_k2;
    FilterToCalc->fl32in_k2 = FilterToCalc->fl32in_k1;
    FilterToCalc->fl32in_k1 = FilterToCalc->fl32in;
    FilterToCalc->fl32out_k3 = FilterToCalc->fl32out_k2;
    FilterToCalc->fl32out_k2 = FilterToCalc->fl32out_k1;
    FilterToCalc->fl32out_k1 = FilterToCalc->fl32out;

return FilterToCalc->fl32out;
}

This functions uses the data structure:

typedef struct
{
    FLOAT32 fl32in_k1; // 0x00
    FLOAT32 fl32in_k2; // 0x02
    FLOAT32 fl32in_k3; // 0x04
    FLOAT32 fl32in_k4; // 0x06
    FLOAT32 fl32in;     // 0x08
    FLOAT32 fl32out_k1; // 0x0a
    FLOAT32 fl32out_k2; // 0x0c
    FLOAT32 fl32out_k3; // 0x0e
    FLOAT32 fl32out_k4; // 0x10
    FLOAT32 fl32out;    // 0x12
    FLOAT32 fl32a0;
    FLOAT32 fl32a1;
    FLOAT32 fl32a2;
    FLOAT32 fl32a3;
    FLOAT32 fl32a4;
    FLOAT32 fl32b0;
    FLOAT32 fl32b1;
    FLOAT32 fl32b2;
    FLOAT32 fl32b3;
    FLOAT32 fl32b4;
} FilterStruct;

We do apply Compiler optimization:

--opt_level=2
--opt_for_speed

When we run the filter function with cgt 6.1.6 correct and complete code is generated.
However, when using cgt 6.4.9 (or cgt 15.12.0) compiled code (checked by disassembling the obj or out file) for C instruction "FilterToCalc->fl32out_k2 = FilterToCalc->fl32out_k1;" is missing.
When we run the filter function as non-inlined function correct and complete code is generated even with cgt 6.4.9.

Is this known and expected behavior?

Thx and regards,
Dieter

over 9 years ago

0 George Mock over 9 years ago

TI__Guru**** 244120 points

I cannot reproduce this result. The difference is probably due to other code that is not shown.

Please preprocess the source file with the problem call to filter3rdOrder, and attach it to your next post. Also show the build options exactly as the compiler sees them.

Thanks and regards,

-George

0 stevenh over 9 years ago

Guru 20035 points

Hello Dieter,

Is FilterToCalc->fl32out_k2 used anywhere in the code? It probably optimized it out because FilterToCalc->fl32out_k2 was not used anywhere else in the code.

Change the function to static inline FLOAT32 filter3rdOrder(volatile FilterStruct * FilterToCalc); and pass it a pomter to a volatile FilterStruct.

Stephen

0 Dieter Massa over 9 years ago in reply to stevenh

Prodigy 130 points

Hello Stephen,
fl32out_k2 is not used at any other place in the code.
The filter function is called periodically and fl32out_k2 is used when the filter function is called next time. Therefore it cannot be optimized out.
fl32out_k2 usage is not different from fl32out_k1 usage or fl32out_k1 usage (or from fl32in_kx usage).

Declaring FilterStruct "inline" resolves the problem. But we are wondering which other code might be affected by the related compiler change.

For providing more code I would need a direct contact.

Dieter

0 Dieter Massa over 9 years ago in reply to Dieter Massa

Prodigy 130 points

Sorry - declaring FilterStruct "volatile" resolves the Problem.

0 stevenh over 9 years ago in reply to Dieter Massa

Guru 20035 points

You are correct. It doesn't seem like the optimizer should have removed that line. I am also using 6.4.9, so I am interested in finding out what is causing this issue.

0 George Mock over 9 years ago in reply to Dieter Massa

TI__Guru**** 244120 points

Dieter Massa said:
For providing more code I would need a direct contact.

You can send the code to me privately. Let your mouse float over my forum avatar or user name. A window pops up with a buttons in it. Click on Send a Private Message. A message compose interface comes up. Use the paper clip icon to attach the file.

Thanks and regards,

-George

0 George Mock over 9 years ago in reply to George Mock

TI__Guru**** 244120 points

Thank you for submitting a test case. I can reproduce the problem with the missing assignment to some structure members. I filed SDSCM00052737 in the SDOWP system to have this investigated. You are welcome to follow it with the SDOWP link below in my signature.

Thanks and regards,

-George

0 Dieter Massa over 9 years ago in reply to George Mock

Prodigy 130 points

Meanwhile I did some deeper investigation and I found the missing instruction:

MOVD32 is used to read the old values of fl32out_k1 and fl32in_k1

According to SPRUE02B assembler instruction "MOVD32 RaH, mem32" operations are:

RaH = [mem32]
[mem32 + 2] = [mem32]

The problem lies deeper:
Compiler version 6.4.9 generates the following code for reading fl32out_k2 and fl32out_k1
0000032f   e318       MOV32        R4H, @0xc
00000330   040c
00000331   e223       MOVD32       R3H, @0xa
00000332   030a
What we finally see is that both, fl32out_k2 and fl32out_k3 receive the old value of fl32out_k1.
This can only happen if R4H receives the new value of fl32out_k2 (the value after the MOVD32 partial operation [mem32 + 2] = [mem32] had finished).
Code we generate with cgt 6.1.6 does not contain MOVD32 instructions.
fl32in_k2 and fl32in_k3 are handled correctly. In this case there are a couple of other instructions between reading fl32in_k2 ("MOVL ACC, @0x2") and reading fl32in_k1 and moving fl32in_k1 to fl32in_k2 ("MOVD32 R5H, @0x0").  
So, this looks like a pipeline error.
Dieter

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

Yes, the compiler is generating an MOVD32 instruction, which is why you don't see an explicit write to fl32out_k2, and this is OK.

I cannot reproduce the error. I cannot find any problem with the generated assembly code, and the code executes correctly on an actual TMS320F28377D device.

I am able to reproduce code which has the instructions "MOV32 R3H,@_stFilter_1+12 || ADDF32 R0H,R0H,R3H" and "MOVD32 R2H,@_stFilter_1+10" back-to-back, but I can't quite reproduce exactly the code sequence you are seeing. We need a test case which demonstrates the error. It would help to see the complete generated assembly code for function CtrlFunc.

0 Dieter Massa over 9 years ago in reply to Archaeologist

Prodigy 130 points

The real problem is not visible in the assembly code.
It is visible only in the realtime data.
Expected behavior:
FilterToCalc->fl32out_k3 = FilterToCalc->fl32out_k2;
FilterToCalc->fl32out_k2 = FilterToCalc->fl32out_k1;
FilterToCalc->fl32out_k1 = FilterToCalc->fl32out;

Observed beavior:
FilterToCalc->fl32out_k3 = FilterToCalc->fl32out_k1; // <- here is the problem
FilterToCalc->fl32out_k2 = FilterToCalc->fl32out_k1;
FilterToCalc->fl32out_k1 = FilterToCalc->fl32out;

So, basically the real problem is not missing code - it is wrong execution of code.

I believe there is a problem with executing the sequence
MOV32 R4H, @0xc
MOVD32 R3H, @0xa
- at least on a F28335 microcontroller.
R4H is initialized with data at offset 0xc (fl32out_k2) after the "[mem+2] = [mem]" partial microcode of the MOVD32 R3H, @0xa
instruction was executed and not - as intended - before. Thus it is initialized with the original data from offset 0xa (fl32out_k1) which is finally written to offset 0xe (fl32out_k3). fl32out_k2 receives the data from fl32out_k1 via the "[mem+2] = [mem]" partial microcode.

Dieter

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

I'm sorry, I cannot reproduce an error, nor can I reproduce your assembly exactly.

Could you please post the full assembly code (or object file) for function CtrlFunc? I would also like to see the complete command-line options used to compile that file.

Can you set a breakpoint in CtrlFunc and step through the function? Does the problem still happen? At which point do the values diverge from what you expect? Does the problem happen every time the function is called, or just sometimes?

0 Dieter Massa over 9 years ago in reply to Dieter Massa

Prodigy 130 points

We have reproduced the problem also on a F28377D microcontroller.

Here is the complete object file disassembly for the simplyfied code

extern FilterStruct stFilter_1;

interrupt void CtrlFunc(void)
{
    filter3rdOrder(&stFilter_1);
} 

.sect ".text:retain"
000000:              _CtrlFunc:
000000:              .text:retain:
00000000   761b       ASP
00000001   fff0       PUSH         RB
00000002   abbd       MOVL         *SP++, XT
00000003   a0bd       MOVL         *SP++, XAR5
00000004   c2bd       MOVL         *SP++, XAR6
00000005   c3bd       MOVL         *SP++, XAR7
00000006   e200       MOV32        *SP++, STF
00000007   00bd
00000008   e203       MOV32        *SP++, R0H
00000009   00bd
0000000a   e203       MOV32        *SP++, R1H
0000000b   01bd
0000000c   e203       MOV32        *SP++, R2H
0000000d   02bd
0000000e   e203       MOV32        *SP++, R3H
0000000f   03bd
00000010   e203       MOV32        *SP++, R4H
00000011   04bd
00000012   e203       MOV32        *SP++, R5H
00000013   05bd
00000014   e203       MOV32        *SP++, R6H
00000015   06bd
00000016   e203       MOV32        *SP++, R7H
00000017   07bd
00000018   e630       SETFLG       RNDF32=1,RNDF64=1
00000019   0600
0000001a   2942       CLRC         OVM|PAGE0
0000001b   5616       CLRC         AMODE
0000001c   761f       MOVW         DP, #0x0
0000001d   0000
0000001e   0602       MOVL         ACC, @0x2
0000001f   e2af       MOV32        R4H, @0x8, UNCF
00000020   0408
00000021   e2af       MOV32        R0H, @0x1e, UNCF
00000022   001e
00000023   e2af       MOV32        R1H, @0x20, UNCF
00000024   0120
00000025   bda9       MOV32        R6H, ACC
00000026   0f2a
00000027   e223       MOVD32       R5H, @0x0
00000028   0500
00000029   e301       MPYF32       R0H, R4H, R0H
                   || MOV32        R3H, @0x22
0000002a   0322
0000002b   e303       MPYF32       R2H, R5H, R1H
                   || MOV32        R7H, @0x4
0000002c   5704
0000002d   e2af       MOV32        R1H, @0x24, UNCF
0000002e   0124
0000002f   e741       MPYF32       R3H, R6H, R3H
                   || ADDF32       R0H, R0H, R2H
00000030   00f3
00000031   e2af       MOV32        R6H, @0x16, UNCF
00000032   0616
00000033   e316       ADDF32       R0H, R0H, R3H
                   || MOV32        R3H, @0xc
00000034   030c
00000035   e223       MOVD32       R2H, @0xa
00000036   020a
00000037   e700       MPYF32       R6H, R2H, R6H
00000038   0196
00000039   7700       NOP
0000003a   7700       NOP
0000003b   e700       MPYF32       R1H, R7H, R1H
0000003c   0079
0000003d   bfa7       MOV32        XAR7, R6H
0000003e   0f2a
0000003f   e710       ADDF32       R0H, R0H, R1H
00000040   0040
00000041   bda7       MOV32        R1H, XAR7
00000042   0f16
00000043   7700       NOP
00000044   7700       NOP
00000045   7700       NOP
00000046   c40e       MOVL         XAR6, @0xe
00000047   e720       SUBF32       R1H, R0H, R1H
00000048   0041
00000049   bda6       MOV32        R0H, XAR6
0000004a   0f12
0000004b   7700       NOP
0000004c   e2af       MOV32        R7H, @0x18, UNCF
0000004d   0718
0000004e   e00e       MPYF32       R7H, R3H, R7H
                   || MOV32        @0x0, R4H
0000004f   fc00
00000050   e2af       MOV32        R6H, @0x1a, UNCF
00000051   061a
00000052   e753       MPYF32       R0H, R0H, R6H
                   || SUBF32       R1H, R1H, R7H
00000053   9380
00000054   1e04       MOVL         @0x4, ACC
00000055   e020       SUBF32       R0H, R1H, R0H
                   || MOV32        @0xe, R3H
00000056   430e
00000057   e2af       MOV32        R7H, *--SP, UNCF
00000058   07be
00000059   e203       MOV32        @0x12, R0H
0000005a   0012
0000005b   e203       MOV32        @0xa, R0H
0000005c   000a
0000005d   e2af       MOV32        R6H, *--SP, UNCF
0000005e   06be
0000005f   e2af       MOV32        R5H, *--SP, UNCF
00000060   05be
00000061   e2af       MOV32        R4H, *--SP, UNCF
00000062   04be
00000063   e2af       MOV32        R3H, *--SP, UNCF
00000064   03be
00000065   e2af       MOV32        R2H, *--SP, UNCF
00000066   02be
00000067   e2af       MOV32        R1H, *--SP, UNCF
00000068   01be
00000069   e2af       MOV32        R0H, *--SP, UNCF
0000006a   00be
0000006b   e280       MOV32        STF, *--SP
0000006c   00be
0000006d   c5be       MOVL         XAR7, *--SP
0000006e   c4be       MOVL         XAR6, *--SP
0000006f   83be       MOVL         XAR5, *--SP
00000070   87be       MOVL         XT, *--SP
00000071   fff1       POP          RB
00000072   7617       NASP
00000073   7602       IRET

This was generated with cl2000 commandline Options:

--keep_asm --quiet --asm_listing --c_src_interlist --optimizer_interlist --large_memory_model --silicon_version=28 --float_support=fpu32 --unified_memory --symdebug:dwarf --opt_level=2 --opt_for_speed

and with compiler:

TMS320C2000 C/C++ Compiler v6.4.9

So far we have always observed this problem.

I.o.w.: There were no situations when the intended operation

FilterToCalc->fl32out_k3 = FilterToCalc->fl32out_k2;

was executed.

Dieter

0 Dieter Massa over 9 years ago in reply to Dieter Massa

Prodigy 130 points

Further Investigation showed that the problem is not caused by the microcontroller but by code generation:
In the complete application we see the sequence
00ce51: 761F05CC MOVW DP, #0x5cc
00ce53: E70001D7 MPYF32 R7H, R2H, R7H
00ce55: A60A MOVDL XT, @0xa
00ce56: BDA20F1A MOV32 R2H, @XAR2
00ce58: BDAC0F16 MOV32 R1H, @XT
00ce5a: 7700 NOP
00ce5b: 7700 NOP
00ce5c: 7700 NOP
00ce5d: 7700 NOP
00ce5e: E7000008 MPYF32 R0H, R1H, R0H
00ce60: 7700 NOP
00ce61: 7700 NOP
00ce62: 7700 NOP
00ce63: BFA60F12 MOV32 @XAR6, R0H
00ce65: BDA00F12 MOV32 R0H, @XAR0
00ce67: C50C MOVL XAR7, @0xc
00ce68: A318 MOVL P, @0x18
...
00ce97: C30E MOVL @0xe, XAR7

address ce55:
partial (second) operation of movdl: fl32out_k1 old is copied to fl32out_k2

address ce67:
fl32out_k2 (==fl32out_k1 old) is read

address ce97:
fl32out_k3 = fl32out_k2 (==fl32out_k1 old)

So, the problem is caused by the context code.
In the context another filter function is executed and we see both operations are nested in the assembler code.

We try to find simple context code creating the problem.

Dieter

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

I think that the disassembly fragment you're showing as the location of the bug is from an entirely different function. I cannot find any sequence of instructions resembling that fragment in the disassembly for _CtrlFunc. Are you absolutely sure that this code is from function _CtrlFunc?

As evidence of my claim:

There is no sequence of 4 NOP instructions in _CtrlFunc.
There is no MPYF32 followed immediately by an MOVD32 in _CtrlFunc
There is no MOVDL instruction in _CtrlFunc

0 Dieter Massa over 9 years ago in reply to Archaeologist

Prodigy 130 points

Yes - the disassembly fragment is from our original code - not from CtrlFunc.
Our original assumption that the problem is caused by the MOV32, MOVD32 sequence was wrong.
It is really a code generation issue.
We try to find new and simple CtrlFunc code showing the problem.

Dieter

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

Understood. Please keep in mind that even if we could say for sure that the problem exists in the compiler, we're not going to be able to analyze or fix it without a C test case that demonstrates the problem.

0 Dieter Massa over 9 years ago in reply to Archaeologist

Prodigy 130 points

Here is the extracted code causing the Problem:
/*
Testcode for cgt649 ff issue
*/

typedef float FLOAT32;

typedef struct
{
FLOAT32 fl32in_k1; // 0x00
FLOAT32 fl32in_k2; // 0x02
FLOAT32 fl32in_k3; // 0x04
FLOAT32 fl32in_k4; // 0x06
FLOAT32 fl32in; // 0x08
FLOAT32 fl32out_k1; // 0x0a
FLOAT32 fl32out_k2; // 0x0c
FLOAT32 fl32out_k3; // 0x0e
FLOAT32 fl32out_k4; // 0x10
FLOAT32 fl32out; // 0x12
FLOAT32 fl32a0;
FLOAT32 fl32a1;
FLOAT32 fl32a2;
FLOAT32 fl32a3;
FLOAT32 fl32a4;
FLOAT32 fl32b0;
FLOAT32 fl32b1;
FLOAT32 fl32b2;
FLOAT32 fl32b3;
FLOAT32 fl32b4;
} FilterStruct;

typedef struct
{

FLOAT32 fl32A;
FLOAT32 fl32B;
FLOAT32 fl32C;
} ABC;

FilterStruct filter1;
ABC ABC1;
ABC ABC2;
FilterStruct filter2;

static inline FLOAT32 filter3rdOrder(FilterStruct* FilterToCalc)
//FLOAT32 filter3rdOrder(FilterStruct* FilterToCalc)
{
FilterToCalc->fl32out =
FilterToCalc->fl32b0 * FilterToCalc->fl32in
+ FilterToCalc->fl32b1 * FilterToCalc->fl32in_k1
+ FilterToCalc->fl32b2 * FilterToCalc->fl32in_k2
+ FilterToCalc->fl32b3 * FilterToCalc->fl32in_k3
- FilterToCalc->fl32a1 * FilterToCalc->fl32out_k1
- FilterToCalc->fl32a2 * FilterToCalc->fl32out_k2
- FilterToCalc->fl32a3 * FilterToCalc->fl32out_k3;

//Save outputs
FilterToCalc->fl32in_k3 = FilterToCalc->fl32in_k2;
FilterToCalc->fl32in_k2 = FilterToCalc->fl32in_k1;
FilterToCalc->fl32in_k1 = FilterToCalc->fl32in;
FilterToCalc->fl32out_k3 = FilterToCalc->fl32out_k2;
FilterToCalc->fl32out_k2 = FilterToCalc->fl32out_k1;
FilterToCalc->fl32out_k1 = FilterToCalc->fl32out;

return FilterToCalc->fl32out;
}

void CtrlFunc(void)
{

ABC2.fl32A = ABC1.fl32A;
ABC2.fl32B = ABC1.fl32B;
ABC2.fl32C = ABC1.fl32C;

filter3rdOrder(&filter1);

filter3rdOrder(&filter2);
}

Dieter

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

Are you compiling the new test case with the same options?

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

Okay! I was able to reproduce and analyze this bug. Thank you for the reproducible test case. It was definitely a novel failure mode. Basically, at a late stage of compilation, the compiler forgets that MOVDL writes to addr+2. I'll have to think about how to fix it.

0 Dieter Massa over 9 years ago in reply to Archaeologist

Prodigy 130 points

Hi,
please let me know with which cgt version this bug will be fixed and when these tools will be available.

thx
Dieter

0 Archaeologist over 9 years ago in reply to Dieter Massa

TI__Guru* 84225 points

This bug was fixed in C2000 compiler versions 6.4.10 and 15.12.2.LTS. I don't know exactly when they will be available, but they should both be available in 1-4 weeks.

Code Composer Studio™︎

Code Composer Studio forum

Missing inline function code when compiling with cgt 6.4.9