C6000-CGT: CGT 8.5.0 erroneously schedules load before store

Markus Moll

Part Number: C6000-CGT

We ran into a problem with the CGT 8.5.0 that we did not observe with earlier compilers. I tried to extract a relatively minimal example that I attached. If you compile the code with -O2 optimizations, the compiler emits code that tries to read a pointer before it is initialized. In our case, this resulted from reading random stack content and treating it as a pointer, which led to various crashes further down the line.

This is the code in question:

#include <algorithm>

struct spi
{
	void callme();
};

struct tag {};

class sp
{
	void* p;
	spi* i;

public:
	sp() : p(nullptr), i(nullptr) {}
	sp(void* p, spi* i) : p(p), i(i) {} // not strictly necessary to trigger the bug (but before the compiler jumps to conclusions...)

	/*
	 * Replace the following initialization of i with some non-null value and the bug disappears
	 * 
	 * static spi gspi;
	 * sp(void* p, const tag&) : p(p), i(&gspi) {}
	 * 
	 */
	sp(void* p, const tag&) : p(p), i(nullptr) {}

	~sp() { if (i) i->callme(); }

	void swap(sp& o)
	{
		std::swap(p, o.p);
		std::swap(i, o.i);
	}
};

// some (incomplete) struct
struct inc;

void somefunc(void *, void (inc::*)()) {}

class fn
{
private:
	void (inc::*mf)(); 
	void (*minv)(void *, void (inc::*)());

	sp s;

	int type;

public:
	#pragma FUNC_CANNOT_INLINE
	fn(void (inc::*f)(), inc* o)
		: type((f && o) ? 3 : 0) // condition needs to be present to trigger bug
	{
		if(f && o) // condition must be present to trigger bug (because otherwise construction of s is optimized)
		{
			// at least minv must be present and written to to trigger the bug
			mf = f;
			minv = &somefunc;
			sp(o, tag{}).swap(s);
		}
	}
};

fn buildfn(void (inc::*f)(), inc *o)
{
	return fn(f, o);
}

The fn constructor shoud default-initialize the sp member, which in turn sets both sp members to nullptr. If both arguments are valid, the sp member is then swapped with another object that has the first pointer set to some non-null value while the second pointer is still null (sp is a stripped down shared_ptr implementation). The fact that the second pointer is still null seems to be relevant.

The temporary used for swapping is the destroyed. Note that it should have both pointers set to zero, so its destructor should do nothing (the compiler doesn't see that (GCC and Clang do) but that's another story). The compiler emits code to load s.i and then call callme() on that object (if non-null):

...
[ A0]   LDW     .D2T1   *B6(4),A0         ; [B_D64P] |4459| 
...
MV      .L1     A0,A4             ; [A_L64P] |28| 
...
[ A0]   CALL    .S1     _ZN3spi6callmeEv  ; [A_S64P] |28|

However, it doesn't bother to initialize B6(4) with zero before the load but does so only after the load:

ADD     .L2X    12,A4,B6          ; [B_L64P] |16| 
...
ZERO    .S2     B8                ; [B_Sb64P] |55| 
ZERO    .D2     B9                ; [B_D64P] |55| 
...
[ A0]   LDW     .D2T1   *B6(4),A0         ; [B_D64P] |4459| 
...
STNDW   .D1T2   B9:B8,*A10(12)    ; [A_D64P] |16| 
...
MV      .L1     A0,A4             ; [A_L64P] |28| 
...
[ A0]   CALL    .S1     _ZN3spi6callmeEv  ; [A_S64P] |28|

The STNDW instruction is used to set both sp members to 0 (A10(12) is the address of the first pointer, storing a double word also overwrites B6(4), which aliases that range).
I am not sure what exactly triggers this behavior but it appears to be a rather severe bug. We decided to revert to 8.3.x for now, as we can work around this specific problem (by tweaking this code and inspecting the output) but without understanding what causes the issue we do not trust the rest of the binary either.

I'd be grateful if anyone could have a closer look at this issue.

Regards

Markus Moll

22 days ago

0 George Mock 22 days ago

TI__Guru**** 250490 points

Thank you for the test case. I generate code that is similar, but not identical, to what you show. To be certain I see the same thing for the same reason, I need to generate identical code. The problem is I guessed at the build options. Please show the build options you use, exactly as the compiler sees them. Please copy and paste the text of the options, and do not use a screenshot.

Thanks and regards,

-George

0 Markus Moll 22 days ago in reply to George Mock

Expert 1830 points

I forgot the opt_for_speed part, I guess. Anyway, here's my complete build command:

cl6x.exe -mv64+ -O2 -mf5 -c -I C:\ti\ti-cgt-c6000_8.5.0.LTS\include -k test.cpp

I used -k to directly output assembly, but the result is the same (minus some mnemonics) if I generate a .obj file and disassemble it.

The complete disassembled output (i.e. using only -c and then dis6x on the output) for the fn constructor is:

TEXT Section .text:_ZN2fnC1EM3incFvvEPS0_ (Little Endian), 0xc0 bytes at 0x00000000 
00000000            _ZN2fnC1EM3incFvvEPS0_:
00000000       05a6           MVK.L1        0,A3
00000002       0292 ||        MVK.S1        0,A5
00000004       0356 ||        MV.D1         A6,A0
00000006       0227 ||        CMPEQ.L2      0,B4,B0
00000008   0880002b ||        MVK.S2        0x0000,B17
0000000c       1b77 ||        MVK.D2        0,B6
0000000e       29ee    [ A0]  MVK.S1        1,A3
00000010   00901a59 ||        CMPEQ.L1X     0,B4,A1
00000014       2b67 || [ A0]  MVK.L2        1,B6
00000016       25f7 ||        STW.D2T1      A11,*B15--[2]
00000018   388000ab || [!B0]  MVK.S2        0x0001,B17
0000001c   e5600c8f           .fphead       n, l, W, BU, nobr, nosat, 0101011b
00000020   03806040 ||        MVK.D1        3,A7
00000024   9284a359    [!A1]  MVK.L1        1,A5
00000028   0044c7e3 ||        AND.S2        B6,B17,B0
0000002c   0311905b ||        ADD.L2X       12,A4,B6
00000030   053c22f5 ||        STW.D2T1      A10,*+B15[1]
00000034       464e ||        MV.S1         A4,A10
00000036       75d6 ||        MV.D1X        B3,A11
00000038       6688           AND.L1        A3,A5,A0
0000003a       f347 ||        MV.L2X        A6,B7
0000003c   ec003400           .fphead       n, l, W, BU, nobr, nosat, 1100000b
00000040   23a8a275 || [ B0]  STW.D1T1      A7,*+A10[5]
00000044   0400002b ||        MVK.S2        0x0000,B8
00000048   04800043 ||        MVK.D2        0,B9
0000004c       0312 ||        MVK.S1        0,A6
0000004e       2cba    [!A0]  BNOP.S1       $C$RL2 (PC+100 = 0x000000a4),1
00000050   c01822e4 || [ A0]  LDW.D2T1      *+B6[1],A0
00000054   3328a274    [!B0]  STW.D1T1      A6,*+A10[5]
00000058   0800002a           MVK.S2        0x0000,B16
0000005c   e1008080           .fphead       n, l, W, BU, br, nosat, 0001000b
00000060   0800006a           MVKH.S2       0x0000,B16
00000064   04298376           STNDW.D1T2    B9:B8,*+A10(12)
00000068   d0000b11    [!A0]  B.S1          $C$L3 (PC+88 = 0x000000b8)
0000006c   d1ac1fdb || [!A0]  MV.L2X        A11,B3
00000070       8046 ||        MV.L1         A0,A4
00000072       83cf ||        MV.S2         B7,B4
00000074   02280276 ||        STW.D1T2      B4,*+A10[0]
00000078   c0000011    [ A0]  B.S1          0x000060
0000007c   e2000300           .fphead       n, l, W, BU, nobr, nosat, 0010000b
00000080   d2280fd9 || [!A0]  MV.L1         A10,A4
00000084   02a82276 ||        STW.D1T2      B5,*+A10[1]
00000088   d00c0363    [!A0]  B.S2          B3
0000008c   08284277 ||        STW.D1T2      B16,*+A10[2]
00000090   d53c22e4 || [!A0]  LDW.D2T1      *+B15[1],A10
00000094   d5bc52e4    [!A0]  LDW.D2T1      *++B15[2],A11
00000098       a407           MV.L2         B8,B5
0000009a       1155           STNDW.D2T2    B5:B4,*B6(0)
0000009c   e8040000           .fphead       n, l, DW/NDW, W, nobr, nosat, 1000000b
000000a0   01810162           ADDKPC.S2     $C$RL2 (PC+4 = 0x000000a4),B3,0
000000a4            $C$RL2:
000000a4            $C$L2:
000000a4   01ac1fdb           MV.L2X        A11,B3
000000a8   02280fd9 ||        MV.L1         A10,A4
000000ac   053c22e4 ||        LDW.D2T1      *+B15[1],A10
000000b0   008c6363           BNOP.S2       B3,3
000000b4   05bc52e4 ||        LDW.D2T1      *++B15[2],A11
000000b8            $C$L3:
000000b8   00002000           NOP           2
000000bc   00000000           NOP

0 Markus Moll 18 days ago in reply to George Mock

Expert 1830 points

Hi George,

do you need any more information? Please let me know if there is anything I can do to assist you.

Regards

Markus

+1 George Mock 18 days ago in reply to Markus Moll

TI__Guru**** 250490 points

I apologize for the delay.

Thank you for the detailed test case. I am able to reproduce the behavior. I'm sure this test case was harder to create than most. I appreciate all of your effort.

I filed the issue EXT_EP-12967 to have this investigated. You are welcome to follow it with that link. Currently the headline and description are vague. Once this bug is characterized, that information will be updated to be more precise and specific.

Thanks and regards,

-George

0 Markus Moll 17 days ago in reply to George Mock

Expert 1830 points

Thanks George,

I just wanted to make sure this issue doesn't get lost and that you weren't waiting for me for some reason.

Actually, while the test case might look complicated, it was easier to create than most others as I simply had no clue what was going on and had to stop minimizing it early ;)

Processors

Processors forum

C6000-CGT: CGT 8.5.0 erroneously schedules load before store