Compiler: CGT8 optimizations for K2 and Nyquist

Risto Alasaarela1

Tool/software: TI C/C++ Compiler

Hello!

We have observed a big increase of code size when moving from using CGT 7.3.23 to CGT 8.3.4. In both compilations, we are using the same compiler flags:

--mem_model:data=far -pdse9 -pdse48 -pdse190 -pdse225 -pdse262 -pdse849 -pdse994 -mi1000 -mv6600 -mo --strip_coff_underscore --disable_push_pop -o3 -ms0

One reason why binary's size is increased is that compiler adds unnecessary NOPs to the code, to align fetch packet boundaries.

And that is visible when it is using so called compact instructions, i.e.16-bit op-codes

CGT 8.3.x

840006f0 10102413 CALLP.S2 $Tramp$S$$AaMemCheckTag (PC+33056 = 0x84008800),B3

840006f4 0c6e || NOP 1

840006f6 0c6e || NOP 1

840006f8 0c6e || NOP 1

840006fa 0c6e || NOP 1

840006fc ec401c0c .fphead n, l, W, BU, nobr, nosat, 1100010b

84000700 $C$RL204:

84000700 2226 CMPEQ.L1 1,A4,A0

84000702 3a76 || MVK.D1 1,A4

84000704 0473902b || MVK.S2 0xffffe720,B8

84000708 04737c29 || MVK.S1 0xffffe6f8,A8

8400070c 0727 || MVK.L2 0,B6

CGT 7.3.x

821f616c 100c1612 CALLP.S2 $Tramp$S$$AaMemCheckTag (PC+24752 = 0x821fc210),B3

821f6170 2226 CMPEQ.L1 1,A4,A0

821f6172 48aa [ A0] BNOP.S1 $C$L63 (PC+68 = 0x821f61a4),2

821f6174 0446aca9 MVK.S1 0xffff8d59,A8

821f6178 020017aa || MVK.S2 0x002f,B4

821f617c e2208003 .fphead n, l, W, BU, br, nosat, 0010001b

And an another problem which is visible, which also adds unnecessary NOPs, which is NOT visible with CGT 7.3.x compilations either:

1081ec50 0fff6410 B.S1 odo_send_helper (PC-1248 = 0x1081e760) -> here we call "odo_send_helper" and the actual "return" is done inside odo_send_helper
1081ec54 71f7 LDW.D2T2 *++B15[2],B3
1081ec56 8047 MV.L2 B0,B4
1081ec58 1313 MVK.S2 16,B6
1081ec5a 16c6 MV.L1X B5,A8
1081ec5c ec000000 .fphead n, l, W, BU, nobr, nosat, 1100000b
1081ec60 00000000 NOP

And the compiler generates an extra return symbol + NOPs, which is never used!!

1081ec64 $C$RL24:
1081ec64 00000000 NOP
1081ec68 00000000 NOP
1081ec6c 00000000 NOP
1081ec70 00000000 NOP
1081ec74 00000000 NOP
1081ec78 00000000 NOP
1081ec7c 00000000 NOP
1081ec80 send:
1081ec80 .text:send:
1081ec80 01ab0228 MVK.S1 0x5604,A3
1081ec84 018843e8 MVKH.S1 0x10870000,A3
1081ec88 018c0264 LDW.D1T1 *+A3[0],A3
1081ec8c d246 MV.L1X B4,A6
1081ec8e cf27 MVK.L2 14,B6
1081ec90 0400a358 MVK.L1 0,A8
1081ec94 0fff5c10 B.S1 odo_send_helper (PC-1312 = 0x1081e760)
1081ec98 020c4266 LDW.D1T2 *+A3[2],B4
1081ec9c e1000000 .fphead n, l, W, BU, nobr, nosat, 0001000b
1081eca0 00006000 NOP 4
1081eca4 $C$RL26:
1081eca4 00000000 NOP
1081eca8 00000000 NOP
1081ecac 00000000 NOP
1081ecb0 00000000 NOP
1081ecb4 00000000 NOP
1081ecb8 00000000 NOP
1081ecbc 00000000 NOP
1081ecc0 odo_send_w_s_safe:

Would you be able to explain why the new compiler generates such unoptimal code?

Second problem is that we are seeing unnecessary symbols (visible in the code but used nowhere) in DWARF structure:

0x00000000 0x00000004 poolIdPrivate
0x00000000 0x00000004 DSP2ARMSender1_
0x00000000 0x00000004 DSP2ARMReceiver1_
0x00000000 0x00000019 $P$T1$2
0x00000000 0x00000004 ret_addr
0x00000000 0x00000004 bufferSize
0x00000000 0x00000004 bufferSize
0x00000000 0x00000004 CHIPDSP_MASK
0x00000000 0x00000800 fftcHostDesc
0x00000000 0x00000004 TEST_LENGTH_IN_SECS
0x00000000 0x00000004 HwSemProcess5_
0x00000000 0x00000004 TBTS_TEST_COMPLETE_IND_MSG
0x00000000 0x00000018 gEthernetLoopbackStatsLastPeriod
0x00000000 0x00000008 IpAddr
0x00000000 0x00000400 gEthFrameBuffer
0x00000000 0x00000002 fragmentIdentification

Br,

Risto Alasaarela

over 4 years ago

0 George Mock over 4 years ago

TI__Guru**** 236405 points

The compiler option -ms0 (the long form equivalent is --opt_for_space=0) says you prefer to optimize for speed over size. I suspect that has something to do with it.

For each problem that causes unnecessary NOP instructions, I presume you can identify one source file with that problem. For each such source file, please follow the directions in the article How to Submit a Compiler Test Case. In case it is not obvious which function contains the problem, please add the comment // PROBLEM FUNCTION.

Regarding ...

Risto Alasaarela1 said:
Second problem is that we are seeing unnecessary symbols (visible in the code but used nowhere) in DWARF structure

I presume by comparing the Dwarf output of the two source files you submit, that I will see the difference. What problem is caused by these extra symbols?

Thanks and regards,

-George

0 Risto Alasaarela1 over 4 years ago in reply to George Mock

Intellectual 450 points

Hello!

Just to emphasize, that we are seeing tens of kBs more NOPs in the targets compiled with CGT8 than with targets compiled with CGT7.3.23.

Unfortunately I cannot provide my source files due to the fact that it would reveal Nokia IP. However, we have found mentioned problems in RTS compilation too:

[ralasaar@ouling36 lib]$ ./../bin/dis6x catrigf.c.obj |grep "RL50" -A 16 | head -n 16
00000090 01888163 ADDKPC.S2 $C$RL50 (PC+32 = 0x000000a0),B3,4
00000094 0c6e || NOP 1
00000096 0c6e || NOP 1
00000098 0c6e || NOP 1
0000009a 0c6e || NOP 1
0000009c ec201c00 .fphead n, l, W, BU, nobr, nosat, 1100001b
000000a0 $C$RL50:
000000a0 02341fdb MV.L2X A13,B4
000000a4 05100fd9 || MV.L1 A4,A10
000000a8 10000013 || CALLP.S2 $C$RL50 (PC+0 = 0x000000a0),B3
000000ac 023006a0 || MV.S1 A12,A4
000000b0 $C$RL52:
000000b0 10000013 CALLP.S2 $C$RL50 (PC+0 = 0x000000a0),B3
000000b4 02101fdb || MV.L2X A4,B4
000000b8 02280fd8 || MV.L1 A10,A4
000000bc $C$RL54:

And another type:

[ralasaar@ouling36 lib]$ ./../bin/dis6x algorithm.cpp.obj |grep 000000bc -A 10
000000bc c8180344 [ A0] STDW.D1T1 A17:A16,*+A6[0]
000000c0 $C$L2:
000000c0 008c8363 BNOP.S2 B3,4
000000c4 020c0fd8 || MV.L1 A3,A4
000000c8 $C$L3:
000000c8 00000000 NOP
000000cc 00000000 NOP
000000d0 00000000 NOP
000000d4 00000000 NOP
000000d8 00000000 NOP
000000dc 00000000 NOP
--

Then for the second problem:

Unnecessary DWARF symbols are not fatal problem for us, but they are somehow interfering our internal tool analysing the compiler output. We would just like to get some kind of explanation for the phenomenon...

Br,

Risto

0 George Mock over 4 years ago in reply to Risto Alasaarela1

TI__Guru**** 236405 points

I tried a similar experiment on a program I have. I see about 16% more single cycle NOP instructions when building with version 8.3.4 than when building with version 7.3.23. I filed the entry CODEGEN-6929 to have this investigated. You are welcome to follow it with the SDOWP link below in my signature.

Thanks and regards,

-George

0 Archaeologist over 4 years ago in reply to Risto Alasaarela1

TI__Guru* 84225 points

Please add the option --no_compress, rebuild your project for both versions, and compare the code size. If they are about the same, then what we're looking at is a deficiency in opcode compression. If they are significantly different, I think the problem lies elsewhere.

0 Risto Alasaarela1 over 4 years ago in reply to George Mock

Intellectual 450 points

I cannot find any plan in the referred CODEGEN ticket for providing improvement for my finding. Are you able to estimate when a new version for CGT8.4 would be available?

Br,

Risto

0 George Mock over 4 years ago in reply to Risto Alasaarela1

TI__Guru**** 236405 points

At this time, there are no plans for a future release of the C6000 compiler. That being the case, the point of this investigation is to find the root cause, and recommend a workaround.

Thanks and regards,

-George

0 Risto Alasaarela1 over 4 years ago in reply to George Mock

Intellectual 450 points

Hello!

Would it be possible for you to provide plan for the WA? Or at least update the status more actively? This finding starts to be a blocker for Nokia to take this new toolset in use.

Br,

Risto

0 George Mock over 4 years ago in reply to Risto Alasaarela1

TI__Guru**** 236405 points

If we determine that the only way to solve your problem is by issuing a compiler release, then we'll discuss it. However, we are not at that point yet.

What about the experiment with --no_compress requested by Archaeologist. What happened?

Risto Alasaarela1 said:
Would it be possible for you to provide plan for the WA?

Sorry, but what does WA stand for?

Thanks and regards,

-George

0 Risto Alasaarela1 over 4 years ago in reply to George Mock

Intellectual 450 points

Hello!

I can confirm that the generated code sizes between CGT7.3.23 and CGT8.3.4 are quite much different when using --no_compress option in the compilation. CGT8 code size is bigger.

WA = workaround

Br,

Risto

0 George Mock over 4 years ago in reply to Risto Alasaarela1

TI__Guru**** 236405 points

Then it is clear your code is quite different from the substitute test case I submitted with CODEGEN-6929. Building it shows different results.

Rather than focus on the difference in NOP's, it is better to focus on understanding the reason for the overall code size difference. The only way to pursue that is with a test case from you. To avoid you sending me the entire project, I need you to do a bit of work to identify one file to send.

Please use the technique described in the article Find Source of Code Size Increase to determine which functions increased in size the most. For one source file that contains some of those functions, please follow the directions in the article How to Submit a Compiler Test Case. Especially note the part about protecting intellectual property.

Thanks and regards,

-George

0 George Mock over 4 years ago

TI__Guru**** 236405 points

In this post, I explain the reasons for many of the NOP instructions you see. I hope it will convince you the cause of the code size increase must lie elsewhere.

The reason for these NOP instructions ...

Risto Alasaarela1 said:

840006f0 10102413 CALLP.S2 $Tramp$S$$AaMemCheckTag (PC+33056 = 0x84008800),B3

840006f4 0c6e || NOP 1

840006f6 0c6e || NOP 1

840006f8 0c6e || NOP 1

840006fa 0c6e || NOP 1

840006fc ec401c0c .fphead n, l, W, BU, nobr, nosat, 1100010b

84000700 $C$RL204:

and for these NOP instructions ...

Risto Alasaarela1 said:

And the compiler generates an extra return symbol + NOPs, which is never used!!

1081ec64 $C$RL24:
1081ec64 00000000 NOP
1081ec68 00000000 NOP
1081ec6c 00000000 NOP
1081ec70 00000000 NOP
1081ec74 00000000 NOP
1081ec78 00000000 NOP
1081ec7c 00000000 NOP
1081ec80 send:
1081ec80 .text:send:

... is related.

C6000 instructions are organized into execute packets and fetch packets. An execute packet is a set of 1-8 instructions that are all in parallel. A fetch packet is a group of 8 instructions, on a 32-byte boundary, that are fetched for execution all at once. An execute packet that is the target of a branch may not span a fetch packet boundary (mostly). For the details, search the C6600 CPU manual for the section titled Execute Packet Restrictions.

The assembler enforces this restriction. A partial explanation of how this works is in appendix section A.10 of the application note Advanced Linker Techniques for Convenient and Efficient Memory Usage. That explains the NOP instructions, not in parallel, contained in the second example I quote above. They fill out that text subsection to a multiple of 32-bytes.

The NOP 1 instructions in the first example I quote, all in parallel, are added by the assembler to cause the next execute packet, which always has a label, to start on a fetch packet boundary. Another way to see this is to search the disassembly for other instances of .fphead. The .fphead directives that are not followed by a label do not have the extra NOP 1 instructions. The .fphead directives that are followed by a label often have the extra NOP 1 instructions.

The C6000 compiler tools have behaved this way from the beginning. In particular, the version 7.3.x tools and the 8.3.x tools are no different regarding this detail. Thus, it is unlikely this is the cause of the increase in code size.

One experiment to consider ... As the appendix in the linker application note explains, using the compiler feature to put functions in subsections may cause a code size increase. The compiler option is --gen_func_subsections, or -mo for short. You build with this option enabled. Consider turning it off. While I am skeptical it will solve the problem, it can't hurt to try.

That being the case, I renew the request for a single file test case I made in the post dated Dec 13. I continue to think that is the best way forward.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

Compiler: CGT8 optimizations for K2 and Nyquist