This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: TI C/C++ Compiler
Hello,
We found very well hidden compiler's bug during testing CGT 8.3.4. Problem manifested on startup, when C++ constructors are called, so it was very hard to detect root cause and find temporary WA, but we managed to prepare local reproduction basing on ASM listings. I'm attaching sample code with class/struct hierarchy required to reproduce issue:
#define SIZE_OF_BIG_STRUCT 10200 #define NUM_OF_BIG_MEMBERS 4 typedef unsigned char u8; typedef unsigned int u32; typedef u8 TUsedSizeType_t; typedef enum { EContext1 = 0, EContext2, EContext3, EMaxContexts } EContextTypes_t; struct SimpleStructWithConstructor { SimpleStructWithConstructor(u32 arg = 9) { y = arg; } u32 y; }; template <typename SIZE_TYPE> struct CommonContextPart { SIZE_TYPE own; SIZE_TYPE nextFreeIndex; u8 group : 7; u8 inUse : 1; }; template <u32 ContextId, typename s_t> struct Context_t { }; template <typename size_type> struct Context_t <(u32)EContext1, size_type> : public CommonContextPart<size_type> { u32 bigArrayToSimulateBigContent[SIZE_OF_BIG_STRUCT]; SimpleStructWithConstructor m_simpleStructWithConstructor; }; typedef Context_t<(u32)EContext1, TUsedSizeType_t > BigStruct; struct SmallStruct { void init() { x = 3; } u32 x; }; struct TestClassCtx { TestClassCtx(); BigStruct m_bigMember[NUM_OF_BIG_MEMBERS]; SmallStruct m_smallMember; }; TestClassCtx::TestClassCtx() : m_bigMember() { m_smallMember.init(); } int main(void) { TestClassCtx testVar; }
In ASM file, please don't focus on "main" function, as it isn't problematic one (in original code we have pools, where placement new is called, so stack usage in "main" is not problem in our software). Instead, we are interested in compiler's generated auxiliary function $P$F0. It isn't affected with CGT7, but with CGT8 stack consumption is horrible huge. Our ASM experts translated affected code into C++ language as something like:
void constructor(int8_t *ptr) { int8_t tempSpace[40808]; memset(tempSpace, 0, sizeof(tempSpace)); memcpy(ptr, tempSpace, sizeof(tempSpace); }
We prepared some findings regarding sample code:
Fix for this bug is extremely important for us, because we are not able to detect such type of problem easily till crash occurrs. We would like to receive correction with RSA intrinsics at the latest.
Best Regards,
ZD
Thank you for reporting the problem, and supplying a test case. I am able to reproduce the problem. I'm sure this particular test case was quite difficult to develop. I appreciate all the effort.
I filed the entry EXT_EP-9683 to have this problem investigated. You are welcome to follow it with the link below in my signature.
Thanks and regards,
-George
As a workaround, add an explicit constructor to object Context_t which calls memset to set the large array to zero instead of relying on default initializations. This will prevent the compiler from creating the local temporary.
template <typename size_type>
struct Context_t <(u32)EContext1, size_type> : public CommonContextPart<size_type>
{
u32 bigArrayToSimulateBigContent[SIZE_OF_BIG_STRUCT];
SimpleStructWithConstructor m_simpleStructWithConstructor;
Context_t() { memset(bigArrayToSimulateBigContent, 0, sizeof bigArrayToSimulateBigContent); }
};
Yes, we're aware that explicit constructor solves problem (it's slighlty more difficult in our production code due to number of different members, but it is still doable). However, the main problem here is different. We can use workaround, when we know, that we're suffering from this issue. But if crash on startup occurs, then we have to spend a lot of time on debugging and stack overflow due to incorrectly generated constructor isn't the first obvious point to check. We can verify this by scanning asm/lst files generated by compiler, but it's still requires awarness about fact, that generated constructor is root cause. That's the reason, why this issue is blocking for us and we would like to wait for correction.
I noted your additional difficulties in the entry I filed for this bug.
Thanks and regards,
-George
Would the call_graph utility from cg_xml help to check for the functions suffering from excessive stack size?Zbigniew Duszeńczuk said:We can verify this by scanning asm/lst files generated by compiler, but it's still requires awarness about fact, that generated constructor is root cause.
See Finding out static stack usage for some information.
Well, the initial result isn't satisfying (both XMLs generated using tutorial from link you suggested to visit):
C:\test>"C:\Program Files (x86)\ti\cgxml-2.61.00\bin\call_graph.exe" --stack_max cgraph1_1.xml
Out of memory!
C:\test>"C:\Program Files (x86)\ti\cgxml-2.61.00\bin\call_graph.exe" --stack_max cgraph1_1_2.xml
Out of memory!
C:\test>
I try Linux version later...
Those XML files must be very large. There is a way to make them much smaller, and then call_graph is likely to work.
To see the documentation for the cg_xml package, load the file cg_xml_install_root/index.hm into a web browser. Click on the entry for call_graph. Find this information ...
OFD OPTIONS
Recent releases of OFD support options for filtering the XML output down
to what is strictly of interest. When processing a .out file, the best
options to use in combination with this script are:
-xg --xml_indent=0 --obj_display=none,header,optheader,symbols,battrs --dwarf_display=none,dinfo
When processing a library, symbols are not needed, so the best options
to use are:
-xg --xml_indent=0 --obj_display=none,header,optheader,battrs --dwarf_display=none,dinfo
Filtering the XML in this way reduces the amount of data processed by
this script, thus making it run faster.
Please try using those options when using ofd6x to create the XML file.
Thanks and regards,
-George
I tried running call_graph under both Windows 10 and Unbuntu on the example program in your original post and couldn't repeat the crash.Zbigniew Duszeńczuk said:C:\test>"C:\Program Files (x86)\ti\cgxml-2.61.00\bin\call_graph.exe" --stack_max cgraph1_1.xml
Out of memory!
The output from call_graph, without the --stack_max option, was:
$ ~/ti/ccs930/ccs/tools/compiler/ti-cgt-c6000_8.3.5/bin/ofd6x -xg ~/workspace_v9/C6000_constructor_issue/Debug/C6000_constructor_issue.out | /home/mr_halfword/ti/ti-processor-sdk-rtos-am335x-evm-05.01.00.11/cg_xml/bin/call_graph Reading from stdin ... Call Graph for /home/mr_halfword/workspace_v9/C6000_constructor_issue/Debug/C6000_constructor_issue.out ********************************************************************** _c_int00 : wcs = 163352 | _args_main : wcs = 163352 | | main : wcs = 163344 | | | _ZN12TestClassCtxC1Ev : wcs = 96 | | | | _ZN11SmallStruct4initEv : wcs = 8 | | | | __cxa_vec_ctor : wcs = 88 | | | | | _Znaj : wcs = 48 | | | | | | _Znwj : wcs = 40 | | | | | | | malloc : wcs = 32 | | | | | | | | minsert : wcs = 0 | | | | | | | | mremove : wcs = 0 | _auto_init_elf : wcs = ??? | _system_pre_init : wcs = 0 | exit : wcs = 8 | | abort : wcs = 0 The roots of the following graphs are functions that: - Are never called ... OR ... - Are called indirectly and are not listed among the functions called indirectly in the configuration file specified with --i_cfg=file. Run "perldoc call_graph.pl" for more information. ====================================================================== $P$F0 : wcs = 40832 | _ZN9Context_tILj0EhEC1Ev : wcs = 16 | | _ZN27SimpleStructWithConstructorC1Ej : wcs = 8 | ( __c6xabi_strasgi_64plus __strasgi_64plus ) : wcs = 0 | memset : wcs = 0 __TI_auto_init_nobinit_nopinit : wcs = 56 | __TI_tls_init : wcs = 32 | _system_post_cinit : wcs = 0 __TI_decompress_none : wcs = 8 | memcpy : wcs = 0 __TI_decompress_rle24 : wcs = 40 | __TI_decompress_rle_core : wcs = 32 | | memset : wcs = 0 _nop : wcs = 0 The following functions are known to contain indirect function calls, but do not contain any information about those indirect calls in the configuration file specified with --i_cfg=file. Run "perldoc call_graph.pl" for more information. ====================================================================== _Znwj __TI_auto_init_nobinit_nopinit __TI_tls_init __cxa_vec_ctor exit malloc
The stack usage for the v8.3.5 compiler generated $P$F0 function is 40832. The issue is that $P$F0 is an indirect function, for which without help call_graph can't tell where it is called from. As a result the stack usage for $P$F0 isn't counted towards the worse case stack usage reported by the --stack_max option, unless the indirect calls are specified in a configuration file.
From a static analysis of the static program, by using dis6x, I wasn't able to tell where $P$F0 was called from. By running the example program in the debugger and setting a breakpoint on $P$F0 was able to determine it was called from the following call stack via a function pointer:
$P$F0() [/home/mr_halfword/workspace_v9/C6000_constructor_issue/Debug/C6000_constructor_issue.out] at 0x800804C0 __cxa_vec_ctor(void *)() at vec_newdel.c:659 0x80080270 TestClassCtx::TestClassCtx() at main.cpp:66 0x8008056E main() at main.cpp:73 0x80080534 _c_int00() at boot.c:142 0x80080ACC (the entry point was reached)
I.e. if call_graph is to be used to check for problematic functions, rather than just look at the output from the --stack_max option should look at the raw output and check for any indirect functions.
Hello all,
Chester Gillon said:I tried running call_graph under both Windows 10 and Unbuntu on the example program in your original post and couldn't repeat the crash.
I wasn't precise. I tried to verify our commercial product and XML files from there are very big. I think such output from our product out files could help us to deal with issue till final correction appears (your findings from call_graph results are exactly the same, as ours).
George Mock I already used commands from tutorial. I repeated steps with additional tips from your post. Here are sizes of XML files generated by ofd6x:
Only last file doesn't trigger crash (verified on Windows 10). However, I don't see many of functions from our component here (including affected code), so I assume the content is cut too much or some additional files are missing (like cfg mentioned in tutorial).
Best Regards,
ZD
One update - I tried call_graph on Linux and it works OK for big files. However, I don't see affected function on call graphs and also "--stack_max" option shows 3104 only. On the other hand, results from call_graph seem to be much smaller than content of input XML, so additional suggestions from TI are welcome.
EDIT:
In affected code from our product, we have static variable need to be initialized from function, which calls placement new to trigger constructor. Affected call graph should include function calls, which uses function pointers from ".init_array" section. I see function, which uses it in both input xml and call graph result, but WCS is extremely low there. I assume, this is indirect call, so analysis will be harder for this case.
I recommend you not worry too much about getting call_graph to construct a completely connected graph. That would be nice, but I don't think it is necessary in this case. You want to find any function that uses a large amount of stack. call_graph does not directly tell you that. It tells you how much stack is used by a function, plus the amount of stack used by all the functions it calls. However, you can infer it. It would be a computation that looks like ...
<wcs of current function> - <largest wcs from any function called by current function>
Perhaps it makes sense to write a script that computes that from the output of call_graph. Or, maybe it makes more sense to modify the Perl code for call_graph and have it print it out. Or, some other approach along these lines.
Thanks and regards,
-George
Hello after longer break,
We verified, how this tool could help us in our daily work. Results are promising, but we have also some comments to share with you.
First of all, we still suffer with incomplete call graphs content for - let's say - regular out files. This looks, like I described before, so to remind: we see only some low level calls (I mean our base subsystem) and no calls from "higher-level" components. Due to this, we can't use the method suggested in previous post. However, we managed to get full call graph, when we compiled our project with "-g" option, so this fact is in contrast with description from tutorial (I mean this: ).
Analysis of this output show, that information from call graph is very valuable. We were able to detect easily extremely huge stack usage discussed here and got all function calls leading to this. To have more opinions, we had an opportunity to analyse stack overflow issue from different system component, which has started to manifest recently. Root cause was unknown, because crash was visible only, when unrelated change in code was introduced. Finally, call graph showed, which callpath caused problem and what should we fix (how much stack size could be increased or which part of code requires optimization). So our final opinion is, that this tool can be helpful. Thank you for your suggestions.
When it comes to our comments, we would like to report one strange issue we met in second case. When we tried to use "optimized" version of command for ofd6x (parameters like "-xg --xml_indent=0 --obj_display=none,header,optheader,symbols,battrs --dwarf_display=none,dinfo"), we suffered from segmentation fault issue and we were unable to get xml and call graph. Processed out file was big, approximately 390MB. However, we decided to process it with "-xg" parameters only and after long time (I'm not sure, but it was approximately half an hour) we received xml file of crazy size: over 16.5 GB! This xml file was processed by call graph after 2-3 hours, but output was valuable (all calls were visible). Unfortunately, I'm not allowed to deliver you more details like used out file etc. If it would be helpful, used cgxml tools version is cgxml-2.61.00 (Linux version).
Since increased stack issue can be detected during static analysis now, we are able to do some workarounds to deal with this issue. It's still very important for us to get final correction (and we expect, it will be published with next compiler release), but we're not blocked with further activity now.
BR,
ZD
Zbigniew Duszeńczuk said:When we tried to use "optimized" version of command for ofd6x (parameters like "-xg --xml_indent=0 --obj_display=none,header,optheader,symbols,battrs --dwarf_display=none,dinfo"), we suffered from segmentation fault issue and we were unable to get xml and call graph.
What crashed? ofd6x? Or the call_graph utility?
Zbigniew Duszeńczuk said:we decided to process it with "-xg" parameters only and after long time (I'm not sure, but it was approximately half an hour) we received xml file of crazy size: over 16.5 GB!
At least add --xml_indent=0. Then there is no space at the start of each line. That will reduce the file size by a few gigs. And it won't have any effect on the output.
Zbigniew Duszeńczuk said:This xml file was processed by call graph after 2-3 hours, but output was valuable (all calls were visible).
I do not understand how that happened. I'll try to reproduce it myself. Though I'm skeptical I'll see it.
Thanks and regards,
-George
Hello,
ofd6x crashed. Here are some details (but I'm affraid, that without debug symbols in ofd6x it isn't so helpful). Some file paths were changed to prevent publishing them (marked yellow to avoid misunderstandings).
[XXX]$ ulimit -c 2000000 && ofd6x -xg --xml_indent=0 --obj_display=none,header,optheader,symbols,battrs --dwarf_display=none,dinfo ./usedOut.out > testData.xml
Segmentation fault (core dumped)
[XXX]$ du -sh core.41712
1.5G core.41712
[XXX]$ gdb ofd6x core.41712
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <gnu.org/.../gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<www.gnu.org/.../>...
Reading symbols from ofd6x...(no debugging symbols found)...done.
[New Thread 41712]
Core was generated by `ofd6x -xg --xml_indent=0 --obj_display=no'.
Program terminated with signal 11, Segmentation fault.
#0 0x08080b63 in DWE::DW_INFO_DIE::save_xml(XMLGEN&) ()
(gdb) bt
#0 0x08080b63 in DWE::DW_INFO_DIE::save_xml(XMLGEN&) ()
#1 0x08080cdc in DWE::DW_INFO_DIE::save_xml(XMLGEN&) ()
#2 0x0808157f in DWE::DW_INFO_CU::save_xml(XMLGEN&) ()
#3 0x080816a8 in DWE::DW_INFO::save_xml(XMLGEN&) ()
#4 0x080671ac in DWE::DWARF_EDITOR::save_xml(XMLGEN&) ()
#5 0x080fd67d in ?? ()
#6 0x080fdf5a in ?? ()
#7 0x080fea6f in main ()
(gdb)
Best Regards,
ZD
Thank you for that core dump information. However, we cannot make immediate use of it.
Did adding --xml_indent=0 make a useful difference?
Regarding these options ...
--obj_display=none,header,optheader,symbols,battrs
This means to disable the output of all information about the object file, except the file header, optional file header, the symbol table, and the build attributes. The script call_graph uses all of that information, but not any of the rest. You could try reversing the sense of this option. For instance ...
--obj_display=norelocs,nostrings
This disables relocation entries and the string table. For more detail on how this works, run this command ...
% ofd6x --obj_display=help TMS320C6x Object File Display v8.3.4 Tools Copyright (c) 1996-2018 Texas Instruments Incorporated The --obj_display option controls display filter settings by specifying a comma-delimited list of display attributes. When prefixed with the word "no", an attribute is disabled instead of enabled. The following attributes are available: battrs: build attributes (default: on) dynamic: ELF .dynamic section (default: on) groups: ELF groups (default: on) header: file header (default: on) lnnos: COFF line number entries (default: on) optheader: COFF optional file header (default: on) rawdata: section raw data (default: off) relocs: relocation entries (default: on) sections: sections (default: on) segments: ELF segments (default: on) strings: string tables (default: on) symbols: symbols (default: on) symhash: ELF symbol hash table (default: on) symver: Symbol version information (default: on) all: enables all attributes none: disables all attributes Examples: --obj_display=battrs,nodynamic --obj_display=all,nobattrs --obj_display=none,dynamic
Feel free to disable anything call_graph does not need. You can probably find some combination of things that makes call_graph work well, while reducing the XML, and making the whole process faster.
Regarding the options ...
--dwarf_display=none,dinfo
This means disable all Dwarf output except for debug information. The general interface is the same as with --obj_display. Feel free to experiment with this as well.
Thanks and regards,
-George
Hello,
At the beginning, I would like to show summary regarding ofd debugging:
I suggest to close discussion regarding cgxml tools here and go back to original issue. Thank you for support.
Let's go back to original topic.
Time is running out and we would like to know, what is the chance to fix stack issues in next compiler releases. Workaround is used at testing/verification phase (which is positive at this point), but it can't be long term solution. I took a look at EXT_EP-9683, but there is no information about release, where stack issue is planned to be fixed. I assume, status "Planned" means, that there is still no development activity in this respect (please correct me, if I'm wrong). Could we tell us, when we should expect fix?
Best Regards,
ZD
And to avoid spamming the forum, I would like to ask also about rest of reports (corresponding threads are locked now):
008342dc 320ca35b || [!B0] MVK.L2 3,B4 008342e0 258e || MV.S1 A11,A1 008342e2 0616 || MV.D1 A12,A0 008342e4 03146c02 || MPY.M2 3,B5,B6 008342e8 9208a35a [!A1] MVK.L2 2,B4 008342ec 220c1fdb [ B0] MV.L2X A3,B4 008342f0 0c6e || NOP 1 008342f2 0c6e || NOP 1 008342f4 0c6e || NOP 1 008342f6 0c6e || NOP 1 008342f8 0c6e || NOP 1 008342fa 0c6e || NOP 1 008342fc ee201f03 .fphead n, l, W, BU, nobr, nosat, 1110001b 00834300 $C$L12: 00834300 d0001491 [!A0] B.S1 $C$L16 (PC+164 = 0x008343a4)
we're wasting approx. 2kB from INTMEM
And overall 12kB, because of parallel nops.
Unfortunately, the out files we research can't be delivered publicly...
Thank you for your answers in advance.
Best Regards,
ZD
Hello!
Could it be possible for your development team to pass more detailed info about the actual plans and development status in EXT_EP-* tickets? I would assume that this would be easiest approach for Nokia to follow the situation...
Br,
Risto
Regarding ...
Zbigniew Duszeńczuk said:Enumerations (e2e.ti.com/.../881731):
It's not critical, but correct enum handling is required to keep portability of our code. As I mentioned in corresponding thread - GCC works OK, Clang also, but CGT8.3.X doesn't and this is not the behaviour introduced by C++11/14 standards (we don't see reasonable difference between widely used GCC/Clang and CGT in field of C++ standard regarding enums). I see also, that EXT_EP-9680 is under development, so I believe there is a chance to get correction.
The referenced thread was unlocked. Further posts were added which resolved the problem.
Regarding ...
Zbigniew Duszeńczuk said:Increased code size (e2e.ti.com/.../861859):
- Issue is quite old, but some deadlines are also welcome. In the meantime, our experts made analysis, which shows that such NOP instructions are visible even in CGT7.3.8. Our expert's statement is following:
[...] we're wasting binary space (and polluting L1P cache...) because of the following stupidity:
The referenced thread was unlocked, and I added a post to it. This post explains the reasons behind these NOP instructions, and why they are probably not the cause of the code size increase. I didn't add it here because I wanted to keep all the related details in the same thread.
Regarding ...
Zbigniew Duszeńczuk said:Map file size (e2e.ti.com/.../882107):
This is the minor one and EXT_EP-9684 shows low priority. Status is still "New". Because it doesn't seem to be complicated to solve it (you mentioned in original thread, that it is even not a bug, but inefficient behaviour), I assume "efficient behaviour" could appear soon, right?
It is my understanding that, through other channels, it was communicated to you that, while this entry is still outstanding against the linker, it is considered a low priority, and there are no plans to address it in the short term.
Thanks and regards,
-George
Regarding ...
Zbigniew Duszeńczuk said:Let's go back to original topic.
Time is running out and we would like to know, what is the chance to fix stack issues in next compiler releases. Workaround is used at testing/verification phase (which is positive at this point), but it can't be long term solution. I took a look at EXT_EP-9683, but there is no information about release, where stack issue is planned to be fixed. I assume, status "Planned" means, that there is still no development activity in this respect (please correct me, if I'm wrong). Could we tell us, when we should expect fix?
A fix is in progress. It will appear in version 8.3.7, which is scheduled to release in mid-May. It will also appear in 8.5.0, for which the schedule is not yet set.
Thanks and regards,
-George