This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430 + deque/map binary size issues with large data model

Other Parts Discussed in Thread: MSP430F2618, CODECOMPOSER

Hi!

We're using an MSP430F2618 in an embedded application. We're using Codecomposer v4.1.2.00027, and we happend to notice some strange behavour with regards to the resulting binary size when compling with "large data model" enabled.

Example:

main.cpp:
#include <map>
using namespace std;

int main()
{
    deque<int> *myDeque = new deque<int>();
    int i = 0;
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
    myDeque->push_back(i++);
}

In a default release build (i.e "large data model" disabled), this results in a 7.6kB binary loaded into flash. Compiling with "large data model" enabled, the same code results in a 12.4kB binary loaded into flash. Is this suppose to happen?

If so, our application is quite large (way beond 65kB), and depend on having "large data model" enabled. What can be done to decrease the binary size, are there any options we can enable/disable?

Having a look at the resulting *.map file for the compiling with "large data model" enabled, it seems like (at least in the map-file for our application) that each call to any function in STL deque is getting it's own version of that function in the binary, i.e. any new calls to a deque funcions is not reusing the previous one in the binary.

 

  • When large data model is used, all poitners to variables are 20 bit (32 bit in memory) instead of 16 bit.

    This means that any memory access requires the use of an X-instruction, which is 2 bytes longer than the normal instructions. Also most other operations (including push and pop). With small data model, registers contain only 16 bit values, so except for the function calls (which of course require a 20 bit argument), none of the X-instructions is needed. With large data model, caution must be taken as any register can contain a 20 bit value. Luckily it does not mean that a pointer requires two registers (unless you do much typecasting from long to *). But depending on the things you do, the size impact may be very well more than the 10% stated in the TI datasheet.

    It might be a better (and faster!) solution to force all code into the upper flash and all constants into the lower 64k, so small data model is sufficient.
    It's even possible to move the interrupt functions to upper flash by providing a 'stub'. Basically, a jump command (assembly) that is in lower flash and jumps to the 'real' ISR in upper flash. It adds 5 cycles to the interrupt latency.
    Don't ask me how to do it in IAR/CCS, I've only done it for the mspgcc compiler.

  • Hi!

    We are aware and do expect the binary size to increase with large data model. However,  we don't expect to see such a huge increase as we do. The binary size shouldn't grow 71% from 44.5kB without large data model to about 76kB with that enabled.

    That been said our main problem, duplicate versions of a function/method in the binary, doesn't seem to be confined to large data model. Here is an example from the .map file generated in a release build without large data model.

    00005614 00000330 Com.obj
     (.text:__CPR403___Insert__Q2_3std239_Tree__tm__225_Q2_3std214_Tmap_traits__tm__193_PCcPQ2_3std42deque__
    tm__29_cQ2_3std18allocator__tm__2_cQ2_3std15less__tm__4_PCcQ2_3std96allocator__tm__79_Q2_3std69pair__tm__
    57_CPCcPQ2_3stdJ77JXCbL_1_0FbQ5_Z1Z14allocator_type50rebind__tm__36_Q3_3std20_Tree_nod__tm__4_Z1Z5_
    Node5other7pointerRCQ2_Z1Z10value_type_Q3_3std16_Tree__tm__J323J8iterator)

    00005944 00000312 Tnc.obj
    (.text:__CPR403___Insert__Q2_3std239_Tree__tm__225_Q2_3std214_Tmap_traits__tm__193_PCcPQ2_3std42deque__
    tm__29_cQ2_3std18allocator__tm__2_cQ2_3std15less__tm__4_PCcQ2_3std96allocator__tm__79_Q2_3std69pair__tm__
    57_CPCcPQ2_3stdJ77JXCbL_1_0FbQ5_Z1Z14allocator_type50rebind__tm__36_Q3_3std20_Tree_nod__tm__4_Z1Z5_
    Node5other7pointerRCQ2_Z1Z10value_type_Q3_3std16_Tree__tm__J323J8iterator)

    Apart from different origin (first hex number), size (second hex number) and object, these two lines are exactly the same. Both the Tnc and Com inherit from the same class and both have multiple lines looking like this:

    commandMap->insert(make_pair( "ping", tmp));

    commandMap is a pointer to a map<const char*, deque<char>*> inherited from Node (which both Tnc and Com inherit from) and tmp is a pointer to a deque<char> generated locally. Both classes (Tnc and Com) have multiple calls like this with different input. We believe this is the source for both allocations in the binary. What is quite puzzling is that this isn't allocated as one block of code which they both can operate on. It would after all be the most logical thing to do. Looking at the same lines in the .map file for a debug build, the only difference is the origin and object, as indicated by the lines below:

    00005e4e 0000039e Com.obj
    (.text:__CPR403___Insert__Q2_3std239_Tree__tm__225_Q2_3std214_Tmap_traits__tm__193_PCcPQ2_3std42deque__
    tm__29_cQ2_3std18allocator__tm__2_cQ2_3std15less__tm__4_PCcQ2_3std96allocator__tm__79_Q2_3std69pair__
    tm__57_CPCcPQ2_3stdJ77JXCbL_1_0FbQ5_Z1Z14allocator_type50rebind__tm__36_Q3_3std20_Tree_nod__tm__4_
    Z1Z5_Node5other7pointerRCQ2_Z1Z10value_type_Q3_3std16_Tree__tm__J323J8iterator)

    000061ec 0000039e Tnc.obj
    (.text:__CPR403___Insert__Q2_3std239_Tree__tm__225_Q2_3std214_Tmap_traits__tm__193_PCcPQ2_3std42deque__
    tm__29_cQ2_3std18allocator__tm__2_cQ2_3std15less__tm__4_PCcQ2_3std96allocator__tm__79_Q2_3std69pair__
    tm__57_CPCcPQ2_3stdJ77JXCbL_1_0FbQ5_Z1Z14allocator_type50rebind__tm__36_Q3_3std20_Tree_nod__tm__4_
    Z1Z5_Node5other7pointerRCQ2_Z1Z10value_type_Q3_3std16_Tree__tm__J323J8iterator)

    Creating a new function in Node for inserting something into commandMap, i.e.

    void Node::setCommand(const char* name, deque<char> *command)
    {
         commandMap->insert(make_pair(name, command));
    }

    quite drastically reduces the size of the binary. What we don't get is the fact that this is a crystal clear candidate for inlining, thus creating the exact same code as before. Are we just thinking wrong here, or is there a problem in the compilation? There seem to be similar problems with deque where creating wrap-functions working on a deque deduces the binary size in the amounts of kBs.

    All release builds a compiled with default release settings + --static_template_instantiation and --opt_for_speed=0 to decrease the binary size as much as possible.

  • benyamin said:
    What is quite puzzling is that this isn't allocated as one block of code which they both can operate on.


    Maybe this would be possible if both classes (and the class tehy are derived from) were in the same c file. But since they aren't and the compiler sees only the current compilation unit, it HAS to generate separate code in each object file. The compiler cannot know that the other class even exists. And for the linker, both code blocks are just two code blocks with no relation. Even if 100% identical.

    But that's not related to the memory model. What is, is the heavy use of pointers. And C++ classes heavily rely on them. Unfortunately, the Compiler (the C++ language too) does not separate between constant data (which needs 20 bit pointers) and variable data (which on MSPs would only require 16 bit pointers since RAM is only in the lower 64k. And depending on the use of these pointers, code size can very well increase that much. 71% is a lot and more than I had expected, yet 10% is 'normal' for plain C programs and for C++ classes, 20-30% would have been my guess.

    benyamin said:
    What we don't get is the fact that this is a crystal clear candidate for inlining

    Inlining is a compiler-dependent thing. Most compilers do inlining when optimizing for speed (as is usually is larger than a function call, yet much faster) and you don't optimize for speed.
    Also, even if inlining, a copy of the function still needs to exist as function, just in case someone form outside will call it (which the compiler cannot know).
    I don't know what --static_template_instantiation means, but this also seems to have some impact on the code size (just from the name) And everyhting 'static' usually means 'larger but faster')

    Sorry, but I can't tell you more wihtout taking a closer look at the code - and I don't have the time to read me into this a little more complex task.

    If you cannot figure out what's going on (a look at the generated assembly code is sometimes very enlightening), you should ask the compiler manufacturer. I think this is far too complex for an MSP forum (which is about problems with the MSP hardware rather than with the C++ compiler itself)

**Attention** This is a public forum