This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

how to force double alignment to 4 bytes from gcc for ARM?



Hello,

I've run into a data alignment problem in my code running on the AM389x Sitara (ARM Cortex A8).  The code is compiled using the CodeSourcery gcc compiler.  

My problem is I have a structure that contains 13 int types followed by a double.  The double data in memory starts at the address immediately after the 13th int member.  I can see though that the code thinks the double starts at the address of the 13th int + 4 bytes (if I point to the double at the address the code tells me at less 4 bytes, it is printed correctly).  Alternately, if I add a dummy int member immediately before the double, the code prints the double correctly.

It looks like the processor aligns double on 4 byte boundaries (1 word), but the compiler thinks they are on 8 byte boundaries, and so assumes a 4 byte pad before the double after the 13 bytes.  As a solution, it seems the gcc option -mno-align-double would fix it, but the CodeSourcery gcc doesn't accept it (and it isn't listed in the CodeSourcery compiler manual under ARM options -- it is listed as an Intel 386 option).

Is there a way to make the compiler align doubles on 4 byte boundaries with ARM and this compiler?  Is there another way to fix this?  Inserting pads is not really a viable option because I have many structures like this and anybody could change one and ignore the need for the pad.

My data structure is shown below, followed by printfs of the start address of each structure member.  Also, __align_of__ (double) returns 8.

typedef struct input_timing_s
{
int chan;
int src_pwr_det;
int clk_det;
int de_det;
int sig_stable;
unsigned int time_locked;
int mode;
int hdcp_authenticated;
int hdcp_src_dev_auth_start;
int horiz_active;
int horiz_blank;
int vert_active;
int vert_blank;
double tmds_frequency;
int horiz_sync_polarity;
int vert_sync_polarity;
int interlace;
} input_timing_t;

addr of chan=0xbe9ca780
src_pwr_det=0xbe9ca784
clk_det=0xbe9ca788
de_det=0xbe9ca78c
sig_stable=0xbe9ca790
time_locked=0xbe9ca794
mode=0xbe9ca798
hdcp_authenticated=0xbe9ca79c
hdcp_src_dev_auth_start=0xbe9ca7a0
horiz_active=0xbe9ca7a4
horiz_blank=0xbe9ca7a8
vert_active=0xbe9ca7ac
vert_blank=0xbe9ca7b0
tmds_frequency=0xbe9ca7b8
horiz_sync_polarity=0xbe9ca7c0
vert_sync_polarity=0xbe9ca7c4
interlace=0xbe9ca7c8

Thanks,

Tim

  • Tim Sax said:
    The code is compiled using the CodeSourcery gcc compiler.

    This forum is for compilers released by TI, and that's not one of them.  However, CCS does come with the Linaro ARM gcc compiler, which is closely related.  I'm pretty sure it is OK to assume those two compilers work the same with regard to your issue.

    Tim Sax said:
    The double data in memory starts at the address immediately after the 13th int member.

    That's not what you show.

    Tim Sax said:
    int vert_blank;
    double tmds_frequency;

    Tim Sax said:
    vert_blank=0xbe9ca7b0
    tmds_frequency=0xbe9ca7b8

    The field vert_blank takes up 4 bytes starting at 0xbe9ca7b0, followed by a 4-byte hole, then tmds_frequency starts on 0xbe9ca7b8, and 8-byte aligned address.  I don't see the problem here.

    Thanks and regards,

    -George

  • Thanks George for your response.  I may have interpreted why this is happening wrongly, but I can see that something is mismatched.  I'll try to clarify:

    George Mock said:

    The field vert_blank takes up 4 bytes starting at 0xbe9ca7b0, followed by a 4-byte hole, then tmds_frequency starts on 0xbe9ca7b8, and 8-byte aligned address.  I don't see the problem here.

    Yes I agree about the 4 byte hole after vert_blank and before tmds_frequency as shown by the printfs.  The problem is whatever determines the location of members in the structure doesn't seem to know about the 4 bytes hole.  When I access tmds_frequency by

    input_timing_t intm;

    printf( "tmds_freq=%f\n", t.tmds_frequency )

    I get 0 instead of the expected 135198032.000000.

    When I make a char * p point to tmds_freq - 4 bytes, I get the correct 135198032.000000:

    char *p;
    double *p_dbl;

    p = (char *)&intm.tmds_frequency;
    p -= 4;
    p_dbl = (double *)p;
    printf( "tmds_freq in rec - 4 = %f\n", *p_dbl );

    tmds_freq in rec - 4 = 135198032.000000

    Data for the whole structure looks like:

    (hex)
    1 0 0 0 1 0 0 0 1
    0 0 0 1 0 0 0 1 0 0
    0 0 0 0 0 1 0 0 0 0
    0 0 0 0 0 0 0 0 5 0
    0 98 1 0 0 0 4 0 0 2a
    0 0 0 0 0 0 a0 ea 1d a0
    41 1 0 0 0 1 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0

    Then printing out each byte in hex starting at the location of tmds_frequency,

    p = (char *)&intm.tmds_frequency;
    printf( "\ntmds_freq bytes in rec: %x ", *p++ );
    for( j = 0; j < sizeof( double ); j++ )
    {
    printf( "%x ", *p++ );
    }
    printf( "\n" );

    I get:

    tmds_freq bytes in rec: ea 1d a0 41 1 0 0 0

    instead of the expected  0 0 0 a0 ea 1d a0 41

    So my question is why is this failure to determine the location of tmds_frequency happening and how do I fix it?

    I hope that clarifies what is going on.

    Regards,

    Tim

  • Hi,

    How do you populate the struct contents?

    The compiler seem to access the field at the correct offset, so maybe the problem is related to the way you use to fill the structure (note also that the presence of the double force the whole structure alignment to 8bytes, so if you use somewhere a cast to input_timing_t* , it can lead to alignment problems).

    Anyway consider the usage of __attribute__((packed)).

  • Thank you Alberto.  Your post was key to my understanding what was going on.

    Actually someone suggested __attribute__((packed)) yesterday and it fixed the problem (I was planning on updating the post today anyway), but I didn't understand why there was a discrepancy between what was in memory and what the compiler thought was in memory.

    I looked at how the data was getting put into memory, and then the cause was clear.  To put the data in memory, I am reading fields from a database one by one, incrementing a pointer to memory by the size of the data type of the field each time.  So when I put the data in memory there is no 4 byte gap before the double because I didn't create one.

    Rather than trying to guess where the compiler is going to insert gaps and create them when I put the data into memory, it seems like the better solution is to use __attribute__((packed)).