This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Is restrict Qualifier in Structure supported ?



Hi All.

Restrict in structure does not seem to work. Is this supposed to be supported ?

Let me give a simple example code:

typedef struct
{
float *restrict pipo1;
float *restrict pipo2;
float *restrict pipo3;
}MyStruct;


void main()
{
int i;
MyStruct* st = malloc(sizeof(MyStruct));
st->pipo1 = malloc(256*4);
st->pipo2 = malloc(256*4);
st->pipo3 = malloc(256*4);

_nassert ( (int) st->pipo1 %8 == 0 );
_nassert ( (int) st->pipo2 %8 == 0 );
_nassert ( (int) st->pipo3 %8 == 0 );
#pragma MUST_ITERATE(256,,8)
#pragma UNROLL(2)
 for (i=0; i<256; i++) // First Loop
st->pipo3[i]=st->pipo1[i]+st->pipo2[i];

float *restrict cpipo1 = st->pipo1;
float *restrict cpipo2 = st->pipo2;
float *restrict cpipo3 = st->pipo3;
_nassert ( (int) cpipo1 %8 == 0 );
_nassert ( (int) cpipo2 %8 == 0 );
_nassert ( (int) cpipo3 %8 == 0 );
#pragma MUST_ITERATE(256,,8)
#pragma UNROLL(2)
 for (i=0; i<256; i++) //Second Loop
cpipo3[i]=cpipo1[i]+cpipo2[i];
 }

 

First Loop : iii = 3 Schedule found with 4 iterations in parallel , Unroll (4x)

Second Loop: ii = 2 Schedule found with 6 iterations in parallel, Unroll(4x)

To Optimize My code I have to work with copy of pointer with restrict qualifier.

I am in Release Configuration  : Symbolic debug for program analysis, Optimization level 3, Optmize fully in the presence of debug directives

 

Regards ,

Pierre

  • The problem is not restrict.  It is that the compiler does not understand that the the pointers from within the structure (the first loop) are aligned on 8 byte boundaries.  The second loop uses LDDW and STDW instructions because it knows the pointers are aligned.  And that is how it gets the lower II.  You correctly use _nassert on both loops.  So, I don't know why you are not getting the same performance.  Thus, I filed SDSCM00040838 in the SDOWP system.  Feel free to track it with the SDOWP link in my sig.

    Thanks and regards,

    -George

  • Hello George

     

    _nassert have  been inserted on both loops. Problem really involves restric qualifier. ( I am also the author of the thread concerning the copy of _nassert before each loop)

    I am at home now and can not paste the assembly listing. But I garantee both loops use long word instruction.

    I will post the produced assembly listing Monday.

    Thanks for your answer

     

  • Though you didn't say, I deduce that you're using C674 or C66, because there are floating-point values.  I also needed to include <stdlib.h> before your example would compile.

    That said, I do see ii=3 and ii=2 for your two loops.  (But nothing about unrolling 4X, which would be weird because you specified UNROLL(2).)

    The loop-carried dependence bound noted in the .asm file is 0 for both loops, indicating that restrict is "working."  However, the resource bound is 2 for one and 3 for the other;  that's what is determining the ii.

    Indeed, the ii=3 loop is using LDW and STW and the ii=2 loop is using LDDW and STDW;  the extra instructions account for the resource difference and the ii.

    The _nasserts on st->pipo1 et al are not of a form that the optimiser can exploit.  It can understand that a scalar variable has certain properties, but not that an indirect memory access does.  (In a sense, it contains multiple assertions:  that st is non-NULL as well as that st->pipo1 is aligned a certain way.)  Use the pointer temps.

     

  • Thank you for your answer.

    I Use C674 (Physical target is a OMPA L-137 EVM  Board)

    pf said:
    That said, I do see ii=3 and ii=2 for your two loops.  (But nothing about unrolling 4X, which would be weird because you specified UNROLL(2).)

    Yes, It is a typo bug. Loop is unrolled twice as expected.

     

  • Using Compiler Optimization is really painful. !

    In Release Mode

     

    _nassert ( (int) (st->frame) %8 ==0);
    _nassert ( (int) (st->inbuf) %8 ==0);
    #pragma MUST_ITERATE(256,,8)
    #pragma UNROLL(8)
    for (i=0;i<N3;i++)
    st->frame[i]=st->inbuf[i];

    ;* ii = 4 Schedule found with 2 iterations in parallel

    ;* Loop will be splooped

     

    But

     {
    _nassert ( (int) (st->frame) %8 ==0);
    _nassert ( (int) (st->inbuf) %8 ==0);
    #pragma MUST_ITERATE(256,,8)
    #pragma UNROLL(8)
    for (i=0;i<N3;i++)
    st->frame[i]=st->inbuf[i];
    }

    (same with Curly brackets)

    ;* ii = 24 Schedule found with 1 iterations in parallel

     not splooped

     

     

    ( It seems that loop optimization strongly depends of further loops in the function ??)

     

    By moving temporary pointers cframe and cinbuf