Is restrict Qualifier in Structure supported ?

Pierre_

Hi All.

Restrict in structure does not seem to work. Is this supposed to be supported ?

Let me give a simple example code:

typedef struct
{
float *restrict pipo1;
float *restrict pipo2;
float *restrict pipo3;
}MyStruct;

void main()
{
int i;
MyStruct* st = malloc(sizeof(MyStruct));
st->pipo1 = malloc(256*4);
st->pipo2 = malloc(256*4);
st->pipo3 = malloc(256*4);

_nassert ( (int) st->pipo1 %8 == 0 );
_nassert ( (int) st->pipo2 %8 == 0 );
_nassert ( (int) st->pipo3 %8 == 0 );
#pragma MUST_ITERATE(256,,8)
#pragma UNROLL(2)
for (i=0; i<256; i++) // First Loop
st->pipo3[i]=st->pipo1[i]+st->pipo2[i];

float *restrict cpipo1 = st->pipo1;
float *restrict cpipo2 = st->pipo2;
float *restrict cpipo3 = st->pipo3;
_nassert ( (int) cpipo1 %8 == 0 );
_nassert ( (int) cpipo2 %8 == 0 );
_nassert ( (int) cpipo3 %8 == 0 );
#pragma MUST_ITERATE(256,,8)
#pragma UNROLL(2)
for (i=0; i<256; i++) //Second Loop
cpipo3[i]=cpipo1[i]+cpipo2[i];
}

First Loop : iii = 3 Schedule found with 4 iterations in parallel , Unroll (4x)

Second Loop: ii = 2 Schedule found with 6 iterations in parallel, Unroll(4x)

To Optimize My code I have to work with copy of pointer with restrict qualifier.

I am in Release Configuration : Symbolic debug for program analysis, Optimization level 3, Optmize fully in the presence of debug directives

Regards ,

Pierre

over 14 years ago

0 George Mock over 14 years ago

TI__Guru**** 251530 points

The problem is not restrict. It is that the compiler does not understand that the the pointers from within the structure (the first loop) are aligned on 8 byte boundaries. The second loop uses LDDW and STDW instructions because it knows the pointers are aligned. And that is how it gets the lower II. You correctly use _nassert on both loops. So, I don't know why you are not getting the same performance. Thus, I filed SDSCM00040838 in the SDOWP system. Feel free to track it with the SDOWP link in my sig.

Thanks and regards,

-George

0 Pierre_ over 14 years ago in reply to George Mock

Intellectual 480 points

Hello George

_nassert have been inserted on both loops. Problem really involves restric qualifier. ( I am also the author of the thread concerning the copy of _nassert before each loop)

I am at home now and can not paste the assembly listing. But I garantee both loops use long word instruction.

I will post the produced assembly listing Monday.

Thanks for your answer

0 pf over 14 years ago in reply to Pierre_

TI__Expert 4930 points

Though you didn't say, I deduce that you're using C674 or C66, because there are floating-point values. I also needed to include <stdlib.h> before your example would compile.

That said, I do see ii=3 and ii=2 for your two loops. (But nothing about unrolling 4X, which would be weird because you specified UNROLL(2).)

The loop-carried dependence bound noted in the .asm file is 0 for both loops, indicating that restrict is "working." However, the resource bound is 2 for one and 3 for the other; that's what is determining the ii.

Indeed, the ii=3 loop is using LDW and STW and the ii=2 loop is using LDDW and STDW; the extra instructions account for the resource difference and the ii.

The _nasserts on st->pipo1 et al are not of a form that the optimiser can exploit. It can understand that a scalar variable has certain properties, but not that an indirect memory access does. (In a sense, it contains multiple assertions: that st is non-NULL as well as that st->pipo1 is aligned a certain way.) Use the pointer temps.

0 Pierre_ over 14 years ago in reply to pf

Intellectual 480 points

Thank you for your answer.

I Use C674 (Physical target is a OMPA L-137 EVM Board)

pf said:
That said, I do see ii=3 and ii=2 for your two loops. (But nothing about unrolling 4X, which would be weird because you specified UNROLL(2).)

Yes, It is a typo bug. Loop is unrolled twice as expected.

0 Pierre_ over 14 years ago in reply to Pierre_

Intellectual 480 points

Using Compiler Optimization is really painful. !

In Release Mode

_nassert ( (int) (st->frame) %8 ==0);
_nassert ( (int) (st->inbuf) %8 ==0);
#pragma MUST_ITERATE(256,,8)
#pragma UNROLL(8)
for (i=0;i<N3;i++)
st->frame[i]=st->inbuf[i];

;* ii = 4 Schedule found with 2 iterations in parallel

;* Loop will be splooped

But

{
_nassert ( (int) (st->frame) %8 ==0);
_nassert ( (int) (st->inbuf) %8 ==0);
#pragma MUST_ITERATE(256,,8)
#pragma UNROLL(8)
for (i=0;i<N3;i++)
st->frame[i]=st->inbuf[i];
}

(same with Curly brackets)

;* ii = 24 Schedule found with 1 iterations in parallel

not splooped

( It seems that loop optimization strongly depends of further loops in the function ??)

By moving temporary pointers cframe and cinbuf

Code Composer Studio™︎

Code Composer Studio forum

Is restrict Qualifier in Structure supported ?