Use MUST_ITERATE in loop nesting condition

Peter010

Hello everyone

I plan to do the dot production for multiple vectors, so I modify the example function DSPF_sp_dotp_cplx. The following is the modified function. I want know that if I use MUST_ITERATE(48,,48) correctly, or not. I mean will the Outside FOR loop be unrolled for 48n times; Will the inside FOR loop be unrolled at the same time? If I want to make the inside FOR loop unrolled, how can I do? I am using Compiler Version 8.0.4, and Opt Level is set at 3.

Thanks

Xining Yu

void DSPF_sp_dotp_cplx_new(const float * x, const float * y, unsigned int nx, unsigned int ny,
                       float * restrict re, float * restrict im)
{
    unsigned int i, j;
    __float2_t x0_im_re, y0_im_re, result0 = 0;
    __float2_t x1_im_re, y1_im_re, result1 = 0;
    __float2_t x2_im_re, y2_im_re, result2 = 0;
    __float2_t x3_im_re, y3_im_re, result3 = 0;
    __float2_t result;

    _nassert(nx % 4 == 0);
    _nassert(nx > 0);
    _nassert((int)x % 8 == 0);
    _nassert((int)y % 8 == 0);
#pragma MUST_ITERATE(48,,48);

    for (j = 0; j < nx; j += 48)
    {
    	for(i = 0; i < 2 * ny; i += 8)
    	{
    		/* load 4 sets of input data */
    		x0_im_re = _amem8_f2((void*)&x[i+j]);
    		y0_im_re = _amem8_f2((void*)&y[i]);

    		x1_im_re = _amem8_f2((void*)&x[i+2+j]);
    		y1_im_re = _amem8_f2((void*)&y[i+2]);

    		x2_im_re = _amem8_f2((void*)&x[i+4+j]);
    		y2_im_re = _amem8_f2((void*)&y[i+4]);

    		x3_im_re = _amem8_f2((void*)&x[i+6+j]);
    		y3_im_re = _amem8_f2((void*)&y[i+6]);

    		/* calculate 4 running sums */
    		result0 = _daddsp(_complex_mpysp(x0_im_re, y0_im_re), result0);
    		result1 = _daddsp(_complex_mpysp(x1_im_re, y1_im_re), result1);
    		result2 = _daddsp(_complex_mpysp(x2_im_re, y2_im_re), result2);
    		result3 = _daddsp(_complex_mpysp(x3_im_re, y3_im_re), result3);
    	}

    	result = _daddsp(_daddsp(result0,result1),_daddsp(result2,result3));
    	result0 = 0; result1 = 0; result2 = 0; result3 = 0;
    	*re = -_hif2(result);
    	*im =  _lof2(result);
    	re += 2;
    	im += 2;

    }

}

over 9 years ago

0 George Mock over 9 years ago

TI__Guru**** 244440 points

Xining Yu said:
will the Outside FOR loop be unrolled for 48n times

No. That's a bad idea. If the loop is unrolled too many times, then too many values are being computed at once. The compilation would take a long time, then it would finally give up and emit a schedule for the loop that is not software pipelined. That means it will perform very poorly.

Xining Yu said:
Will the inside FOR loop be unrolled at the same time?

No. A MUST_ITERATE pragma applies only to the next loop, and not any subsequent loops.

Note the inner loop is already manually unrolled 4 times. In my experiments, I did not find any way to improve on that.

You can force the compiler to unroll a loop with the UNROLL pragma. Read about it in the C6000 compiler manual. Generally speaking, this is not a good idea. But it is a useful way to experiment. You can use it in this case to see that unrolling the outer or inner loop does not improve performance. Start by using #pragma UNROLL(1) on the inner loop, then increase it by multiples of 2. Use the compiler build switch --debug_software_pipeline. After each build, inspect the resulting .asm file. There is a large block comment before the inner loop. Focus on two numbers, the ii and the Loop Unroll Multiple. ii stands for initiation interval. If the loop unroll multiple is not present, then presume it is one. You want this to be a small as possible: ii/Loop Unroll Multiple. In the experiments I tried, that number never improved as I increased the unrolling of the inner loop.

Thanks and regards,

-George

0 Peter010 over 9 years ago in reply to George Mock

Genius 4165 points

Thanks for your replying. I appreciate it.

Regards
Xining

Code Composer Studio™︎

Code Composer Studio forum

Use MUST_ITERATE in loop nesting condition