TMS320C6678: why the function for matrix operations requires rows and columns to be multiples of 2 in dsplib

nacy jae

Part Number: TMS320C6678

Tool/software:

such as DSPF_sp_mat_mul funchtion, in Assumptions "All r1, c1, c2 are assumed to be multiple of 2 and >=2."


void	DSPF_sp_mat_mul (float x1, const int r1, const int c1, float x2, const int c2, float *restrict y)

This function computes the expression “y = x1*x2” for the matrices x1 and x2. The column dimension of x1 must match the row dimension of x2. The resulting matrix has the same number of rows as x1 and the same number of columns as x2. The values stored in the matrices are assumed to be single-precision floating-point values. This code is suitable for dense matrices. No optimizations are made for sparse matrices.

Parameters:

	x1	= Pointer to r1 by c1 input matrix.
	r1	= Number of rows in x1.
	c1	= Number of columns in x1. Also number of rows in x2.
	x2	= Pointer to c1 by c2 input matrix.
	c2	= Number of columns in x2.
	y	= Pointer to r1 by c2 output matrix.

Algorithm:: DSPF_sp_mat_mul.c is the natural C equivalent of the optimized intrinsic C code withoutrestrictions. Note that the intrinsic C code is optimized and restrictions may apply.

Assumptions:: The arrays ‘x1’, ‘x2’, and ‘y’ are stored in distinct arrays. That is, in-place processing is not allowed.
All r1, c1, c2 are assumed to be multiple of 2 and >=2.

Implementation Notes:: Interruptibility : The code is interruptible.
Endian support : supports both Little and Big endian modes.

11 months ago

0 Shankari G 11 months ago

TI__Mastermind 25535 points

Nacy jae,

Just an assumption for ease of calculation.

Please try out the non-multiples of two and observe the results too.

Please get back with the results.

Regards

Shankari G

0 nacy jae 11 months ago in reply to Shankari G

Prodigy 10 points

thanks for your replay!

I have try DSPF_sp_mat_mul function use 5*5 matrix, and 6*6 matrix, 6*6 matrix multiplication result is correct, but 5*5 matrix is error.

Perhaps these assumptions are intended to improve memory access efficiency?

0 nacy jae 11 months ago in reply to nacy jae

Prodigy 10 points

I read the source code and found that the _nassert function seems to specify the memory alignment of the array. I am a beginner, so I am not very clear about the actual effectiveness of this function. I think it may be because these functions limit the matrix dimension to be a multiple of 2, and not being a multiple of 2 will result in calculation errors.

void DSPF_sp_mat_mul(float *x1, const int r1, const int c1,
    float *x2, const int c2, float *restrict y)
{
    int i, j, k;
    double sum0, sum1, sum2, sum3;
    float *ptr_x, *ptr_y;
    double x_10, x_32, y_20, y_31, y_10, y_32;
    unsigned int xoff;

    _nassert(r1 > 0);
    _nassert(c1 > 0);
    _nassert(c2 > 0);
    _nassert( (int) x1 % 8 == 0 );
    _nassert( (int) x2 % 8 == 0 );
    _nassert( (int) y  % 8 == 0 );
    _nassert( r1 % 2 == 0 );
    _nassert( c2 % 2 == 0 );
    _nassert( c1 % 2 == 0 );

    /* ---------------------------------------------------- */
    /*  Multiply each row in x1 by each column in x2.  The  */
    /*  product of row m in x1 and column n in x2 is placed */
    /*  in position (m,n) in the result.                    */
    /* ---------------------------------------------------- */

    #pragma MUST_ITERATE(1,,)
    for (i = 0; i < r1; i+=2) {
        xoff = i*c1;

        #pragma MUST_ITERATE(1,,)
        for (j = 0; j < c2; j+=2) {                                                     
            sum0 = 0;
            sum1 = 0;
            sum2 = 0;
            sum3 = 0;
            
            ptr_x = &x1[xoff];
            ptr_y = &x2[j];
            
            #pragma MUST_ITERATE(1,,)
            for (k = 0; k < c1; k+=2,ptr_x+=2,ptr_y+=2*c2) {
                _amemd8(&x_10) = _amemd8_const(&ptr_x[0]);
                _amemd8(&x_32) = _amemd8_const(&ptr_x[c1]);
                _amemd8(&y_20) = _amemd8_const(&ptr_y[0]);
                _amemd8(&y_31) = _amemd8_const(&ptr_y[c2]);
              
                y_10 = _ftod(_lof(y_31), _lof(y_20));
                y_32 = _ftod(_hif(y_31), _hif(y_20));
              
                sum0 = _daddsp(sum0, _dmpysp(x_10, y_10));
                sum1 = _daddsp(sum1, _dmpysp(x_10, y_32));
                sum2 = _daddsp(sum2, _dmpysp(x_32, y_10));
                sum3 = _daddsp(sum3, _dmpysp(x_32, y_32));
            }
           
            _amemd8(&y[(i + 0)*c2 + j]) = _ftod(_hif(sum1) + _lof(sum1),
                                                _hif(sum0) + _lof(sum0));
            _amemd8(&y[(i + 1)*c2 + j]) = _ftod(_hif(sum3) + _lof(sum3),
                                                _hif(sum2) + _lof(sum2));
        }
    }                                                     
}

0 Shankari G 11 months ago in reply to nacy jae

TI__Mastermind 25535 points

Nacy,

Let me experiment and get back in a day or two.

Regards

Shankari G

0 Shankari G 11 months ago in reply to Shankari G

TI__Mastermind 25535 points

Nacy,

I was on leave these days.

yet to work on your query.

Thanks for your patience.

Regards

Shankari G

0 Shankari G 11 months ago in reply to Shankari G

TI__Mastermind 25535 points

Nacy,

Yes, You are right.

It is done that way and expected to perform that way.

I could see few other functions in both ( dp - double point and sp - single point ) matrix multiplication functions like "DSPF_sp_mat_mul_cplx" and "DSPF_sp_mat_mul_gemm_cplx "

They are all either have limitations to mutliples of 2 or 4.

We can use only for those purposes satisfying the assumptions...

Regards

Shankari G

0 nacy jae 11 months ago in reply to Shankari G

Prodigy 10 points

thanks! I think I understand now.

Processors

Processors forum

TMS320C6678: why the function for matrix operations requires rows and columns to be multiples of 2 in dsplib