This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: why the function for matrix operations requires rows and columns to be multiples of 2 in dsplib

Part Number: TMS320C6678

Tool/software:

such as DSPF_sp_mat_mul funchtion, in Assumptions "All r1, c1, c2 are assumed to be multiple of 2 and >=2."

void  DSPF_sp_mat_mul (float *x1, const int r1, const int c1, float *x2, const int c2, float *restrict y)

This function computes the expression “y = x1*x2” for the matrices x1 and x2. The column dimension of x1 must match the row dimension of x2. The resulting matrix has the same number of rows as x1 and the same number of columns as x2. The values stored in the matrices are assumed to be single-precision floating-point values. This code is suitable for dense matrices. No optimizations are made for sparse matrices.

Parameters:
x1  = Pointer to r1 by c1 input matrix.
r1  = Number of rows in x1.
c1  = Number of columns in x1. Also number of rows in x2.
x2  = Pointer to c1 by c2 input matrix.
c2  = Number of columns in x2.
y  = Pointer to r1 by c2 output matrix.
Algorithm:
DSPF_sp_mat_mul.c is the natural C equivalent of the optimized intrinsic C code withoutrestrictions. Note that the intrinsic C code is optimized and restrictions may apply.
Assumptions:
The arrays ‘x1’, ‘x2’, and ‘y’ are stored in distinct arrays. That is, in-place processing is not allowed.
All r1, c1, c2 are assumed to be multiple of 2 and >=2.
Implementation Notes:
Interruptibility : The code is interruptible.
Endian support : supports both Little and Big endian modes.
  • Nacy jae,

    Just an assumption for ease of calculation.

    Please try out the non-multiples of two and observe the results too.

    Please get back with the results.

    Regards

    Shankari G

  • thanks for your replay!

    I have try DSPF_sp_mat_mul function use 5*5 matrix, and 6*6 matrix,  6*6 matrix multiplication result is correct, but 5*5 matrix is error.


    Perhaps these assumptions are intended to improve memory access efficiency?

  • I read the source code and found that the _nassert function seems to specify the memory alignment of the array. I am a beginner, so I am not very clear about the actual effectiveness of this function. I think it may be because these functions limit the matrix dimension to be a multiple of 2, and not being a multiple of 2 will result in calculation errors.


    void DSPF_sp_mat_mul(float *x1, const int r1, const int c1,
        float *x2, const int c2, float *restrict y)
    {
        int i, j, k;
        double sum0, sum1, sum2, sum3;
        float *ptr_x, *ptr_y;
        double x_10, x_32, y_20, y_31, y_10, y_32;
        unsigned int xoff;
    
        _nassert(r1 > 0);
        _nassert(c1 > 0);
        _nassert(c2 > 0);
        _nassert( (int) x1 % 8 == 0 );
        _nassert( (int) x2 % 8 == 0 );
        _nassert( (int) y  % 8 == 0 );
        _nassert( r1 % 2 == 0 );
        _nassert( c2 % 2 == 0 );
        _nassert( c1 % 2 == 0 );
    
        /* ---------------------------------------------------- */
        /*  Multiply each row in x1 by each column in x2.  The  */
        /*  product of row m in x1 and column n in x2 is placed */
        /*  in position (m,n) in the result.                    */
        /* ---------------------------------------------------- */
    
        #pragma MUST_ITERATE(1,,)
        for (i = 0; i < r1; i+=2) {
            xoff = i*c1;
    
            #pragma MUST_ITERATE(1,,)
            for (j = 0; j < c2; j+=2) {                                                     
                sum0 = 0;
                sum1 = 0;
                sum2 = 0;
                sum3 = 0;
                
                ptr_x = &x1[xoff];
                ptr_y = &x2[j];
                
                #pragma MUST_ITERATE(1,,)
                for (k = 0; k < c1; k+=2,ptr_x+=2,ptr_y+=2*c2) {
                    _amemd8(&x_10) = _amemd8_const(&ptr_x[0]);
                    _amemd8(&x_32) = _amemd8_const(&ptr_x[c1]);
                    _amemd8(&y_20) = _amemd8_const(&ptr_y[0]);
                    _amemd8(&y_31) = _amemd8_const(&ptr_y[c2]);
                  
                    y_10 = _ftod(_lof(y_31), _lof(y_20));
                    y_32 = _ftod(_hif(y_31), _hif(y_20));
                  
                    sum0 = _daddsp(sum0, _dmpysp(x_10, y_10));
                    sum1 = _daddsp(sum1, _dmpysp(x_10, y_32));
                    sum2 = _daddsp(sum2, _dmpysp(x_32, y_10));
                    sum3 = _daddsp(sum3, _dmpysp(x_32, y_32));
                }
               
                _amemd8(&y[(i + 0)*c2 + j]) = _ftod(_hif(sum1) + _lof(sum1),
                                                    _hif(sum0) + _lof(sum0));
                _amemd8(&y[(i + 1)*c2 + j]) = _ftod(_hif(sum3) + _lof(sum3),
                                                    _hif(sum2) + _lof(sum2));
            }
        }                                                     
    }

  • Nacy,

    Let me experiment and get back in a day or two.

    Regards

    Shankari G

  • Nacy,

    I was on leave these days.

    yet to work on your query.

    Thanks for your patience.

    Regards

    Shankari G

  • Nacy,

    Yes, You are right.

    It is done that way and expected to perform that way.

    --

    I could see few other functions in both ( dp - double point and sp - single point )  matrix multiplication functions like "DSPF_sp_mat_mul_cplx" and  "DSPF_sp_mat_mul_gemm_cplx " 

    They are all either have limitations to mutliples of 2 or 4.

    --

    We can use only for those purposes satisfying the assumptions...

    Regards

    Shankari G

  • thanks!  I think I understand now.