This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to use CMATMPY intrinsic

Hi,

I'd like to know how to use the cmatmpy intrinsic.

I read in "TMS320C6000 Optimizing Compiler v7.4" that the intrinsic is __x128_t_cmatmpy (long long src1, __x128_t src2);

I'm really new at this and I don't understand how it works... I tried _cmatmpy(array,matrix); but it doesn't match with the types of the arguments.

Can you clear that up for me please ?

Thank you,

Alex

  • Hi Alex,

    Please check the document sprugh7.pdf -TMS320C66x DSP CPU and Instruction Set paragraph 4.35 CCMATMPY

    Thanks,

    HR

  • Hi HR,

    I already did check this document, I began with this one.

    The thing is that the description and examples are in assembly code, and I'd like to use the intrinsic in order to implement C code in CCS...

    The function takes two numbers for input, but we treat a 1x2 array and a 2x2 matrix. I don't understand why we need the conversion from array/matrix to long long/__x128_t. I also checked the example in C code in the document "Optimizing Loops on the C66x DSP" without fully understanding it... I repeat that I'm really new at this, I'm in an internship where I learn day after day...

    Best regards,

    Alex

  • Hi again,

    I found how to use the function but it doesn't give me the good results...

    Here is my code :

    void main(){

        int matrix[2][2] = {-3,7,2,1};
        int array[2] = {-5,6};
        long long arr,arr1,arr2;
        __x128_t matrix_128,output;
        long A8,A9,A10,A11;

        arr=_itoll(array[0],array[1]); // conversion of the array in long long
        arr1=_itoll(matrix[0][0],matrix[0][1]); // conversion of the first line of the input matrix into long long
        arr2=_itoll(matrix[1][0],matrix[1][1]); // conversion of the second line of the input matrix into long long
        matrix_128=_llto128(arr1,arr2); // conversion of the input matrix  in __x128_t

        output=_cmatmpy(arr,matrix_128);

        A8=_get32_128 (output, 0);
        A9=_get32_128 (output, 1);
        A10=_get32_128 (output, 2);
        A11=_get32_128 (output, 3);

        printf("A11 = %ld, A10 = %ld, A9 = %ld, A8 = %ld",A11,A10,A9,A8);
    }

    In the console I get :
    A11 = -26, A10 = 8, A9 = 29, A8 = -7    instead of 27, 0, -29, 0

    Can someone tell me what's wrong ?

    Thanks,

    Alex

  • Alexandre,

    It is a complex multiply. With a complex being represented as a 32-bit entity (two 16-bit numbers packed)

    In your code I don't see any reference to complex number nor 16-bit numbers packed in a 32-bit one.

    Are you trying to do

    [A B] x [ C D ] = [ G H ]
                [ E  F ]

    with A = -5, B = 6, C = -3, D = 7, E = 2, F = 1 ?
    in that case G = 27 H = -29

    The code doesn't seem wrong per se, it's just that your understanding of what CMATMPY does is confused.

  • Hi Clement,

    Thank you for your answer.

    That's exactly what I want to do, I try the function with some easy input values for understanding it... It doesn't matter if I don't use complex values, right ?

    But the output doesn't seem correct. Instead of 27, 0, -29, 0 (for real and imaginary parts) I get -26, 8, 29, -7.

    Do you know why ?

  • The output is correct. The input isn't.

    The problem is : the intrinsic expect a 32-bit complex with the 16-bit low part being the imaginary the 16-bit high part being the real part.

    When you use a

    int A = 5; in memory you have 00000000 00000000 ; 00000000 00000101 (high part = re ; low part = im)

    if you tell the compiler that it's the input of your CMATMPY intrinsics it understands it as : 0 + 5i

    Do you understand your problem ?

  • Thank you, I think I have understood indeed.

    I have one last question though : what do we have in memory for negative numbers, when I do int A = -5 ?

    I thought we had something like 00000000 00000000 ; 10000000 00000101 but when I do the computation array/matrix myself, I do not get the values that CCS give me.

  • Assuming you are in a debug configuration,

    use the memory browser view in CCS and/or the variable view too. You can then see your 'int' in memory.

    It's a good idea to debug step by step and check in memory how things are stored (when you use itoll for example).

    CM

  • Ok so I saw that we use the 2s compliment to represent a negative number.

    For example, -5 is 11111111 11111111 ; 11111111 11111011 in binary.

    But when I use the intrinsic, the compiler doesn't understand it as 65535 + i*65531 right ?

     

  • Well I don't know

    you have to try something like that :

    short re, im; //16-bit
    int complex; //32-bit

    re = -5;
    im = 0;

    complex = _thegoodintrinsic(re,im);

    and see what happens in memory.

    or you can use a struct too

    pseudo code

    typedef struct complex{

    short re;
    short im; } ;

    complex myComplex;

    myComplex.re = -5;
    myComplex.im = 0;

  • Hi,

    I'll try with your structure today. I saw that you tried to make me understand alone but it's really new for me.

    Anyway thanks a lot for your help, I'll come back if necessary but I don't hope so... !

    Alexandre

  • Hi,

    I have another problem with the function. Below is the code I wrote for multiplying a 1x2 vector by a 2x2 matrix when all elements are of type int :

     

    typedef struct { 

    int re; //32 bit

    int im; // 32 bit  

    } complex;

    complex arr1, arr2, mat1, mat2, mat3, mat4; 

    arr1.re = -5;   arr1.im = 0; arr2.re = 6; arr2.im = 0;

     mat1.re = -3; mat1.im = 0; mat2.re = 7; mat2.im = 0; mat3.re = 2; mat3.im = 0; mat4.re = 1; mat4.im = 0;

     int array_1, array_2, matrix_1, matrix_2, matrix_3, matrix_4;

    array_1 = _spack2(arr1.re,arr1.im); 

    array_2 = _spack2(arr2.re,arr2.im);

    matrix_1 = _spack2(mat1.re,mat1.im);

    matrix_2 = _spack2(mat2.re,mat2.im); 

    matrix_3 = _spack2(mat3.re,mat3.im);

    matrix_4 = _spack2(mat4.re,mat4.im);

    long long array, line1_matrix, line2_matrix;

    array = _itoll(array_1,array_2);  line1_matrix = _itoll(matrix_1,matrix_2);  line2_matrix = _itoll(matrix_3,matrix_4);

    __x128_t matrix, product;

    matrix = _llto128(line1_matrix,line2_matrix);  product = _cmatmpy(array,matrix);

     long A8,A9,A10,A11;

     A8 = _get32_128(product,0);  A9 = _get32_128(product,1);  A10 = _get32_128(product,2);  A11 = _get32_128(product,3);

     printf("A11 = %ld, A10 = %ld, A9 = %ld, A8 = %ld",A11,A10,A9,A8);

     

     

    This code works correctly for matrix of type int. I'd like to use the intrinsic for matrix of float. I tried to do things like that but I didn't succeed because of the types of the arguments.

    Can you help me to adapt this ? I think that there shouldn't be a lot to modify...

     

    Thanks,

    Alex

     

  • Hmm I'm wondering if that's possible.

    float numbers are coded on 32 bits, and all are used...

    With integers it was easier since I just took the 16 LSB of each integer to form the long long.

    I can't deprive myself by considering only some bits of a float number... Otherwise the number will not be the same.

     

    But I don't think TI has forgotten the possibility to treat float numbers matrix...

    So how can I do this ?

  • I'm upping the topic...

    I recall my problem : I'd like to use the complex matrix multiply when my matrixes are composed by float numbers. I succeedeed for int numbers, but for floats, I really don't see the conversions to do for getting a correct input.

    Can someone help me please ?

  • Alexandre NGUYEN said:

    I recall my problem : I'd like to use the complex matrix multiply when my matrixes are composed by float numbers. I succeedeed for int numbers, but for floats, I really don't see the conversions to do for getting a correct input.

    Can someone help me please ?

    The C6678 don't have instructions for float matrix multiply, so you cannot do that directly with instrinsics. You have to code the multiply by yourself, mybe trying to optimize it with some other intrinsic as _complex_conjugate_mpysp. See source code in DSPLIB, function DSPF_sp_mat_mul_cplx in MCSDK.

  • Hi Alberto,

     

    Thank you for your answer !

    That's what I thought... I could have looked for the solution for a long time !

    Problem solved !

    Bye