Compiler/TMS320C6678: Where can I find C/C++ SIMD intrinsic instructions (SSE)?

Mike ITGeek

Part Number: TMS320C6678

Tool/software: TI C/C++ Compiler

Hi All,

I'd like to improve some vector-matrix operations on a TMDSEVM6678LE via intrinsic instructions (SSE). I have done it via Intel-best instructions for GCC. as you can see below; however, our main purpose is to apply SSE for a TMDSEVM6678LE.

Please let me know if you are aware of a document/tutorial.

I wasn't able to find any. I only could find the two below:

http://www.ti.com/lit/an/spraa14/spraa14.pdf

http://www.ti.com/lit/an/spra666/spra666.pdf?ts=1591768687912

Regards

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

#include <pmmintrin.h>

#ifndef __SSE3__
#error This example requires SSE3
#endif

#include "util.h"

/* Size of the matrices to multiply */
#define SIZE 256
#define ITERS 100

#define XMM_ALIGNMENT_BYTES 16

static float mat_a[SIZE][SIZE] __attribute__((aligned (XMM_ALIGNMENT_BYTES)));
static float mat_b[SIZE][SIZE] __attribute__((aligned (XMM_ALIGNMENT_BYTES)));
static float mat_c[SIZE][SIZE] __attribute__((aligned (XMM_ALIGNMENT_BYTES)));
static float mat_ref[SIZE][SIZE] __attribute__((aligned (XMM_ALIGNMENT_BYTES)));


/**
 * Matrix multiplication. This is the procedure you should try to
 * optimize.
 */
static void
matmul_sse()
{
    for (i = 0; i < SIZE; i++) {
        for (j = 0; j < SIZE; j++) {
            float c_sum_float[4] = {0, 0, 0, 0};
            __m128 c_sum = _mm_set_ps1(0.f);
            for (k = 0; k < SIZE; k+=4) {
                // load
                const __m128 a = _mm_load_ps(&(mat_a[i*SIZE+k]));
                const __m128 b = _mm_load_ps(&(mat_b[k*SIZE+j]));
                // mul
                const __m128 c = _mm_mul_ps(a, b);
                // sum & add
                c_sum = a + b;
            }
            _mm_storeu_ps(&c_sum_float[0], c_sum);
            mat_c[i][j] = c_sum_float[0] + c_sum_float[1] + c_sum_float[2] + c_sum_float[3];
        }
    }

}

Ple

over 5 years ago

0 Victor Kazmirenko over 5 years ago

Guru 13202 points

Hello!

The most comprehensive reference is at SPRUGH7 TMS320C66x DSP CPU and Instruction Set http://www.ti.com/lit/ug/sprugh7/sprugh7.pdf.

There is no description of SSE instructions references to C66 intrinsics, but the document itself is complete reference of instruction set.

I find myself easier first search through CCS help to find possible candidates, the read their thorough in that manual.

0 Mike ITGeek over 5 years ago in reply to Victor Kazmirenko

Expert 1320 points

Thanks rrlagic,

I was able to find them here too.

0 George Mock over 5 years ago in reply to Mike ITGeek

TI__Guru**** 249420 points

Another resource to consider is DSPLIB. You can either call the functions in the library, or explore the source code to those functions to understand the best way to implement them.

Thanks and regards,

-George

Processors

Processors forum

Compiler/TMS320C6678: Where can I find C/C++ SIMD intrinsic instructions (SSE)?