Compiler/TMS320C6678: how to load and store four 32-bit float from and into memory using SIMD intrinsics ?

Feng P.

Part Number: TMS320C6678

Tool/software: TI C/C++ Compiler

Hi Everyone,

I am trying to load from and store into memory using SIMD intrinsics ... I have already done this using Intel intrinsics. I need to sth like below:

__m128 _mm_load_ps (float const* mem_addr)

void _mm_store_ps (float* mem_addr, __m128 a)

I also checked "TMS320C6000 Optimizing Compiler v8.3.x User's Guide" section "8.6.7 The __x128_t Container Type" ...

but I can't wrap my head around it ... I guess I have an incorrect perception of C6000 intrinsics compared to that of Intel.

Any comment is much appreciate it.

Regards

over 5 years ago

0 George Mock over 5 years ago

TI__Guru**** 249340 points

The compiler manual has this example near the description of the __x128_t intrinsics.

#include <c6x.h>
#include <stdio.h>

__x128_t mpy_four_way_example(__x128_t s, int a, int b, int c, int d)
{
    __x128_t t = _ito128(a, b, c, d); // Pack values into a __x128_t
    __x128_t results = _qmpy32(s, t); // Perform a four-way SIMD multiply

    int lowest32 = _get32_128(results, 0); // Extract lowest reg of __x128_t
    int highest32 = _get32_128(results, 3); // Extract highest reg of __x128_t

    printf("lowest = %d\n", lowest32);
    printf("highest = %d\n", highest32);

    return results;
}

Notice how the _ito128 and _get32_128 intrinsics are used. You need to do something similar, except with the float variants of those intrinsics: _fto128 and _get32f_128.

Thanks and regards,

-George

0 Feng P. over 5 years ago in reply to George Mock

Intellectual 570 points

Thanks for the quick response. really appreciate it.

I come from a Intel SIMD background, I taught there should be one with float* argument too.

How about greater than (equal) and less than (equal) ?? Maybe I should do via subtraction ....??? What are the arithmetic operations in that case ?

__m128 _mm_cmpgt_ps (__m128 a, __m128 b)

__m128 _mm_cmplt_ps (__m128 a, __m128 b)

Sorry for stupid questions, I just have experience with Intel intrinsics and don't have good understanding TI's yet.

I wish this part of the manual had been further expanded ... sth like this. I will put one together as soon as I understand what is going on ... ;)

0 George Mock over 5 years ago in reply to Feng P.

TI__Guru**** 249340 points

Please search the C6000 compiler manual for the sub-chapter titled The __x128_t Container Type. It gives a list of what operations are, or are not, supported for __x128_t type variables.

Feng P. said:
How about greater than (equal) and less than (equal) ?? Maybe I should do via subtraction ....??? What are the arithmetic operations in that case ?

No built-in operations, like comparison or subtraction, are supported. Though you can assign them, pass them to functions, and return them from functions.

The general pattern is to:

Use creation intrinsics to form some __x128_t variables
Operate on those variables with intrinsics which are specifically documented to accept them as operands, or return them as results
Use extraction intrinsics to copy parts of a __x128_t variable to other variables of a built-in type like int or float.

Thanks and regards,

-George

0 Feng P. over 5 years ago in reply to George Mock

Intellectual 570 points

George Mock said:

No built-in operations, like comparison or subtraction, are supported. Though you can assign them, pass them to functions, and return them from functions.

So noway to compare, sum, sub, div __x128 variables. We only can multiply ...???

Thanks,

0 George Mock over 5 years ago in reply to Feng P.

TI__Guru**** 249340 points

Feng P. said:
So no way to compare, sum, sub, div __x128 variables.

Correct.

For those operations, you are limited to what intrinsics are available. The intrinsics, in turn, are based on the instructions available on the C66x CPU. Since there is no instruction which compares 128-bit wide values, there is no corresponding intrinsic.

The purpose of the __x128_t type is to provide a way to get data in and out of the SIMD (single instruction multiple data) instructions on the C66x CPU. It is not intended to be yet another built-in type like int or long.

Thanks and regards,

-George

0 Feng P. over 5 years ago in reply to George Mock

Intellectual 570 points

I am just curious what it is meant for. looks as if it is meant for complex multiplication.

Optimizing compiler v8.3.x "Table 8-7. TMS320C6600 C/C++ Compiler Intrinsics"

And also dot product ...

"Table 8-7. TMS320C6600 C/C++ Compiler Intrinsics (continued)"

The only addition I can see is for __float2_t under table "Table 8-7. TMS320C6600 C/C++ Compiler Intrinsics"

And one subtraction ...

It seams it is mainly meant for fast load and store of the register values ...

0 George Mock over 5 years ago in reply to Feng P.

TI__Guru**** 249340 points

Feng P. said:
looks as if it is meant for complex multiplication.

That is partially correct. The type __x128_t is meant for use with the SIMD instructions that operate on 128-bits of data in a single instruction. It is not meant for anything else.

Thanks and regards,

-George

Processors

Processors forum

Compiler/TMS320C6678: how to load and store four 32-bit float from and into memory using SIMD intrinsics ?