TDA2SX: OpenCL on C66

Marco Lopes

Part Number: TDA2SX
Other Parts Discussed in Thread: TDA2

Tool/software:

Hello,

I would like to know where to find documentation on the OpenCL for the C66, which seems not really useful exists, and have the following questions:

Is using OpenCL recommended over using intrinsics directly?
Should using OpenCL lead to the same or better performance using intrinsics, or can it worsen performance?
Where can I find some examples of how to use it for the C66?
How to properly load data to floatn data types, attending to memory alignment?

I have tried to rewrite some code, but the outputs seem to differ in some cases, which I suspect has to due to memory alignment.

With the following test code:

void test_func(float * restrict in_ptr)
{
    __float2_t vec_a, vec_b;
    __float2_t vec_c = _ftof2(0.0f,0.0f);
    float test_in_1[] = {0.678513,0.75461321};
    float test_in_2[] ={847.3125,684351.13};
    float2 vec_af2, vec_bf2, vec_c2f;

printf("\n\nTEST LOCAL ARRAY\n");
    vec_a = _mem8_f2(&test_in_1[0]);
    vec_b = _mem8_f2(&test_in_2[0]);
    vec_c = _dmpysp(vec_a,vec_b);
    vec_c = _daddsp(_dmpysp(vec_a,vec_b),vec_c);

    printf("Pointer Input: %f %f %f %f\n\n",test_in_1[0],test_in_1[1],test_in_2[0],test_in_2[1]);

    printf("Intrisics input: %f %f\n",_hif2(vec_a),_lof2(vec_a));
    printf("Intrinsics output: %f\n",_hif2(vec_c)+_lof2(vec_c));

    vec_af2 = *(float2*)(&test_in_1[0]);
    vec_bf2 = *(float2*)(&test_in_2[0]);
    vec_c2f = vec_af2 * vec_bf2;
    vec_c2f += vec_af2 * vec_bf2;

    printf("OpenCL input: %f %f\n",vec_af2.hi, vec_af2.lo);
    printf("OpenCL output: %f\n",vec_c2f.hi + vec_c2f.lo);


    printf("\n\nTEST INPUT POINTER\n");
    vec_a = _mem8_f2(&in_ptr[0]);
    vec_b = _mem8_f2(&in_ptr[2]);
    vec_c = _dmpysp(vec_a,vec_b);
    vec_c = _daddsp(_dmpysp(vec_a,vec_b),vec_c);

    printf("Pointer Input: %f %f %f %f\n\n",in_ptr[0],in_ptr[1],in_ptr[2],in_ptr[3]);
    printf("Intrisics input: %f %f %f %f\n",_hif2(vec_a),_lof2(vec_a),_hif2(vec_b),_lof2(vec_b));
    printf("Intrinsics output: %f\n",_hif2(vec_c)+_lof2(vec_c));

    vec_af2 = *(float2*)(&in_ptr[0]);
    vec_bf2 = *(float2*)(&in_ptr[2]);
    vec_c2f = vec_af2 * vec_bf2;
    vec_c2f += vec_af2 * vec_bf2;

    printf("OpenCL input: %f %f %f %f\n",vec_af2.hi, vec_af2.lo,vec_bf2.hi, vec_bf2.lo);
    printf("OpenCL output: %f\n",vec_c2f.hi + vec_c2f.lo);

    }

Where in_ptr is pointing to some location within a const float in_data[] = {...}

I get the following results:
[HOST] [DSP1 ]     97.967547 s: TEST LOCAL ARRAY
[HOST] [DSP1 ]     97.967577 s: Pointer Input: 0.678513 0.754613 847.312500 684351.125000
[HOST] [DSP1 ]     97.967608 s:
[HOST] [DSP1 ]     97.967638 s: Intrisics input: 0.754613 0.678513
[HOST] [DSP1 ]     97.967669 s: Intrinsics output: 1033990.625000
[HOST] [DSP1 ]     97.967699 s: OpenCL input: 0.754613 0.678513
[HOST] [DSP1 ]     97.967699 s: OpenCL output: 1033990.625000
[HOST] [DSP1 ]     97.967730 s:
[HOST] [DSP1 ]     97.967730 s:
[HOST] [DSP1 ]     97.967760 s: TEST INPUT POINTER
[HOST] [DSP1 ]     97.967791 s: Pointer Input: -0.000000 0.662687 0.932101 -0.000000
[HOST] [DSP1 ]     97.967821 s:
[HOST] [DSP1 ]     97.967821 s: Intrisics input: 0.662687 -0.000000 -0.000000 0.932101
[HOST] [DSP1 ]     97.967852 s: Intrinsics output: -0.000000
[HOST] [DSP1 ]     97.967913 s: OpenCL input: -0.000000 0.294224 0.932101 0.662687
[HOST] [DSP1 ]     97.967913 s: OpenCL output: 0.389957

It can be seen that with local arrays the data is loaded correctly, but in TEST INPUT POINTER the load with OpenCL it is actually shifted by 1 float, e.g. 0.294224 is the float before &in_ptr[0], this leads to a wrong output.

over 1 year ago

0 Jared McArthur over 1 year ago

TI__Mastermind 25230 points

Hi Marco,

Our expert is currently out of office. Please expect a delay until next week in response.

Best,
Jared

0 Praveen Rao over 1 year ago in reply to Jared McArthur

TI__Mastermind 49983 points

Hello Marco,

Can you provide more details on the TDA2SX project for which you are asking this question? Is it an existing or old project?

Note that the TDA2 is a previous-generation device, and the TI SDK was last released in 2019—Link. The accompanying SDK source and documentation are your best bet for answering your query. However, the SDK is for "as-is" use, and there will be limited support.

Thanks.

0 Marco Lopes over 1 year ago in reply to Praveen Rao

Prodigy 40 points

Hello Praveen,

I am not sure I can provide more details. The project has existed for a while.

As far as I can tell TDA2SX devices still have an active status, not obsolete.

Is the support in regards to the usage of OpenCL any different, for newer generations?

+1 Praveen Rao over 1 year ago in reply to Marco Lopes

TI__Mastermind 49983 points

Hi,

The device is still active to buy, howerever the SDK release has been stopped in 2019. So there is not new development or new release planned.

So for any software support on this device, we suggest you to look into the existing SDK and its documentation.

Alternavely, if it is a new project or new revision of existing product, you can look into our next generation device that is being actively being supported software wise. See: https://www.ti.com/microcontrollers-mcus-processors/arm-based-processors/overview.html and look into their SDK documentation for details. Note that OpenCL is not planned to be supported on our latest devices.

Thanks.

0 Marco Lopes over 1 year ago in reply to Praveen Rao

Prodigy 40 points

Alright, thanks for the clarification.

Processors

Processors forum

TDA2SX: OpenCL on C66