This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TDA2PXEVM: [EVE_SW] VCOP calculation result is not reflected

Part Number: TDA2PXEVM

Tool/software: TI C/C++ Compiler

Hello TI-san,

I am trying to run the following VCOP kernel code:

---

void vcop_kernel
(
__vptr_uint8 in_ptr,
__vptr_uint32 optr0_Y_ptr,
__vptr_uint32 optr1_Y_ptr,
__vptr_uint16 optr2_Y_ptr,
__vptr_uint16 wptr_C,
unsigned short in1_num,
unsigned short in0_num_vld,
unsigned short in1_num_vld
)
{
__agen Addrc;
__vector Vidxi10, Vc16, Vc8;

Addrc = 0;

Vc16 = 16;
Vc8 = 8;
Vidxi10 = wptr_C[Addrc];

for (int I0 = 0; I0 < 8; I0++)
{
__agen Addr2;
__vector Vidxoff, Vnum0, Vnum1, Vivld, Vidx0;

Addr2 = I0 * sizeof(*optr2_Y_ptr);

Vidxoff = -8;
Vnum0 = in0_num_vld;
Vnum1 = in1_num_vld;
Vivld = 0xFFFFFFFF;

Vidx0 = optr2_Y_ptr[Addr2].onept();

for (int I1 = 0; I1 < in1_num/VCOP_SIMD_WIDTH; I1++)
{
__agen Addri, Addr0, Addr1;
__vector Vin0, Vcur0, Vmin1st0, Vmin2nd0, Vidx10, Vflag0, Vflag10;

Addri = (I0 * sizeof(*in_ptr) * 512) + (I1 * sizeof(*in_ptr) * VCOP_SIMD_WIDTH);

Addr0 = (I1 * sizeof(*optr0_Y_ptr) * VCOP_SIMD_WIDTH);
Addr1 = (I1 * sizeof(*optr1_Y_ptr) * VCOP_SIMD_WIDTH);

Vin0 = in_ptr[Addri];

Vmin1st0 = optr0_Y_ptr[Addr0];
Vmin2nd0 = optr1_Y_ptr[Addr1];

Vidxoff += Vc8;

Vcur0 = Vidx0;
Vcur0 |= Vin0 << Vc16; // (Vin0 << 16) | Vidx0

Vidx10 = Vidxoff + Vidxi10;

Vflag0 = (Vidx0 >= Vnum0);
Vflag10 = (Vidx10 >= Vnum1);
Vflag10 = Vflag10 | Vflag0;

Vcur0 = select(Vflag10, Vivld, Vcur0);

(Vmin1st0, Vcur0).minmax();
(Vmin2nd0, Vcur0).minmax();

optr0_Y_ptr[Addr0] = Vmin1st0;
optr1_Y_ptr[Addr1] = Vmin2nd0;
}

}

for (int I0 = 0; I0 < 1; I0++)
{
__agen Addr2;
__vector Vidx0, Vc;

Addr2 = I0;

Vidx0 = optr2_Y_ptr[Addr2];
Vc = 8;

Vidx0 += Vc;

optr2_Y_ptr[Addr2] = Vidx0;
}
}

---

However, this kernel code often produces incorrect results for the outputs "optr0_Y_ptr" and "optr1_Y_ptr" when "in1_num" is small (ex.16,32).

Needless to say, the _vcop_vloop_done () function is called after the _vloops () function for this kernel code.

Do you know why this kind of failure happens?

Also, would you please let me know if there is anything wrong with this kernel code?

The version of the EVE compiler in use is arp32_1.0.7.

Best regards,

Yudai ISHIBASHI