Tool/software:
Hi TI,
I programmed the SE to read scattered cfloat data. I mean that each get_adv() goes "far" in memory in the inner loop of that SE (I know it's probably bad regarding cache)
I am reading this way 8 times to build a cfloat8. So it's something like :
cfloat d0 = ..get_adv();
...
cfloat d7 = ..get_adv();
cfloat8 v = cfloat8(d0,...,d7);
Is there an intrinsics to do this in one instruction, something like :
cfloat8 v = ...get_adv(); // get'ing & adv'ing 8 times
All d0/7/v are local variables of my kernel loop, and I see in assembly they are all in registers so there's no memory write to the stack there (good).
[ I am new to all of these ]
Thanks.
