Hi,
I have two arrays of input data each float32 and 256 bits DSP units I'm trying to vectorize my calculations to do vectors of 8 (32*8).
My first array,
A = [1,2,3,4,5,6,7,8,...]
has a size of 4N, and the second one,
B=[10,20,30,40,...]
has a size of N. I want to configure my DSP streaming engines to multiply each B element into four A elements. So if C is the result it would be like
C=[1*10+2*10+3*10+4*10,5*20+6*20+7*20+8*20, ...].
I know how to do this operation with float4 however, I have 256 bits available and I was wondering if there might be some grouping configuration that can do this operation for me.
Thanks