AM62A7: Config grouping duplication in DSP streaming engine

Iman Fakhari

Part Number: AM62A7

Hi,

I have two arrays of input data each float32 and 256 bits DSP units I'm trying to vectorize my calculations to do vectors of 8 (32*8).

My first array,

A = [1,2,3,4,5,6,7,8,...]

has a size of 4N, and the second one,

B=[10,20,30,40,...]

has a size of N. I want to configure my DSP streaming engines to multiply each B element into four A elements. So if C is the result it would be like

C=[1*10+2*10+3*10+4*10,5*20+6*20+7*20+8*20, ...].

I know how to do this operation with float4 however, I have 256 bits available and I was wondering if there might be some grouping configuration that can do this operation for me.

Thanks

over 1 year ago

0 Qutaiba Saleh over 1 year ago

TI__Expert 4640 points

Hello Iman,

It is our pleasure to help.

Yes, you can use streaming engine grouping duplication. This is an initial code to set the streaming engine for your use case:

//Initialize template to default value
__SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();

//Specify element type
seTemplate.ELETYPE = se_eletype<float_vec>::value;

//Turn on Group Duplication
seTemplate.GRPDUP = __SE_GRPDUP_ON;

//Specify group size using VECLEN field.
seTemplate.VECLEN = __SE_VECLEN_2ELEMS; // This is specific to your use case: duplicated two elements of the B array to fill an entire vector.

//Specify the type of transfer
seTemplate.DIMFMT = __SE_DIMFMT_1D;

seTemplate.ICNT0 = ARRAY_LENGTH;
.
.
// SE open
.
.
//Fetch input data
Loop for 4xSize of B = 4N

Please, let me know if this answers your question.

Best regards,

Qutaiba

0 Iman Fakhari over 1 year ago in reply to Qutaiba Saleh

Prodigy 10 points

Thanks for your quick reply.

The config you mentioned will turn B from [10,20,30,..] into [10,20,10,20,10,20,10,20] however I need it to be [10,10,10,10,20,20,20,20]. In fact I need two elements to be duplicated 4 times instead of the group of 2 duplicated 4 times.

Do you know how I can address this?

Best,

0 Qutaiba Saleh over 1 year ago in reply to Iman Fakhari

TI__Expert 4640 points

Hi Iman,

Sorry for the confusion. Please use the Element duplication property instead. See this example:

//Initialize template to default value

__SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();

//Specify element type

seTemplate.ELETYPE = se_eletype<float_vec>::value;

seTemplate.VECLEN = se_veclen<float_vec>::value;

//Duplicate one 32-bit element 4 times to fill 256bit vector

seTemplate.ELEDUP = __SE_ELEDUP_4X

//Specify the type of transfer

seTemplate.DIMFMT = __SE_DIMFMT_1D;

seTemplate.ICNT0 = ARRAY_LENGTH;

This should work for your use case.

Best regards,

Qutaiba

Processors

Processors forum

AM62A7: Config grouping duplication in DSP streaming engine