This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7: Config grouping duplication in DSP streaming engine

Part Number: AM62A7

Hi,

I have two arrays of input data each float32 and 256 bits DSP units I'm trying to vectorize my calculations to do vectors of 8 (32*8).

My first array,

A = [1,2,3,4,5,6,7,8,...]

has a size of 4N, and the second one,

B=[10,20,30,40,...]

has a size of N. I want to configure my DSP streaming engines to multiply each B element into four A elements. So if C is the result it would be like

C=[1*10+2*10+3*10+4*10,5*20+6*20+7*20+8*20, ...].

I know how to do this operation with float4 however, I have 256 bits available and I was wondering if there might be some grouping configuration that can do this operation for me.

Thanks

  • Hello Iman,

     

    It is our pleasure to help. 

    Yes, you can use streaming engine grouping duplication. This is an initial code to set the streaming engine for your use case:

     

    //Initialize template to default value
    __SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();

    //Specify element type
    seTemplate.ELETYPE = se_eletype<float_vec>::value;

    //Turn on Group Duplication
    seTemplate.GRPDUP = __SE_GRPDUP_ON;

    //Specify group size using VECLEN field.
    seTemplate.VECLEN = __SE_VECLEN_2ELEMS; // This is specific to your use case: duplicated two elements of the B array to fill an entire vector.

    //Specify the type of transfer
    seTemplate.DIMFMT = __SE_DIMFMT_1D;

    seTemplate.ICNT0 = ARRAY_LENGTH;
    .
    .
    // SE open
    .
    .
    //Fetch input data
    Loop for 4xSize of B = 4N

    Please, let me know if this answers your question.

     

    Best regards,

     

    Qutaiba

  • Thanks for your quick reply.

    The config you mentioned will turn B from [10,20,30,..] into [10,20,10,20,10,20,10,20] however I need it to be [10,10,10,10,20,20,20,20]. In fact I need two elements to be duplicated 4 times instead of the group of 2 duplicated 4 times.

    Do you know how I can address this?

    Best,

  • Hi Iman,

     

    Sorry for the confusion. Please use the Element duplication property instead. See this example:

     

    //Initialize template to default value

    __SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();

    //Specify element type

    seTemplate.ELETYPE   = se_eletype<float_vec>::value;

    seTemplate.VECLEN    = se_veclen<float_vec>::value;

    //Duplicate one 32-bit element 4 times to fill 256bit vector

    seTemplate.ELEDUP    = __SE_ELEDUP_4X

    //Specify the type of transfer

    seTemplate.DIMFMT = __SE_DIMFMT_1D;

    seTemplate.ICNT0 = ARRAY_LENGTH;

    .

    .

     

    This should work for your use case.

     

    Best regards,

     

    Qutaiba