[FAQ] TDA4AH-Q1: C7x Scalable Vector Programming

Betsy Varughese

Expert 4390 points

Part Number: TDA4AH-Q1

Tool/software:

Hi Team,

Can you explain scalable vector programming with an example?

Regards,

Betsy Varughese

9 hours ago

+1 Betsy Varughese 9 hours ago

TI__Expert 4390 points

What is the purpose of using scalable vector programming:

For C7x, there are different C7000 variants, and as a result, there are different vector lengths on C7000 variants. On some variants (eg: C7120 & C7100), a vector can be support up to 512 bits and on other variants(eg: C7504 & C7524), a vector can be up to 256 bits. Therefore, it can be very helpful if there was a way to write vector code in a vector-length agnostic way. i.e, it would be useful if a programmer could write the c++ code for a particular algorithm once, and it would automatically compile and run on each C7000 variant without changes to the C++ code, using the maximum vector size that is possible on that C7000 variant. To support this, there is a feature of the C7000 C++ compiler and C7000 host emulation called the Scalable Vector Programming Model.

The scalable vector programming model consists of scalable vector types and associated C++ type traits.

Scalable Vector Types :

The scalable vector types, along with associated C++ traits. allow the programmer to write their code in such a way as to ensure the code will compile and run seamlessly on all C7x variants. This can be used only in C++ code, they cannot be used in C code. When a scalable vector type is used, the size of the type will depend on the C7x variant being compiled for (i.e, for example , the c7x::float_vec type will be 16 elements or 512 bits in length on C7100 and C7120, but only 8 float elements on C7504 and C7524 variants.)

To add scalability to our streaming engine applied loop, follow these steps:

Include c7x_scalable.h in the source code, i.e. #include <c7x_scalable.h>

Note: These utilities are available for use in C++ code only due to use of C++ language features in their implementation.

2. Based on the implementation logic, use available APIs from c7x_scalable.h

Sample code using C7x scalable vector programming concepts :

Below is an example of a function accepting two integer vectors, adding them element-wise, and returning an integer vector. C7000 scalable vector types can be accessed by including the c7x_scalable.h file in your source file.

#include <c7x.h>
#include <c7x_scalable.h>

c7x::int_vec add_two_int_vectors(c7x::int_vec a, c7x::int_vec b)
{
    return a + b;
}

here, c7x::int_vec will be 16 elements on 7100 and 7120 variants, and 8 elements on 7504 and 7524 variants.

Sample codes:

1. Without using streaming engine

   #include <c7x.h>

#define ARRAY_SIZE (64)
#define SIMD_WIDTH (64)

void vadd_exec_c7x(int8_t *pInA, int8_t *pInB, int8_t *pOutC)
{   
    for(int32_t ctr = 0; ctr < ARRAY_SIZE; ctr += SIMD_WIDTH) {
 
        // Read a vector of 64-8b elements from input Array A
        uchar64 vInA = *stov_ptr(uchar64, (int8_t *)(pInA + ctr));
        // Reinterprets it as ushort_vec 
        ushort32 InA = __as_ushort32(vInA);
        
        // Read a vector of 64-8b elements from input Array B
        uchar64 vInB = *stov_ptr(uchar64, (int8_t *)(pInB + ctr));
        // Reinterprets it as ushort_vec 
        ushort32 InB = __as_ushort32(vInB);

        // Added 32-16b elements in parallel
        ushort32 vOutC = (InA + InB);
        // Converts back to uchar_vec
        uchar64  OutC = __as_uchar64(vOutC);

        //Store 64-8b elements to output array C
        *stov_ptr(uchar64, (int8_t *)(pOutC + ctr)) = OutC;
    }

}

   
   With scalable vector programming concept:
   #include <c7x_scalable.h>

using namespace  c7x;

#define ARRAY_SIZE (64)

void vadd_exec_c7x_scalable(int8_t *pInA, int8_t *pInB, int8_t *pOutC)
{  
    typedef typename c7x::make_full_vector<c7x::uchar_vec>::type  vec1;
    typedef typename c7x::make_full_vector<c7x::ushort_vec>::type vec2;
    int32_t eleCount = c7x::element_count_of<vec1>::value;
    for(int32_t ctr = 0; ctr < ARRAY_SIZE; ctr += eleCount){
        // Read a vector of 64-8b elements from input Array A
        vec1 vInA = *stov_ptr(vec1, (int8_t *)(pInA + ctr));
        // Reinterprets it as ushort_vec 
        vec2 InA = __as_ushort32(vInA);
        
        // Read a vector of 64-8b elements from input Array B
        vec1 vInB = *stov_ptr(vec1, (int8_t *)(pInB + ctr));
        // Reinterprets it as ushort_vec 
        vec2 InB = __as_ushort32(vInB);

        // Added 32-16b elements in parallel
        vec2 vOutC = (InA + InB);
        // Converts back to uchar_vec
        vec1   OutC = __as_uchar64(vOutC);
    
        //Store 64-8b elements to output array C
        *stov_ptr(vec1, (int8_t *)(pOutC + ctr)) = OutC;
}

}


2. Using Streaming Engine

   #include <c7x.h>

#define ARRAY_SIZE (64)
#define SIMD_WIDTH (64)

void vadd_exec_c7x (int8_t *pInA, int8_t *pInB, int8_t *pOutC)
{   
    // SE- SA config params
    __SE_TEMPLATE_v1 se0Params;
    __SA_TEMPLATE_v1 sa0Params;

    __SE_ELETYPE SE_ELETYPE;
    __SE_VECLEN  SE_VECLEN;
    __SA_VECLEN  SA_VECLEN;

    SE_VECLEN  = __SE_VECLEN_64ELEMS;
    SA_VECLEN  = __SA_VECLEN_64ELEMS;
    SE_ELETYPE = __SE_ELETYPE_8BIT;

    /**********************************************************************/
    /* Prepare streaming engine 1 to fetch the input                      */
    /**********************************************************************/
    se0Params = __gen_SE_TEMPLATE_v1();
    se0Params.ICNT0   = ARRAY_SIZE;
    se0Params.ELETYPE = SE_ELETYPE;
    se0Params.VECLEN  = SE_VECLEN;
    se0Params.DIMFMT  = __SE_DIMFMT_1D;

    /**********************************************************************/
    /* Prepare SA template to store output                                */
    /**********************************************************************/
    sa0Params = __gen_SA_TEMPLATE_v1();
    sa0Params.ICNT0  = ARRAY_SIZE;
    sa0Params.DIM1   = ARRAY_SIZE;
    sa0Params.VECLEN = SA_VECLEN;
    sa0Params.DIMFMT = __SA_DIMFMT_1D;

   __SE0_OPEN((int8_t *)pInA, se0Params);  
   __SE1_OPEN((int8_t *)pInB, se0Params); 
   __SA0_OPEN(sa0Params);    
    
   for(int32_t ctr = 0; ctr < ARRAY_SIZE; ctr += SIMD_WIDTH){
    
    // Read a vector of 64-8b elements from input Array A
    uchar64 vInA =__SE0ADV(uchar64);
    // Reinterprets it as ushort_vec
    ushort32 InA = __as_ushort32(vInA);
    
    // Read a vector of 64-8b elements from input Array B
    uchar64 vInB =__SE1ADV(uchar64);
    // Reinterprets it as ushort_vec
    ushort32 InB = __as_ushort32(vInB);

    // Added 32-16b elements in parallel
    ushort32 vOutC = (InA + InB);
    // Converts back to uchar_vec
    uchar64   OutC = __as_uchar64(vOutC);
    
    //Store 64-8b elements to output array C
   *__SA0(uchar64, pOutC)= OutC;
    }
    
    __SE0_CLOSE();
    __SE1_CLOSE();
    __SA0_CLOSE();

}

   
   With scalable vector programming concepts:
   
   #include <c7x_scalable.h>

using namespace  c7x;

#define ARRAY_SIZE (64)
#define SIMD_WIDTH (64)

void vadd_exec_c7x_scalable (int8_t *pInA, int8_t *pInB, int8_t *pOutC)
{   
    typedef typename c7x::make_full_vector<c7x::uchar_vec>::type  vec1;
    typedef typename c7x::make_full_vector<c7x::ushort_vec>::type vec2;
    int32_t eleCount = c7x::element_count_of<vec1>::value;

    // SE- SA config params
    __SE_TEMPLATE_v1 se0Params;
    __SA_TEMPLATE_v1 sa0Params;

    __SE_ELETYPE SE_ELETYPE;
    __SE_VECLEN  SE_VECLEN;
    __SA_VECLEN  SA_VECLEN;

    SE_VECLEN  = c7x::se_veclen<vec1>::value;;
    SA_VECLEN  = c7x::sa_veclen<vec1>::value;
    SE_ELETYPE = c7x::se_eletype<vec1>::value;

    /**********************************************************************/
    /* Prepare streaming engine 1 to fetch the input                      */
    /**********************************************************************/
    se0Params = __gen_SE_TEMPLATE_v1();
    se0Params.ICNT0   = ARRAY_SIZE;
    se0Params.ELETYPE = SE_ELETYPE;
    se0Params.VECLEN  = SE_VECLEN;
    se0Params.DIMFMT  = __SE_DIMFMT_1D;

    /**********************************************************************/
    /* Prepare SA template to store output                                */
    /**********************************************************************/
    sa0Params = __gen_SA_TEMPLATE_v1();
    sa0Params.ICNT0  = ARRAY_SIZE;
    sa0Params.DIM1   = ARRAY_SIZE;
    sa0Params.VECLEN = SA_VECLEN;
    sa0Params.DIMFMT = __SA_DIMFMT_1D;

   __SE0_OPEN((int8_t *)pInA, se0Params);  
   __SE1_OPEN((int8_t *)pInB, se0Params); 
   __SA0_OPEN(sa0Params); 
  

    for(int32_t ctr = 0; ctr < ARRAY_SIZE; ctr += eleCount){
        // Read a vector of 64-8b elements from input Array A
        vec1 vInA = strm_eng<0, vec1>::get_adv();
        // Reinterprets it as ushort_vec
        vec2 InA = __as_ushort32(vInA);
 
        // Read a vector of 64-8b elements from input Array B
        vec1 vInB =strm_eng<1, vec1>::get_adv();
        // Reinterprets it as ushort_vec 
        vec2 InB = __as_ushort32(vInB);

        // Added 32-16b elements in parallel
        vec2 vOutC = (InA + InB);
        // Converts back to uchar_vec
        vec1  OutC  = __as_uchar64(vOutC);

        //Store 64-8b elements to output array C
        __vpred tmp = c7x::strm_agen<0, vec1>::get_vpred();
         vec1 *VB1 = c7x::strm_agen<0, vec1>::get_adv(pOutC);
         __vstore_pred(tmp, VB1, OutC);

    }

    __SE0_CLOSE();
    __SE1_CLOSE();
    __SA0_CLOSE();
}


Regards,
Betsy Varughese

Processors

Processors forum

[FAQ] TDA4AH-Q1: C7x Scalable Vector Programming