TDA4VM: C7X Streaming Engine: Extracting interleaved data

Part Number: TDA4VM


I want to extract interleaved data from an incoming buffer using the Streaming Engine.

The data is formatted as below:

[a,b,c,d],[a,b,c,d],...[a,b,c,d],     HEIGHT

I want to extract 16 elements at a time into 3 vectors A, B, C (I don't care about the 4th element).

Here's my descriptor setup:

static inline __SE_TEMPLATE_v1 InterleaveTemplate(const int WIDTH, const int HEIGHT)
    __SE_TEMPLATE_v1 cfg = __gen_SE_TEMPLATE_v1();
    cfg.VECLEN = __SE_VECLEN_16ELEMS;   // Stream 16 elements per advance
    cfg.ELETYPE = __SE_ELETYPE_32BIT;   // Each element is 32 bits (float)
    cfg.ICNT0 = 1;                      // Get one interleaved element (a, b, or c. Skip d.)
    cfg.DIM1 = 4;                       // Go to next interleaved element (a, b, or c. Skip d.)
    cfg.ICNT1 = 16;                     // 16 for each a, b, and c
    cfg.DIM2 = 1;                       // Go to next element (a, b, or c. Skip d.)
    cfg.ICNT2 = 3;                      // Do a, b, c
    cfg.DIM3 = 16 * 4;                  // Advance to the next 16 elements
    cfg.ICNT3 = WIDTH * HEIGHT / 16;    // Repeat for each element, 16 at a time
    cfg.DIMFMT = __SE_DIMFMT_4D;        // Streaming 4 dimensions
    return cfg;

Then the code looks like this:

__SE_TEMPLATE_v1 cfg = InterleavedTemplate(BLOCK_WIDTH, BLOCK_HEIGHT);
__SE0_OPEN(ptr, cfg);

for (int i = 0; i < N_ITERATIONS; i++)
    float16 a = __SE0ADV(float16);
    float16 b = __SE0ADV(float16);
    float16 c = __SE0ADV(float16);
    /* rest of code */


However, after testing, it looks like ICNT0 must be 16 in order to get 16 elements in the destination vector with one __SE0_ADV(). Is that correct? If that's the case, what would be the better alternative to achieve my goal?