Hi,
I want to extract interleaved data from an incoming buffer using the Streaming Engine.
The data is formatted as below:
WIDTH [a,b,c,d],[a,b,c,d],...[a,b,c,d], [a,b,c,d],[a,b,c,d],...[a,b,c,d], HEIGHT ... [a,b,c,d],[a,b,c,d],...[a,b,c,d]
I want to extract 16 elements at a time into 3 vectors A, B, C (I don't care about the 4th element).
Here's my descriptor setup:
static inline __SE_TEMPLATE_v1 InterleaveTemplate(const int WIDTH, const int HEIGHT) { __SE_TEMPLATE_v1 cfg = __gen_SE_TEMPLATE_v1(); cfg.VECLEN = __SE_VECLEN_16ELEMS; // Stream 16 elements per advance cfg.ELETYPE = __SE_ELETYPE_32BIT; // Each element is 32 bits (float) cfg.ICNT0 = 1; // Get one interleaved element (a, b, or c. Skip d.) cfg.DIM1 = 4; // Go to next interleaved element (a, b, or c. Skip d.) cfg.ICNT1 = 16; // 16 for each a, b, and c cfg.DIM2 = 1; // Go to next element (a, b, or c. Skip d.) cfg.ICNT2 = 3; // Do a, b, c cfg.DIM3 = 16 * 4; // Advance to the next 16 elements cfg.ICNT3 = WIDTH * HEIGHT / 16; // Repeat for each element, 16 at a time cfg.DIMFMT = __SE_DIMFMT_4D; // Streaming 4 dimensions return cfg; }
Then the code looks like this:
__SE_TEMPLATE_v1 cfg = InterleavedTemplate(BLOCK_WIDTH, BLOCK_HEIGHT); __SE0_OPEN(ptr, cfg); const int N_ITERATIONS = BLOCK_WIDTH * BLOCK_HEIGHT / 16; for (int i = 0; i < N_ITERATIONS; i++) { float16 a = __SE0ADV(float16); float16 b = __SE0ADV(float16); float16 c = __SE0ADV(float16); /* rest of code */ } __SE0_CLOSE();
However, after testing, it looks like ICNT0 must be 16 in order to get 16 elements in the destination vector with one __SE0_ADV(). Is that correct? If that's the case, what would be the better alternative to achieve my goal?
Thanks,
Fred