PROCESSOR-SDK-J784S4: Size of C++ source files

Thierry BERNIER

Prodigy 80 points

Part Number: PROCESSOR-SDK-J784S4
Other Parts Discussed in Thread: MATHLIB

Tool/software:

Hi TI,

In the same C++ source file (around 800 lines), I have :

a C7X kernel code inside a C++ kernel() function,
and an init() function to prepare a context for that kernel.

The init() function evaluates an array of constants for the kernel, and prepares a bunch of SE/SA templates. (FFTlib-like pattern of API)

The init() function also uses something like a kernel loop, to evaluate the array of constants. And this uses some sort of sin/cos inlines from MATHlib.

The execution time of the kernel() function is 67 units of time.

If I put the part of the init() function dealing with the array of constants calculation, inside another source file (separate compilation), without changing the kernel() function, then the execution time of kernel() is better, down to 58 units.

I suspect that the compiler/optimizer codes differently the kernel(), depending of the amount of code it has in the same source file. Despite it is in independant functions.
Does it sounds possible to you ?

I see in MATHlib inline functions that their code is put in a dedicated text section. (something like "text;optci"). What is the reason for that, shall I do the same ?

over 1 year ago

0 Asha Bhandarkar over 1 year ago

TI__Genius 10170 points

Hi Thierry,

Just to confirm - this is how the code is roughly structured? And the change you are making would be removing the for loop in the init function to another file?

init() {
    SE/SA param intialization
    for() {
        MATHLIB_sin()
        MATHLIB_cos()
    }
}

kernel() {
    for() {
        Processing code use SE/SA
    }
}

Thierry BERNIER said:
I see in MATHlib inline functions that their code is put in a dedicated text section. (something like "text;optci")

Can you point to where you are seeing this?

Best,

Asha

0 Thierry BERNIER over 1 year ago in reply to Asha Bhandarkar

Prodigy 80 points

Hi Asha,

yes, the is my scheme. More detailed the code is like that in a "file.cpp" (C++) :

namespace {  // C++ ODR 

static struct Tab {
  float k[xxx];
  xxx sesa_tmpl[xxx];
  init();
} tab;

static init_constants() {
  { fill tab.k[] using a small kernel made of copy/mix of MATHlib inlines }
}

Tab::init() {
  init_constants(); // Call a function of the same file to fill tab.k
  { here-code to prepare SE/SA templates, filling tab_sesa_tmpl[] }
}

static kernel() {
  { use tab.k[] and tab.sesa_tmpl[] }
}

} // namespace

void API () {   // Public/exported C-function
  init();
  for (...) kernel();
}

In the "fast version", the init_constants() is left outside in another file, passing it tab.k[] as ptr argument, to fill it.

Extract of MATH_sin from MATHlib_09_02_00_04 :

(...)

static inline float sinsp_i(float a);

#ifndef __cplusplus /* FOR PROTECTION PURPOSE - C++ NOT SUPPORTED. */
#pragma CODE_SECTION(sinsp_i, ".text:optci");
#endif

(...)

0 Thierry BERNIER over 1 year ago in reply to Thierry BERNIER

Prodigy 80 points

NB: I am using CGT version 3.1.0. I will try with 4.1.0.

0 Asha Bhandarkar over 1 year ago in reply to Thierry BERNIER

TI__Genius 10170 points

Hi Thierry,

Yes if you are using the 9.2 SDK, use the 4.1.0.LTS compiler as this is what is packaged with the SDK (and what we will use for validation).

Also, as a sidenote, do note that the sinsp_i function is a scalar function derived from the C66x implementation of sin, and not the "vectorized" version that is optimized for the C7x processor.

I'm pulling in our compiler team to understand the main issue. Are you measuring the difference in terms of cycle counts?

Best,

Asha

0 George Mock over 1 year ago

TI__Guru**** 253170 points

For this source file ...

Thierry BERNIER said:
In the same C++ source file (around 800 lines), I have :

a C7X kernel code inside a C++ kernel() function,

and an init() function to prepare a context for that kernel.

... please follow the directions in the article How to Submit a Compiler Test Case. But please do it two times. Once in original form. And again with these changes ...

Thierry BERNIER said:
If I put the part of the init() function dealing with the array of constants calculation, inside another source file (separate compilation), without changing the kernel() function, then the execution time of kernel() is better, down to 58 units.

Thanks and regards,

-George

0 Thierry BERNIER over 1 year ago in reply to George Mock

Prodigy 80 points

I sent you a private message with the requested informations

0 George Mock over 1 year ago in reply to Thierry BERNIER

TI__Guru**** 253170 points

I apologize for the delay.

Thank you the test case. I am able to reproduce the behavior. I cannot explain the cause. So I filed EXT_EP-11846 to have this investigated. You are welcome to follow it with that link.

Thanks and regards,

-George

Processors

Processors forum

PROCESSOR-SDK-J784S4: Size of C++ source files