This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-J784S4: Size of C++ source files

Part Number: PROCESSOR-SDK-J784S4
Other Parts Discussed in Thread: MATHLIB

Tool/software:

Hi TI,

In the same C++ source file (around 800 lines), I have :

  • a C7X kernel code inside a C++ kernel() function,
  • and an init() function to prepare a context for that kernel.

The init() function evaluates an array of constants for the kernel, and prepares a bunch of SE/SA templates. (FFTlib-like pattern of API)

The init() function also uses something like a kernel loop, to evaluate the array of constants. And this uses some sort of sin/cos inlines from MATHlib.

The execution time of the kernel() function is 67 units of time.

If I put the part of the init() function dealing with the array of constants calculation, inside another source file (separate compilation), without changing the kernel() function, then the execution time of kernel() is better, down to 58 units.

I suspect that the compiler/optimizer codes differently the kernel(), depending of the amount of code it has in the same source file. Despite it is in independant functions.
Does it sounds possible to you ?

I see in MATHlib inline functions that their code is put in a dedicated text section. (something like "text;optci"). What is the reason for that, shall I do the same ?

  • Hi Thierry,

    Just to confirm - this is how the code is roughly structured? And the change you are making would be removing the for loop in the init function to another file?

    init() {
        SE/SA param intialization
        for() {
            MATHLIB_sin()
            MATHLIB_cos()
        }
    }
    
    kernel() {
        for() {
            Processing code use SE/SA
        }
    }

    I see in MATHlib inline functions that their code is put in a dedicated text section. (something like "text;optci")

    Can you point to where you are seeing this?

    Best,

    Asha

  • Hi Asha,

    yes, the is my scheme. More detailed the code is like that in a "file.cpp" (C++) :

    namespace {  // C++ ODR 

    static struct Tab {
      float k[xxx];
      xxx sesa_tmpl[xxx];
      init();
    } tab;

    static init_constants() {
      { fill tab.k[] using a small kernel made of copy/mix of MATHlib inlines }
    }

    Tab::init() {
      init_constants(); // Call a function of the same file to fill tab.k
      { here-code to prepare SE/SA templates, filling tab_sesa_tmpl[] }
    }

    static kernel() {
      { use tab.k[] and tab.sesa_tmpl[] }
    }

    } // namespace

    void API () {   // Public/exported C-function
      init();
      for (...) kernel();
    }

    In the "fast version", the init_constants() is left outside in another file, passing it tab.k[] as ptr argument, to fill it.

    Extract of MATH_sin from MATHlib_09_02_00_04 :

    (...)

    static inline float sinsp_i(float a);

    #ifndef __cplusplus /* FOR PROTECTION PURPOSE - C++ NOT SUPPORTED. */
    #pragma CODE_SECTION(sinsp_i, ".text:optci");
    #endif

    (...)
  • NB: I am using  CGT version 3.1.0. I will try with 4.1.0.

  • Hi Thierry,

    Yes if you are using the 9.2 SDK, use the 4.1.0.LTS compiler as this is what is packaged with the SDK (and what we will use for validation). 

    Also, as a sidenote, do note that the sinsp_i function is a scalar function derived from the C66x implementation of sin, and not the "vectorized" version that is optimized for the C7x processor. 

    I'm pulling in our compiler team to understand the main issue. Are you measuring the difference in terms of cycle counts? 

    Best,

    Asha

  • For this source file ...

    In the same C++ source file (around 800 lines), I have :

    • a C7X kernel code inside a C++ kernel() function,
    • and an init() function to prepare a context for that kernel.

    ... please follow the directions in the article How to Submit a Compiler Test Case.  But please do it two times.  Once in original form.  And again with these changes ...

    If I put the part of the init() function dealing with the array of constants calculation, inside another source file (separate compilation), without changing the kernel() function, then the execution time of kernel() is better, down to 58 units.

    Thanks and regards,

    -George

  • I sent you a private message with the requested informations

  • I apologize for the delay.

    Thank you the test case.  I am able to reproduce the behavior.  I cannot explain the cause.  So I filed EXT_EP-11846 to have this investigated.  You are welcome to follow it with that link.

    Thanks and regards,

    -George