This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: egl initialization time optimization problem

Part Number: TDA4VM

Hi TI

TDA4VM
SDK: 7.2


There is currently a strong optimization requirement,The EGL and rendering needs to start quickly in the project, but also load other modules as soon as possible,The Implementation of our project is roughly as follows :
1. Thread A : load all modules so
load module 1 render -> run render initialization (egl..)
load module 2 .. -> run initialization after 【 render initialization】
load module 3 .. -> run initialization after 【 render initialization】
....

2. Thread B :
eglInitialize -> (dlopen ...x4)
/usr/lib/dri/tls/pvr_dri.so
/usr/lib/dri/pvr_dri.so
libpvr_dri_support.so
libGLESv1_CM_PVR_MESA.so
libGLESv2_PVR_MESA.so
libGL.so

eglCreateContext -> (dlopen ...x1)
libGLESv2_PVR_MESA.so
...


Because egl calls dlopen too many times, and dlopen is serial, it is very easy to be blocked
At present, I have consulted with FAE, but I can't provide the code of EGL, so can TI provide the so without dlopen version, and we explicitly link dynamic libraries in cmake?

Thank you very much

  • Hello,

    Thank you for the explanation. I am checking with our development team and our partners whether this solution can be implemented.

    Regards,

    Erick

  • Hello,

    After discussion with our partner, it looks like this is not a trivial change in the driver implementation to force the driver to dynamically link the libraries. Below are some highlights from our discussion:

    - It is possible to produce a 'static' linked binary but the downside to that approach is that the physical size (as in disk-size) will be increased for every application as it will require static copies of the libraries that it uses to also be linked into the final binary image. Providing a 'static' DDK will likely cause space issues on the "disk" space that is available for vendor utilities etc.

    - The other 'cost' to running static applications is that there is more data to load from the underlying FS and so that will take 'longer' - also, because it is a static binary each invocation of such an application will eat another X-MB of RAM as there is no sharing available - each app is completely self-contained.

    Are you currently trying to start this application as quickly as possible from booting up the device? Or just booting up the application?

    What is your underlying storage holding your filesystem?

    Regards,

    Erick

  • Hello Erick Narvaez

    Nice to see your reply,The problems mentioned are as follows

    Trying to start this application as quickly as possible from booting up the device。

    :Yes, The application design is to start a thread to load so, we don't want other threads loading so to cause conflict, currently egl has a conflict

    storage device

    :EMMC

    Do you mean that TI can provide a static library of EGL, the library does not contain an explicit dlopen call?

    We think this is a very good proposal, the size of the dynamic library mentioned in the question is about 4MB

    Regards

  • Hello,

    Can you describe how you are measuring the latency for your application startup?

    I've created a simple test to measure the time it takes to run eglInitialize, but it does not account for the application startup time, which can vary from different trials.

    To understand your system setup, can you also let me know if you are using the default kernel from the SDK 7.2? Have you made any changes to this?

    Regards,

    Erick

  • Hello

    Can you describe how you are measuring the latency for your application startup?

    The application contains two threads, AB
    Thread A: keeps running loading dlopen, keeps loading so
    Thread B: Initializes egl
    Measure eglInitialize, eglCreateContext time in thread B

    As we discussed recently in other issues, the GPU SDK is version 7.3

    TDA4VM: Coredump appears when glBindFramebuffer is executed - Processors forum - Processors - TI E2E support forums

    Regards

  • Hello,

    This ticket is described to be SDK 7.2, so your Linux Kernel is SDK 7.2? I will deliver with this assumption.

    Below, please find new libraries with updated GPU DDK binaries, please let me know if you see an improvement in your loading times, we have added an optimization in the loading time of the libraries.

    Regards,

    Erick

    sdk-7.2-gpu-ddk-lib-loading-optimization.tar.gz

  • Hello 

    Do I only replace usr/lib, do I also need to replace ko?

  • Hello,

    Please try just replacing the usr/lib for now. There could be issues with the .ko since I built it against the SDK 7.2 Linux kernel, which you may have made modifications to.

    Let me know if there are any issues in the boot process and driver initialization.

    Regards,

    Erick

  • Hello 

    thanks Erick ,I compared the test library with the library being used without noticeable effect

    【test】

    eglInitialize time = 319

    eglCreateContext time = 366

     being used 

    eglInitialize time = 319

    eglCreateContext time = 318

    Can you tell us what changes have been made to the test library?

    Regards

  • Hello Li,

    Currently, the libraries disabled the check that is made for libraries being present for GLES. On my side, during the test, there was a slight improvement in the run-time of the eglInitialize function, but it seems as through you are not seeing substantial improvements.

    I will update our partner with this information to see if they have any other suggestions.

    Regards,

    Erick

  • Hello

    Thanks Erick,

    “Currently, the libraries disabled the check that is made for libraries” 

    What function operation does it refer to?

    expect new improvements

    Regards

  • Hello Li,

    Can you please run this test application on your device so that we can get comparable numbers? If you could give the output of running it a few times, please let me know if there are issues running it.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/a.out

    Regards,

    Erick

  • Hello Erick

    Sorry for the  late , Running a.out is shown below

    error: XDG_RUNTIME_DIR not set in the environment.
    a.out: simple_egl.c:678: main: Assertion `display.display' failed.
    Aborted

      

    Regards

  • Hello Li,

    Are you running Weston when you boot up your board? Or is your application running differently?

    Regards,

    Erick

  • Hello

    Does weston mean the window system? At present, weston is not deployed in targetfs,  our application is rendered on a shared memory bind to eglCreateImageKHR texture

    Regards

  • Ok, we are trying to get a similar application to run so we can compare the performance between our systems.

    In your test, you have reported times of 319, 366 and 318. What is this time in? Milliseconds, microseconds, etc?

    In the test application I was running on my system, I get the time for eglInitialize=14ms, 18ms at worst.

    Regards,

    Erick

  • Hello

    The time to run the initialization egl in the application is as follows, in milliseconds

    eglInitialize time = 319

    eglCreateContext time = 366

    As communicated earlier, we are testing egl initialization time alone is about 10 milliseconds

    Regards

  • Hello,

    Thank you.

    I've modified my application to spin up ~20-30 threads before I see this impact, how many threads are you running?

    1. Thread A : load all modules so
    load module 1 render -> run render initialization (egl..)
    load module 2 .. -> run initialization after 【 render initialization】
    load module 3 .. -> run initialization after 【 render initialization】

    Regards,

    Erick

  • Hello

    one thread for load so always

    maybe 10 threads run at same time

    Regards

  • Li,

    At what granularity are you measuring the delay of eglInitialize and eglCreateContext, outside the function? Do you have any more internal statistics on what is taking the longest? Or are you measuring the time delay of the dlopen() directly?

    Regards,

    Erick

  • Hello

    It's really important for you to ask this question, because the tests are all at the interface level, and only record the run time of the eglCreateContext and eglInitialize functions

    I have two directions, please see if it works:
    1. Please helped provide a library with time-consuming printing, I'll check the results
    2. Whether the tools can be used to grasp the time-consuming inside

    Regards

  • Erick,

    Unlocking this thread for further communication.

    - Keerthy