This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMpAcc K2H memory leak?

Hi,

unfortunately, in a late phase of the development we encountered linux stalls on the
K2H ARM cores.
After about two hours the system just vanishes, though BMC status reports it
running. On the serial terminal we saw kernel traces occasionally, which told a
story of two ARM cores being stalled. Otherwise the system becomes unreachable.

Since our project utilizes many subsystems of the SOC already it took quite a while
to distill the problem to OpenMP memory leak. Alas, there's no valgrind available in
the root filesystem but using libc's mtrace we saw suspicious (and suspiciously
many) allocations before entering an OpenMP target region.

To prove our point clearly and conspicuously we took a TI native simple
OpenMP example: vecadd from ti-openmpacc_1.2.0/openmpacc-examples and modified it
slightly to run the test repeatedly using a simple while(1) construct. Hence, the
test runs flawlessly, though 'top' shows rapidly growing memory consumption. Finally,
the process gets killed or the whole system stalls. So it seems each openmp target
call has a serious price. Needless to say we can not overcome this difficulty by
ourself, and we've run out of ideas. The problem is twofold: first the memory leak, and
then the fact that it leads to a complete disaster, namely the system stall.

Could you please address this?

Regards,

Janos Balazs

HPC Ver: hpc_03_00_01_12

  • Moved this thread over HPC forum for appropriate and faster response. Thank you.
  • Hi Janos,

    I've reproduced the problem and I'm currently investigating.

    Best Regards,

    Eric Stotzer
  • 4087.libOpenMPAcc.so.1.2.1.tar.gz

    Hi Janos,

    I've located the source of the memory leak in the OpenMPAcc runtime library.  I have attached a patched library.  Install this library on you ARM filesystem in /usr/lib and update /usr/lib/libOpenMPAcc.so.1 to point at at it.  Please confirm that your system is working with this fix.

    Best Regards, Eric

  • Please make sure that you update /usr/lib/libOpenMPAcc.so.1. I had the wrong library name in my original post, which I have since edited.

    Thanks, Eric
  • Thanks for the quick response and the patch!

    It really seems to solve the leak with 'vecadd', but while testing it with our project we experienced a significant
    performance degradation. What usually runs in 160 msecs now is over 500ms. I
    double-checked it, and the phenomenon positively correlates with the patch. We
    checked if something similar happens using vecadd but for 10^6 numbers that
    computation lasts 16msecs and it seems indifferent to the patch. Our computations
    are somewhat more complex. There is one simple part of it though which merely uses
    the FFT provided by TI for the DSP side and we do use cmemk's continuous memory
    blocks and also the EDMA extensively to prefetch data from DDR into local L2. So the
    computation is simple (calling TI FFT) and all we added is the prefetch from cmem
    DDR into L2, yet the measured time for this part is also significantly longer.
    Moreover, we use many map to/from target of course but mostly for cmem located
    arrays. I don't think I will find a similar test in the stock to check, hence I can
    not show or understand what is wrong.
    All the parts running on OMP target take more than double time to finish. Since you
    know what has changed, maybe you have some ideas pertaining the source of the
    problem.

    Thanks a lot!
  • 4213.libOpenMPAcc.so.1.2.2.tar.gz

    Hi Janos,


    I was able to reproduce performance degradations when using large buffers allocated via __malloc_ddr().  I tracked the issue to a build environment problem.  The build was using an older version of the ti/cmem.h file.  In any case, I've attached an updated libOpenMPAcc.so.  Please give it a try.


    Thanks, Eric


  • Hi, sorry for the delay.. We've just finished a long running test and now
    we can confirm, that you have indeed made a significant contribution to
    the next MCSDK HPC release: originial performance whithout memory leak.
    
    
    Thanks a lot for your support!