OpenMpAcc K2H memory leak?

Janos Balazs

Hi,

unfortunately, in a late phase of the development we encountered linux stalls on the
K2H ARM cores.
After about two hours the system just vanishes, though BMC status reports it
running. On the serial terminal we saw kernel traces occasionally, which told a
story of two ARM cores being stalled. Otherwise the system becomes unreachable.

Since our project utilizes many subsystems of the SOC already it took quite a while
to distill the problem to OpenMP memory leak. Alas, there's no valgrind available in
the root filesystem but using libc's mtrace we saw suspicious (and suspiciously
many) allocations before entering an OpenMP target region.

To prove our point clearly and conspicuously we took a TI native simple
OpenMP example: vecadd from ti-openmpacc_1.2.0/openmpacc-examples and modified it
slightly to run the test repeatedly using a simple while(1) construct. Hence, the
test runs flawlessly, though 'top' shows rapidly growing memory consumption. Finally,
the process gets killed or the whole system stalls. So it seems each openmp target
call has a serious price. Needless to say we can not overcome this difficulty by
ourself, and we've run out of ideas. The problem is twofold: first the memory leak, and
then the fact that it leads to a complete disaster, namely the system stall.

Could you please address this?

Regards,

Janos Balazs

HPC Ver: hpc_03_00_01_12

over 8 years ago

0 Raja over 8 years ago

TI__Guru* 81335 points

Moved this thread over HPC forum for appropriate and faster response. Thank you.

0 EricStotzer over 8 years ago

TI__Prodigy 610 points

Hi Janos,

I've reproduced the problem and I'm currently investigating.

Best Regards,

Eric Stotzer

0 EricStotzer over 8 years ago

TI__Prodigy 610 points

4087.libOpenMPAcc.so.1.2.1.tar.gz

Hi Janos,

I've located the source of the memory leak in the OpenMPAcc runtime library. I have attached a patched library. Install this library on you ARM filesystem in /usr/lib and update /usr/lib/libOpenMPAcc.so.1 to point at at it. Please confirm that your system is working with this fix.

Best Regards, Eric

0 EricStotzer over 8 years ago in reply to EricStotzer

TI__Prodigy 610 points

Please make sure that you update /usr/lib/libOpenMPAcc.so.1. I had the wrong library name in my original post, which I have since edited.

Thanks, Eric

0 Janos Balazs over 8 years ago in reply to EricStotzer

Intellectual 380 points

Thanks for the quick response and the patch!

It really seems to solve the leak with 'vecadd', but while testing it with our project we experienced a significant
performance degradation. What usually runs in 160 msecs now is over 500ms. I
double-checked it, and the phenomenon positively correlates with the patch. We
checked if something similar happens using vecadd but for 10^6 numbers that
computation lasts 16msecs and it seems indifferent to the patch. Our computations
are somewhat more complex. There is one simple part of it though which merely uses
the FFT provided by TI for the DSP side and we do use cmemk's continuous memory
blocks and also the EDMA extensively to prefetch data from DDR into local L2. So the
computation is simple (calling TI FFT) and all we added is the prefetch from cmem
DDR into L2, yet the measured time for this part is also significantly longer.
Moreover, we use many map to/from target of course but mostly for cmem located
arrays. I don't think I will find a similar test in the stock to check, hence I can
not show or understand what is wrong.
All the parts running on OMP target take more than double time to finish. Since you
know what has changed, maybe you have some ideas pertaining the source of the
problem.

Thanks a lot!

0 EricStotzer over 8 years ago in reply to Janos Balazs

TI__Prodigy 610 points

4213.libOpenMPAcc.so.1.2.2.tar.gz

Hi Janos,

I was able to reproduce performance degradations when using large buffers allocated via __malloc_ddr(). I tracked the issue to a build environment problem. The build was using an older version of the ti/cmem.h file. In any case, I've attached an updated libOpenMPAcc.so. Please give it a try.

Thanks, Eric

0 Janos Balazs over 8 years ago in reply to EricStotzer

Intellectual 380 points

Hi, sorry for the delay.. We've just finished a long running test and now
we can confirm, that you have indeed made a significant contribution to
the next MCSDK HPC release: originial performance whithout memory leak.

Thanks a lot for your support!

Processors

Processors forum

OpenMpAcc K2H memory leak?