Hi,
unfortunately, in a late phase of the development we encountered linux stalls on the
K2H ARM cores.
After about two hours the system just vanishes, though BMC status reports it
running. On the serial terminal we saw kernel traces occasionally, which told a
story of two ARM cores being stalled. Otherwise the system becomes unreachable.
Since our project utilizes many subsystems of the SOC already it took quite a while
to distill the problem to OpenMP memory leak. Alas, there's no valgrind available in
the root filesystem but using libc's mtrace we saw suspicious (and suspiciously
many) allocations before entering an OpenMP target region.
To prove our point clearly and conspicuously we took a TI native simple
OpenMP example: vecadd from ti-openmpacc_1.2.0/openmpacc-examples and modified it
slightly to run the test repeatedly using a simple while(1) construct. Hence, the
test runs flawlessly, though 'top' shows rapidly growing memory consumption. Finally,
the process gets killed or the whole system stalls. So it seems each openmp target
call has a serious price. Needless to say we can not overcome this difficulty by
ourself, and we've run out of ideas. The problem is twofold: first the memory leak, and
then the fact that it leads to a complete disaster, namely the system stall.
Could you please address this?
Regards,
Janos Balazs
HPC Ver: hpc_03_00_01_12