Tool/software:
This is Seunghun at Stradvision.
I encounter runtime errors in the resulting executable built by BUILD_WITH_OPENACC.
- HPC SDK 23.7
- PSDK 9.2 for J721S2
- Docker image: nvidia/cuda:11.8.0-devel-ubuntu22.04 provided by NVIDIA
- NVIDIA-related environment (tested on two different PCs):
- (4-1) RTX 4080
- NVIDIA Graphics Driver: 535.183
- CUDA Driver: Pre-installed in HPC SDK (12.2)
- (4-2) RTX 3070, TITAN X (two GPUs in the same machine)
- NVIDIA Graphics Driver: 530.41
- CUDA Driver: Pre-installed in HPC SDK (12.2)
- Modified build settings: In our concerto build configuration for the executable that uses the TIDL library, we added the following link information:
LDIRS += /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib
LDIRS += /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2/lib64
SHARED_LIBS += acccuda acchost accdevice accdevaux cudart
Accelerator Fatal Error: No CUDA device code available File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463 Line: 473
-gpu=ccall
to -gpu=cc86
)Accelerator Fatal Error: No CUDA device code available File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463 Line: 473
-gpu=ccall
to -gpu=cc86
)Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc80 -gpu=cc86 -acc=host or -acc=multicore Rebuild this file with -gpu=cc61 to use NVIDIA Tesla GPU 0 File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463 Line: 473
ldd
on the executable, it shows OpenACC and CUDA-related libraries linked as follows:libacccuda.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libacccuda.so (0x00007fe543400000) libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libacchost.so (0x00007fe543000000) libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libaccdevice.so (0x00007fe542800000) libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libaccdevaux.so (0x00007fe542400000) libcudart.so.12 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2/lib64/libcudart.so.12 (0x00007f19bac00000)
nvc++ -v
) confirms:Export PGI_CURR_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 Export NVHPC_CURRENT_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 Export NVHPC_CURRENT_CUDA_VERSION=12.2.53 Export NVCOMPILER=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7 Export PGI=/opt/nvidia/hpc_sdk
TIDL_refConv2dKernelFast
despite the build logs showing “Generating NVIDIA GPU code” and the generation of .ptx
, fat binary files.void TIDL_refConv2dKernelFast<1, unsigned short, signed char, int, int>(unsigned short*, signed char*, int*, int*, int*, int*, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int): 473, Generating present(pCoeffs[:((numInChannels-1)*(coeffsWidth*coeffsHeight))+((coeffsWidth*(coeffsHeight*(numInChannels*(numOutChannels-1))))+(numOutChannels*(coeffsWidth*((numInChannels*(numGroups-1))*coeffsHeight))))+1],pInChannel[:((width%strideWidth)+(width-strideWidth))+((inImPitch*((height%strideHeight)+(height-strideHeight)))+((inChPitch*(numInChannels-1))+((inBatchPitch*(numBatches-1))+(inChPitch*(numInChannels*(numGroups-1))))))+1],pBias[:numOutChannels+((numGroups-1)*numOutChannels)],accPtr[:(((width%strideWidth)+(width-strideWidth))/strideWidth)+((((height%strideHeight)+(height-strideHeight))*outImPitch)+(((numOutChannels-1)*outChPitch)+(((numBatches-1)*outBatchPitch)+(((numGroups-1)*numOutChannels)*outChPitch))))+1]) Generating implicit firstprivate(numGroups,strideHeight,topPad,width,pInChannel,numInChannels,numBatches,leftPad,inWidth,isOTFpad,inHeight,strideWidth,inImPitch,height,numOutChannels) Generating NVIDIA GPU code 496, #pragma acc loop gang, vector(128) collapse(5) /* blockIdx.x threadIdx.x */ 498, /* blockIdx.x threadIdx.x collapsed */ 500, /* blockIdx.x threadIdx.x collapsed */ 502, /* blockIdx.x threadIdx.x collapsed */ 504, /* blockIdx.x threadIdx.x collapsed */ Generating reduction(min:_min) Generating reduction(max:_max) 519, #pragma acc loop seq 524, #pragma acc loop seq 527, #pragma acc loop seq 504, Generating implicit firstprivate(enableBias,inBatchPitch,inChPitch,outBatchPitch,outImPitch,outChPitch) 519, Generating implicit firstprivate(coeffsHeight,coeffsWidth) 527, Generating implicit firstprivate(dilationHeight,startRowNumberInTensor,padVal,dilationWidth) ................... /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/tools/nvdd -dcuda /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 -usenvvm -nvvm70 -reloc /tmp/nvacceWgemkX4NEtn.gpu -computecap 86 -ptx /tmp/nvacceWgem5n6NRPb.ptx -o /tmp/nvaccKWgeSc99ei3K.bin -ftz -cuda12020 /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/tools/nvdd -dcuda /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 -reloc -cuda12020 -fat src/tidl_conv2d_base.c -sm 86 /tmp/nvaccKWgeSc99ei3K.bin -compute 86 /tmp/nvacceWgem5n6NRPb.ptx -o /tmp/nvacceWgemyLVfjWr.fat NVC++/x86-64 Linux 23.7-0: compilation successful
Any help or suggestions you can provide would be greatly appreciated.