This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

HELP:The segmentation fault issue in the OpenCL kernel function

Other Parts Discussed in Thread: 66AK2H12

When I implemented the function based on OpenCL, the segmentation fault occurred within the kernel function.
The background is that I just made the static lib (.a file) including the OpenCL code, and integrate such lib in my project and call the public interface from the lib.
When I gdb the program, the callstack is as follows:

#0  0x000ce780 in ?? ()
#1  0xb6d1d3ec in Coal::BufferEvent::isSubBufferAligned(Coal::MemObject const*, Coal::DeviceInterface const*) () from /usr/lib/libOpenCL.so.1
#2  0xb6d1f768 in Coal::KernelEvent::KernelEvent(Coal::CommandQueue*, Coal::Kernel*, unsigned int, unsigned int const*, unsigned int const*, unsigned int const*, unsigned int, _cl_event* const*, int*) () from /usr/lib/libOpenCL.so.1
#3  0xb6d17664 in clEnqueueNDRangeKernel () from /usr/lib/libOpenCL.so.1
#4  0x0008efee in cl::CommandQueue::enqueueNDRangeKernel (this=0xc7dcc <kernelFunc+4>, kernel=..., offset=..., global=..., local=..., events=0x0,
    event=0xbeffed94)
    at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2955
#5  0x0008fc70 in cl::KernelFunctor::operator()<int, int, int, float, cl::Buffer*, int, cl::Buffer*, int, float, cl::Buffer*, int, int, int, cl::LocalSpaceArg, cl::Buffer> (this=0xc7dc8 <kernelFunc>, a1=@0xbeffed14: 16, a2=@0xbeffed10: 200704, a3=@0xbeffedb8: 27, a4=@0xbeffed0c: 1,
    a5=@0xbeffed34: 0x122ea0, a6=@0xbeffed14: 16, a7=@0xbeffed30: 0x117ba8, a8=@0xbeffedb8: 27, a9=@0xbeffed08: 1, a10=@0xbeffed2c: 0x10a4b8,
    a11=@0xbeffed14: 16, a12=@0xc7da0: 32, a13=@0xc7da4: 8, a14=..., a15=..., events=0x0)
    at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:3936
   ...

(PS: a13 means constant value 8, not the memory address)

And the error information is:
Cannot access memory at address 0x8
Program received signal SIGSEGV, Segmentation fault.
Cannot access memory at address 0x8

I see that the program visited the illegal address 0x8 in the "KernelEvent". But I've checked all the parameters of kernel function, they are either allocated at the normal address or the fixed values. Since the last function is "isSubBufferAligned" that exists in OpenCL lib. I didn't know how to go further on this issue.

Could anyone give some idea on how to debug such OpenCL run-time error?
Thanks so much!!

  • Hi,

    Could you please share which board is this? I see that you use ti-processor-sdk-linux-k2hk-evm-03.00.00.04, does this mean that you use the K2HK EVM or is this a custom board? If this is the K2H EVM, can you share the steps to reproduce the problem on my side?

    Also if you're using the TI reference design (K2H EVM), have you tried with the latest ti-processor-sdk-linux-k2hk-evm-03.02.00.05 (kernel 4.4.32)?

    Best Regards,
    Yordan
  • Hi, Yordan

    Sorry I missed some information. I'm using 66AK2H12 board.

    From my point of view, the issue is almost irrelevant to the older version of processor-sdk. I've attached the source code and building script as a zip package here.

    code_segmentation_fault.zip


    Just run "sh build_lib_test.sh" and "./main-g" to start the program.

    Thank you so much for helping me on this topic!!

  • Hi,

    I'm trying to compile the source you attach, but in the opencl_sgemm_lib directory I have the following files:

    x0155517@mms:~/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/seg_fault/code_segmentation_fault/opencl_sgemm_lib$ ls

    cblas.h data.h data_move.c edmamgr.h gemm.cpp gemm_dsp.h kernel.cl libcblas_atlas_arm main.c Makefile sgemm.c sgemm_kernel.c sgemm_kernel.h


    In the libcblas_atlas_arm directory I have the following files: 
    x0155517@mms:~/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/seg_fault/code_segmentation_fault/opencl_sgemm_lib/libcblas_atlas_arm$ ls

    libatlas.a libcblas.a libptcblas.a

    Respectively in  sgemm.c,  line 45 there is the following include: #include "dsp_c.h"   & in build_lib_test.sh, there is the following command: 
    "arm-linux-gnueabihf-gcc  -I/home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/linux-devkit/sysroots/x86_64-arago-linux/usr/include -I/home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include -idirafter /usr/include -Wall -Wfatal-errors -pg  -O0 -g main.o -o main -lrt -lpthread -lm  -L/home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/lib -L/home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/lib -Wl,-rpath-link,/home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/lib -Wl,-rpath-link,/home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/lib -lOpenCL -locl_util /home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/seg_fault/code_segmentation_fault/opencl_sgemm_lib/libcblas_atlas_arm/libcblas.a /home/x0155517/ti-processor-sdk-linux-k2hk-evm-03.02.00.05/seg_fault/code_segmentation_fault/opencl_sgemm_lib/libcblas_atlas_arm/libatlas.a libgemm_dsp.a -lstdc++ -locl_util "

    And as you see by the output of ls, there are no dsp_c.h & libgemm_dsp.a files, so my build fails with: 

    "sgemm.c", line 45: fatal error: cannot open source file "dsp_c.h"
    1 catastrophic error detected in the compilation of "sgemm.c".
    Compilation terminated.

    >> Compilation failure
    make: *** [sgemm.obj] Error 1
    Building test code ...
    arm-linux-gnueabihf-gcc: error: libgemm_dsp.a: No such file or directory

    Could you provide those two missing files? 

    Best Regards, 
    Yordan


       

  • Hi, Yordan

    I've checked your two missing files. The first "dsp_c.h" is included in the SDK directory, I marked it with red color in the following log. And "libgemm_dsp.a" is just one of the generating files of the script. The "build_lib_test.sh" will generate "libgemm_dsp.a" and "main" sequentially.

    Here is my log of compilation:

    Building lib ...
    Compiling sgemm.c
    cl6x -mv6600 --abi=eabi -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/include  -I/usr/share/ti/opencl -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/opencl -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/include -c -o0  -g sgemm.c
    "sgemm.c", line 31: warning: #warning directive: Using EDMA to transfer data
    Compiling sgemm_kernel.c
    cl6x -mv6600 --abi=eabi -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/include  -I/usr/share/ti/opencl -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/opencl -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/include -c -o0  -g sgemm_kernel.c
    Compiling data_move.c
    cl6x -mv6600 --abi=eabi -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/include  -I/usr/share/ti/opencl -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/opencl -I/opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/include -c -o0  -g data_move.c
    clocl  -g --txt kernel.cl sgemm.obj sgemm_kernel.obj data_move.obj
    Compiling gemm.cpp
    arm-linux-gnueabihf-ar rcs  libgemm_dsp.a gemm.o
    Building test code ...

  • Hi,

    I tested this on my side, and here is the result.

    Upon first run of ./main -g, I get the bellow console output:

    root@k2hk-evm:~# ./main -g

    first line for the project!!!!

    Generating Input Data ...C[16,200704] = alpha * A[16,27] * B[27,200704] + beta * C[16,200704], use row-major storage

    alpha=1.000000, beta=1.000000

    Illegal instruction (core dumped)

    As you can see it differs from the behaviour on your side. I get Illegal instruction.  However on a consequent try the output is as follows:

    root@k2hk-evm:~# ./main -g

    first line for the project!!!!

    [   45.093763] misc debugss: drv_error: debugss_remap_mmap: Requested debugss physical address range is already mapped by another process with pid6

    Generating Input Data ...C[16,200704] = alpha * A[16,27] * B[27,200704] + beta * C[16,200704], use row-major storage

    alpha=1.000000, beta=1.000000

    Segmentation fault (core dumped)

    root@k2hk-evm:~#  

    The drv_error says it is conflicting with another process pid6, which was a kworker.  Anyway, in both cases I see that the program crashes around alpha=1.000000, beta=1.000000, that is right before the call of __cblas_sgemm(). So I started narrowing down the cause of the crash, by adding the following prints in  gemm.cpp:

    __cblas_sgemm.c
    extern "C" void __cblas_sgemm(int TA, int TB, int M, int N, int K, float ALPHA, 
            float* A, int lda, 
            float* B, int ldb,
            const float BETA,
            float* C, int ldc)
    /*(int TA, int TB, int M, int N, int K, float ALPHA, 
            float* A, void* A_cl, int lda, 
            float* B, void* B_cl, int ldb,
            const float BETA,
            float* C, void* C_cl, int ldc)*/
    {
        enum CBLAS_ORDER order = CblasRowMajor; /*Fixed value*/
    /*
        Buffer* a = (Buffer*)A_cl;
        Buffer* b = (Buffer*)B_cl;
        Buffer* c = (Buffer*)C_cl;
    */
        int i = 0;
        printf("print before first dsp_mem_alloc_copy()\n");
        Buffer* a = dsp_mem_alloc_copy(A, (M*K), CL_MEM_READ_ONLY);
        i++;
        printf("print %d", i); 
        Buffer* b = dsp_mem_alloc_copy(B, (N*K), CL_MEM_READ_ONLY);
        i++;
        printf("print %d", i);
        Buffer* c = dsp_mem_alloc_copy(C, (M*N), CL_MEM_READ_WRITE);
        i++;
        printf("print %d", i);
        
        printf("print before OOP allocation new. most likely breaking point, since we have conflict with kworker!\n");
        kernel = new Kernel(*program, "K_ocl_sgemm_dsp");
        printf("random print after new Kernel()\n");
        NDRange* global = new NDRange(NUMCOMPUNITS);//test
        printf("print after second OOP memory allocation!\n");
        NDRange* local = new NDRange(1);
        printf("print after third OOP memory allocation!\n");
        kernelFunc = kernel->bind(*commandQueue, *global, *local);
        printf("print after kernel bind!!!\n");
    
        if (TA)
        {
            transpose_matrix(A, M, K);
            printf("print after TA! \n");
        }
    
        if (TB)
        {
            transpose_matrix(B, K, N);
            printf("print after TB! \n");
        }
    
        try
        {
           if (order == CblasRowMajor) {
               kernelFunc(N, M, K, ALPHA, b, N, a, K, BETA, c, N,
                      NUMAPANELS, NUMBPANELS,
                      __local(L2_BUF_SIZE), *bufMsmc).wait();
               printf("Print after CBlasRowMajor kernel func in if statment in try! \n");
           }
    
           else {
               kernelFunc(M, N, K, ALPHA, a, M, b, K, BETA, c, M,
                      NUMAPANELS, NUMBPANELS,
                      __local(L2_BUF_SIZE), *bufMsmc).wait();
               printf("Print after  kernel func in else statment in try! \n");
            }
        }
        catch (Error err)
       {
           cerr << "ERROR: " << err.what() << "(" << err.err() << ", "
                << ocl_decode_error(err.err()) << ")" << endl;
           printf("Print in CATCH! \n");
           exit(-1);
       }
    
        RELEASE_OBJ(kernel);
        RELEASE_OBJ(global);
        RELEASE_OBJ(local);
        dsp_free(a);
        dsp_free(b);
        dsp_free(c);
        printf("print upon exit of __cblas_sgemm!\n");
        return;
    }
    
    

    And ran the program again.. the output is: 

    root@k2hk-evm:~# ./main -g
    first line for the project!!!!
    [ 620.492314] misc debugss: drv_error: debugss_remap_mmap: Requested debugss physical address range is already mapped by another process with pid:844
    Generating Input Data ...C[16,200704] = alpha * A[16,27] * B[27,200704] + beta * C[16,200704], use row-major storage
    alpha=1.000000, beta=1.000000

    Debug print before the __cblas_sgemm call
    print before first dsp_mem_alloc_copy()
    print 1print 2print 3print before OOP allocation new. most likely breaking point, since we have conflict with kworker!
    random print after new Kernel()
    print after second OOP memory allocation!
    print after third OOP memory allocation!
    print after kernel bind!!!
    Segmentation fault (core dumped)

    As you can see the kernel crashes right after calling kernelFunc = kernel->bind(*commandQueue, *global, *local);. The program does NOT go into the If (TA), if (TB) or the try catch statements. 

    Also this time the conflicting process with pid 844 did NOT appear in top. My guess is that it was a daemon started with the ./main & was killed after its exectuion.. Currently I am not sure what exactly is the problem.. Have you tried locking the execution of __cblas_sgemm() (i.e. with a semaphor) to see if the problem will persist?  Also did you get the same behavior with the unmodified opencl_sgemm_lib provided in the SDK? 

    Best Regards, 
    Yordan

  • Hi,

    Ok, it seems that the Segmentation call is caused by the following call:
    kernelFunc = kernel->bind(*commandQueue, *global, *local);

    If I remove this line from gemm.cpp the code works fine & catches an error:
    root@k2hk-evm:~# ./main
    first line for the project!!!!
    Generating Input Data ...C[16,200704] = alpha * A[16,27] * B[27,200704] + beta * C[16,200704], use row-major storage
    alpha=1.000000, beta=1.000000

    Debug print before the __cblas_sgemm call
    print before first dsp_mem_alloc_copy()
    print 1print 2print 3print before OOP allocation new. most likely breaking point, since we have conflict with kworker!
    random print after new Kernel()
    print after second OOP memory allocation!
    print after third OOP memory allocation!
    ERROR: clSetKernelArg(-48, Invalid kernel)

    If i have the kernel-bind() line in the code it executes, prints my debug message (printf("print after kernel bind!!!\n"); -> refer to the attached __cblas_sgemm.c file in my previous post), but results in Segmentaiton fault & cannot print the last debug message in the gemm.cpp file (printf("print upon exit of __cblas_sgemm!\n");) and cannot exit the __cblas_sgemm() function ==> execution of main is terminated with Segmentation Fault.

    EDIT: FYI, I tested the locking of __cblas_sgemm() with sem_init(), sem_wait(), sem_post() & sem_destroy() Also tried using lock() & unlock() in the gemm.c around  kernel-bind() in gemm.cpp. They don't work, still have the segmentation fault.

    Best Regards,
    Yordan

  • Hi, Yordan

    The current debug result is my side is like this:

    I used gdb to execute each line in the function and "kernel->bind" could be passed, the program crashed in the "kernelFunc".
     I showed my gdb log as below:

    k2hk-evm:/home/mcw/opencl_example# gdb main-g
    GNU gdb (Linaro GDB) 7.8-2014.09
    Copyright (C) 2014 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <gnu.org/.../gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "arm-linux-gnueabi".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://bugs.linaro.org>.
    Find the GDB manual and other documentation resources online at:
    <www.gnu.org/.../>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from main-g...done.

    (gdb) r
    The program being debugged has been started already.
    Start it from the beginning? (y or n) y

    Starting program: /home/mcw/opencl_example/main-g
    warning: Could not load shared library symbols for linux-vdso.so.1.
    Do you need "set solib-search-path" or "set sysroot"?
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/libthread_db.so.1".
    warning: File "/lib/libstdc++.so.6.0.21-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
    first line for the project!!!!
    [New Thread 0x9f294440 (LWP 842)]
    [New Thread 0x9ea94440 (LWP 843)]
    Generating Input Data ...C[16,200704] = alpha * A[16,27] * B[27,200704] + beta * C[16,200704], use row-major storage
    alpha=1.000000, beta=1.000000


    Breakpoint 1, __cblas_sgemm (TA=0, TB=0, M=16, N=200704, K=27, ALPHA=1,
        A=0x11a410, lda=27, B=0x9cde8008, ldb=200704, BETA=1, C=0x9c1a7008,
        ldc=200704) at gemm.cpp:256

    Since kernelFunc is just pointer function and wrapped by OpenCL, I went into its implementation inside on "cl.hpp", which exists in "opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/".

    256        enum CBLAS_ORDER order = CblasRowMajor; /*Fixed value*/
    (gdb) n
    262        Buffer* a = dsp_mem_alloc_copy(A, (M*K), CL_MEM_READ_ONLY);
    (gdb)
    263        Buffer* b = dsp_mem_alloc_copy(B, (N*K), CL_MEM_READ_ONLY);
    (gdb)
    264        Buffer* c = dsp_mem_alloc_copy(C, (M*N), CL_MEM_READ_WRITE);
    (gdb)
    266        kernel = new Kernel(*program, "K_ocl_sgemm_dsp");
    (gdb)
    267        NDRange* global = new NDRange(NUMCOMPUNITS);//test
    (gdb)
    268        NDRange* local = new NDRange(1);
    (gdb)
    269        kernelFunc = kernel->bind(*commandQueue, *global, *local);
    (gdb)
    271        if (TA)
    (gdb)
    276        if (TB)
    (gdb)
    283           if (order == CblasRowMajor)
    (gdb)
    286                      __local(L2_BUF_SIZE), *bufMsmc).wait();
    (gdb)
    284               kernelFunc(N, M, K, ALPHA, b, N, a, K, BETA, c, N,
    (gdb) s
    cl::KernelFunctor::operator()<int, int, int, float, cl::Buffer*, int, cl::Buffer*, int, float, cl::Buffer*, int, int, int, cl::LocalSpaceArg, cl::Buffer> (
        this=0x91530 <kernelFunc>, a1=@0xbefffab0: 200704, a2=@0xbefffab4: 16,
        a3=@0xbefffb58: 27, a4=@0xbefffaac: 1, a5=@0xbefffad0: 0xd98e0,
        a6=@0xbefffab0: 200704, a7=@0xbefffad4: 0xd9830, a8=@0xbefffb58: 27,
        a9=@0xbefffaa8: 1, a10=@0xbefffacc: 0xdba60, a11=@0xbefffab0: 200704,
        a12=@0x91508: 32, a13=@0x9150c: 8, a14=..., a15=..., events=0x0)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:3918
    3918    /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp: No such file or directory.
    (gdb) n
    3920    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp

    I continued the program until "cl::CommandQueue::enqueueNDRangeKernel". From the call stack at the end of the following log we can see it died in the "Coal::BufferEvent::isSubBufferAligned", which is kind of lib function from libOpenCL.so.

    (gdb)
    3921    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3922    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3923    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3924    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3925    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3926    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3927    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3928    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3929    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3930    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) n
    3931    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3932    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3933    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3934    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb)
    3936    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::CommandQueue::enqueueNDRangeKernel (this=0x91534 <kernelFunc+4>,
        kernel=..., offset=..., global=..., local=..., events=0x0,
        event=0xbefffb2c)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2957
    2957    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    2955    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::detail::Wrapper<_cl_kernel*>::operator() (this=0x91530 <kernelFunc>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:1122
    1122    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::NDRange::dimensions (this=0x91550 <kernelFunc+32>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2252
    2252    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::CommandQueue::enqueueNDRangeKernel (this=0x91534 <kernelFunc+4>,
        kernel=..., offset=..., global=..., local=..., events=0x0,
        event=0xbefffb2c)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2958
    2958    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::NDRange::dimensions (this=0x91538 <kernelFunc+8>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2252
    2252    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::CommandQueue::enqueueNDRangeKernel (this=0x91534 <kernelFunc+4>,
        kernel=..., offset=..., global=..., local=..., events=0x0,
        event=0xbefffb2c)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2955
    2955    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::NDRange::operator unsigned int const* (this=0x91550 <kernelFunc+32>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2251
    2251    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::vector<unsigned int, 3u>::operator unsigned int const* (
        this=0x91550 <kernelFunc+32>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:528
    528    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::CommandQueue::enqueueNDRangeKernel (this=0x91534 <kernelFunc+4>,
        kernel=..., offset=..., global=..., local=..., events=0x0,
        event=0xbefffb2c)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2960
    2960    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::NDRange::dimensions (this=0x91568 <kernelFunc+56>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2252
    2252    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::CommandQueue::enqueueNDRangeKernel (this=0x91534 <kernelFunc+4>,
        kernel=..., offset=..., global=..., local=..., events=0x0,
        event=0xbefffb2c)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2955
    2955    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::NDRange::operator unsigned int const* (this=0x91568 <kernelFunc+56>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2251
    2251    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    cl::vector<unsigned int, 3u>::operator unsigned int const* (
        this=0x91568 <kernelFunc+56>)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:528
    528    in /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp
    (gdb) s
    Cannot access memory at address 0x8

    Program received signal SIGILL, Illegal instruction.
    Cannot access memory at address 0x8
    0x000a4c1c in ?? ()
    (gdb) bt
    #0  0x000a4c1c in ?? ()
    #1  0xb6d1d3ec in Coal::BufferEvent::isSubBufferAligned(Coal::MemObject const*, Coal::DeviceInterface const*) () from /usr/lib/libOpenCL.so.1
    #2  0xb6d1f768 in Coal::KernelEvent::KernelEvent(Coal::CommandQueue*, Coal::Kernel*, unsigned int, unsigned int const*, unsigned int const*, unsigned int const*, unsigned int, _cl_event* const*, int*) () from /usr/lib/libOpenCL.so.1
    #3  0xb6d17664 in clEnqueueNDRangeKernel () from /usr/lib/libOpenCL.so.1
    #4  0x0005d8fe in cl::CommandQueue::enqueueNDRangeKernel (
        this=0x91534 <kernelFunc+4>, kernel=..., offset=..., global=...,
        local=..., events=0x0, event=0xbefffb2c)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:2955
    #5  0x0005e580 in cl::KernelFunctor::operator()<int, int, int, float, cl::Buffer*, int, cl::Buffer*, int, float, cl::Buffer*, int, int, int, cl::LocalSpaceArg, cl::Buffer> (this=0x91530 <kernelFunc>, a1=@0xbefffab0: 200704,
        a2=@0xbefffab4: 16, a3=@0xbefffb58: 27, a4=@0xbefffaac: 1,
        a5=@0xbefffad0: 0xd97e0, a6=@0xbefffab0: 200704, a7=@0xbefffad4: 0xd88e8,
        a8=@0xbefffb58: 27, a9=@0xbefffaa8: 1, a10=@0xbefffacc: 0xd9ca0,
        a11=@0xbefffab0: 200704, a12=@0x91508: 32, a13=@0x9150c: 8, a14=...,
        a15=..., events=0x0)
        at /opt/ti-processor-sdk-linux-k2hk-evm-03.00.00.04/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/include/CL/cl.hpp:3936
    #6  0x0005cb38 in __cblas_sgemm (TA=0, TB=0, M=16, N=200704, K=27, ALPHA=1,
    ---Type <return> to continue, or q <return> to quit---  
        A=0x11a418, lda=27, B=0x9ce68008, ldb=200704, BETA=1, C=0x9c227008,
        ldc=200704) at gemm.cpp:284
    #7  0x00011da8 in main () at main.c:224

    From the crash information, it's obvious that some function pointer or data pointer is set by 0x8, a illegal instruction address to go on. In another word, If we can gdb the lib, I think we can easily find which pointer have the illegal value.


    Yordan, could you please do me a favor to request help to your collegue if they had any experience or idea on debugging such issue?


    Thanks so much!!

  • Hi,

    I also came across this:
    #1 0xb6d1d3ec in Coal::BufferEvent::isSubBufferAligned(Coal::MemObject const*, Coal::DeviceInterface const*) () from /usr/lib/libOpenCL.so.1
    when debugging the app with gdb, yesterday. Unfortunately I don't have the sources of the .so library and cannot elaborate further.

    The design team is notified, and should elaborate directly here.

    Best Regards,
    Yordan
  • Hi Hao,

    I looked at your code briefly. It appears that your code has a bug. The kernel functor is expecting MemObject/Buffer references, yet you passed in pointers.

    - Yuan
  • Dear Hao Yang,

    Please take a look on following guide for TI' OPEN CL.

    The 8GB of DDR3 exists in the K2x 36-bit physical address space at addresses 8:0000:0000 to 9:FFFF:FFFF. The K2x device boots with the C66x DSP’s mapping the upper 2GB of its address space 8000:0000 to FFFF:FFFF to the beginning of that physical range. For the remainder of this section, the physical range from 8:0000:0000 to 8:7FFF:FFFF will be referred to as the low 2GB and the range from 8:8000:0000 to 9:FFFF:FFFF will be referred to as the upper 6GB.

    If the entire upper 6GB of memory are configured as Linux system memory and are therefore unavailable to OpenCL, then OpenCL will have 1488MB of memory in the lower 2GB available for OpenCL C programs and Buffers and no further constraints are necessary and the remainder of this section is not applicable. Additionally, if the environment variable TI_OCL_DSP_NOMAP is set, then OpenCL will ignore any CMEM region that is defined in the upper 6GB, and OpenCL operation will be restricted to the lower 2GB and again the remainder of this section is not applicable.

    If there is memory in the upper 6GB that is given to CMEM to manage, then that memory will be available to OpenCL as well and understanding how OpenCL will use that memory is important so an application can maximize resource utilization. The figure below illustrates a potential DDR partition with CMEM in the upper 6GB.

     

    Best regards,

  • Yes, you're right. Yuan.

    When I used buffer reference instead of buffer pointer, the crash disappears. But I'm still confused that the kernel functor ("K_ocl_sgemm_dsp") has the "float*" parameter, why I have to use "Buffer" reference instead of the pointer?

    This is the kernel functor declaration:

    K_ocl_sgemm_dsp(
                    int m, int n, int k,
                    float alpha,
                    global float *a, int lda,
                    global float *b, int ldb,
                    float beta,
                    global float *c, int ldc,
                    int NUMAPANELS, int NUMBPANELS,
                    local  float *L2_buf, global float *Msmc_buf)

    Moreover, I found another issue at the end of program. The program still ends with "Segmentation fault.".  And when I remove "delete program", the issue disappear, but this will cause memory leakage because "program" is allocated by "new" operator. I think the reason may be some resource is still in use by kernel functor. Because when I insert "delete program" before the kernel functor call, there's no crash.

    Is there any other release task to do before delete program?

  • Hi Hao,

    That is just how OpenCL C++ header file (/usr/include/CL/cl.hpp) defines OpenCL C++ objects and APIs. You are welcome to use OpenCL C API, should you prefer.

    - Yuan