I'm hoping to find some pointers to debugging an issue I'm having running OpenCL code on an AM5728.
The gist of the problem is that the example code is compiling and running fine up until the point of validating the data resulting from the DSP kernels. Each of the examples I've tried seem to have the same problem. Work is being done on the DSP, so the firmware is loading correctly and LAD is queueing up messages just fine, but the large chunks of data all are corrupt. I would guess that there's something wrong with the cmem configuration, but everything looks to be configured correctly.
I'm running kernel 4.1.13 on a Compulab CL-SOM-AM57x with AM5728 running SDK version 2.00.01.07 - Unfortunately that's the most recent SDK they have "officially" running - even though I'm not 100% sure their documentation is complete - I've had to fill in some holes here and there.
I noticed that for some other folks having issues with earlier versions of the SDK, the recommendation is to update the SDK. I'd be happy to update to a newer version of the SDK if anyone can comment on which versions will work with a 4.1 kernel. I can manage recompiling everything in the SDK, but it will be more difficult to upgrade the kernel to 4.4 at this point since I don't have all of Compulab's configurations for that kernel.
To illustrate some of the issues, here are the output of a few different examples...
The one with the most useful output seems to be the vecadd_md. Here's what it looks like:
$ ./vecadd_md DEVICE 0: TI Multicore C66 DSP === Method 1: Using ReadBuffer/WriteBuffer APIs === Failed at Element 1: 8 != 4 Failed at Element 2: 16 != 8 Failed at Element 3: 24 != 12 Failed at Element 4: 32 != 16 Failed at Element 5: 40 != 20 Failed at Element 6: 48 != 24 Failed at Element 7: 56 != 28 Failed at Element 8: 64 != 32 Failed at Element 9: 72 != 36 Method 1: 1130 micro seconds DEVICE 0: Write BufA : Queue to Submit: 9 us Write BufA : Submit to Start : 26 us Write BufA : Start to End : 42 us Write BufB : Queue to Submit: 58 us Write BufB : Submit to Start : 34 us Write BufB : Start to End : 25 us Kernel Exec : Queue to Submit: 1 us Kernel Exec : Submit to Start : 18 us Kernel Exec : Start to End : 279 us Read BufDst : Queue to Submit: 276 us Read BufDst : Submit to Start : 125 us Read BufDst : Start to End : 23 us Fail with 8191 errors! === Method 2: Using MapBuffer/UnmapBuffer APIs === Failed at Element 1: 8 != 4 Failed at Element 2: 16 != 8 Failed at Element 3: 24 != 12 Failed at Element 4: 32 != 16 Failed at Element 5: 40 != 20 Failed at Element 6: 48 != 24 Failed at Element 7: 56 != 28 Failed at Element 8: 64 != 32 Failed at Element 9: 72 != 36 Method 2: 859 micro seconds DEVICE 0: Map BufA : Queue to Submit: 1 us Map BufA : Submit to Start : 16 us Map BufA : Start to End : 2 us Map BufB : Queue to Submit: 115 us Map BufB : Submit to Start : 5 us Map BufB : Start to End : 1 us Unmap BufA : Queue to Submit: 1 us Unmap BufA : Submit to Start : 18 us Unmap BufA : Start to End : 10 us Unmap BufB : Queue to Submit: 1 us Unmap BufB : Submit to Start : 14 us Unmap BufB : Start to End : 6 us Kernel Exec : Queue to Submit: 1 us Kernel Exec : Submit to Start : 12 us Kernel Exec : Start to End : 234 us Map BufDst : Queue to Submit: 240 us Map BufDst : Submit to Start : 49 us Map BufDst : Start to End : 4 us Unmap BufDst : Queue to Submit: 2 us Unmap BufDst : Submit to Start : 15 us Unmap BufDst : Start to End : 2 us Fail with 8191 errors!
Something like fftlib doesn't report any errors, but I don't think the code is doing any validation of the results. However it shows that at least some work is being done by the DSPs, so that much is working.
$ ./dsplib_fft Offloading FFT (SP,Complex) of 64K elements... Write X : Queue to Submit: 8 us Write X : Submit to Start : 26 us Write X : Start to End : 935 us Twiddle : Queue to Submit: 953 us Twiddle : Submit to Start : 42 us Twiddle : Start to End : 823 us FFT : Queue to Submit: 1271 us FFT : Submit to Start : 5 us FFT : Start to End : 13169 us Read Y : Queue to Submit: 14430 us Read Y : Submit to Start : 134 us Read Y : Start to End : 858 us Done!
Okay, so some of the examples report this type of error:
sudo ./dspheap [host ] DDR heap size 16384k recvfrom failed: Link has been severed (67) rpmsgThreadFxn: transportGet failed on fd 12, returned -20 TIOCL FATAL: Communication to a DSP has been lost (likely due to an MMU fault). Please wait while the DSPs are reset and the runtime attempts to terminate. A reboot may be required before running another OpenCL application if this fails. See the kernel log for fault information.
Looking in the kernel log:
[ 95.542808] omap-iommu 40d01000.mmu: iommu fault: da 0xc0011540 flags 0x0 [ 95.549636] remoteproc2: crash detected in 40800000.dsp: type mmufault [ 95.556289] omap-iommu 40d01000.mmu: 40d01000.mmu: errs:0x00000002 da:0xc0011540 pgd:0xec09b000 *pgd:px00000000 [ 95.566472] remoteproc2: handling crash #1 in 40800000.dsp [ 95.572107] remoteproc2: recovering 40800000.dsp
I'm not sure if that's directly related to the other issue, but it would indicate that something is wrong.
Thanks for any pointers.
Scott