This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMDSEVM572X: Linear Algebra Library examples

Part Number: TMDSEVM572X

HI, everyone

I have download the SDK  "ti-processor-sdk-linux-am57xx-evm-03.03.00.04" .

I want to run the LINALG examples  on my board AM572x, so I according  the instructions  ti/ti-linalg-tree/examples/arm+dsp/readme.txt  to build the examples source code

this is the  content of readme.txt:

Build Instructions of LINALG examples:

1. Following environment vaialbes must be set in order to build LINALG examples:
export TARGET_ROOTDIR= <Processor-SDK-Linux installation root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi
export LINALG_DIR= <LINALG_INSTALLATION_ROOT>

2. Just type "make" in examples root folder.

but there are some question ,there is no folder named ‘cortexa15hf-vfp-neon-linux-gnueabi’ in the SDK,

there are only two folders in the ‘sysroots’ root,which are ‘armv7ahf-neon-linux-gnueabi’ and ‘x86_64-arago-linux’.

I know  ‘x86_64-arago-linux’ is arm-gcc cross compiler ,so ‘armv7ahf-neon-linux-gnueabi’ is equal to ‘cortexa15hf-vfp-neon-linux-gnueabi’ ??

I assumption it is, so I compile the examples with TARGET_ROOTDIR= <Processor-SDK-Linux installation root>/linux-devkit/sysroots/armv7ahf-neon-linux-gnueabi, and then, I run the examples on my board AM572x, but it seem that ,the example cannot run correctly, this is the output:

/arm+dsp/dgemm_test# ./dgemm_test
[ 210.860949] NET: Registered protocol family 41
[ 211.578120] omap-iommu 40d01000.mmu: iommu fault: da 0xc45dd780 flags 0x0
[ 211.584942] remoteproc2: crash detected in 40800000.dsp: type mmufault
[ 211.591588] omap-iommu 40d01000.mmu: 40d01000.mmu: errs:0x00000002 da:0xc45dd780 pgd:0xdfab7114 *pgd:px00000000
[ 211.601765] remoteproc2: handling crash #1 in 40800000.dsp
[ 211.607358] remoteproc2: recovering 40800000.dsp
recvfrom failed: Link has been severed (67)
rpmsgThreadFxn: transportGet failed on fd 18, returned -20
dg[ 211.640365] omap_hwmod: mmu1_dsp1: _wait_target_disable failed
emm_test: /home/gtbldadm/processor-sdk-linux-fido-build/build-CO[ 211.652837] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
RTEX_1/arago-tmp-external-linaro-toolchain/work/am57xx_evm-linux[ 211.658891] remoteproc2: stopped remote processor 40800000.dsp
-gnueabi/opencl/1.1.7.2-r0.0.tisdk0/git/host/src/core/dsp/mbox_impl_msgq.cpp:121: uint32_t MBoxMsgQ::read(uint8_t*, uint32_t*, uint8_t): Assertion `msg != __null' failed.
TransportRpmsg_put: send failed: 108 (Cannot send after transport endpoint shutdown)
dgemm_test: /home/gtbldadm/processor-sdk-linux-fido-build/build-CORTEX_1/arago-tmp-external-linaro-toolchain/work/am57xx_evm-l[ 211.700359] remoteproc2: powering up 40800000.dsp
inux-gnueabi/opencl/1.1.7.2-r0.0.tisdk0/git/host/src/core/dsp/mb[ 211.709337] remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 21591344
ox_impl_msgq.cpp:107: void MBoxMsgQ::write(uint8_t*, uint32_t, u[ 211.728101] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
int32_t, uint8_t): Assertion `status == (0)' failed.
[ 211.734063] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
[ 211.744644] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
[ 211.757427] remoteproc2: RSC_INTMEM is deprecated. Please do not use this resource type to support loading into internal memories.
[ 211.774302] remoteproc2: remote processor 40800000.dsp is now up
[ 211.781138] virtio_rpmsg_bus virtio3: rpmsg host is online
[ 211.786694] remoteproc2: registered virtio3 (type 7)

the board will  hold on there and never continue ,I don't know the reason, please help me, thank you very much!!

  • Thank you for you reply, I have readed this wiki, but it cannot resolve my problem.
    the printf information show some error such as

    [ 211.578120] omap-iommu 40d01000.mmu: iommu fault: da 0xc45dd780 flags 0x0
    dg[ 211.640365] omap_hwmod: mmu1_dsp1: _wait_target_disable failed

    Dose it suggest my board have some problem??
  • Hi ,

    >>omap-iommu 40d01000.mmu: iommu fault: da 0xc45dd780 flags 0x0
    This indicates that you might have a wrong DSP1 image? Can you list the files under /lib/firmware?

    Regards,
    Garrett
  • thank you for you reply, this is the files under /lib/firmware

    root@am57xx-evm:/lib/firmware# ls -al
    total 132736
    drwxr-xr-x 4 root root 4096 Jan 28 20:53 .
    drwxr-xr-x 8 root root 4096 Jan 28 23:59 ..
    -rw-r--r-- 1 root root 4831660 Jan 28 19:56 NameServerApp.xe66
    lrwxrwxrwx 1 root root 49 Jan 28 20:53 am57xx-pru1_0-fw -> /lib/firmware/pru/PRU_RPMsg_Echo_Interrupt1_0.out
    lrwxrwxrwx 1 root root 49 Jan 28 20:53 am57xx-pru1_1-fw -> /lib/firmware/pru/PRU_RPMsg_Echo_Interrupt1_1.out
    lrwxrwxrwx 1 root root 49 Jan 28 20:53 am57xx-pru2_0-fw -> /lib/firmware/pru/PRU_RPMsg_Echo_Interrupt2_0.out
    lrwxrwxrwx 1 root root 49 Jan 28 20:53 am57xx-pru2_1-fw -> /lib/firmware/pru/PRU_RPMsg_Echo_Interrupt2_1.out
    lrwxrwxrwx 1 root root 46 Jan 28 20:53 dra7-dsp1-fw.xe66 -> /lib/firmware/dra7-dsp1-fw.xe66.opencl-monitor
    -rw-r--r-- 1 root root 863516 Jan 28 20:03 dra7-dsp1-fw.xe66.dspdce-fw
    -rwxr-xr-x 1 root root 6937149 Jan 28 19:14 dra7-dsp1-fw.xe66.ipc-test-fw
    -rw-r--r-- 1 root root 21591344 Jan 28 20:17 dra7-dsp1-fw.xe66.opencl-monitor
    lrwxrwxrwx 1 root root 46 Jan 28 20:53 dra7-dsp2-fw.xe66 -> /lib/firmware/dra7-dsp2-fw.xe66.opencl-monitor
    -rwxr-xr-x 1 root root 6937149 Jan 28 19:14 dra7-dsp2-fw.xe66.ipc-test-fw
    -rw-r--r-- 1 root root 21591344 Jan 28 20:24 dra7-dsp2-fw.xe66.opencl-monitor
    lrwxrwxrwx 1 root root 43 Jan 28 20:53 dra7-ipu1-fw.xem4 -> /lib/firmware/dra7-ipu1-fw.xem4.ipc-test-fw
    -rwxr-xr-x 1 root root 6142375 Jan 28 19:14 dra7-ipu1-fw.xem4.ipc-test-fw
    -rw-r--r-- 1 root root 3485072 Jan 28 20:00 dra7-ipu2-fw.xem4
    -rwxr-xr-x 1 root root 6142375 Jan 28 19:14 dra7-ipu2-fw.xem4.ipc-test-fw
    lrwxrwxrwx 1 root root 46 Jan 28 20:53 ducati-m3-core0.xem3 -> /lib/firmware/ducati-m3-core0.xem3.ipc-test-fw
    -rwxr-xr-x 1 root root 5818304 Jan 28 19:14 ducati-m3-core0.xem3.ipc-test-fw
    -rw-r--r-- 1 root root 4725924 Jan 28 19:56 fault.xe66
    -rw-r--r-- 1 root root 5647192 Jan 28 19:56 gatempapp.xe66
    -rw-r--r-- 1 root root 4757056 Jan 28 19:56 messageq_multi.xe66
    -rw-r--r-- 1 root root 4767792 Jan 28 19:56 messageq_multimulti.xe66
    -rw-r--r-- 1 root root 4744640 Jan 28 19:56 messageq_single.xe66
    -rw-r--r-- 1 root root 4492548 Jan 28 19:56 ping_rpmsg.xe66
    -rw-r--r-- 1 root root 4526940 Jan 28 19:56 ping_tasks.xe66
    drwxr-xr-x 2 root root 4096 Jan 28 20:49 pru
    lrwxrwxrwx 1 root root 41 Jan 28 20:53 tesla-dsp.xe64T -> /lib/firmware/tesla-dsp.xe64T.ipc-test-fw
    -rwxr-xr-x 1 root root 6029943 Jan 28 19:14 tesla-dsp.xe64T.ipc-test-fw
    -rw-r--r-- 1 root root 5837608 Jan 28 19:56 test_omx_dsp1_vayu.xe66
    -rw-r--r-- 1 root root 5727288 Jan 28 19:56 test_omx_dsp2_vayu.xe66
    drwxr-xr-x 2 root root 4096 Jan 28 20:51 ti-connectivity
    -rw-r--r-- 1 root root 4002 Jan 28 19:28 vpdma-1b8.bin
    root@am57xx-evm:/lib/firmware#

    And now, I realize the output information of my board is chaos, I have revised  it, it should be this:

    /arm+dsp/dgemm_test# ./dgemm_test
    [ 210.860949] NET: Registered protocol family 41
    [ 211.578120] omap-iommu 40d01000.mmu: iommu fault: da 0xc45dd780 flags 0x0
    [ 211.584942] remoteproc2: crash detected in 40800000.dsp: type mmufault
    [ 211.591588] omap-iommu 40d01000.mmu: 40d01000.mmu: errs:0x00000002 da:0xc45dd780 pgd:0xdfab7114 *pgd:px00000000
    [ 211.601765] remoteproc2: handling crash #1 in 40800000.dsp
    [ 211.607358] remoteproc2: recovering 40800000.dsp
    recvfrom failed: Link has been severed (67)
    rpmsgThreadFxn: transportGet failed on fd 18, returned -20
    [ 211.640365] omap_hwmod: mmu1_dsp1: _wait_target_disable failed

    dgemm_test: /home/gtbldadm/processor-sdk-linux-fido-build/build-CORTEX_1/arago-tmp-external-linaro-toolchain/ \
    work/am57xx_evm-linux-gnueabi/opencl/1.1.7.2-r0.0.tisdk0/git/host/src/core/dsp/ \
    mbox_impl_msgq.cpp:121: uint32_t MBoxMsgQ::read(uint8_t*, uint32_t*, uint8_t): Assertion `msg != __null' failed.

    [ 211.652837] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [ 211.658891] remoteproc2: stopped remote processor 40800000.dsp
    TransportRpmsg_put: send failed: 108 (Cannot send after transport endpoint shutdown)

    dgemm_test: /home/gtbldadm/processor-sdk-linux-fido-build/build-CORTEX_1/arago-tmp-external-linaro-toolchain/work/ \
    am57xx_evm-linux-gnueabi/opencl/1.1.7.2-r0.0.tisdk0/git/host/src/core/dsp/ \
    mbox_impl_msgq.cpp:107: void MBoxMsgQ::write(uint8_t*, uint32_t,uint32_t, uint8_t): Assertion `status == (0)' failed.

    [ 211.700359] remoteproc2: powering up 40800000.dsp
    [ 211.709337] remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 21591344
    [ 211.728101] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [ 211.734063] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [ 211.744644] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [ 211.757427] remoteproc2: RSC_INTMEM is deprecated. Please do not use this resource type to support loading into internal memories.
    [ 211.774302] remoteproc2: remote processor 40800000.dsp is now up
    [ 211.781138] virtio_rpmsg_bus virtio3: rpmsg host is online
    [ 211.786694] remoteproc2: registered virtio3 (type 7)
    /arm+dsp/dgemm_test#

    I realize that it was not hold on the end, but quit because of error  , it indicates mbox_impl_msgq.cpp has some error,

    but I am not find the path "/home/gtbldadm/processor-sdk-linux-fido-build/......" in my board, can you give me some suggestion?

    Thank you very much!

  • Hi, I have check the board and my operate,I don't know how to solve this problem, can you give me any suggestion? thank you very much
  • Hi, I have check the board and my operate,I don't know how to solve this problem, can you give me any suggestion? thank you very much
  • Hi,

    Can you try a fresh installation of the latest PLSDK 4.1 to see if the problem persists?

    software-dl.ti.com/.../index_FDS.html

    Thanks,
    Garrett
  • thank you very much,this resolved my issue, but I have a another problem.
    I run the example dgemm(Matrix multiplication) on my board AM572x(two ARM cores and two DSP cores), the example performance is not same with the table show.
    www.ti.com/.../linear-algebra.html

    the doc of LINALG explian that BLAS can be configured to run on either ARM or DSP (offloading). When BLAS runs on ARM, it can be configured to run on 1 or more cores. When BLAS runs on DSP, it will always run on all cores.
    when I configure example(dgemm) run on one ARM core, it spend more time than run on DSP, same with the table show speedup 1.4x when moving code from ARM to DSP.
    But when I configure example(dgemm) run on two ARM core, it spend less time than run on DSP.
    I guess there's only one DSP core working, so I rebuild the LINALG, but I don't resolve this issue,
    can you give me any suggestion, thank you!!
  • Hi,

    Can you provide the exact numbers of your testing results: time when BLAS runs on DSP, time when running on ARM only? Also, did you use the dgemm example as is, i.e. matrix is 1000x1000? 

    Thanks.

    Jianzhong Xu

  • dgemm_time_ARM_1_core.dat

    dgemm_time_ARM_2_cores.datdgemm_time_DSP.dat

    the  three files is output by the dgemm example

    the content of  'dgemm_time_ARM_1_core.txt'  'dgemm_time_ARM_2_cores.txt' 'dgemm_time_DSP.txt' respectively express the  time consum on ARM(only) ARM(two cores) DSP

  • the table show dgemm example consume time on 2-ARM and 2-DSP are 0.786s and 0.55s when matrix size M=N=K=1000 . but the result I get is different , I run the dgemm example on my device AM572x, it consume time on 2-ARM and 2-DSP are 0.435s and 0.64s when matrix size M=N=K=1024. Only run on one ARM , consume time 0.844s.
  • Hi,

    Which platform are you running the degemm with? Is it Industrial SDK or GP EVM (www.ti.com/.../TMDSEVM572X)
    Also are you running general Linux or RT Linux on board?
    The benchmark on ti.com was collected with very early PLSDK release on GP EVM.

    Did you rebuild the dgemm example or use the run time options - BLIS_IC_NT, TI_CBLAS_OFFLOAD to configure 1 ARM /2 ARM/ 2DSP?

    Regards,
    Garrett
  • I am sure the board is TMDSEVM572X(two ARM cores and two DSP cores), it runningn general Linux which is install from the SDK
    I am used the sdk : PROCESSOR-SDK-LINUX-AM57X 04_02_00_09
    software-dl.ti.com/.../index_FDS.html
    I rebuild the dgemm example
    I configure BLIS_IC_NT=1, TI_CBLAS_OFFLOAD=000 ,the dgemm example consume 0.84s on ARM core when matrix size is 1024(M=N=K=1024)
    I configure BLIS_IC_NT=2, TI_CBLAS_OFFLOAD=000 ,the dgemm example consume 0.43s on 2-ARM cores when matrix size is 1024(M=N=K=1024)
    about run on DSP(TI_CBLAS_OFFLOAD=001), it always consume 0.64s when matrix size is 1024(M=N=K=1024)
    I have uploaded the output files of this example, you can see them at previous reply.
    Maybe the benchmark on ti.com is not fit with TMDSEVM572X
    www.ti.com/.../linear-algebra.html
    thank you very much!

  • Hi,

    The dgemm benchmark result varies depending on Linux release (RT-Linux vs. Linux). You will see its performance boost when offloading the computation to DSP with RT-Linux which typically runs on the GP EVM.

    1 Arm:
    1024 1024 1024 1.45012045e+00

    2 Arm:
    1024 1024 1024 7.45487332e-01

    DSP:
    1024 1024 1024 6.46519780e-01

    Regards, Garrett

  • Thank you for your reply, my board is running the Linux (not RT-Linux ). I got the result ( dgemm ),

    1 ARM

    1024 1024 1024  8.44199836e-01

    2 ARM

    1024 1024 1024  4.35683638e-01

    DSP

    1024 1024 1024  6.45739794e-01

    so its performance not boost when offloading the computation to DSP compare with Linux, it always 0.64s,

    it's just that RT-Linux don't  exert the performance of ARM core(A15)

  • Hi,

    Yes, with the particular dgemm example, computation time in DSP is fixed no matter Linux or RT-Linux, but varies on ARM depending on Linux or RT-Linux.

    Regards,
    Garrett