This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3354: Crash of SGX PVR driver on processor SDK 03.02.00.05

Part Number: AM3354

Tool/software: Linux

Hi

    we have a product running on AM3354 + processor SDK 03.02.00.05

   recently i encountered a problem with the following error log:

|  _  |___ ___ ___ ___   |  _  |___ ___  |_|___ ___| |_
|     |  _| .'| . | . |  |   __|  _| . | | | -_|  _|  _|
|__|__|_| |__,|_  |___|  |__|  |_| |___|_| |___|___|_|  
              |___|                    |___|            

Arago Project http://arago-project.org am335x-evm ttyO0

Arago 2016.10 am335x-evm ttyO0

am335x-evm login: random: nonblocking pool is initialized
cpsw 4a100000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
can: controller area network core (rev 20120528 abi 9)
NET: Registered protocol family 29
can: raw protocol (rev 20120528)
PVR_K:(Error): SGXOSTimer() detected SGX lockup (0x218a tasks)
PVR_K: HWRecoveryResetSGX: SGX Hardware Recovery triggered
PVR_K: SGX debug (SGX_DDK sgxddk MAIN@3699939)
PVR_K:(Error): SGX Register Base Address (Linear):   0xe0c60000
PVR_K:(Error): SGX Register Base Address (Physical): 0x56000000
PVR_K: Running SGXREG Debug Scripts:
PVR_K: (P0)
PVR_K:  (SGXREG) 0x00000000 : 0x00222220
PVR_K:  (SGXREG) 0x00000004 : 0x00101100
PVR_K:  (SGXREG) 0x00000118 : 0x00000010
PVR_K:  (SGXREG) 0x0000012C : 0x20000000
PVR_K:  (SGXREG) 0x000004E0 : 0x00000000
PVR_K:  (SGXREG) 0x000004E4 : 0x00000000
PVR_K:  (SGXREG) 0x00000658 : 0x00000000
PVR_K:  (SGXREG) 0x00000A74 : 0x09BF8200
PVR_K:  (SGXREG) 0x00000C04 : 0x00004002
PVR_K:  (SGXREG) 0x00000C08 : 0x00224000
PVR_K:  (SGXREG) 0x00000E04 : 0x00000000
PVR_K:  (SGXREG) 0x00000624 : 0x00000200
PVR_K:  (SGXREG) 0x00000628 : 0x000103ED
PVR_K:  (SGXREG) 0x00000630 : 0x00000005
PVR_K:  (SGXREG) 0x00000734 : 0x00000000
PVR_K:  (SGXREG) 0x00000AA4 : 0xAAAAAAAA
PVR_K:  (SGXREG) 0x00000AA8 : 0xAAAAAAAA
PVR_K:  (SGXREG) 0x00000B08 : 0x000142A0
PVR_K:  (SGXREG) 0x00000B14 : 0x00010A3A
PVR_K:  (SGXREG) 0x00000B0C : 0x00008907
PVR_K:  (SGXREG) 0x00000B18 : 0x0000212C
PVR_K:  (SGXREG) 0x00000B10 : 0x0001218A
PVR_K:  (SGXREG) 0x00000B1C : 0x00010000
PVR_K: SGX Register Dump:
PVR_K: (P0) EUR_CR_CORE_ID:          01120000
PVR_K: (P0) EUR_CR_CORE_REVISION:    00010205
PVR_K: (P0) EUR_CR_EVENT_STATUS:     20000000
PVR_K: (P0) EUR_CR_EVENT_STATUS2:    00000010
PVR_K: (P0) EUR_CR_BIF_CTRL:         00000000
PVR_K: (P0) EUR_CR_BIF_INT_STAT:     00004002
PVR_K: (P0) EUR_CR_BIF_FAULT:        00224000
PVR_K: (P0) EUR_CR_BIF_MEM_REQ_STAT: 00000000
PVR_K: (P0) EUR_CR_CLKGATECTL:       00222220
PVR_K: (P0) EUR_CR_PDS_PC_BASE:      00302A0C
PVR_K: Found MMU context for page fault 0x00224000
PVR_K: GPU memory context is for PID=443 (eN-Display)
PVR_K: PDE valid: PTE = 0x00000000 (PhysAddr = 0x00000000, Invalid)
PVR_K:  Host Ctl flags= 00000006
PVR_K: SGX Host control:
PVR_K:  (HC-0) 0x00000001 0x00000000 0x00000000 0x00000001
PVR_K:  (HC-10) 0x00000000 0x00000001 0x0000000A 0x00030D40
PVR_K:  (HC-20) 0x00000000 0x00000000 0x00000001 0x00000000
PVR_K:  (HC-30) 0x00000000 0x03ACA371 0xEF1E40D0 0x00000000
PVR_K:  (HC-40) 0x00000000 0x00000000 0x01AF6E21 0x00000000
PVR_K: SGX TA/3D control:
PVR_K:  (T3C-0) 0x0F003000 0x0F003140 0x0F002000 0x00000000
PVR_K:  (T3C-10) 0x00000000 0x00000000 0x00000002 0x00000000
PVR_K:  (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-80) 0x00000000 0x00000000 0x0F00AF60 0x0F000000
PVR_K:  (T3C-90) 0x9B366000 0x0F090880 0x0F00AF60 0x0F08B920
PVR_K:  (T3C-A0) 0x0F00AEA0 0x0F00AFA4 0x0F08B920 0x00000000
PVR_K:  (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-C0) 0x00000000 0x00000000 0x002BF841 0x002BF840
PVR_K:  (T3C-D0) 0x0F000000 0x8000B000 0x8004B000 0x0F004000
PVR_K:  (T3C-E0) 0x0F00A420 0x0F00A740 0x0F08B000 0x0F08B000
PVR_K:  (T3C-F0) 0x00000000 0x000001BB 0x000001BB 0x00000000
PVR_K:  (T3C-100) 0x00000003 0x00000000 0x00000000 0x00000001
PVR_K:  (T3C-110) 0x00000000 0x00000000 0x00000000 0x00000000
PVR_K:  (T3C-120) 0x0F00AEA0 0x0F090880 0x00000000 0x00000000
PVR_K: SGX Kernel CCB WO:0x74 RO:0x74
PVR_K: Active syncs
PVR_K:  SyncInfo 5:
PVR_K:          Write ops (0x0d8010cc): P/C = 960534/960533 (0x000ea816/0x000ea815)
PVR_K:          Read ops (0x0d8010d4): P/C = 0/0 (0x00000000/0x00000000)
PVR_K:          Read ops 2 (0x0d8010dc): P/C = 0/0 (0x00000000/0x00000000)
PVR_K:  SyncInfo 2:
PVR_K:          Write ops (0x0d801054): P/C = 66/66 (0x00000042/0x00000042)
PVR_K:          Read ops (0x0d80105c): P/C = 2881613/2881612 (0x002bf84d/0x002bf84c)
PVR_K:          Read ops 2 (0x0d801064): P/C = 0/0 (0x00000000/0x00000000)
PVR_K:  SyncInfo 0:
PVR_K:          Write ops (0x0d801004): P/C = 2881613/2881612 (0x002bf84d/0x002bf84c)
PVR_K:          Read ops (0x0d80100c): P/C = 0/0 (0x00000000/0x00000000)
PVR_K:          Read ops 2 (0x0d801014): P/C = 0/0 (0x00000000/0x00000000)

 The Qt5 application is running normally though.

Can anyone tell me what's wrong here? what does these PVR_K errors mean?

thanks

semiyd

  • Hello Semiyd,

    As indicated by the logs, PVR_K errors is an indication that the SGX has crashed and the kernel is attempting to reset the SGX. Could you please share your steps for running the Qt5 application? Is your application using the SGX?

    Regards,
    Krunal
  • Hi Krunal
    Yes our application is using SGX (eglfs) and is running on Qt5.6.2

    Here are the steps:


    export QTDIR=/usr/lib
    export LD_LIBRARY_PATH=$QTDIR/
    export QT_QPA_PLATFORM=eglfs:/dev/fb0
    export QT_QPA_EVDEV_KEYBOARD_PARAMETERS="/dev/input/event0"
    export QT_QPA_FONTDIR=$QTDIR/fonts
    export QT_PLUGIN_PATH=$QTDIR/qt5/plugins

    #Start up Application

    cd /home/application
    ./application >> /var/bk/log_dsp.txt 2>&1 &
  • Hello Semiyd,

    Are you observing any crashes while running the default TI provided QT examples? I am able to run the "hellogl2" example located under "/usr/share/qt5/examples/opengl/hellogl2" without any issues.

    Regards,
    Krunal
  • Hi Krunal:

         Yes i run the many TI demos during development.

         The crash is a rare case. On a daily basis our application runs fine. Just like the demos.

         But sometimes there is this crash.

         I just wanted to know like:

         What detailed information can be extracted from the crash log ?

         And what are the possiblities that might be triggering this crash.

        So that we can take some measures against it.

    thanks

    semiyd

  • semiyd,

    Can you try running your application on a TI development board such as the AM335x EVM?  I would start by using binaries from SDK 3.02:

    http://software-dl.ti.com/processor-sdk-linux/esd/AM335X/03_02_00_05/index_FDS.html

    If you can reproduce the issue on a TI EVM, then that suggests a software issue.  In that case we can take a newer SDK on the same EVM and see if the issue is still reproduced.  That would at least give you confidence as to whether upgrading might help with this issue.

    On the other hand, if you cannot reproduce the issue on the TI EVM, that might suggest a hardware issue.  In that case we might need to look at things like stability of vdd_core and/or whether you might have a very subtle DDR issue.  Do all of your boards exhibit this issue infrequently, or do only some boards seem to have the issue?

    Best regards,
    Brad

  • Hi Brad:

         Thank you for the detailed suggestion!

         i thought there's no update on this thread .So i haven't check it out for quite a while...just saw your reply today.

         so far we've only experienced this phenomenon once on one board only. Meaning it happens very rarely.

         I do have a small am335x evm from TI.

         I will have our test team to run our own application on TI evm to see if the problem would occur...

    regards

    semiyd

  • Hi Semiyd,

    The SGX lockup can happen with memory corruption too or any other system issues. Since the SGX driver is verbose, it prints the error. There may be other IPs in the system, which may also be not running properly but might not be printing errors. I would also recommend to do DDR testing on the board.

    Regards,
    Manisha
  • Thank you Manisha

    I checked and it seems we've done the ddr test including:

    TI EDMA test in CCS IDE

    memtest application test after linux is up and running

  • Hi Semiyd, As mentioned earlier, there can be other issues in the system like memory corruption or could be power related issue or something else. You need to identify first if this is hardware issue (happening always on certain boards only) or software issue and take it from there.