This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OMAP3530 ES2.1 Graphics SDK 4_03_00_01 problem

Other Parts Discussed in Thread: OMAP3530, SYSCONFIG, UKERNEL

Hello,

we are having problems with latest graphics driver release, the gles programs hang shortly after startup. 4.00.00.01 and earlier releases don't have this problem. Tested on arago 2.6.37 and our custom 2.6.27 kernels with exactly same results.Does the driver still support OMAP3530?

Here is gfx_check output:

WSEGL settings
[default]
WindowSystem=libpvrPVR2D_FRONTWSEGL.so
------
ARM CPU information
Processor       : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 498.87
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0xc08
CPU revision    : 2

Hardware        : Pandora Handheld Console
Revision        : 0020
Serial          : 0000000000000000
------
SGX driver information
Version 1.6.16.3977 (release) /home/notaz/stuff/sgx_4_03/GFX_Linux_KM
System Version String: SGX revision = 1.0.3
------
Framebuffer settings

mode "800x480"
    geometry 800 480 800 480 16
    timings 0 0 0 0 0 0 0
    rgba 5/11,6/5,5/0,0/0
endmode

Frame buffer device information:
    Name        : omapfb
    Address     : 0x8f800000
    Size        : 3072000
    Type        : PACKED PIXELS
    Visual      : TRUECOLOR
    XPanStep    : 1
    YPanStep    : 1
    YWrapStep   : 0
    LineLength  : 1600
    Accelerator : No
------
Rotation settings
0
------
Kernel Module information
Module                  Size  Used by
omaplfb                 8929  0
bufferclass_ti          5200  0
pvrsrvkm              163187  2 omaplfb,bufferclass_ti
------
Boot settings
debug root=/dev/mmcblk0p2 rw rootdelay=2 console=ttyO2,115200n8 vram=6272K omapfb.vram=0:3000K
------
Linux Kernel version
Linux omap3-pandora 2.6.37-rc7-25612-g08e6a91-dirty #5 Fri Feb 25 00:09:45 EET 2011 armv7l GNU/Linux

  • Thank you for the detailed post. It really helps to know the details before we dive in.

    Can you also confirm if the same issue exists in FLIP mode (need to change in /etc/powervr.ini) ?

     

  • Yes, although it took a bit longer to hang (a bit above minute instead of several seconds).

    A bit more info: when hung app is killed, all other GLES apps hang as soon as they are started. Reloading the kernel modules and rerunning pvrsrvinit makes everything work again, but only for limited time (up to a minute or two). Attempting to re-run GLES apps too many times when the driver is in bad state results in a kernel crash:

    [ 1094.079803] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
    ...
    [ 1094.116485] PC is at PVRSRVProcessQueues+0x100/0x2d8 [pvrsrvkm]
    [ 1094.122436] LR is at 0xcc740120
    [ 1094.433410] [<bf006434>] (PVRSRVProcessQueues+0x0/0x2d8 [pvrsrvkm]) from [<bf00cbf0>] (PVRSRVMISR+0x30/0x68 [pvrsrvkm])
    [ 1094.444427] [<bf00cbc0>] (PVRSRVMISR+0x0/0x68 [pvrsrvkm]) from [<bf0151d0>] (MISRWrapper+0x10/0x14 [pvrsrvkm])
    [ 1094.454681]  r4:00000000
    [ 1094.457244] [<bf0151c0>] (MISRWrapper+0x0/0x14 [pvrsrvkm]) from [<c005889c>] (tasklet_action+0x70/0xb0)
    [ 1094.466796] [<c005882c>] (tasklet_action+0x0/0xb0) from [<c0058d80>] (__do_softirq+0x60/0xc0)
    [ 1094.475402]  r7:c04a3300 r6:0000000a r5:00000001 r4:c04a3368
    [ 1094.481140] [<c0058d20>] (__do_softirq+0x0/0xc0) from [<c0058e24>] (irq_exit+0x44/0x4c)
    [ 1094.489227] [<c0058de0>] (irq_exit+0x0/0x4c) from [<c002d050>] (__exception_text_start+0x50/0x68)
    [ 1094.498168] [<c002d000>] (__exception_text_start+0x0/0x68) from [<c002d8b0>] (__irq_svc+0x30/0x80)
    [ 1094.507202] Exception stack(0xcf04fd38 to 0xcf04fd80)
    [ 1094.512298] fd20:                                                       00001e4c 00000001
    [ 1094.520660] fd40: 00000728 00000000 2ea2e174 00000001 00000000 00000064 000f4240 00000001
    [ 1094.529022] fd60: 00000001 cf04fd8c cf04fd90 cf04fd80 bf0153b4 c01d73b0 20000013 ffffffff
    [ 1094.537414]  r6:00000000 r5:d8200000 r4:ffffffff
    [ 1094.542083] [<bf0153a4>] (OSWaitus+0x0/0x14 [pvrsrvkm]) from [<bf00d394>] (PollForValueKM+0x5c/0x98 [pvrsrvkm])
    [ 1094.552429] [<bf00d338>] (PollForValueKM+0x0/0x98 [pvrsrvkm]) from [<bf0175ec>] (SGXScheduleCCBCommand+0x94/0x1f4 [pvrsrvkm])
    [ 1094.563995] [<bf017558>] (SGXScheduleCCBCommand+0x0/0x1f4 [pvrsrvkm]) from [<bf0177ac>] (SGXScheduleCCBCommandKM+0x60/0xac [pvrsrvkm])
    [ 1094.576354] [<bf01774c>] (SGXScheduleCCBCommandKM+0x0/0xac [pvrsrvkm]) from [<bf015b98>] (SGXSubmitTransferKM+0x228/0x2b4 [pvrsrvkm])
    [ 1094.588653] [<bf015970>] (SGXSubmitTransferKM+0x0/0x2b4 [pvrsrvkm]) from [<bf01dbcc>] (SGXSubmitTransferBW+0x160/0x170 [pvrsrvkm])
    [ 1094.600677]  r6:cf0c0000 r5:00000002 r4:cf0c0000
    [ 1094.605316] [<bf01da6c>] (SGXSubmitTransferBW+0x0/0x170 [pvrsrvkm]) from [<bf019c10>] (BridgedDispatchKM+0xe0/0x138 [pvrsrvkm])
    [ 1094.617095] [<bf019b30>] (BridgedDispatchKM+0x0/0x138 [pvrsrvkm]) from [<bf015800>] (PVRSRV_BridgeDispatchKM+0x11c/0x174 [pvrsrvkm])
    [ 1094.629302]  r8:c002dea8 r7:0000051d r6:cdeea3c0 r5:c01c675c r4:bea1adac
    [ 1094.636077] [<bf0156e4>] (PVRSRV_BridgeDispatchKM+0x0/0x174 [pvrsrvkm]) from [<c00af9a8>] (vfs_ioctl+0x38/0x7c)
    [ 1094.646362]  r7:cdeea3c0 r6:bea1adac r5:c01c675c r4:cdeea3c0
    [ 1094.652069] [<c00af970>] (vfs_ioctl+0x0/0x7c) from [<c00afc48>] (do_vfs_ioctl+0x25c/0x274)
    [ 1094.660430]  r6:00000005 r5:bea1adac r4:cdeea3c0
    [ 1094.665100] [<c00af9ec>] (do_vfs_ioctl+0x0/0x274) from [<c00afca0>] (sys_ioctl+0x40/0x64)
    [ 1094.673339]  r6:c01c675c r5:bea1adac r4:00000005
    [ 1094.678009] [<c00afc60>] (sys_ioctl+0x0/0x64) from [<c002dd00>] (ret_fast_syscall+0x0/0x2c)
    [ 1094.686431]  r7:00000036 r6:000d4258 r5:bea1b380 r4:403168d0
    [ 1094.692169] Code: ea000011 e5923000 e5920004 e5933000 (e593c00c)
    [ 1094.698394] Kernel panic - not syncing: Fatal exception in interrupt

     

  • Thanks for all the details.

    Have you tried with debug build as well? If not can you please try and send us the console logs.

    This is because debug build prints more information which might be helpful to understand the problem better.

    Also can you try the below 2 things -

    1)Increase the Parameter buffer size (ParamBufferSize) in /etc/powervr.ini and then run the application. You can do so as below -

    ParamBufferSize = 16777216 (in /etc/powervr.ini file).

    2)Try with a resolution of 640x480. You can do so as below-

    rmmod omaplfb.ko

    fbset -vxres 640 -vyres480 -xres 640 -yres 480

    insmod omaplfb.ko

    Now run the application.

    Please let us know the observations.

    Thanks,

    Prathap.

     

  • Sorry for the delay, I'm involved in several projects so it takes time.

    I've built the debug driver and after starting the program it prints this:

    [  322.218505] PVR: Installing MISR with cookie bf0541f4
    [  322.223937] PVR: Installing device LISR SGX ISR on IRQ 21 with cookie c7d2eb00
    [  322.231689] PVR: OSUnMapPhysToLin: unmapping 16384 bytes from d0be8000
    [  322.238677] PVR_K:(Warning): SysFinalise: Version string: SGX revision = 1.0.3 [575, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/system/omap3/sysconfig.c]

     

    Now the program runs and hangs sometime between 1-2minutes. Nothing was printed in kernel log, I waited for 5 minutes, then killed it, which produced this output:

    [ 3411.773986] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 3411.789886] PVR_K:(Error): SGXCleanupRequest: Wait for uKernel to clean up (2) failed [569, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]
    [ 3412.813049] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 3412.828918] PVR_K:(Error): SGXCleanupRequest: Wait for uKernel to clean up (3) failed [569, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]
    [ 3413.852081] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 3413.867950] PVR_K:(Error): SGXCleanupRequest: Wait for uKernel to clean up (3) failed [569, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]

     

    Attempting to run the program again resulted in this:

    [ 3532.664611] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 3532.680511] PVR_K:(Error): SGXScheduleCCBCommand: Wait for uKernel to Invalidate BIF cache failed [204, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]
    [ 3532.851989] PVR_K:(Error): SGXOSTimer() detected SGX lockup (0x1d66f tasks) [1220, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxinit.c]
    [ 3532.867126] PVR_K: HWRecoveryResetSGX: SGX Hardware Recovery triggered
    [ 3532.874023] PVR_K: SGX debug (1.6.16.3977)
    [ 3532.878417] PVR_K:(Error): SGX Register Base Address (Linear):   0xD0BE0000 [958, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxinit.c]
    [ 3532.893432] PVR_K:(Error): SGX Register Base Address (Physical): 0x50000000 [959, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxinit.c]
    [ 3532.908477] PVR_K: (P0) EUR_CR_EVENT_STATUS:     20000000
    [ 3532.914154] PVR_K: (P0) EUR_CR_EVENT_STATUS2:    00000000
    [ 3532.919860] PVR_K: (P0) EUR_CR_BIF_CTRL:         00000000
    [ 3532.925567] PVR_K: (P0) EUR_CR_BIF_INT_STAT:     00004002
    [ 3532.931304] PVR_K: (P0) EUR_CR_BIF_FAULT:        0EBFE000
    [ 3532.936981] PVR_K: (P0) EUR_CR_BIF_MEM_REQ_STAT: 00000002
    [ 3532.942687] PVR_K: (P0) EUR_CR_CLKGATECTL:       00212120
    [ 3532.948394] PVR_K: (P0) EUR_CR_PDS_PC_BASE:      00000000
    [ 3532.954132] PVR_K: Flip Command Complete Data 0 for display device 2:
    [ 3532.960906] PVR_K: SGX Host control:
    [ 3532.964721] PVR_K:   (HC-0) 0x00000001 0x00000000 0x00000000 0x00000000
    [ 3532.971618] PVR_K:   (HC-10) 0x00000000 0x00000001 0x0000000A 0x0001B04A
    [ 3532.978607] PVR_K:   (HC-20) 0x00000002 0x00000000 0x00000001 0x00000000
    [ 3532.985626] PVR_K:   (HC-30) 0x0550D4C7 0x00000000 0x9AD04984 0x00001A02
    [ 3532.992614] PVR_K: SGX TA/3D control:
    [ 3532.996459] PVR_K:   (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000
    [ 3533.003479] PVR_K:   (T3C-10) 0x00000000 0x00000002 0x00000000 0x00000000
    [ 3533.010559] PVR_K:   (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.017639] PVR_K:   (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.024749] PVR_K:   (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.031829] PVR_K:   (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.038879] PVR_K:   (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.045959] PVR_K:   (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.053039] PVR_K:   (T3C-80) 0x00000000 0x00000000 0x0F000000 0x8E3E8000
    [ 3533.060119] PVR_K:   (T3C-90) 0x0F08C800 0x00000000 0x0F088060 0x0F007F40
    [ 3533.067230] PVR_K:   (T3C-A0) 0x0F0BC454 0x0F088060 0x00000000 0x00000000
    [ 3533.074310] PVR_K:   (T3C-B0) 0x00000003 0x0000002B 0x00000000 0x00000143
    [ 3533.081390] PVR_K:   (T3C-C0) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.088470] PVR_K:   (T3C-D0) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.095581] PVR_K:   (T3C-E0) 0x00000000 0x00000000 0x00000000 0x00000000
    [ 3533.102661] PVR_K:   (T3C-F0) 0x00001C63 0x00000EF0 0x00000EF0 0x0F000000
    [ 3533.109771] PVR_K:   (T3C-100) 0x80008000 0x80048000 0x0F004000 0x0F007C20
    [ 3533.116912] PVR_K:   (T3C-110) 0x0F002020 0x0F088000 0x0F088000 0x00000000
    [ 3533.124084] PVR_K: SGX Kernel CCB WO:0xF3 RO:0xED
    [ 3533.129211] PVR_K:(Fatal): Debug assertion failed! [504, home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxinit.c]
    [ 3533.141937] kernel BUG at /home/notaz/stuff/sgx_4_03/GFX_Linux_KM/services4/srvkm/env/linux/pvr_debug.c:174!
    [ 3533.152313] Unable to handle kernel NULL pointer dereference at virtual address 00000000
    [ 3533.160858] pgd = c0004000
    [ 3533.404937] [<c0043138>] (__bug+0x18/0x24) from [<bf01d888>] (PVRSRVTrace+0x0/0x124 [pvrsrvkm])
    [ 3533.414398] [<bf01d888>] (PVRSRVTrace+0x0/0x124 [pvrsrvkm]) from [<bf02bb6c>] (HWRecoveryResetSGX+0xac/0x108 [pvrsrvkm])
    [ 3533.425994] [<bf02bb6c>] (HWRecoveryResetSGX+0xac/0x108 [pvrsrvkm]) from [<ce3e9f00>] (0xce3e9f00)

    Prathap Srinivas said:

    Also can you try the below 2 things -

    1)Increase the Parameter buffer size (ParamBufferSize) in /etc/powervr.ini and then run the application. You can do so as below -

    ParamBufferSize = 16777216 (in /etc/powervr.ini file).

    2)Try with a resolution of 640x480. You can do so as below-

    rmmod omaplfb.ko

    fbset -vxres 640 -vyres480 -xres 640 -yres 480

    insmod omaplfb.ko

    Now run the application.

    Tried both and got the same results.

  • Hi,

    We are seeing similar issue with some openGLES2.0 demos for OMAP35x with SGX core 1.0.3. However we are not seeing any issues on OMAP35x with SGX core revision 1.2.1. 

    We have identified possible root cause and fix for the same.

    We are planning to incorporate this fix in next release(04.03.00.02) planned. The release is planned for sometime early next week.

    We hope the next week's release would solve the problem you are facing.

    Thanks,

    Prathap.

  • Hi,

    Please find the details on the latest graphics SDK release 04.03.00.02 below-

    Download link - http://software-dl.ti.com/dsps/dsps_public_sw/gfxsdk/4_03_00_02/index_FDS.html

    TI Graphics Blog - http://tigraphics.blogspot.com/2011/03/new-sgx-graphics-driver-release-4030002.html

    Please use the latest Graphics SDK release and let us know your test results.

    Thanks,

    Prathap.

  • Hello,

    the new driver works a lot better and the program is stable for 5-20min, however the problem still has not gone away.

    When the program hangs, in release mode 'SGX Hardware Recovery triggered' is printed along with some register dump. In debug mode I did get this output on one try:

    [ 1493.774017] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x8 but found 0x0 (mask 0x8). [638, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 1493.790100] PVR_K:(Error): SGXPrePowerState: Wait for SGX ukernel power transition failed. [269, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxpower.c]
    [ 1493.808593] PVR_K:(Fatal): Debug assertion failed! [504, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxinit.c]
    [ 1493.821594] kernel BUG at /home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/env/linux/pvr_debug.c:174!
    [ 1493.832183] Unable to handle kernel NULL pointer dereference at virtual address 00000000
    [ 1493.840728] pgd = cebe8000
    [ 1493.843597] [00000000] *pgd=8e02a031, *pte=00000000, *ppte=00000000
    [ 1493.850250] Internal error: Oops: 817 [#1]
    [ 1493.854522] last sysfs file: /sys/devices/platform/omap/omap_i2c.3/i2c-3/3-0055/power_supply/bq27500-0/status
    [ 1493.864959] Modules linked in: omaplfb bufferclass_ti pvrsrvkm
    [ 1493.871124] CPU: 0    Not tainted  (2.6.37-rc7-25612-g08e6a91-dirty #5)
    ...
    [ 1494.200073] [<c0043138>] (__bug+0x18/0x24) from [<bf01d8d8>] (PVRSRVTrace+0x0/0x124 [pvrsrvkm])
    [ 1494.209533] [<bf01d8d8>] (PVRSRVTrace+0x0/0x124 [pvrsrvkm]) from [<bf033cbc>] (SGXPostPowerState+0xb8/0x240 [pvrsrvkm])
    [ 1494.221038] [<bf033cbc>] (SGXPostPowerState+0xb8/0x240 [pvrsrvkm]) from [<ce2de600>] (0xce2de600)

    On another try the program hung without messages, but after killing it dmesg reported this:

    [ 2180.797637] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 2180.813690] PVR_K:(Error): SGXCleanupRequest: Wait for uKernel to clean up (2) failed [569, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]
    [ 2182.836547] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 2182.852630] PVR_K:(Error): SGXCleanupRequest: Wait for uKernel to clean up (3) failed [569, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]
    [ 2184.875701] PVR_K:(Error): PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0x1). [638, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/common/pvrsrv.c]
    [ 2184.891754] PVR_K:(Error): SGXCleanupRequest: Wait for uKernel to clean up (3) failed [569, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c]

    The fastest way I found to reproduce this is to start OGLESSkybox, let it run for some seconds, kill it and start it again. Then something bad usually happens (hang or kernel crash) in sometihng like 5 minutes.

     

  • Hi,

    Thanks for all the details.  It is good to know that the new driver is working better for you.

    The issue with skybox demo is a known problem and we have already mentioned this in release notes as a known issue.

    http://processors.wiki.ti.com/index.php/RN_4_03_00_02

    We have raised a bug on this and will be working on this.

    Also i hope you are quitting the demos with 'q'. There is a known issue with ctrl+c and its mentioned in release notes as well. 

    Meanwhile please let us know if you face problems with any other application/demo or an use case with which you are seeing this problem.

    Thanks,

    Prathap. 

  • Hello,

    Prathap Srinivas said:

    The issue with skybox demo is a known problem and we have already mentioned this in release notes as a known issue.

    http://processors.wiki.ti.com/index.php/RN_4_03_00_02

    We have raised a bug on this and will be working on this.

    That's good to know, thanks.

    Prathap Srinivas said:

    Also i hope you are quitting the demos with 'q'. There is a known issue with ctrl+c and its mentioned in release notes as well. 

    Right, without killing the demos are rather stable, but they still hang under some rare conditions, which are real hard to reproduce. Once I was running a demo for a while and started scp transfer in the background from PC, which caused a hang with 'SGX Hardware Recovery triggered' message in dmesg. Maybe it's some race condition in the code.

    Prathap Srinivas said:

    Meanwhile please let us know if you face problems with any other application/demo or an use case with which you are seeing this problem.

    Don't know if this related, but this game ( http://notaz.gp2x.de/misc/pnd/briquolo.tar.gz ; extract and run ./run.sh under X) is always causing a crash in debug driver:

    [  201.341491] PVR_K:(Fatal): Debug assertion failed! [263, home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/common/buffer_manager.c]
    [  201.354675] kernel BUG at /home/notaz/stuff/sgx_4_03_2/GFX_Linux_KM/services4/srvkm/env/linux/pvr_debug.c:174!
    [  201.365295] Unable to handle kernel NULL pointer dereference at virtual address 00000000

    It works with release driver though. I'll keep looking for testcases.

  • Hi,

    Once I was running a demo for a while and started scp transfer in the background from PC, which caused a hang with 'SGX Hardware Recovery triggered' message in dmesg.

    Its not clear from above sentence whether this experience is from earlier release(04.03.00.01) or latest release?

    Have you tried the scp tests with the latest release 04.03.00.02 ? Do you still see any problem with this scp test using latest release(04.03.00.02)?

    As already mentioned, the skybox and killing the demos abruptly with ctrl+c are known issues.  It would be helpful if you can provide use case or any other demo or application that can be used to  reproduce issue in a null window environment.

    Again thanks for all the information and appreciate the details provided.

    Thanks,

    Prathap.

  • Prathap Srinivas said:

    Its not clear from above sentence whether this experience is from earlier release(04.03.00.01) or latest release?

    Have you tried the scp tests with the latest release 04.03.00.02 ? Do you still see any problem with this scp test using latest release(04.03.00.02)?

    I updated to 04.03.00.02 as soon as it was released, so yes it was with 04.03.00.02. As said this is very rare and I'm still looking for reliable way to reproduce this (simply repeating scp test does not trigger it).

    Prathap Srinivas said:

    As already mentioned, the skybox and killing the demos abruptly with ctrl+c are known issues.  It would be helpful if you can provide use case or any other demo or application that can be used to  reproduce issue in a null window environment.

    Will do as soon as I find what's causing it.

  • Here is a relatively small X program that causes endless hardware recovery loop:

    http://panic.cs-bristol.org.uk/~jules/rtt-crash.tar.gz

    We also have a large program that suffers similar issue on null window system but we couldn't make a testcase out of it yet. It's an ES 2.0 app so it could be related to skybox demo issue you already know about. It works correctly under 4.00.00.01 driver.

  • Hi,

    I'm the author of the small test program in the previous post, http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/p/95950/359479.aspx#359479. I'm very interested to know if there's been any progress on this issue, in particular whether TI engineers have been able to reproduce the kernel crash/endless hardware recovery loop behaviour?

    Thanks,

    Julian

  • Hi Julian,

    We are not seeing any kernel crash or endless SGX hardware recovery messages. We could compile and run the test program on OMAP35x. Details below-

    In release mode it runs fine with the following logs on the console -

    root@omap3evm:/usr/local/bin# ./rtt-crash
    GL extensions provided: GL_OES_rgb8_rgba8 GL_OES_depth24 GL_OES_vertex_half_floa
    t GL_OES_texture_float GL_OES_texture_half_float GL_OES_element_index_uint GL_OE
    S_mapbuffer GL_OES_fragment_precision_high GL_OES_compressed_ETC1_RGB8_texture G
    L_OES_EGL_image GL_OES_required_internalformat GL_OES_depth_texture GL_OES_get_p
    rogram_binary GL_OES_packed_depth_stencil GL_OES_standard_derivatives GL_OES_ver
    tex_array_object GL_OES_egl_sync GL_EXT_multi_draw_arrays GL_EXT_texture_format_
    BGRA8888 GL_EXT_discard_framebuffer GL_EXT_shader_texture_lod GL_IMG_shader_bina
    ry GL_IMG_texture_compression_pvrtc GL_IMG_texture_stream2 GL_IMG_texture_npot G
    L_IMG_texture_format_BGRA8888 GL_IMG_read_format GL_IMG_program_binary GL_IMG_mu
    ltisampled_render_to_texture
    fps: 73.700546
    fps: 52.006920
    fps: 47.472885
    fps: 45.503529
    fps: 44.371025
    fps: 43.652641
    fps: 43.154076
    fps: 42.792313
    fps: 42.509350
    fps: 42.291660
    fps: 42.109249
    fps: 41.961617
    fps: 41.841198
    fps: 41.730675
    fps: 41.639381

    PVR:(Warning): Kicking render due to frag buffer space [702, /buffers.c]
    fps: 0.000005
    In debug mode, we have warnings getting printed from SGX driver and frame rate is very low but still no kernel hang or crash observed.

    Setup details -

    root@omap3evm:/usr/local/bin# cat /proc/pvr/version
    Version 1.6.16.3977 (debug) omap3430_linux
    System Version String: SGX revision = 1.2.1

    Thanks,

    Prathap.

     

  • Hi Prathap,

    Thanks very much for the reply, and for trying out my test case! We might now have established that the problem is specific to the 1.0.3 SGX revision. Do you by any chance have hardware available with the older SGX revision that you could try with the test case? What kernel version are you using for your testing?

    What is the meaning and cause of the warning "Kicking render due to frag buffer space"?

    Thanks,

    Julian

  • Julian,

    I will check out if i can get access to the hardware with older SGX revision. I am using kernel 2.6.32.

    Also the warning is seen only in debug mode and hence as of now we can conisder this of low priority.

    I would be interested to know if the application crashes on 1.0.3 are seen in release mode? 

    Also would be interested to know if the application crashes are seen in null window environment like using front buffer mode and not in X environment. This would help us to rule out any X specific issues(if any).

    Thanks,

    Prathap.

  • Yes, I see the crashes in release mode. I will try to get my test running in framebuffer mode, though I might not have time to make those changes for a couple of days.

    Thanks very much for looking into this!

    Julian

  • I've now made modifications to the test case to run without X. Please download the new version of the test from:

    http://panic.cs-bristol.org.uk/~jules/rtt-crash-nox.tar.gz

    I'm still seeing the SGX  hardware recovery triggered/kernel crash behaviour with this version, eliminating X from the equation. Again, this is running with release-mode drivers.

    This is the output from dmesg after the crash occurs (the system becomes unresponsive very soon after this appears):

    [  243.243804] PVR_K: HWRecoveryResetSGX: SGX Hardware Recovery triggered

    [  243.243865] PVR_K: SGX debug (1.6.16.3977)

    [  243.243896] PVR_K: (P0) EUR_CR_EVENT_STATUS:     00000000

    [  243.243927] PVR_K: (P0) EUR_CR_EVENT_STATUS2:    00000000

    [  243.243957] PVR_K: (P0) EUR_CR_BIF_CTRL:         00000000

    [  243.243957] PVR_K: (P0) EUR_CR_BIF_INT_STAT:     00000000

    [  243.243988] PVR_K: (P0) EUR_CR_BIF_FAULT:        00000000

    [  243.244018] PVR_K: (P0) EUR_CR_BIF_MEM_REQ_STAT: 00000000

    [  243.244018] PVR_K: (P0) EUR_CR_CLKGATECTL:       00212120

    [  243.244049] PVR_K: (P0) EUR_CR_PDS_PC_BASE:      00000000

    [  243.244079] PVR_K: Flip Command Complete Data 0 for display device 1:

    [  243.244110] PVR_K:   SRC 0: (Not in use)

    [  243.244110] PVR_K:   SRC 1: (Not in use)

    [  243.244140] PVR_K: SGX Host control:

    [  243.244171] PVR_K:   (HC-0) 0x00000001 0x00000000 0x00000000 0x00000000

    [  243.244201] PVR_K:   (HC-10) 0x00000001 0x0000000A 0x0001B04A 0x00000002

    [  243.244201] PVR_K:   (HC-20) 0x00000000 0x00000001 0x00000000 0x00000EA6

    [  243.244232] PVR_K:   (HC-30) 0x00007AF6 0xFC9DFFFC 0x00000000 0x00000000

    [  243.244262] PVR_K: SGX TA/3D control:

    [  243.244293] PVR_K:   (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x0F09A500

    [  243.244323] PVR_K:   (T3C-10) 0x00000001 0x00000002 0x00000001 0x0F007F40

    [  243.244323] PVR_K:   (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.244354] PVR_K:   (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.244384] PVR_K:   (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.244415] PVR_K:   (T3C-50) 0x0F007F40 0x00000000 0x00000000 0x0F007F40

    [  243.244445] PVR_K:   (T3C-60) 0x00000000 0x00000000 0x0F007F40 0x00000000

    [  243.244476] PVR_K:   (T3C-70) 0x00000000 0x0F007F40 0x00000000 0x00000000

    [  243.244506] PVR_K:   (T3C-80) 0x00000000 0x00000000 0x0F000000 0x8B47C000

    [  243.244537] PVR_K:   (T3C-90) 0x0F08A640 0x00000000 0x0F088200 0x0F007F40

    [  243.244537] PVR_K:   (T3C-A0) 0x00000000 0x0F088060 0x00000000 0x00000000

    [  243.244567] PVR_K:   (T3C-B0) 0x00000003 0x00000001 0x00000000 0x00000001

    [  243.244598] PVR_K:   (T3C-C0) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.244628] PVR_K:   (T3C-D0) 0x00000000 0x00000000 0x00000000 0x00000EBC

    [  243.244659] PVR_K:   (T3C-E0) 0x00000EBA 0x0F000000 0x80008000 0x80048000

    [  243.244689] PVR_K:   (T3C-F0) 0x0F004000 0x0F007C20 0x0F002020 0x00000000

    [  243.244720] PVR_K:   (T3C-100) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.244720] PVR_K: SGX Kernel CCB WO:0x3E RO:0x3C

    [  243.431427] PVR_K: HWRecoveryResetSGX: SGX Hardware Recovery triggered

    [  243.431457] PVR_K: SGX debug (1.6.16.3977)

    [  243.431488] PVR_K: (P0) EUR_CR_EVENT_STATUS:     20000000

    [  243.431518] PVR_K: (P0) EUR_CR_EVENT_STATUS2:    00000000

    [  243.431549] PVR_K: (P0) EUR_CR_BIF_CTRL:         00000000

    [  243.431549] PVR_K: (P0) EUR_CR_BIF_INT_STAT:     00000000

    [  243.431579] PVR_K: (P0) EUR_CR_BIF_FAULT:        00000000

    [  243.431610] PVR_K: (P0) EUR_CR_BIF_MEM_REQ_STAT: 00000000

    [  243.431640] PVR_K: (P0) EUR_CR_CLKGATECTL:       00212120

    [  243.431640] PVR_K: (P0) EUR_CR_PDS_PC_BASE:      00000000

    [  243.431671] PVR_K: Flip Command Complete Data 0 for display device 1:

    [  243.431701] PVR_K:   SRC 0: (Not in use)

    [  243.431701] PVR_K:   SRC 1: (Not in use)

    [  243.431732] PVR_K: SGX Host control:

    [  243.431762] PVR_K:   (HC-0) 0x00000001 0x0000001C 0x00000000 0x00000000

    [  243.431793] PVR_K:   (HC-10) 0x00000002 0x0000000A 0x0001B04A 0x00000002

    [  243.431793] PVR_K:   (HC-20) 0x00000000 0x00000001 0x00000000 0x00000EA6

    [  243.431823] PVR_K:   (HC-30) 0x00007AF7 0xFCA0DC5C 0x00000000 0x00000000

    [  243.431854] PVR_K: SGX TA/3D control:

    [  243.431884] PVR_K:   (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000

    [  243.431915] PVR_K:   (T3C-10) 0x00000001 0x00000002 0x00000001 0x0F007F40

    [  243.431915] PVR_K:   (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.431945] PVR_K:   (T3C-30) 0x00000003 0x00000000 0x00000000 0x00000000

    [  243.431976] PVR_K:   (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.432006] PVR_K:   (T3C-50) 0x0F007F40 0x00000000 0x00000000 0x0F007F40

    [  243.432037] PVR_K:   (T3C-60) 0x00000000 0x00000000 0x0F007F40 0x00000000

    [  243.432067] PVR_K:   (T3C-70) 0x00000000 0x0F007F40 0x00000000 0x00000000

    [  243.432098] PVR_K:   (T3C-80) 0x00000000 0x00000000 0x0F000000 0x8B47C000

    [  243.432128] PVR_K:   (T3C-90) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.432128] PVR_K:   (T3C-A0) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.432159] PVR_K:   (T3C-B0) 0x00000003 0x00000001 0x00000000 0x00000001

    [  243.432189] PVR_K:   (T3C-C0) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.432220] PVR_K:   (T3C-D0) 0x00000000 0x00000000 0x00000000 0x00000EBC

    [  243.432250] PVR_K:   (T3C-E0) 0x00000EBA 0x0F000000 0x80008000 0x80048000

    [  243.432281] PVR_K:   (T3C-F0) 0x0F004000 0x0F007C20 0x0F002020 0x00000000

    [  243.432312] PVR_K:   (T3C-100) 0x00000000 0x00000000 0x00000000 0x00000000

    [  243.432312] PVR_K: SGX Kernel CCB WO:0x3F RO:0x3F

    Thanks,
    Julian

  • Hi Prathap,

    Has there been any progress on this issue? I'm still interested in seeing it resolved. Will there be a new release of the TI Graphics SDK for OMAP3530 chips at some point? Is there anything else I can do to help?

    Thanks,

    Julian

  • Julian,

    The new releases planned will not be supporting the older SGX cores and hence for these you can continue to use older Graphics SDK releases that is working for you . Otherwise we suggest you to move to 37xx.

    Thanks,

    Prathap.

  • Hi Prathap,

    I understand from your reply that TI are unable to fix the hardware or software bug which is causing the lockup I have been seeing on my OMAP3530 device. I did a little research, and found an errata document on the following web page:

    http://focus.ti.com/docs/prod/folders/print/omap3530.html

    The advisory 3.1.1.197 looks like a very likely cause of the problem. Can you please comment on whether you agree that this erratum is the likely cause of the issue described in this thread, and can you also let me know whether the workaround mentioned has been implemented in the Graphics SDK driver?

    It's quite impossible for me to move to a 37xx device (I'm just a software developer), and I've not yet found a version of the TI Graphics SDK which can reliably render to a texture on my hardware, as demonstrated by the test case I provided for you upthread.

    Thanks,

    Julian

  • Julian,

    If you trace back to the start of this thread, the problem was reported with latest graphics SDK release 04.03.00.02. The thread clearly mentions that the older graphics SDK release 04.00.00.01 is working fine for SGX core 1.0.3.  The latest graphics SDK release covered sanity tests only for older SGX cores like 1.0.3 and as already mentioned, the future releases will not be supporting them. So it is advised to stick with older graphics SDK releases that had undergone more tests for older SGX cores like 1.0.3.

    Thanks,

    Prathap.

  • Hi Prathap,

    Thanks for the reply, and apologies for the tone of my previous post! I will try again with the older driver version when I have time, and let you know how I get on. I did try several driver versions previously, but I don't remember now if I tried exactly the 04.00.00.01 release. I did see very similar crashes with an older driver though (3.01.00.02), and may have been guilty of assuming that intermediate versions would also suffer from the problem.

    Note that I and hit problems with the driver independently, and that his case worked correctly with 04.00.00.01 doesn't necessarily imply that my case will also work.

    Thanks,

    Julian

  • Hi Prathap,

    FYI, the 04.00.00.01 driver appears to suffer from exactly the same problem. I don't really have time to work on this further now, but I wonder if you can comment on whether the erratum fix in later driver versions (4.03.00.02) controlled by the preprocessor define "FIX_HW_BRN_28889" might relate to the "TLB overflow" issue (advisory 3.1.1.197) in the document I linked to upthread? I'm hoping there might be some possibility to work around the problem in the open-source kernel-module part of the SGX driver, but obviously that's very hard without any useful documentation available.

    Thanks,

    Julian

  • Hi Julian,

    Thanks for the update. The FIX_HW_BRN_28889 takes care of the cache invalidation appropriately. 

    You can refer to the code flow when this fix is defined in - GFX_Linux_KM/services4/srvkm/devices/sgx/sgxutils.c.

    For your tests, you can define the fix in GFX_Linux_KM/services4/srvkm/hwdefs/sgxerrata.h (Add the define under #if SGX_CORE_REV == 103 for SGX530) and build, load the kernel modules.

    I see by enabling this fix, the skybox demo in flip mode which used to freeze the system earlier is OK now. But i still see some H/W recovery messages but the system recovers fine.

    Please check if this works for you. But as already mentioned this core (1.0.3) is no longer supported and its advised to move to higher versions as already suggested.

    Thanks,

    Prathap.

     

     

  • Hi Prathap,

    Good news, at least partly -- the FIX_HW_BRN_28889 erratum workaround does seem to make the full-system lockups I had been seeing disappear, but unfortunately, as far as I can tell, it doesn't serve to allow my code (the larger program the "rtt-crash" test case was derived from) to run correctly. That is, my program will render a number of frames, but it will then freeze, although at that point it can be killed with ctrl-C without bringing down the rest of the system. The "rtt-crash" test itself generally seems to freeze before printing any output, although again it doesn't lock up the system.

    (That is using a 2.6.37-derived kernel. I also tried a 2.6.27-derived kernel. I had trouble getting the newer SGX driver to work reliably with either of those: I saw intermittent errors with EGL initialisation, failure to obtain the right kind of visual, or freezing before actually rendering anything. I may try to look further into those problems at some point.)

    The erratum workaround as discussed definitely seems to be a step in the right direction though. Perhaps the #define adding the workaround can be added for the 1.0.3 core revision for your next driver release, even if you're not "officially" supporting the older version.

    Thanks,

    Julian

  • Hi Julian,

    Thanks for the update. Good to know that you are no longer seeing full system lockups. We will add the fix in next release for 1.0.3 but as mentioned by you already, it won't be officially supported.

    I ran rtt-crash and it seems to be running but i don't see anything on display. What is expected on the display?  It is running continously with prints of fps on console and stops only with ctrl-c.  It is not freezing.

    Also ran in X environment with the latest Graphics SDK release(04_04_00_02) supporting Xorg driver with DRI acceleration and it is running OK.

    Logs attached for your reference.

    Thanks,

    Prathap.2235.rtt_crash_logs.txt