This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: 8.2 TISDK Failed to create GPU Context after PVR error

Part Number: TDA4VM


Hi,

We observed below error in some scenarios when launching surround view application(svm). Not able to reproduce the issue consistently. Also shared kernel logs with all details. We are using  8.2TI Processor SDK for custom tda4x platform (Not EVM)

Failed to create GPU Context after PVR error

Dec 20 18:04:54 j7-evm user.err kernel: [    1.906475] PVR_K:(Error):   360: PollForValueKM: Timeout. Expected 0x1 but found 0x0 (mask 0xffffffff). [2927]
Dec 20 18:04:54 j7-evm user.err kernel: [    1.906487] PVR_K:(Error):   360: PVRSRVPollForValueKM: Failed! Error(PVRSRV_ERROR_TIMEOUT) CPU linear address(000000001e276fa0) Expected value(1) [2954]
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906493] PVR_K:  360: ------------[ PVR DBG: START (High) ]------------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906498] PVR_K:  360: OS kernel info: Linux 5.10.65 #1 SMP PREEMPT Wed Nov 8 19:43:08 PST 2023 aarch64
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906502] PVR_K:  360: DDK info: Rogue_DDK_Linux_WS rogueddk 1.15@6133109 (release) j721e_linux
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906506] PVR_K:  360: Time now: 1906504us
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906509] PVR_K:  360: Services State: OK
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906511] PVR_K:  360: Server Errors: 2
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906515] PVR_K:  360: Connections: No Devices: No active connections
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906518] PVR_K:  360: ------[ Driver Info ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906520] PVR_K:  360: Comparison of UM/KM components: MATCHING
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906523] PVR_K:  360: KM Arch: 32 Bit
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906527] PVR_K:  360: UM info: 0.0 @        0 (debug) build options: 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906530] PVR_K:  360: KM info: 0.0 @        0 (debug) build options: 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906533] PVR_K:  360: Window system: wayland
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906539] PVR_K:  360: ------[ RGX Device ID:0 Start ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906542] PVR_K:  360: ------[ RGX Info ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906545] PVR_K:  360: Device Node (Info): 0000000005f496e4 (000000007cf6d705)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906548] PVR_K:  360: RGX BVNC: 22.104.208.318 (rogue)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906552] PVR_K:  360: RGX Device State: Initialising
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906555] PVR_K:  360: RGX Power State: OFF
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906558] PVR_K:  360: FW info: UNINITIALIZED
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906563] PVR_K:  360: RGX FW State: OK (HWRState 0x00000000:)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906569] PVR_K:  360: RGX FW Power State: RGXFWIF_POW_OFF (APM disabled: 0 ok, 0 denied, 0 non-idle, 0 retry, 0 other, 0 total. Latency: 100 ms)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906575] PVR_K:  360: RGX DVFS: 0 frequency changes. Current frequency: 100.000 MHz (sampled at 1406138685 ns). FW frequency: 100.000 MHz.
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906579] PVR_K:  360: RGX FW OS 0 - State: offline; Freelists: Not Ok; Priority: 0; MTS on;
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906598] PVR_K:  360: Number of HWR: GP(0/0+0), 2D(0/0+0), TA(0/0+0), 3D(0/0+0), CDM(0/0+0), RAY(0/0+0), FALSE(0,0,0,0,0,0)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906603] PVR_K:  360: DM 0 (GP)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906611] PVR_K:  360: DM 1 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906616] PVR_K:  360: DM 2 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906621] PVR_K:  360: DM 3 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906626] PVR_K:  360: DM 4 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906632] PVR_K:  360: DM 5 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906639] PVR_K:  360: RGX Kernel CCB WO:0x0 RO:0x0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906642] PVR_K:  360: RGX Firmware CCB WO:0x0 RO:0x0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906644] PVR_K:  360: RGX Kernel CCB commands executed = 0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906647] PVR_K:  360: RGX SLR: Forced UFO updates requested = 0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906650] PVR_K:  360: RGX Errors: WGP:0, TRP:0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906655] PVR_K:  360: FW System config flags = 0x00020000 (Ctx switch options: Medium CSW profile; VDM CS INDEX mode;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906659] PVR_K:  360: FW OS config flags = 0x0000000F (Ctx switch: TDM; TA; 3D; CDM;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906662] PVR_K:  360:  (!) RGX power is down. No registers dumped
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906664] PVR_K:  360: ------[ RGX FW Trace Info ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906667] PVR_K:  360: Debug log type: none
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906670] PVR_K:  360: RGX FW thread 0: Trace buffer not yet allocated
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906672] PVR_K:  360: ------[ Full CCB Status ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906678] PVR_K:  360: ------[ RGX Device ID:0 End ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906680] PVR_K:  360: ------[ System Summary Device ID:0 ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906683] PVR_K:  360: Device System Power State: ON
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906686] PVR_K:  360: MaxHWTOut: 500000us, WtTryCt: 10000, WDGTOut(on,off): (10000ms,3600000ms)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906689] PVR_K:  360: ------[ Server Thread Summary ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906692] PVR_K:  360:   pvr_defer_free : Running
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906695] PVR_K:  360:     Number of deferred cleanup items : 0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906697] PVR_K:  360:   pvr_device_wdg : Running
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906700] PVR_K:  360:   pvr_cacheop : Running
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906705] PVR_K:  360:     Configuration: QSZ: 16, UKT: -1, KDFT: 131072, LINESIZE: 64, PGSIZE: 4096, KDF: Yes, URBF: Yes
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906708] PVR_K:  360:     Pending deferred CacheOp entries : 0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906711] PVR_K:  360: ------[ AppHint Settings ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906714] PVR_K:  360:   Build Vars
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906718] PVR_K:  360:     EnableTrustedDeviceAceConfig: N
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906721] PVR_K:  360:     CleanupThreadPriority: 0x00000005
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906724] PVR_K:  360:     CacheOpThreadPriority: 0x00000001
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906727] PVR_K:  360:     WatchdogThreadPriority: 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906729] PVR_K:  360:     HWPerfClientBufferSize: 0x000c0000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906731] PVR_K:  360:   Module Params
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906741] PVR_K:  360:     none
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906743] PVR_K:  360:   Debug Info Params
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906749] PVR_K:  360:     CacheOpConfig: 0x0000000c
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906752] PVR_K:  360:     CacheOpUMKMThresholdSize: 0xffffffff
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906755] PVR_K:  360:   Debug Info Params Device ID: 0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906761] PVR_K:  360:     none
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906764] PVR_K:  360: ------[ HTB Log state: Off ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906768] PVR_K:  360: ------[ Active Sync Checkpoints ]------
Dec 20 18:04:54 j7-evm user.err kernel: [    1.906773] ------[ Native Fence Sync: timelines ]------
Dec 20 18:04:54 j7-evm user.err kernel: [    1.906776] foreign_sync: @0 ctx=1 refs=1
Dec 20 18:04:54 j7-evm user.info kernel: [    1.906779] PVR_K:  360: ------------[ PVR DBG: END ]------------
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.906836] ------------[ cut here ]------------
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.906954] WARNING: CPU: 1 PID: 360 at PVRSRVDebugRequest+0x4d0/0x660 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.906956] Modules linked in: sch_htb optee_rng rng_core ti_am65_cpsw_nuss ti_am335x_adc m_can_platform m_can can_dev phy_can_transceiver kfifo_buf ti_am335x_tscadc sha512_generic lm75 rti_wdt ti_j721e_cpsw_virt_mac pvrsrvk
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907056] CPU: 1 PID: 360 Comm: svm Tainted: G           O      5.10.65 #1
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907059] Hardware name: Texas Instruments K3 J721E SoC (DT)
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907064] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907100] pc : PVRSRVDebugRequest+0x4d0/0x660 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907132] lr : PVRSRVDebugRequest+0x4d0/0x660 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907134] sp : ffff80001241b7e0
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907137] x29: ffff80001241b7e0 x28: ffff800008ccb000 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907142] x27: 0000000000000000 x26: ffff800008c85098 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907147] x25: ffff000814c4d600 x24: 0000000000000009 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907151] x23: 0000000000000002 x22: 0000000000000000 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907155] x21: 0000000000000000 x20: ffff000814c4d6d8 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907160] x19: ffff000814c4ada0 x18: 0000000000000000 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907165] x17: 0000000000000000 x16: 0000000000000002 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907169] x15: ffff000810449a00 x14: 000000000000021a 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907174] x13: ffff000810449e50 x12: 00000000ffffffea 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907178] x11: ffff800010f0fd60 x10: ffff800010ef7d20 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907183] x9 : ffff800010ef7d78 x8 : 0000000000017fe8 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907187] x7 : c0000000ffffefff x6 : 0000000000000001 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907192] x5 : ffff00087f9f4ab0 x4 : 0000000000000000 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907197] x3 : 0000000000000000 x2 : ffffffffffffff00 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907201] x1 : 0000000000000000 x0 : ffff000814c52d80 
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907206] Call trace:
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907241]  PVRSRVDebugRequest+0x4d0/0x660 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907273]  PVRSRVPollForValueKM+0x17c/0x180 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907305]  RGXPostPowerState.part.0+0x78/0x138 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907338]  RGXPostPowerState+0x20/0x38 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907369]  PVRSRVSetDevicePowerStateKM+0x17c/0x2e0 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907402]  PVRSRVDeviceFinalise.part.0+0x94/0x340 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907433]  PVRSRVCommonDeviceInitialise+0x94/0x330 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907465]  PVRSRVDeviceOpen+0xac/0x170 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907498]  pvr_drm_open+0x44/0x98 [pvrsrvkm]
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907508]  drm_file_alloc+0x144/0x230
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907511]  drm_open+0x14c/0x290
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907515]  drm_stub_open+0xa8/0x160
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907523]  chrdev_open+0xa4/0x1a0
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907528]  do_dentry_open+0x12c/0x398
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907531]  vfs_open+0x2c/0x38
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907535]  path_openat+0x818/0xca8
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907538]  do_filp_open+0x78/0x100
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907541]  do_sys_openat2+0x1f0/0x2a0
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907543]  do_sys_open+0x58/0xa0
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907546]  __arm64_sys_openat+0x24/0x30
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907553]  el0_svc_common.constprop.0+0x78/0x1c8
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907556]  do_el0_svc+0x24/0x90
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907564]  el0_svc+0x14/0x20
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907567]  el0_sync_handler+0xb0/0xb8
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907570]  el0_sync+0x180/0x1c0
Dec 20 18:04:54 j7-evm user.warn kernel: [    1.907573] ---[ end trace 4a0f5da23426b327 ]---
Dec 20 18:04:54 j7-evm user.err kernel: [    1.907579] PVR_K:(Error):   360: RGXPostPowerState: Polling for 'FW started' flag failed. [1003]
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907583] PVR_K:  360: BIF0 - OK
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907587] PVR_K:  360: RGX FW State: OK (HWRState 0x00000000:)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907591] PVR_K:  360: RGX FW Power State: RGXFWIF_POW_OFF (APM disabled: 0 ok, 0 denied, 0 non-idle, 0 retry, 0 other, 0 total. Latency: 100 ms)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907597] PVR_K:  360: RGX DVFS: 0 frequency changes. Current frequency: 100.000 MHz (sampled at 1406138685 ns). FW frequency: 100.000 MHz.
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907601] PVR_K:  360: RGX FW OS 0 - State: offline; Freelists: Not Ok; Priority: 0; MTS on;
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907616] PVR_K:  360: Number of HWR: GP(0/0+0), 2D(0/0+0), TA(0/0+0), 3D(0/0+0), CDM(0/0+0), RAY(0/0+0), FALSE(0,0,0,0,0,0)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907620] PVR_K:  360: DM 0 (GP)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907628] PVR_K:  360: DM 1 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907633] PVR_K:  360: DM 2 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907638] PVR_K:  360: DM 3 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907643] PVR_K:  360: DM 4 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907648] PVR_K:  360: DM 5 (HWRflags 0x00000000: working;)
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907653] PVR_K:  360: ------[ RGX registers ]------
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907656] PVR_K:  360: RGX Register Base Address (Linear):   0x00000000ac654bcf
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907658] PVR_K:  360: RGX Register Base Address (Physical): 0x4E20000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907662] PVR_K:  360: CORE_ID                       : 0x0000000008470000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907665] PVR_K:  360: CORE_REVISION                 : 0x00D0013E
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907668] PVR_K:  360: DESIGNER_REV_FIELD1           : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907671] PVR_K:  360: DESIGNER_REV_FIELD2           : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907674] PVR_K:  360: CHANGESET_NUMBER              : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907677] PVR_K:  360: CLK_CTRL                      : 0x0aaaaa002a2aaaaa
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907680] PVR_K:  360: CLK_STATUS                    : 0x0000000000600000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907683] PVR_K:  360: CLK_CTRL2                     : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907686] PVR_K:  360: CLK_STATUS2                   : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907689] PVR_K:  360: EVENT_STATUS                  : 0x00000400
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907692] PVR_K:  360: TIMER                         : 0x00000000002ed79a
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907695] PVR_K:  360: BIF_FAULT_BANK0_MMU_STATUS    : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907698] PVR_K:  360: BIF_FAULT_BANK0_REQ_STATUS    : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907701] PVR_K:  360: BIF_FAULT_BANK1_MMU_STATUS    : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907704] PVR_K:  360: BIF_FAULT_BANK1_REQ_STATUS    : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907707] PVR_K:  360: BIF_MMU_STATUS                : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907709] PVR_K:  360: BIF_MMU_ENTRY                 : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907712] PVR_K:  360: BIF_MMU_ENTRY_STATUS          : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907715] PVR_K:  360: BIF_STATUS_MMU                : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907718] PVR_K:  360: BIF_READS_EXT_STATUS          : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907721] PVR_K:  360: BIF_READS_INT_STATUS          : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907724] PVR_K:  360: BIFPM_STATUS_MMU              : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907727] PVR_K:  360: BIFPM_READS_EXT_STATUS        : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907729] PVR_K:  360: BIFPM_READS_INT_STATUS        : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907733] PVR_K:  360: BIF_CAT_BASE_INDEX            : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907736] PVR_K:  360: BIF_CAT_BASE0                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907739] PVR_K:  360: BIF_CAT_BASE1                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907742] PVR_K:  360: BIF_CAT_BASE2                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907745] PVR_K:  360: BIF_CAT_BASE3                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907748] PVR_K:  360: BIF_CAT_BASE4                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907752] PVR_K:  360: BIF_CAT_BASE5                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907755] PVR_K:  360: BIF_CAT_BASE6                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907758] PVR_K:  360: BIF_CAT_BASE7                 : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907761] PVR_K:  360: BIF_CTRL_INVAL                : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907764] PVR_K:  360: BIF_CTRL                      : 0x000000C0
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907767] PVR_K:  360: BIF_PM_CAT_BASE_VCE0          : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907770] PVR_K:  360: BIF_PM_CAT_BASE_TE0           : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907774] PVR_K:  360: BIF_PM_CAT_BASE_ALIST0        : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907777] PVR_K:  360: BIF_PM_CAT_BASE_VCE1          : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907780] PVR_K:  360: BIF_PM_CAT_BASE_TE1           : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907783] PVR_K:  360: BIF_PM_CAT_BASE_ALIST1        : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907786] PVR_K:  360: PERF_TA_PHASE                 : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907789] PVR_K:  360: PERF_TA_CYCLE                 : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907792] PVR_K:  360: PERF_3D_PHASE                 : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907794] PVR_K:  360: PERF_3D_CYCLE                 : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907798] PVR_K:  360: PERF_TA_OR_3D_CYCLE           : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907800] PVR_K:  360: PERF_TA_AND_3D_CYCLE          : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907803] PVR_K:  360: PERF_COMPUTE_PHASE            : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907805] PVR_K:  360: PERF_COMPUTE_CYCLE            : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907808] PVR_K:  360: PM_PARTIAL_RENDER_ENABLE      : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907811] PVR_K:  360: ISP_RENDER                    : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907814] PVR_K:  360: TLA_STATUS                    : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907817] PVR_K:  360: MCU_FENCE                     : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907820] PVR_K:  360: VDM_CONTEXT_STORE_STATUS      : 0x00000001
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907823] PVR_K:  360: VDM_CONTEXT_STORE_TASK0       : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907826] PVR_K:  360: VDM_CONTEXT_STORE_TASK1       : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907829] PVR_K:  360: VDM_CONTEXT_STORE_TASK2       : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907833] PVR_K:  360: VDM_CONTEXT_RESUME_TASK0      : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907836] PVR_K:  360: VDM_CONTEXT_RESUME_TASK1      : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907839] PVR_K:  360: VDM_CONTEXT_RESUME_TASK2      : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907842] PVR_K:  360: ISP_CTL                       : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907844] PVR_K:  360: ISP_STATUS                    : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907847] PVR_K:  360: MTS_INTCTX                    : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907850] PVR_K:  360: MTS_BGCTX                     : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907854] PVR_K:  360: MTS_BGCTX_COUNTED_SCHEDULE    : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907857] PVR_K:  360: MTS_SCHEDULE                  : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907859] PVR_K:  360: MTS_GPU_INT_STATUS            : 0x00000400
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907863] PVR_K:  360: CDM_CONTEXT_STORE_STATUS      : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907866] PVR_K:  360: CDM_CONTEXT_PDS0              : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907869] PVR_K:  360: CDM_CONTEXT_PDS1              : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907872] PVR_K:  360: CDM_TERMINATE_PDS             : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907875] PVR_K:  360: CDM_TERMINATE_PDS1            : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907878] PVR_K:  360: SIDEKICK_IDLE                 : 0x0000007E
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907881] PVR_K:  360: SLC_IDLE                      : 0x000000FF
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907883] PVR_K:  360: SLC_STATUS0                   : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907886] PVR_K:  360: SLC_STATUS1                   : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907890] PVR_K:  360: SLC_STATUS2                   : 0x0000000000000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907892] PVR_K:  360: SLC_CTRL_BYPASS               : 0x00000000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907896] PVR_K:  360: SLC_CTRL_MISC                 : 0x0000000000200003
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907900] PVR_K:  360: MIPS_ADDR_REMAP1_CONFIG1      : 0x1FC00001
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907903] PVR_K:  360: MIPS_ADDR_REMAP1_CONFIG2      : 0x0000000894a0e00c
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907906] PVR_K:  360: MIPS_ADDR_REMAP2_CONFIG1      : 0x1FC01001
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907909] PVR_K:  360: MIPS_ADDR_REMAP2_CONFIG2      : 0x0000000894a3a00c
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907912] PVR_K:  360: MIPS_ADDR_REMAP3_CONFIG1      : 0x1FC02001
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907915] PVR_K:  360: MIPS_ADDR_REMAP3_CONFIG2      : 0x0000000894a0d00c
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907918] PVR_K:  360: MIPS_ADDR_REMAP4_CONFIG1      : 0x1FC00000
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907921] PVR_K:  360: MIPS_ADDR_REMAP4_CONFIG2      : 0x000000000000000c
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907924] PVR_K:  360: MIPS_ADDR_REMAP5_CONFIG1      : 0x00000001
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907927] PVR_K:  360: MIPS_ADDR_REMAP5_CONFIG2      : 0x0000000894a0e00c
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907930] PVR_K:  360: MIPS_WRAPPER_CONFIG           : 0x000000000001cf80
Dec 20 18:04:54 j7-evm user.info kernel: [    1.907933] PVR_K:  360: MIPS_EXCEPTION_STATUS         : 0x00000020
Dec 20 18:04:54 j7-evm user.warn kernel: [    2.315044] HTB: quantum of class 10001 is big. Consider r2q change.
Dec 20 18:04:54 j7-evm user.warn kernel: [    2.334394] HTB: quantum of class 10010 is big. Consider r2q change.
Dec 20 18:04:54 j7-evm user.warn kernel: [    2.354394] HTB: quantum of class 10011 is big. Consider r2q change.
Dec 20 18:04:54 j7-evm user.info kernel: [    2.377117] u32 classifier
Dec 20 18:04:54 j7-evm user.info kernel: [    2.377122]     input device check on
Dec 20 18:04:54 j7-evm user.info kernel: [    2.377124]     Actions configured
Dec 20 18:04:54 j7-evm user.info kernel: [    2.408064] PVR_K:  360: ---- [ MIPS internal state ] ----
Dec 20 18:04:54 j7-evm user.info kernel: [    2.408071] PVR_K:  360: MIPS extra debug not available
Dec 20 18:04:54 j7-evm user.info kernel: [    2.408074] PVR_K:  360: --------------------------------
Dec 20 18:04:54 j7-evm user.err kernel: [    2.408086] PVR_K:(Error):   360: PVRSRVDeviceFinalise: Failed to set device 0000000005f496e4 power state to 'on' (PVRSRV_ERROR_TIMEOUT) [2774]
Dec 20 18:04:54 j7-evm user.err kernel: [    2.408092] PVR_K:(Error):   360: PVRSRVDeviceFinalise() failed (PVRSRV_ERROR_TIMEOUT) in PVRSRVCommonDeviceInitialise() [2225]
Dec 20 18:04:54 j7-evm user.err kernel: [    2.408101] PVR_K:(Error):   360: PVRSRVDeviceOpen: Failed to initialise device (PVRSRV_ERROR_TIMEOUT) [476]
Dec 20 18:04:54 j7-evm user.err kernel: [    2.410721] PVR_K:(Error):   360: PVRSRVDeviceOpen: Driver already in bad state. Device open failed. [464]
Dec 20 18:04:54 j7-evm user.err kernel: [    2.411816] PVR_K:(Error):   360: PVRSRVDeviceOpen: Driver already in bad state. Device open failed. [464]
Dec 20 18:04:54 j7-evm user.err kernel: [    2.412145] PVR_K:(Error):   360: PVRSRVDeviceOpen: Driver already in bad state. Device open failed. [464]
Dec 20 18:04:54 j7-evm user.err svm[360]: [0000000002412386][a2ecb010][OVX_BUFFER_DISPLAY][E] Failed to create GPU Context

  • Hello,

    This is quite an old SDK. There are some known issues with this GPU driver. Are you tied to this SDK version, or can you upgrade to the latest?

    Please let me know so we can know how to proceed.

    Thank you,

    Erick

  • We are tied with this SDK and cannot upgrade to the latest

  • Gajanan,

    Understood. Since you are already on 1.15 verion of GPU driver. Please see this FAQ on upgrading the driver with the latest patches. You will only need to replace the UM libs and you will need to re-build your kernel driver with the patches mentioned. Please let me know if you have any questions. This will solve most known issues, and we can proceed if there are any other GPU driver HWR.

    If you are unfamiliar with the GPU driver, there is another FAQ linked in this one I mentioned that explains how to re-build it, etc.

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1316731/faq-tda4vl-q1-what-are-the-gpu-driver-bug-fixes-for-sdk-8-6-or-earlier

    Regards,

    Erick

  • ,

    Thanks for the quick response and providing the solution. I have below following up queries - 

    1 - Based on the above link - 
    Apply CL6529585_Enable_cached_mappings_in_KM_on_ARM64_for_DDK_1.15.patch

    Revert c901804e8221d477983a6f7224a9cdc6e832f050

    I hope this is correct since I am already in 1.15 version

    2 - latest-1.15-umlibs provided in above patch looks different version than I am using already. 
    I am using - EGL_EGLEXT_VERSION 20220525
    In patch - EGL_EGLEXT_VERSION 20200220

    So should I my retain the umlibs to as it is and no changes applied there ???
    Any changes required here in umlibs to support the shared  patches in point 1

    3 - In makefile "ti-img-rogue-driver_1.15/build/linux/j721e_linux/Makefile", I see below highlighted additional lines. I hope this is fine and will not create any issues ??

    KERNEL_COMPONENTS := srvkm $(DISPLAY_CONTROLLER)

    HWR_DEFAULT_ENABLED := 1

    # Should be last
    include ../config/core.mk
    -include ../common/lws.mk
    include ../common/3rdparty.mk

    $(eval $(call TunableUserConfigC,SGLTRACE,))



    Regards

    Gajanan

  • Hi,

    1 - Based on the above link - 
    Apply CL6529585_Enable_cached_mappings_in_KM_on_ARM64_for_DDK_1.15.patch

    Revert c901804e8221d477983a6f7224a9cdc6e832f050

    I hope this is correct since I am already in 1.15 version

    Yes, this should apply smoothly.

    2 - latest-1.15-umlibs provided in above patch looks different version than I am using already. 
    I am using - EGL_EGLEXT_VERSION 20220525
    In patch - EGL_EGLEXT_VERSION 20200220

    So should I my retain the umlibs to as it is and no changes applied there ???
    Any changes required here in umlibs to support the shared  patches in point 1

    Where did you get the "EGL_EGLEXT_VERSION" string from? Was this after adding the new libraries to your system?


    3 - In makefile "ti-img-rogue-driver_1.15/build/linux/j721e_linux/Makefile", I see below highlighted additional lines. I hope this is fine and will not create any issues ??

    KERNEL_COMPONENTS := srvkm $(DISPLAY_CONTROLLER)

    HWR_DEFAULT_ENABLED := 1

    # Should be last
    include ../config/core.mk
    -include ../common/lws.mk
    include ../common/3rdparty.mk

    $(eval $(call TunableUserConfigC,SGLTRACE,))

    These changes are OK, the HWR_DEFAULT_ENABLED is enabling the hardware recovery mechanism in the GPU driver by default, which I believe is already the case.

    Regards,

    Erick

  • Hi Erick,

    Where did you get the "EGL_EGLEXT_VERSION" string from? Was this after adding the new libraries to your system?

    I am already using patches from below e2e ticket and this ticket is following ticket on below linked e2e -

    e2e.ti.com/.../tda4vm-8-2-ti-sdk-gpu-error-and-gpu-application-hang

    Regards

    Gajanan

  • Please don't use the libraries from this E2E ticket, it's quite old:

    e2e.ti.com/.../tda4vm-8-2-ti-sdk-gpu-error-and-gpu-application-hang

    Instead, please use the ones I linked in the FAQ, those are fresh with the latest bug fixes.

    Regards,

    Erick

  • Thanks Erick. I will use the shared link details as per your suggestions

  • Hi Erick,

    After applying patch, getting below pvr errors many times and it making streaming dropping many frames -

    j7-evm user.err kernel: [ 1178.332141] PVR_K:(Error):   297: RGXUpdateHealthStatus: LISR has not received the last 21748 interrupts [5835]

    Can you give more details about this error ??

    Regards

    Gajanan

  • Gajanan,

    No, this seems to be worst than before.

    I wonder if it is only when you run your application or when other applications run. Can you please send some information to me:

    1) When you get these errors, can you please send the output of "cat /sys/kernel/debug/pvr/status"?

    2) Can you run another GPU application, or is this in the RTOS+LINUX sdk setup?

    Regards,

    Erick

  • Hi Erick,

    1 - Please find complete log details along with status values for 2 cases we observed with latest changes -

    A ==> 

    Feb  1 03:33:19 j7-evm user.err kernel: [ 3451.616281] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=32480 DOFF=32480 WOFF=672) for "TA-P359-T359-svm" [2295]
    Feb  1 03:33:19 j7-evm user.err kernel: [ 3451.616291] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=37296 DOFF=37296 WOFF=39312) for "3D-P359-T359-svm" [2295]
    Feb  1 03:33:19 j7-evm user.info kernel: [ 3451.616298] PVR_K:  299: Possible stalled client RGX contexts detected: TA 3D 
    Feb  1 03:33:19 j7-evm user.info kernel: [ 3451.616301] PVR_K:  299: Trying to identify stalled context...(force) [0]
    Feb  1 03:33:19 j7-evm user.info kernel: [ 3451.616307] PVR_K:  299: Fence found on context 0xc0028040 'TA-P359-T359-svm' @ 32480 has 2 UFOs
    Feb  1 03:33:19 j7-evm user.info kernel: [ 3451.616317] PVR_K:  299:   1/2 FWAddr 0xc002d008 requires 0xa774
    Feb  1 03:33:19 j7-evm user.info kernel: [ 3451.616329] PVR_K:  299:   2/2 FWAddr 0xc002b051 requires 0x519 (currently 0x519)
    Feb  1 03:33:29 j7-evm user.err kernel: [ 3461.852282] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=32480 DOFF=32480 WOFF=672) for "TA-P359-T359-svm" [2295]
    Feb  1 03:33:29 j7-evm user.err kernel: [ 3461.852293] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=37296 DOFF=37296 WOFF=39312) for "3D-P359-T359-svm" [2295]
    Feb  1 03:33:40 j7-evm user.err kernel: [ 3472.092287] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=32480 DOFF=32480 WOFF=672) for "TA-P359-T359-svm" [2295]
    Feb  1 03:33:40 j7-evm user.err kernel: [ 3472.092297] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=37296 DOFF=37296 WOFF=39312) for "3D-P359-T359-svm" [2295]
    Feb  1 03:33:50 j7-evm user.err kernel: [ 3482.336282] PVR_K:(Error):   299: CheckForStalledCCB (force): CCCB has not progressed (ROFF=32480 DOFF=32480 WOFF=672) for "TA-P359-T359-svm" [2295]


    No pvt status log collected in this case

    "Error ==> CCCB has not progressed"

    B==> systemlog2.log

    Error ==> "RGXUpdateHealthStatus: LISR has not received the last 307 interrupts [5835]"

    Below logs for pvr status for this case--
    cat /sys/kernel/debug/pvr/status
    Driver Status:   OK
    Firmware Status: NOT RESPONDING (Missing interrupts)
    Server Errors:   27
    HWR Event Count: 1
    CRR Event Count: 0
    SLR Event Count: 0
    WGP Error Count: 0
    TRP Error Count: 0
    FWF Event Count: 7
    APM Event Count: 0

    2 - We are running on RTOS + LINUX sdk setup. No other application running which use GPU apart from surround view application

    Regards

    Gajanan

  • Gajanan,

    This is not expected behavior, something does seem to be incorrect.

    To start off, can you please provide logs of this issue by providing a pvrlogdump as follows when you see the issue:

    "pvrdebug -loggroups main,mts,hwr"

    And provide the output files, alongside the console log as you did with these two cases? Then we can analyze and come back with suggestions.

    Regards,

    Erick

  • Hi Erick,

    Our Filesystem is read only and pvrlogdump app dump the logs in /tmp/ which is readonly.

    Kindly suggest possible to change the dump path for pvrlogdump app.
    Meanwhile we are working to make filesystem writable and reproduce the issue

    Regards

    Gajanan

  • Gajanan,

    The pvrlogdump is actually a bash script. If you have a directory that is not read-only, you can modify these lines:

     95     # output file
     96     if [ -d /tmp ]; then
     97     ¦   OUT=/tmp/`date +pvrlogdump_%y%m%d%H%M.txt`
     98     else
     99     ¦   OUT=`date +pvrlogdump_%y%m%d%H%M.txt`
    100     fi

    You can see that /tmp is checked for this directory to exist. Otherwise, it will output the file in the same directory. So you can either:

    1) Remove lines 96,97,98,100 OR

    2) Rename /tmp on both lines 96 and 97 to the directory you wish to output at.

    Let me know if that works for you.

    Thanks,

    Erick

  • Hi Erick,

    Thanks for the suggestion.

    Please find below logs collected as per your suggestions -

    PVR_K:(Error):   286: RGXUpdateHealthStatus: LISR has not received the last 483 interrupts [5835]


    pvrlogdump_2402072029.txt


    root@j7-evm:/tmp# cat /sys/kernel/debug/pvr/status
    Driver Status: OK
    Firmware Status: NOT RESPONDING (Missing interrupts)
    Server Errors: 22
    HWR Event Count: 0
    CRR Event Count: 0
    SLR Event Count: 0
    WGP Error Count: 0
    TRP Error Count: 0
    FWF Event Count: 110
    APM Event Count: 0
    root@j7-evm:/tmp# pvrdebug -loggroups main,mts,hwr
    ----------------------- Start -----------------------
    Set FW Log type to TRACE ( main mts hwr )
    Connecting to first (0) default pvr device
    ------------------------ End ------------------------


    Regards

    Gajanan

  • Gajanan,

    How often does this re-occur? The behavior is like a misconfiguration in the GPU device-tree or something in your system.

    IF it occurs on every boot, that would be the most likely behavior. The HWR (hardware recovery) counter is 0, so it doesn't report a critical issue in the driver/firmware.

    Can you share your device tree, specifically around the GPU? And if you have made any changes in the interrupt mapping/board resource partition?

    Thanks,

    Erick

  • Hi Erick,

    Thanks for the feedback.

    We have earlier stable version on which this issue observed rarely (say 1 out of 50 times reboot) but after applying your patches, observed the issue in 1-2 out of 20times reboot.

    I will collect the details and share it with you for further analysis

    Regards

    Gajanan

  • Gajanan,

    Understood. Seems the issue is persistent across the GPU libraries, so it seems unrelated to known bugs. Most likely it is an issue with your system setup. Please let us know if you have made any changes in the device tree.

    Regards,

    Erick

  • Hi Erick,

    I am colleague of Gajanan.

    We have not made changes to the device tree, specificaly related to GPU. Are you looking for anything specific in the device tree ? 

    Just for the context, we have used GPU firmware with 4KB pages, instead of 64KB pages. Thats one of the differences we have.

  • Dwarakesh,

    Thanks for the info. 4KB pages should be usable, and the latest driver updates include that feature. Usually, when there is an issue with the page size, you'll see a "version mismatch" error during the GPU driver initialization.

    We have not made changes to the device tree, specificaly related to GPU. Are you looking for anything specific in the device tree ? 

    We were looking for anything that might have changed the interrupts for the GPU.

    Regards,

    Erick

  • Hi Erick,

    No changes related to interrupts for the GPU

    regards

    Gajanan

  • Gajanan,

    Can you please provide your dmesg log when your board boots? And it would be helpful to see it before and after you run a graphics application.

    Thanks,

    Erick

  • Hi Erick,

    Attached are the the bootup logs for working and nonworking cases

     - bootup prints

    - svm (surround view monitoring) application start

    - Crash prints



    For your reference, provided working and nonworking bootup sequence -

    notworking_svcu_system_f1.log
    notworking_svcu_system_f2.log
    notworking_svcu_system_f3.log
    working_svcu_system_f.log




    Regards

    Gajanan

  • Hi Erick,

    Kindly let me know the shared boot details are good enough for your analysis to root cause the problem ???

    Regards

    Gajanan

  • Gajanan,

    This should be sufficient for now. I'm working with the team to see what could be the issue here.

    Regards,

    Erick

  • Gajanan,

    To clarify, this happens on all boards? How many boards have you tested this one?

    Regards,

    Erick

  • HI Erick,

    The issue is easily Reproducible in many boards with applying latest patches shared.
    With our existing release, the issue is rarely reproducible.

    Regards

    Gajanan

  • Hi Erick,

    Are logs useful to root cause the problem ??

    Regards

    Gajanan

  • Gajanan,

    The team is working to identify the issue currently. I've forwarded the information you sent about the reproducibility happening on many boards.

    Thanks,

    Erick

  • Hi Erick,

    Kindly let me know if there are any updates here

    Regards

    Gajanan

  • Gajanan,

    The team believes this is related to the patch CL6529585_Enable_cached_mappings_in_KM_on_ARM64_for_DDK_1.15. You have this applied in your kernel driver, correctly and updated the kernel driver on your system?

    Regards,

    Erick

  • Hi Erick,

    I verified and applied patch looks good.

    Just FYI - Patches shared in below was more stable than the recent patches.

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1136749/tda4vm-8-2-ti-sdk-gpu-error-and-gpu-application-hang

    Request you to analye the link above with changes suggested.

    Also due to see any possibility of any other conflicts which causing and increasing the issue of PVR ??


    PFA my patches applied along with revert of commit ID - c901804e8221d477983a6f7224a9cdc6e832f050
    I observed there are difference of about 30 lines of code between your patch and my patch.
    Can you cross check this as well once -


    diff --git a/services/server/common/physmem.c b/services/server/common/physmem.c
    index 277f804..4a8124f 100644
    --- a/services/server/common/physmem.c
    +++ b/services/server/common/physmem.c
    @@ -448,6 +448,23 @@ static PVRSRV_ERROR _DevPhysHeapFromFlags(PVRSRV_MEMALLOCFLAGS_T uiFlags,
     	return PVRSRV_OK;
     }
     
    +static INLINE void _PromoteToCpuCached(PVRSRV_MEMALLOCFLAGS_T *puiFlags)
    +{
    +	if ((*puiFlags & (PVRSRV_MEMALLOCFLAG_CPU_READABLE |
    +	                  PVRSRV_MEMALLOCFLAG_CPU_WRITEABLE |
    +	                  PVRSRV_MEMALLOCFLAG_KERNEL_CPU_MAPPABLE)) == 0)
    +	{
    +		/* We don't need to upgrade if we don't map into the CPU */
    +		return;
    +	}
    +
    +	/* Clear the existing CPU cache flags */
    +	*puiFlags &= ~(PVRSRV_MEMALLOCFLAG_CPU_CACHE_MODE_MASK);
    +
    +	/* Add CPU cached flags */
    +	*puiFlags |= PVRSRV_MEMALLOCFLAG_CPU_CACHE_INCOHERENT;
    +}
    +
     PVRSRV_ERROR
     PhysmemNewRamBackedPMR_direct(CONNECTION_DATA *psConnection,
                            PVRSRV_DEVICE_NODE *psDevNode,
    @@ -471,6 +488,12 @@ PhysmemNewRamBackedPMR_direct(CONNECTION_DATA *psConnection,
     
     	PVR_UNREFERENCED_PARAMETER(uiAnnotationLength);
     
    +	if (PVRSRVSystemSnoopingOfCPUCache(psDevNode->psDevConfig) &&
    +		psDevNode->pfnGetDeviceSnoopMode(psDevNode) == PVRSRV_DEVICE_SNOOP_CPU_ONLY)
    +	{
    +		_PromoteToCpuCached(&uiFlags);
    +	}
    +
     	eError = _ValidateParams(ui32NumPhysChunks,
     	                         ui32NumVirtChunks,
     	                         uiFlags,
    diff --git a/services/server/devices/rogue/rgxinit.c b/services/server/devices/rogue/rgxinit.c
    index 5357ab8..0e9f7a3 100644
    --- a/services/server/devices/rogue/rgxinit.c
    +++ b/services/server/devices/rogue/rgxinit.c
    @@ -1227,6 +1227,26 @@ static MMU_DEVICEATTRIBS *RGXDevMMUAttributes(PVRSRV_DEVICE_NODE *psDeviceNode,
     	return psMMUDevAttrs;
     }
     
    +/*
    +	RGXDevSnoopMode
    +*/
    +static PVRSRV_DEVICE_SNOOP_MODE RGXDevSnoopMode(PVRSRV_DEVICE_NODE *psDeviceNode)
    +{
    +	PVRSRV_RGXDEV_INFO *psDevInfo;
    +
    +	PVR_ASSERT(psDeviceNode != NULL);
    +	PVR_ASSERT(psDeviceNode->pvDevice != NULL);
    +
    +	psDevInfo = (PVRSRV_RGXDEV_INFO *) psDeviceNode->pvDevice;
    +
    +	if (RGX_IS_FEATURE_SUPPORTED(psDevInfo, AXI_ACELITE))
    +	{
    +		return PVRSRV_DEVICE_SNOOP_CPU_ONLY;
    +	}
    +
    +	return PVRSRV_DEVICE_SNOOP_NONE;
    +}
    +
     /*
      * RGXInitDevPart2
      */
    @@ -4516,6 +4536,7 @@ PVRSRV_ERROR RGXRegisterDevice(PVRSRV_DEVICE_NODE *psDeviceNode)
     
     	psDeviceNode->pfnValidateOrTweakPhysAddrs = NULL;
     
    +	psDeviceNode->pfnGetDeviceSnoopMode = RGXDevSnoopMode;
     	psDeviceNode->pfnMMUCacheInvalidate = RGXMMUCacheInvalidate;
     
     	psDeviceNode->pfnMMUCacheInvalidateKick = RGXMMUCacheInvalidateKick;
    diff --git a/services/server/devices/volcanic/rgxinit.c b/services/server/devices/volcanic/rgxinit.c
    index d8172a0..ab7eae6 100644
    --- a/services/server/devices/volcanic/rgxinit.c
    +++ b/services/server/devices/volcanic/rgxinit.c
    @@ -1124,6 +1124,17 @@ static MMU_DEVICEATTRIBS *RGXDevMMUAttributes(PVRSRV_DEVICE_NODE *psDeviceNode,
     	return psMMUDevAttrs;
     }
     
    +
    +/*
    +	RGXDevSnoopMode
    +*/
    +static PVRSRV_DEVICE_SNOOP_MODE RGXDevSnoopMode(PVRSRV_DEVICE_NODE *psDeviceNode)
    +{
    +	PVR_UNREFERENCED_PARAMETER(psDeviceNode);
    +
    +	return PVRSRV_DEVICE_SNOOP_NONE;
    +}
    +
     /*
      * RGXInitDevPart2
      */
    @@ -4090,6 +4101,7 @@ PVRSRV_ERROR RGXRegisterDevice (PVRSRV_DEVICE_NODE *psDeviceNode)
     
     	psDeviceNode->pfnValidateOrTweakPhysAddrs = NULL;
     
    +	psDeviceNode->pfnGetDeviceSnoopMode = RGXDevSnoopMode;
     	psDeviceNode->pfnMMUCacheInvalidate = RGXMMUCacheInvalidate;
     
     	psDeviceNode->pfnMMUCacheInvalidateKick = RGXMMUCacheInvalidateKick;
    diff --git a/services/server/include/device.h b/services/server/include/device.h
    index 1a92688..badc2ea 100644
    --- a/services/server/include/device.h
    +++ b/services/server/include/device.h
    @@ -337,6 +337,8 @@ typedef struct _PVRSRV_DEVICE_NODE_
     
     	MMU_DEVICEATTRIBS* (*pfnGetMMUDeviceAttributes)(struct _PVRSRV_DEVICE_NODE_ *psDevNode, IMG_BOOL bKernelMemoryCtx);
     
    +	PVRSRV_DEVICE_SNOOP_MODE (*pfnGetDeviceSnoopMode)(struct _PVRSRV_DEVICE_NODE_ *psDevNode);
    +
     	PVRSRV_DEVICE_CONFIG	*psDevConfig;
     
     	/* device post-finalise compatibility check */
    


    I used branch "linuxws/dunfell/k5.10/1.15.6133109_unified_fw_pagesize" to apply patches. I hope this is correct.

    Regards

    Gajanan

  • Gajanan,

    There were many patches and updates in the thread, I'm assuming you took the last ones on the last post? These shouldn't have any more bug fixes that aren't present in the FAQ already.

    Also due to see any possibility of any other conflicts which causing and increasing the issue of PVR ??

    I think the biggest concern is the application of the libraries and patches. In your kernel driver, could you please change the following line of code:

    132 static void SysDevFeatureDepInit(PVRSRV_DEVICE_CONFIG *psDevConfig, IMG_UINT64 ui64Features)
    133 {
    134 #if defined(SUPPORT_AXI_ACE_TEST)
    135         if( ui64Features & RGX_FEATURE_AXI_ACELITE_BIT_MASK)
    136         {
    137             gsDevices[0].eCacheSnoopingMode     = PVRSRV_DEVICE_SNOOP_CPU_ONLY;
    138         }
    139         else
    140 #endif
    141         {
    142             psDevConfig->eCacheSnoopingMode = PVRSRV_DEVICE_SNOOP_NONE;
    143         }
    144 }

    And change it to this:

    132 static void SysDevFeatureDepInit(PVRSRV_DEVICE_CONFIG *psDevConfig, IMG_UINT64 ui64Features)
    133 {
    134 #if defined(SUPPORT_AXI_ACE_TEST)
    135         if( ui64Features & RGX_FEATURE_AXI_ACELITE_BIT_MASK)
    136         {
    137             gsDevices[0].eCacheSnoopingMode     = PVRSRV_DEVICE_SNOOP_CPU_ONLY;
    138         }
    139         else
    140 #endif
    141         {
    142             psDevConfig->eCacheSnoopingMode = PVRSRV_DEVICE_SNOOP_CPU_ONLY;
    143         }
    144 }

    This may be missing in the patch. Let me know if behavior changes.

    Thanks,

    Erick

  • Hi Erick

    There were many patches and updates in the thread, I'm assuming you took the last ones on the last post? These shouldn't have any more bug fixes that aren't present in the FAQ already.

    Yes I am using the last one. The same info I shared it earlier but you suggested the new link of e2e patches are some more bug fixes. So I continued with your suggestions.

    So you confirmed the below link has all the required patches and no other changes needed apart from next suggestion listed below -
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1136749/tda4vm-8-2-ti-sdk-gpu-error-and-gpu-application-hang

    You are suggesting the below change in current stable version (as per the above link)  ??
    psDevConfig->eCacheSnoopingMode = PVRSRV_DEVICE_SNOOP_CPU_ONLY;

    Regards

    Gajanan

  • Gajanan,

    Yes I am using the last one. The same info I shared it earlier but you suggested the new link of e2e patches are some more bug fixes. So I continued with your suggestions.

    So you confirmed the below link has all the required patches and no other changes needed apart from next suggestion listed below -
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1136749/tda4vm-8-2-ti-sdk-gpu-error-and-gpu-application-hang

    Ok, I understand your query better now. In the past, we had used the libraries linked here because of a debug that was ongoing where there were performance issues and other bugs. I'm assuming you are using those libraries because you are trying to keep that fix in your code?

    There are 2 options here:

    1) Keep the libraries from here (https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1136749/tda4vm-8-2-ti-sdk-gpu-error-and-gpu-application-hang) and make sure you have the QOS workaround.

    2) Use the latest FAQ libraries, and make sure you take the latest fix instead of the QOS workaround: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1316731/faq-tda4vl-q1-what-are-the-gpu-driver-bug-fixes-for-sdk-8-6-or-earlier

    The libraries I linked in the FAQ do not have that performance fix. But, they will have the latest functional bug fixes. The performance fix is not actually in the driver, but it as part of the Yocto build in the filesystem, but is tightly coupled with the driver and will be hard to re-build.

    Let me know which option you need.

    Regards,

    Erick

  • Hi Erick,

    After below changes, the latest FAQ libraries looks stable compared to earlier libs shared in the same thread. We are still evaluating for long testing and keep posted here if found any issues -

    34 #if defined(SUPPORT_AXI_ACE_TEST)
    135         if( ui64Features & RGX_FEATURE_AXI_ACELITE_BIT_MASK)
    136         {
    137             gsDevices[0].eCacheSnoopingMode     = PVRSRV_DEVICE_SNOOP_CPU_ONLY;
    138         }
    139         else
    140 #endif
    141         {
    142             psDevConfig->eCacheSnoopingMode = PVRSRV_DEVICE_SNOOP_CPU_ONLY;
    143         }

    Regarding your suggestion, please suggest which option is more suitable for us from performance and stability prospective ???

    Regards

    Gajanan

  • Gajanan,

    Regarding your suggestion, please suggest which option is more suitable for us from performance and stability prospective ???

    The more suitable option for stability is the latest FAQ, which you are currently testing, and it would be the recommended to use for your system.

    For better performance, there was a complex patch that was introduced in the last E2E post, I'll call it the MESA patched libraries. These are going to be near impossible to reproduce again, but if performance becomes an issue, we can discuss how the solution would look like. It would require taking the latest FAQ libraries and patching them with that performance patch, which breaks other aspects of the GPU that you do not use, but other customers might use.

    Regards,

    Erick

  • Hi Erick,

    Thanks for you suggestion. We will evaluate more and get back to you.

    Regards

    Gajanan