This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM-Q1: GPU crash when do power on and off pressure test.

Part Number: TDA4VM-Q1
Other Parts Discussed in Thread: TDA4VM

Hi, Erick

Customer now meet GPU issue. They do pressure test, 30 seconds power on, 10 seconds power off.  Issue reproduce about once every 2000 times.

Scenario is power on, issue happen, there is no display in the screen. After 17 mins, coredump. During this period, there is no display all the time.

When the issue happen, I let them capture pvr log.pvrlogdump_error.txt

------------[ PVR DBG: START (High) ]------------
OS kernel info: Linux 5.10.120 #1 SMP PREEMPT Thu Mar 21 20:35:58 CST 2024 aarch64
DDK info: Rogue_DDK_Linux_WS rogueddk 1.15@6133109 (release) j721e_linux
Time now: 425641748us
Services State: OK
Server Errors: 0
Connections Device ID:0(128) P289-V289-T313-avmMain, P353-V353-T391-mv_psd, P435-V435-T444-mv_remote
------[ Driver Info ]------
Comparison of UM/KM components: MATCHING
KM Arch: 64 Bit
UM Connected Clients: 64 Bit
UM info: 1.15 @  6133109 (release) build options: 0x80000810
KM info: 1.15 @  6133109 (release) build options: 0x00000810
Window system: lws-generic
------[ RGX Device ID:0 Start ]------
------[ RGX Info ]------
Device Node (Info): 0000000094b10562 (000000007d2b5739)
RGX BVNC: 22.104.208.318 (rogue)
RGX Device State: Active
RGX Power State: ON
FW info: 1.15 @  6133109 (release) build options: 0x80000810
BIF0 - OK
RGX FW State: NOT RESPONDING - KCCB stalled (HWRState 0x00000001: HWR OK;)
RGX FW Power State: RGXFWIF_POW_ON (APM disabled: 0 ok, 0 denied, 0 non-idle, 0 retry, 0 other, 0 total. Latency: 100 ms)
RGX DVFS: 0 frequency changes. Current frequency: 750.000 MHz (sampled at 420828061435 ns). FW frequency: 100.000 MHz.
RGX FW OS 0 - State: active; Freelists: Ok; Priority: 0; MTS on;
RGX PHR configuration: (1) reset RD hardware
RGX Kernel CCB WO:0xE RO:0x0
RGX Firmware CCB WO:0x0 RO:0x0
RGX Kernel CCB commands executed = 0
RGX SLR: Forced UFO updates requested = 0
RGX Errors: WGP:0, TRP:0
FW System config flags = 0x00020000 (Ctx switch options: Medium CSW profile; VDM CS INDEX mode;)
FW OS config flags = 0x0000000F (Ctx switch: TDM; TA; 3D; CDM;)
------[ RGX registers ]------
RGX Register Base Address (Linear):   0x00000000f8d2fc51
RGX Register Base Address (Physical): 0x4E20000000
CORE_ID                       : 0x0000000008470000
CORE_REVISION                 : 0x00D0013E
DESIGNER_REV_FIELD1           : 0x00000000
DESIGNER_REV_FIELD2           : 0x00000000
CHANGESET_NUMBER              : 0x0000000000000000
CLK_CTRL                      : 0x0aaaaa002a2aaaaa
CLK_STATUS                    : 0x0000000000600000
CLK_CTRL2                     : 0x0000000000000000
CLK_STATUS2                   : 0x0000000000000000
EVENT_STATUS                  : 0x00000400
TIMER                         : 0x000000004a209206
BIF_FAULT_BANK0_MMU_STATUS    : 0x00000000
BIF_FAULT_BANK0_REQ_STATUS    : 0x0000000000000000
BIF_FAULT_BANK1_MMU_STATUS    : 0x00000000
BIF_FAULT_BANK1_REQ_STATUS    : 0x0000000000000000
BIF_MMU_STATUS                : 0x00000000
BIF_MMU_ENTRY                 : 0x00000000
BIF_MMU_ENTRY_STATUS          : 0x0000000000000000
BIF_STATUS_MMU                : 0x00000000
BIF_READS_EXT_STATUS          : 0x00000000
BIF_READS_INT_STATUS          : 0x00000000
BIFPM_STATUS_MMU              : 0x00000000
BIFPM_READS_EXT_STATUS        : 0x00000000
BIFPM_READS_INT_STATUS        : 0x00000000
BIF_CAT_BASE_INDEX            : 0x0000000000000000
BIF_CAT_BASE0                 : 0x0000000000000000
BIF_CAT_BASE1                 : 0x0000000000000000
BIF_CAT_BASE2                 : 0x0000000000000000
BIF_CAT_BASE3                 : 0x0000000000000000
BIF_CAT_BASE4                 : 0x0000000000000000
BIF_CAT_BASE5                 : 0x0000000000000000
BIF_CAT_BASE6                 : 0x0000000000000000
BIF_CAT_BASE7                 : 0x0000000000000000
BIF_CTRL_INVAL                : 0x00000000
BIF_CTRL                      : 0x000000C0
BIF_PM_CAT_BASE_VCE0          : 0x0000000000000000
BIF_PM_CAT_BASE_TE0           : 0x0000000000000000
BIF_PM_CAT_BASE_ALIST0        : 0x0000000000000000
BIF_PM_CAT_BASE_VCE1          : 0x0000000000000000
BIF_PM_CAT_BASE_TE1           : 0x0000000000000000
BIF_PM_CAT_BASE_ALIST1        : 0x0000000000000000
PERF_TA_PHASE                 : 0x00000000
PERF_TA_CYCLE                 : 0x00000000
PERF_3D_PHASE                 : 0x00000000
PERF_3D_CYCLE                 : 0x00000000
PERF_TA_OR_3D_CYCLE           : 0x00000000
PERF_TA_AND_3D_CYCLE          : 0x00000000
PERF_COMPUTE_PHASE            : 0x00000000
PERF_COMPUTE_CYCLE            : 0x00000000
PM_PARTIAL_RENDER_ENABLE      : 0x00000000
ISP_RENDER                    : 0x00000000
TLA_STATUS                    : 0x0000000000000000
MCU_FENCE                     : 0x0000000000000000
VDM_CONTEXT_STORE_STATUS      : 0x00000001
VDM_CONTEXT_STORE_TASK0       : 0x0000000000000000
VDM_CONTEXT_STORE_TASK1       : 0x0000000000000000
VDM_CONTEXT_STORE_TASK2       : 0x0000000000000000
VDM_CONTEXT_RESUME_TASK0      : 0x0000000000000000
VDM_CONTEXT_RESUME_TASK1      : 0x0000000000000000
VDM_CONTEXT_RESUME_TASK2      : 0x0000000000000000
ISP_CTL                       : 0x00000000
ISP_STATUS                    : 0x00000000
MTS_INTCTX                    : 0x00000000
MTS_BGCTX                     : 0x00000001
MTS_BGCTX_COUNTED_SCHEDULE    : 0x00000000
MTS_SCHEDULE                  : 0x00000000
MTS_GPU_INT_STATUS            : 0x00000400
CDM_CONTEXT_STORE_STATUS      : 0x00000000
CDM_CONTEXT_PDS0              : 0x0000000000000000
CDM_CONTEXT_PDS1              : 0x0000000000000000
CDM_TERMINATE_PDS             : 0x0000000000000000
CDM_TERMINATE_PDS1            : 0x0000000000000000
SIDEKICK_IDLE                 : 0x0000007A
SLC_IDLE                      : 0x000000FF
SLC_STATUS0                   : 0x00000000
SLC_STATUS1                   : 0x0000000000000000
SLC_STATUS2                   : 0x0000000000000000
SLC_CTRL_BYPASS               : 0x00000000
SLC_CTRL_MISC                 : 0x0000000000200003
MIPS_ADDR_REMAP1_CONFIG1      : 0x1FC00001
MIPS_ADDR_REMAP1_CONFIG2      : 0x00000008abd5f00c
MIPS_ADDR_REMAP2_CONFIG1      : 0x1FC01001
MIPS_ADDR_REMAP2_CONFIG2      : 0x00000008abd4200c
MIPS_ADDR_REMAP3_CONFIG1      : 0x1FC02001
MIPS_ADDR_REMAP3_CONFIG2      : 0x00000008abd6000c
MIPS_ADDR_REMAP4_CONFIG1      : 0x1FC00000
MIPS_ADDR_REMAP4_CONFIG2      : 0x000000000000000c
MIPS_ADDR_REMAP5_CONFIG1      : 0x00000001
MIPS_ADDR_REMAP5_CONFIG2      : 0x00000008abd5f00c
MIPS_WRAPPER_CONFIG           : 0x000000000001cf80
MIPS_EXCEPTION_STATUS         : 0x00000000
---- [ MIPS internal state ] ----
PC                            : 0xC00073BC
STATUS_REGISTER               : 0x00481004
CAUSE_REGISTER                : 0x40800C08
BAD_REGISTER                  : 0xC0007934
EPC                           : 0xC0007934
SP                            : 0xCF600F40
BAD_INSTRUCTION               : 0x00000000
TLB                           :
 0) VA 0xCF800000 ( 64k) -> PA0 0xe20000000 DV  , PA1 0x00000000    C
 1) VA 0xCF000000 ( 16k) -> PA0 0x8abfb0000 DVGC, PA1 0x8abfb4000 DVGC
 2) VA 0xCF600000 (  4k) -> PA0 0x8abd41000 DV C, PA1 0x00000000    C
 3) VA 0xC0032000 (  4k) -> PA0 0x8abd45000 DVGC, PA1 0x8abd44000 DVGC
 4) VA 0xC0006000 (  4k) -> PA0 0x8abd72000 DVGC, PA1 0x8abd71000 DVGC
 5) VA 0xC0016000 (  4k) -> PA0 0x8abd62000 DVGC, PA1 0x8abd61000 DVGC
 6) VA 0xC1FF0000 (  4k) -> PA0 0x8abd96000 DVGC, PA1 0x8abd97000 DVGC
 7) VA 0xC0020000 (  4k) -> PA0 0x8abd57000 DVG , PA1 0x8abd30000 DVG
 8) VA 0x00000000 (  4k) -> PA0 0x8abd78000 DVGC, PA1 0x8abd77000 DVGC
BRN63553 WA present with a valid TLB entry mapping address 0x0.
 9) VA 0xC001E000 (  4k) -> PA0 0x8abd92000  VGC, PA1 0x8abd94000 DVG
10) VA 0xF0014000 (  4k) -> PA0 0x00000000    C, PA1 0x00000000    C
11) VA 0xC1FD0000 (  4k) -> PA0 0x8abd47000 DVG , PA1 0x8abd48000 DVG
12) VA 0xC0008000 (  4k) -> PA0 0x8abd70000 DVGC, PA1 0x8abd6f000 DVGC
13) VA 0xF001A000 (  4k) -> PA0 0x00000000    C, PA1 0x00000000    C
14) VA 0xC1FE0000 (  4k) -> PA0 0x8abd7b000 DVGC, PA1 0x8abd7c000 DVGC
15) VA 0xC001A000 (  4k) -> PA0 0x8abd8b000 DVG , PA1 0x8abd8d000 DVG
--------------------------------
------[ RGX FW Trace Info ]------
Debug log type: trace ( main )
------[ RGX FW thread 0 trace START ]------
FWT[traceptr]: 0
FWT[tracebufsize]: 2EE0
FWT[00000000]: 00000000 ... 00000000
FWT[END]: 400 lines were all zero
------[ RGX FW thread 0 trace END ]------
------[ Full CCB Status ]------
FWCtx 0xC0028300 (TQ_3D-P289-T313-avmMain)
  |--Waiting TQ_3D @ 0 Int=1 Ext=1
  |--Waiting UPDATE @ 200 Int=1 Ext=1
  |  |--Addr:0xc002b000 Val=0x00000001
  |  `--Addr:0xc002e001 Val=0x00000519
  |--Waiting TQ_3D @ 256 Int=2 Ext=2
  |--Waiting UPDATE @ 456 Int=2 Ext=2
  |  `--Addr:0xc002b000 Val=0x00000002
  |--Waiting TQ_3D @ 504 Int=3 Ext=3
  |--Waiting UPDATE @ 704 Int=3 Ext=3
  |  `--Addr:0xc002b000 Val=0x00000003
  |--Waiting TQ_3D @ 752 Int=4 Ext=4
  |--Waiting UPDATE @ 952 Int=4 Ext=4
  |  `--Addr:0xc002b000 Val=0x00000004
  |--Waiting TQ_3D @ 1000 Int=5 Ext=5
  |--Waiting UPDATE @ 1200 Int=5 Ext=5
  |  `--Addr:0xc002b000 Val=0x00000005
  |--Waiting TQ_3D @ 1248 Int=6 Ext=6
  `--Waiting UPDATE @ 1448 Int=6 Ext=6
     |--Addr:0xc002b000 Val=0x00000006
     `--Addr:0xc002e009 Val=0x00000519
FWCtx 0xC0028040 (TA-P289-T313-avmMain)
  |--Waiting FENCE @ 0 Int=8 Ext=0
  |  |--Addr:0xc002d000 Val=0x00000000
  |  `--Addr:0xc002a000 Val=0x00000000
  |--Waiting TA @ 56 Int=8 Ext=0
  `--Waiting UPDATE @ 168 Int=8 Ext=0
     |--Addr:0xc002d000 Val=0x00000001
     |--Addr:0xc002a000 Val=0x00000001
     `--Addr:0xc002e031 Val=0x00000519
FWCtx 0xC00280E0 (3D-P289-T313-avmMain)
  |--Waiting FENCE_PR @ 0 Int=8 Ext=0
  |  `--Addr:0xc002d000 Val=0x00000001
  |--Waiting 3D @ 48 Int=8 Ext=0
  `--Waiting UPDATE @ 440 Int=8 Ext=0
     |--Addr:0xc002d000 Val=0x00000002
     |--Addr:0xc002a000 Val=0x00000002
     |--Addr:0xc002e029 Val=0x00000519
     `--Addr:0xc002e039 Val=0x00000519
FWCtx 0xC002F000 (TA-P353-T391-mv_psd)
  |--Waiting FENCE @ 0 Int=7 Ext=0
  |  |--Addr:0xc0031000 Val=0x00000000
  |  `--Addr:0xc0030000 Val=0x00000000
  |--Waiting TA @ 56 Int=7 Ext=0
  `--Waiting UPDATE @ 168 Int=7 Ext=0
     |--Addr:0xc0031000 Val=0x00000001
     |--Addr:0xc0030000 Val=0x00000001
     `--Addr:0xc002e019 Val=0x00000519
FWCtx 0xC002F0A0 (3D-P353-T391-mv_psd)
  |--Waiting FENCE_PR @ 0 Int=7 Ext=0
  |  `--Addr:0xc0031000 Val=0x00000001
  |--Waiting 3D @ 48 Int=7 Ext=0
  `--Waiting UPDATE @ 440 Int=7 Ext=0
     |--Addr:0xc0031000 Val=0x00000002
     |--Addr:0xc0030000 Val=0x00000002
     |--Addr:0xc002e011 Val=0x00000519
     `--Addr:0xc002e021 Val=0x00000519
FWCtx 0xC002F500 (TA-P435-T444-mv_remote)
  `--<Empty>
FWCtx 0xC002F5A0 (3D-P435-T444-mv_remote)
  `--<Empty>
------[ RGX Device ID:0 End ]------
------[ System Summary Device ID:0 ]------
Device System Power State: ON
MaxHWTOut: 500000us, WtTryCt: 10000, WDGTOut(on,off): (10000ms,3600000ms)
------[ Server Thread Summary ]------
  pvr_defer_free : Running
    Number of deferred cleanup items : 0
  pvr_device_wdg : Running
  pvr_cacheop : Running
    Configuration: QSZ: 16, UKT: -1, KDFT: 131072, LINESIZE: 64, PGSIZE: 4096, KDF: Yes, URBF: Yes
    Pending deferred CacheOp entries : 0
------[ AppHint Settings ]------
  Build Vars
    EnableTrustedDeviceAceConfig: N
    CleanupThreadPriority: 0x00000005
    CacheOpThreadPriority: 0x00000001
    WatchdogThreadPriority: 0x00000000
    HWPerfClientBufferSize: 0x000c0000
  Module Params
    none
  Debug Info Params
    CacheOpConfig: 0x0000000c
    CacheOpUMKMThresholdSize: 0xffffffff
  Debug Info Params Device ID: 0
    EnableLogGroup: main
------[ HTB Log state: Off ]------
------[ Active Sync Checkpoints ]------
        - ID = 7, FWAddr = 0xc002e038, r1:e1:f0: es3_DoKick3D_0
        - ID = 6, FWAddr = 0xc002e030, r1:e1:f0: es3_DoKickTA_0
        - ID = 5, FWAddr = 0xc002e028, r1:e1:f0: update fence
        - ID = 4, FWAddr = 0xc002e020, r1:e1:f0: es3_DoKick3D_0
        - ID = 3, FWAddr = 0xc002e018, r1:e1:f0: es3_DoKickTA_0
        - ID = 2, FWAddr = 0xc002e010, r1:e1:f0: update fence
        - ID = 1, FWAddr = 0xc002e008, r1:e1:f0: TQM
        - ID = 0, FWAddr = 0xc002e000, r1:e1:f0: TQM
------[ Native Fence Sync: timelines ]------
foreign_sync: @0 ctx=1 refs=1
sw: RM_SWTimeline-v_avm-avmMain-289 @0 cur=0
rogue-ta3d: @1 ctx=3 refs=2
 @0: (+-) refs=5 fwaddr=0xc002e029 enqueue=1 status=Active    0-update fence
rogue-tq3d: @0 ctx=5 refs=1
QE-mv_avm-avmMain-289: @2 ctx=6 refs=3
 @0: (+-) refs=2 fwaddr=0xc002e001 enqueue=1 status=Active    0-TQM
 @1: (+-) refs=2 fwaddr=0xc002e009 enqueue=1 status=Active    1-TQM
sw: RM_SWTimeline-mv_psd-353 @0 cur=0
rogue-ta3d: @1 ctx=8 refs=2
 @0: (+-) refs=6 fwaddr=0xc002e011 enqueue=1 status=Active    0-update fence
V3-mv_psd-353: @1 ctx=10 refs=2
 @0: (+-) refs=2 fwaddr=0xc002e019 enqueue=1 status=Active    0-es3_DoKickTA_0
P3-mv_psd-353: @1 ctx=11 refs=2
 @0: (+-) refs=2 fwaddr=0xc002e021 enqueue=1 status=Active    0-es3_DoKick3D_0
V3-mv_avm-avmMain-289: @1 ctx=12 refs=2
 @0: (+-) refs=2 fwaddr=0xc002e031 enqueue=1 status=Active    0-es3_DoKickTA_0
P3-mv_avm-avmMain-289: @1 ctx=13 refs=2
 @0: (+-) refs=2 fwaddr=0xc002e039 enqueue=1 status=Active    0-es3_DoKick3D_0
sw: RM_SWTimeline-mv_remote-435 @0 cur=0
rogue-ta3d: @0 ctx=15 refs=1
------------[ PVR DBG: END ]------------

Here is the GPU log in terminal.

Here is the function call in their code project:

Here is the core dump log:

Please help to further debug.

Regards

Zekun