TDA4VE-Q1: Rendering artifacts while using opengl + R5F display node on SDK9.1

Part Number: TDA4VE-Q1

Hi, 

Setup

Question


We are currently facing display corruption when using the R5F display node. We use a custom opengl node based on the mosaic one from visionapps and we pass a vximage directly to the display node.

The graph is called in an a vxProcessGraph() and our graph looks like:
inputs -> [opengl openvx node on A72] -> [display node on R5F] -> display port

The problem is that at random time (~1/400) we can see corruption in the Y and X axis

  

Here a non corrupted image as reference.

Is it a behavior that TI's observed in certain cases or a known issue in the SDK9.1 ?

Thanks,

  • Hi,

    inputs -> [opengl openvx node on A72] -> [display node on R5F] -> display port

    If your flow is as above, could you please confirm that the output of OpenGL based node is correct by saving the same. (i.e. before sending the same to display?)

    I believe you would have a full ready output buffer at the output of OpenGL node and then you use one video pipe of display node to send this to display right?

    In this case, if you replace the display node with file save as seen in opengl node based on the mosaic one from visionapps, do you still see this issue?

    May I know if you are using eDP output here from Display?

    Regards,

    Nikhil

  • 1. I'll validate the image before the display node, it shouldn't be a problem to put in place the file save
    2. Yes this is how we do. Once we have everything we send it to the opengl node
    3. We use [tda4-eDP] -> [active displayport to hdmi adapter] -> [Display] 

    I just checked the dmesg and we also see this driver log after few seconds after our application start.

    [  275.343261] PVR_K:  228: ------------[ PVR DBG: START (High) ]------------
    [  275.350212] PVR_K:  228: OS kernel info: Linux 6.1.46-g5892b80d6b #1 SMP PREEMPT Wed Apr  3 19:34:28 UTC 2024 aarch64
    [  275.360827] PVR_K:  228: DDK info: Rogue_DDK_Linux_WS rogueddk 23.2@6460340 (release) j721s2_linux
    [  275.369798] PVR_K:  228: Time now: 275369792us
    [  275.374260] PVR_K:  228: Services State: OK
    [  275.378483] PVR_K:  228: Server Errors: 0
    [  275.382800] PVR_K:  228: Connections Device ID:0(128) P1128-V1-T1153-VayaDriveConsol
    [  275.390613] PVR_K:  228: ------[ Driver Info ]------
    [  275.395699] PVR_K:  228: Comparison of UM/KM components: MATCHING
    [  275.405345] PVR_K:  228: KM Arch: 64 Bit
    [  275.410821] PVR_K:  228: Driver Mode: Native
    [  275.415527] PVR_K:  228: UM Connected Clients: 64 Bit
    [  275.420814] PVR_K:  228: UM info: 23.2 @  6460340 (release) build options: 0x80000810
    [  275.428899] PVR_K:  228: KM info: 23.2 @  6460340 (release) build options: 0x00000810
    [  275.437396] PVR_K:  228: Window system: wayland
    [  275.442353] PVR_K:  228: ------[ Server Thread Summary ]------
    [  275.448690] PVR_K:  228:   pvr_defer_free : Running
    [  275.453976] PVR_K:  228:     Number of deferred cleanup items: QUEUED: 00000  CONNECTION : 00000 MMU : 00000 OSMEM : 00000 PMR : 00000
    [  275.466696] PVR_K:  228:     Number of deferred cleanup items dropped after retry limit reached : 0
    [  275.476102] PVR_K:  228:   pvr_device_wdg : Running
    [  275.481396] PVR_K:  228: ------[ RGX Device ID:0 Start ]------
    [  275.487504] PVR_K:  228: ------[ RGX Info ]------
    [  275.492539] PVR_K:  228: Device Node (Info): 00000000078fb2ef (00000000f5bd2391)
    [  275.500309] PVR_K:  228:     DevmemHistoryRecordStats - None
    [  275.506401] PVR_K:  228: RGX BVNC: 36.53.104.796 (rogue)
    [  275.511990] PVR_K:  228: RGX Device State: ACTIVE
    [  275.517009] PVR_K:  228: RGX Power State: ON
    [  275.521464] PVR_K:  228: FW info: 23.2 @  6460340 (release) build options: 0x80000810
    [  275.529415] PVR_K:  228: TRP: HW support - Yes; SW disabled
    [  275.535300] PVR_K:  228: WGP: HW support - Yes; SW disabled
    [  275.543149] PVR_K:  228: BIF0 - OK
    [  275.547033] PVR_K:  228: BIF1 - OK
    [  275.555602] PVR_K:  228: FWCORE - OK
    [  275.559198] PVR_K:  228: RGX FW State: OK (HWRState 0x00000001: HWR OK;)
    [  275.566278] PVR_K:  228: RGX FW Power State: RGXFWIF_POW_IDLE (APM enabled: 310 ok, 1 denied, 0 non-idle, 24408 retry, 0 other, 24719 total. Latency: 100 ms)
    [  275.580619] PVR_K:  228: RGX DVFS: 0 frequency changes. Current frequency: 799.999 MHz (sampled at 274993121725 ns). FW frequency: 800.000 MHz.
    [  275.595579] PVR_K:  228: RGX FW OS 0 - State: active; Freelists: Ok; Priority: 0; Isolation group: 0; MTS on;
    [  275.607947] PVR_K:  228: Number of HWR: GP(0/0+0), 2D(0/0+0), TA(1/1+0), 3D(0/0+0), CDM(0/0+0), RAY(0/0+0), TA2(0/0+0), FALSE(0,0,0,0,0,0,0)
    [  275.620873] PVR_K:  228: DM 0 (GP)
    [  275.624476] PVR_K:  228: DM 1 (HWRflags 0x00000000: working;)
    [  275.631189] PVR_K:  228: DM 2 (HWRflags 0x00000000: working;)
    [  275.637255] PVR_K:  228:   Recovery 1: Core = 0, PID = 1128 / VayaDriveConsol, frame = 0, HWRTData = 0x60046640, EventStatus = 0x00000000, Guilty Lockup
    [  275.651162] PVR_K:  228:               CRTimer = 0x0000000A398A, OSTimer = 275.207505085, CyclesElapsed = 33024
    [  275.661636] PVR_K:  228:               PreResetTimeInCycles = 30720, HWResetTimeInCycles = 109568, FreelistReconTimeInCycles = 157440, TotalRecoveryTimeInCycles = 297728
    [  275.677034] PVR_K:  228:     BIF0 - FAULT:
    [  275.681429] PVR_K:  228:       * MMU status (0x0000000000003041): PC = 3, Page Size = 0 (Page Catalog).
    [  275.691143] PVR_K:  228:       * Request (0x00101494e6a574d0): MCU PDS USCA (-), Reading from 0x94E6A574D0.
    [  275.701167] PVR_K:  228:     PC index (595) out of bounds (0)
    [  275.707417] PVR_K:  228: DM 3 (HWRflags 0x00000000: working;)
    [  275.713329] PVR_K:  228: DM 4 (HWRflags 0x00000000: working;)
    [  275.719375] PVR_K:  228: DM 5 (HWRflags 0x00000000: working;)
    [  275.725506] PVR_K:  228: DM 6 (HWRflags 0x00000000: working;)
    [  275.731523] PVR_K:  228: RGX Kernel CCB WO:0x24F RO:0x24F
    [  275.737103] PVR_K:  228: RGX Firmware CCB WO:0x3 RO:0x3
    [  275.742460] PVR_K:  228: RGX Kernel CCB commands executed = 10831
    [  275.748674] PVR_K:  228: RGX SLR: Forced UFO updates requested = 0
    [  275.755009] PVR_K:  228: RGX Errors: WGP:0, TRP:0
    [  275.759883] PVR_K:  228: Thread0: FW IRQ count = 10825
    [  275.765169] PVR_K:  228: Last sampled IRQ count in LISR = 10825
    [  275.771269] PVR_K:  228: FW System config flags = 0x00020000 (Ctx switch options: Medium CSW profile;)
    [  275.780822] PVR_K:  228: FW OS config flags = 0x0000000F (Ctx switch: TDM; GEOM; 3D; CDM;)
    [  275.789271] PVR_K:  228: ------[ RGX registers ]------
    [  275.794525] PVR_K:  228: RGX Register Base Address (Linear):   0x00000000d9f65986
    [  275.802162] PVR_K:  228: RGX Register Base Address (Physical): 0x4E20000000
    [  275.809262] PVR_K:  228: CORE_ID__PBVNC                : 0x002400350068031C
    [  275.816399] PVR_K:  228: DESIGNER_REV_FIELD1           : 0x00000000
    [  275.822824] PVR_K:  228: DESIGNER_REV_FIELD2           : 0x00000000
    [  275.829200] PVR_K:  228: CHANGESET_NUMBER              : 0x0000000000000000
    [  275.836317] PVR_K:  228: MULTICORE_SYSTEM              : 0x00000001
    [  275.842742] PVR_K:  228: MULTICORE_GPU                 : 0x00000078
    [  275.849380] PVR_K:  228: CLK_CTRL                      : 0x002AAA002A22AAAA
    [  275.856610] PVR_K:  228: CLK_STATUS                    : 0x0000000000600000
    [  275.863761] PVR_K:  228: CLK_CTRL2                     : 0x0000000000000000
    [  275.870962] PVR_K:  228: CLK_STATUS2                   : 0x0000000000000000
    [  275.878051] PVR_K:  228: EVENT_STATUS                  : 0x00000000
    [  275.884470] PVR_K:  228: TIMER                         : 0x0000000000240AE9
    [  275.891576] PVR_K:  228: BIF_FAULT_BANK0_MMU_STATUS    : 0x00000000
    [  275.897948] PVR_K:  228: BIF_FAULT_BANK0_REQ_STATUS    : 0x0000000000000000
    [  275.905065] PVR_K:  228: BIF_FAULT_BANK1_MMU_STATUS    : 0x00000000
    [  275.911651] PVR_K:  228: BIF_FAULT_BANK1_REQ_STATUS    : 0x0000000000000000
    [  275.918846] PVR_K:  228: BIF_MMU_STATUS                : 0x00000000
    [  275.925304] PVR_K:  228: BIF_MMU_ENTRY                 : 0x00000000
    [  275.931702] PVR_K:  228: BIF_MMU_ENTRY_STATUS          : 0x0000000000000000
    [  275.938799] PVR_K:  228: BIF_STATUS_MMU                : 0x00000000
    [  275.945186] PVR_K:  228: BIF_READS_EXT_STATUS          : 0x00000000
    [  275.951611] PVR_K:  228: BIF_READS_INT_STATUS          : 0x00000000
    [  275.958019] PVR_K:  228: BIFPM_STATUS_MMU              : 0x00000000
    [  275.964452] PVR_K:  228: BIFPM_READS_EXT_STATUS        : 0x00000000
    [  275.970858] PVR_K:  228: BIFPM_READS_INT_STATUS        : 0x00000000
    [  275.977477] PVR_K:  228: BIF_CAT_BASE_INDEX            : 0x0000000000000303
    [  275.984718] PVR_K:  228: BIF_CAT_BASE0                 : 0x00000008AFE8C000
    [  275.991863] PVR_K:  228: BIF_CAT_BASE1                 : 0x0000000000000000
    [  275.998953] PVR_K:  228: BIF_CAT_BASE2                 : 0x0000000000000000
    [  276.006053] PVR_K:  228: BIF_CAT_BASE3                 : 0x00000008E474F000
    [  276.013324] PVR_K:  228: BIF_CAT_BASE4                 : 0x0000000000000000
    [  276.020446] PVR_K:  228: BIF_CAT_BASE5                 : 0x0000000000000000
    [  276.027599] PVR_K:  228: BIF_CAT_BASE6                 : 0x0000000000000000
    [  276.034697] PVR_K:  228: BIF_CAT_BASE7                 : 0x0000000000000000
    [  276.042008] PVR_K:  228: BIF_CTRL_INVAL                : 0x00000000
    [  276.048514] PVR_K:  228: BIF_CTRL                      : 0x00000000
    [  276.054944] PVR_K:  228: BIF_PM_CAT_BASE_VCE0          : 0x000000093F513001
    [  276.062080] PVR_K:  228: BIF_PM_CAT_BASE_TE0           : 0x000000093F51E001
    [  276.069155] PVR_K:  228: BIF_PM_CAT_BASE_ALIST0        : 0x000000093F523001
    [  276.076252] PVR_K:  228: BIF_PM_CAT_BASE_VCE1          : 0x000000093F513001
    [  276.083390] PVR_K:  228: BIF_PM_CAT_BASE_TE1           : 0x000000093F51E001
    [  276.090484] PVR_K:  228: BIF_PM_CAT_BASE_ALIST1        : 0x000000093F523001
    [  276.097546] PVR_K:  228: MULTICORE_GEOMETRY_CTRL_COMMON: 0x00000000
    [  276.103950] PVR_K:  228: MULTICORE_FRAGMENT_CTRL_COMMON: 0x00000001
    [  276.110384] PVR_K:  228: MULTICORE_COMPUTE_CTRL_COMMON : 0x00000101
    [  276.116974] PVR_K:  228: PERF_TA_PHASE                 : 0x00000001
    [  276.123480] PVR_K:  228: PERF_TA_CYCLE                 : 0x00015BD1
    [  276.129937] PVR_K:  228: PERF_3D_PHASE                 : 0x00000001
    [  276.136348] PVR_K:  228: PERF_3D_CYCLE                 : 0x0020DCE4
    [  276.142751] PVR_K:  228: PERF_TA_OR_3D_CYCLE           : 0x002238B5
    [  276.149214] PVR_K:  228: PERF_TA_AND_3D_CYCLE          : 0x00000000
    [  276.155696] PVR_K:  228: PERF_COMPUTE_PHASE            : 0x00000000
    [  276.162146] PVR_K:  228: PERF_COMPUTE_CYCLE            : 0x00000000
    [  276.168536] PVR_K:  228: PM_PARTIAL_RENDER_ENABLE      : 0x00000000
    [  276.175073] PVR_K:  228: ISP_RENDER                    : 0x00000000
    [  276.181582] PVR_K:  228: TLA_STATUS                    : 0x0000000000000000
    [  276.188689] PVR_K:  228: MCU_FENCE                     : 0x0000018000000000
    [  276.195798] PVR_K:  228: VDM_CONTEXT_STORE_STATUS      : 0x00000001
    [  276.202233] PVR_K:  228: VDM_CONTEXT_STORE_TASK0       : 0x0000000000000000
    [  276.209332] PVR_K:  228: VDM_CONTEXT_STORE_TASK1       : 0x0000000000000000
    [  276.216449] PVR_K:  228: VDM_CONTEXT_STORE_TASK2       : 0x0000000000000000
    [  276.223560] PVR_K:  228: VDM_CONTEXT_RESUME_TASK0      : 0x0000000000000000
    [  276.230655] PVR_K:  228: VDM_CONTEXT_RESUME_TASK1      : 0x0000000000000000
    [  276.237706] PVR_K:  228: VDM_CONTEXT_RESUME_TASK2      : 0x0000000000000000
    [  276.245050] PVR_K:  228: ISP_CTL                       : 0x8003500F
    [  276.251592] PVR_K:  228: ISP_STATUS                    : 0x00000003
    [  276.258153] PVR_K:  228: MTS_INTCTX                    : 0x00000000
    [  276.264575] PVR_K:  228: MTS_BGCTX                     : 0x00000000
    [  276.270981] PVR_K:  228: MTS_BGCTX_COUNTED_SCHEDULE    : 0x00000000
    [  276.277365] PVR_K:  228: MTS_SCHEDULE                  : 0x00000000
    [  276.283767] PVR_K:  228: MTS_GPU_INT_STATUS            : 0x00000110
    [  276.290158] PVR_K:  228: CDM_CONTEXT_STORE_STATUS      : 0x00000000
    [  276.296560] PVR_K:  228: CDM_CONTEXT_PDS0              : 0x0000000000000000
    [  276.303680] PVR_K:  228: CDM_CONTEXT_PDS1              : 0x0000000000000000
    [  276.310898] PVR_K:  228: CDM_TERMINATE_PDS             : 0x0000000000000000
    [  276.318110] PVR_K:  228: CDM_TERMINATE_PDS1            : 0x0000000000000000
    [  276.325246] PVR_K:  228: CDM_CONTEXT_LOAD_PDS0         : 0x0000000000000000
    [  276.332364] PVR_K:  228: CDM_CONTEXT_LOAD_PDS1         : 0x0000000000000000
    [  276.339447] PVR_K:  228: SIDEKICK_IDLE                 : 0x0000007F
    [  276.345843] PVR_K:  228: SLC_IDLE                      : 0x000000FF
    [  276.352264] PVR_K:  228: SLC_STATUS0                   : 0x00000000
    [  276.358685] PVR_K:  228: SLC_STATUS1                   : 0x0000000000000000
    [  276.365778] PVR_K:  228: SLC_STATUS2                   : 0x0000000000000000
    [  276.372889] PVR_K:  228: SLC_CTRL_BYPASS               : 0x00001D1F00000000
    [  276.380286] PVR_K:  228: SLC_CTRL_MISC                 : 0x0000000000200003
    [  276.387795] PVR_K:  228: SAFETY_EVENT_STATUS__ROGUEXE  : 0x00000000
    [  276.394348] PVR_K:  228: MTS_SAFETY_EVENT_ENABLE__ROGUEXE: 0x000000FF
    [  276.401034] PVR_K:  228: FWCORE_WDT_CTRL               : 0x00001F01
    [  276.407578] PVR_K:  228: SCRATCH0                      : 0x00000000
    [  276.414072] PVR_K:  228: SCRATCH1                      : 0x00000000
    [  276.420786] PVR_K:  228: SCRATCH2                      : 0x00000000
    [  276.427224] PVR_K:  228: SCRATCH3                      : 0x00000000
    [  276.433763] PVR_K:  228: SCRATCH4                      : 0x00000000
    [  276.440715] PVR_K:  228: SCRATCH5                      : 0x00000000
    [  276.447389] PVR_K:  228: SCRATCH6                      : 0x00000000
    [  276.453926] PVR_K:  228: SCRATCH7                      : 0x00000000
    [  276.460479] PVR_K:  228: SCRATCH8                      : 0x00000000
    [  276.466914] PVR_K:  228: SCRATCH9                      : 0x00000000
    [  276.473458] PVR_K:  228: SCRATCH10                     : 0x00000000
    [  276.479959] PVR_K:  228: SCRATCH11                     : 0x00000000
    [  276.486366] PVR_K:  228: SCRATCH12                     : 0x00000000
    [  276.493316] PVR_K:  228: SCRATCH13                     : 0x00000000
    [  276.499891] PVR_K:  228: SCRATCH14                     : 0x00000000
    [  276.506369] PVR_K:  228: SCRATCH15                     : 0x00000000
    [  276.513292] PVR_K:  228: FWCORE_MEM_CAT_BASE0          : 0x00000008AFE8C000
    [  276.520710] PVR_K:  228: FWCORE_MEM_CAT_BASE1          : 0x0000000000000000
    [  276.528056] PVR_K:  228: FWCORE_MEM_CAT_BASE2          : 0x0000000000000000
    [  276.535327] PVR_K:  228: FWCORE_MEM_CAT_BASE3          : 0x00000008E474F000
    [  276.542512] PVR_K:  228: FWCORE_MEM_CAT_BASE4          : 0x0000000000000000
    [  276.549707] PVR_K:  228: FWCORE_MEM_CAT_BASE5          : 0x0000000000000000
    [  276.556904] PVR_K:  228: FWCORE_MEM_CAT_BASE6          : 0x0000000000000000
    [  276.564132] PVR_K:  228: FWCORE_MEM_CAT_BASE7          : 0x0000000000000000
    [  276.571305] PVR_K:  228: FWCORE_ADDR_REMAP_CONFIG4     : 0x120000E1C0002000
    [  276.578673] PVR_K:  228: FWCORE_ADDR_REMAP_CONFIG5     : 0x220000E1C001E000
    [  276.586099] PVR_K:  228: FWCORE_ADDR_REMAP_CONFIG6     : 0x220000E1C0000000
    [  276.593432] PVR_K:  228: FWCORE_ADDR_REMAP_CONFIG12    : 0x120000E1C0000000
    [  276.600635] PVR_K:  228: FWCORE_ADDR_REMAP_CONFIG13    : 0x220000E1C0000000
    [  276.607800] PVR_K:  228: FWCORE_ADDR_REMAP_CONFIG14    : 0x0000000000000000
    [  276.615006] PVR_K:  228: FWCORE_MEM_FAULT_MMU_STATUS   : 0x00000000
    [  276.621488] PVR_K:  228: FWCORE_MEM_FAULT_REQ_STATUS   : 0x0000000000000000
    [  276.628622] PVR_K:  228: FWCORE_MEM_MMU_STATUS         : 0x00000000
    [  276.635108] PVR_K:  228: FWCORE_MEM_READS_EXT_STATUS   : 0x00000000
    [  276.641765] PVR_K:  228: FWCORE_MEM_READS_INT_STATUS   : 0x00000000
    [  276.648689] PVR_K:  228: ---- [ RISC-V internal state ] ----
    [  276.654908] PVR_K:  228: pc                            : 0x40004004
    [  276.661585] PVR_K:  228: ra                            : 0x4001BDEA
    [  276.668100] PVR_K:  228: sp                            : 0x50001010
    [  276.674617] PVR_K:  228: mepc                          : 0x40004004
    [  276.681124] PVR_K:  228: mcause                        : 0x8000000B
    [  276.687555] PVR_K:  228: mdseac                        : 0x00000000
    [  276.694024] PVR_K:  228: mstatus                       : 0x00001888
    [  276.700510] PVR_K:  228: mie                           : 0x40000888
    [  276.707264] PVR_K:  228: mip                           : 0x00000000
    [  276.714009] PVR_K:  228: mscratch                      : 0x00000000
    [  276.720649] PVR_K:  228: mbvnc0                        : 0x00010001
    [  276.727188] PVR_K:  228: mbvnc1                        : 0x0032000B
    [  276.733692] PVR_K:  228: micect                        : 0x10000000
    [  276.740199] PVR_K:  228: mdcect                        : 0x10000000
    [  276.746616] PVR_K:  228: mdcrfct                       : 0x10000000
    [  276.753104] PVR_K:  228: TFBC_VERSION                  : 0x0000000A
    [  276.759651] PVR_K:  228: ------[ RGX FW Trace Info ]------
    [  276.765327] PVR_K:  228: Debug log type: none
    [  276.769939] PVR_K:  228: RGX FW thread 0: Trace buffer not yet allocated
    [  276.777139] PVR_K:  228: ------[ Full CCB Status ]------
    [  276.782848] PVR_K:  228: FWCtx 0x60046040 (TA-P1128-T1153-VayaDriveConso)
    [  276.789973] PVR_K:  228:   `--<Empty>
    [  276.793807] PVR_K:  228: FWCtx 0x600460F0 (3D-P1128-T1153-VayaDriveConso)
    [  276.800766] PVR_K:  228:   `--<Empty>
    [  276.804576] PVR_K:  228: ------[ RGX Device ID:0 End ]------
    [  276.810476] PVR_K:  228: ------[ Device ID: 128 - Phys Heaps ]------
    [  276.817151] PVR_K:  228: 0x000000007359d691 -> PdMs: SYSMEM, Type: UMA, Usage Flags: 0x00000004 (GPU_LOCAL), Refs: 11, Free Size: 12971237376B, Total Size: 14997815296B
    [  276.832720] PVR_K:  228: PMR Zombie Count: 2, PMR Zombie Count In Cleanup: 0
    [  276.840223] PVR_K:  228: PMR Live Count: 208
    [  276.845181] PVR_K:  228: ------[ System Summary Device ID:0 ]------
    [  276.852208] PVR_K:  228: Device System Power State: ON
    [  276.857833] PVR_K:  228: MaxHWTOut: 500000us, WtTryCt: 10000, WDGTOut(on,off): (10000ms,3600000ms)
    [  276.868040] PVR_K:  228: ------[ AppHint Settings ]------
    [  276.873824] PVR_K:  228:   Build Vars
    [  276.877938] PVR_K:  228:     EnableTrustedDeviceAceConfig: N
    [  276.884063] PVR_K:  228:     CleanupThreadPriority: 0x00000005
    [  276.890257] PVR_K:  228:     WatchdogThreadPriority: 0x00000000
    [  276.896609] PVR_K:  228:     HWPerfClientBufferSize: 0x000c0000
    [  276.902889] PVR_K:  228:     DevmemHistoryBufSizeLog2: 0x0000000b
    [  276.909284] PVR_K:  228:     DevmemHistoryMaxEntries: 0x00002710
    [  276.915650] PVR_K:  228:   Module Params
    [  276.919899] PVR_K:  228:     none
    [  276.923213] PVR_K:  228:   Debug Info Params
    [  276.927993] PVR_K:  228:     none
    [  276.931314] PVR_K:  228:   Debug Info Params Device ID: 0
    [  276.937289] PVR_K:  228:     none
    [  276.940907] PVR_K:  228: ------[ Active Sync Checkpoints ]------
    [  276.951630] PVR_K:  228: (SyncCP Counts: InUse:5 Max:9)
    [  276.957224] sw: RM_SWTimeline-VayaDriveConsole- @0 cur=0
    [  276.963087] ------[ Native Fence Sync: timelines ]------
    [  276.971596] foreign_sync: @0 ctx=1 refs=1
    [  276.975976] rogue-ta3d: @669 ctx=41 refs=5
    [  276.980076]  @665: (++) refs=1 fwaddr=0xd0044001 enqueue=1 status=Signalled 665-update fence
    [  276.988498]  @666: (++) refs=1 fwaddr=0xd0044081 enqueue=1 status=Signalled 666-update fence
    [  276.996918]  @667: (++) refs=1 fwaddr=0xd0044099 enqueue=1 status=Signalled 667-update fence
    [  277.005335]  @668: (++) refs=4 fwaddr=0xd0044061 enqueue=1 status=Signalled 668-update fence
    [  277.016547] V3-iveConsole-VayaDriveConsol-1: @669 ctx=42 refs=1
    [  277.027581] P3-iveConsole-VayaDriveConsol-1: @669 ctx=43 refs=1
    [  277.035347] PVR_K:  228: ------------[ PVR DBG: END ]------------
    [  277.043511] ------------[ cut here ]------------
    [  277.048148] WARNING: CPU: 0 PID: 228 at /workspace/build/j721s2/Release/system/pvrsrvkm-module/target_aarch64/kbuild/services/server/common/pvr_notifier.c:641 PVRSRVDebugRequest+0x514/0x6b0 [pvrsrvkm]
    [  277.066051] Modules linked in: can_raw can xt_conntrack xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter ip_tables x_tables br_netfilter bridge stp llc xsens_mt usbserial cdc_acm xhci_plat_hcd rpmsg_ctrl rpmsg_char omap_rng cdns_csi2rx ti_am335x_adc v4l2_fwnode kfifo_buf cdns3 cdns_usb_common crct10dif_ce display_connector overlay phy_can_transceiver cfg80211 bluetooth ecdh_generic ecc rfkill ti_k3_r5_remoteproc cdns_mhdp8546 pvrsrvkm(O) cdns_dsi k3_j72xx_bandgap drm_display_helper ti_am335x_tscadc wave5 sa2ul cdns_dphy_rx ti_k3_dsp_remoteproc drm_kms_helper virtio_rpmsg_bus syscopyarea sysfillrect rpmsg_ns ti_k3_common sysimgblt j721e_csi2rx videobuf2_dma_contig fb_sys_fops d3_serdes(O) cdns_dphy videobuf2_memops cdns3_ti v4l2_async v4l2_mem2mem videobuf2_v4l2 videobuf2_common at24 videodev rtc_ds1307 pci_j721e_host m_can_platform pci_j721e mc m_can pcie_cadence_host pcie_cadence can_dev pwm_tiehrpwm optee_rng rti_wdt
    [  277.066217]  rng_core fuse drm drm_panel_orientation_quirks ipv6
    [  277.158978] CPU: 0 PID: 228 Comm: pvr_device_wdg Tainted: G           O       6.1.46-g5892b80d6b #1
    [  277.168002] Hardware name: Texas Instruments J721S2 EVM (DT)
    [  277.173644] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [  277.180588] pc : PVRSRVDebugRequest+0x514/0x6b0 [pvrsrvkm]
    [  277.186124] lr : PVRSRVDebugRequest+0x514/0x6b0 [pvrsrvkm]
    [  277.191655] sp : ffff80000a6fbca0
    [  277.194955] x29: ffff80000a6fbca0 x28: 0000000000000000 x27: ffff00083074b9a0
    [  277.202074] x26: ffff00082e275608 x25: ffff00082fe74a08 x24: ffff00082fe74b20
    [  277.209192] x23: ffff00082e275720 x22: ffff00082fe74b20 x21: 0000000000000002
    [  277.216310] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000
    [  277.223427] x17: 0000000000000000 x16: 0000000000000000 x15: 0000b183104cde6c
    [  277.230545] x14: 0000000000000196 x13: 0000000000000001 x12: 0000000000000000
    [  277.237663] x11: 0000000000000002 x10: 00000000000009b0 x9 : ffff80000a6fbb00
    [  277.244780] x8 : ffff00082d81df90 x7 : ffff000b7e196340 x6 : 0000000000000000
    [  277.251898] x5 : 00000000410fd080 x4 : 0000000000c0000e x3 : 0000000000100000
    [  277.259014] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00082d81d580
    [  277.266131] Call trace:
    [  277.268565]  PVRSRVDebugRequest+0x514/0x6b0 [pvrsrvkm]
    [  277.273758]  DevicesWatchdogThread_ForEachVaCb+0x108/0x170 [pvrsrvkm]
    [  277.280240]  List_PVRSRV_DEVICE_NODE_ForEach_va+0x70/0xac [pvrsrvkm]
    [  277.286635]  DevicesWatchdogThread+0x9c/0x204 [pvrsrvkm]
    [  277.291990]  OSThreadRun+0x24/0x60 [pvrsrvkm]
    [  277.296390]  kthread+0x10c/0x110
    [  277.299616]  ret_from_fork+0x10/0x20
    [  277.303180] ---[ end trace 0000000000000000 ]---

  • Hi,

    I'll validate the image before the display node, it shouldn't be a problem to put in place the file save

    Yes, this would be great. With this experiment if the problem appears again, we can eliminate the involvement of display node and focus only on the GPU.

    Please let me know the results for the same.

    Regards,

    Nikhil

  • Hi Nikhil,

    I was able to reproduce and it seem that the image that I have between nodes are corrupted when the issue is happening.

    I added the following to our current code to produce the images

    	if (success)
    	{
    		success = RunOpenGLESRenderGraph();
    	}
    
    	static int framenum = 0;
    	char filename[128] = {'\0'};
    
    	snprintf(filename, sizeof(filename),"/host/home/root/inFrame_%d.bmp", framenum);
    	tivx_utils_save_vximage_to_bmpfile(filename, m_openGLESRGBX->image);
    	framenum++;
    
    	if (success)
    	{
    		success = RunDisplayGraph();
    	}

    Here we run our Opengl graph then the diplaynode graph. Those are using pipelining mechanism and we use vxWaitGraph in each Run[...]Graph() functions.

    The interesting thing is that the header of the .bmp file saved is corrupted. 
    209 is corrupted and 208 is not in the screenshot below.

  • David,

    So in saving the image, even the header of the .bmp was corrupted? Seems outside of the GPU and more as something happening on the system if my understanding is correct.

    tivx_utils_save_vximage_to_bmpfile() I would assume does something simple for the header part of the BMP. 'B' and 'M' being corrupted at the same time that the frames are not correct is highly suspicious. We'll need to discuss what to explore next.

    Regards,

    Erick

  • David,

    And here is the link for PVRTune download, it sounds like you have it: https://developer.imaginationtech.com/solutions/pvrtune/

    The only test to do with this for now is to see if there is a performance gap when the issue happens (stall in the pipeline or something similar). We can explore other options after that test.

  • Thanks for the link.

    Here is an other type of artifact that we see

    Also it would be appreciated if you can point out the jtag debugger that we could use with the EVM + TDA4VE.

  • Regarding the corruption on the .bmp
    I open up the image and we can clrearly see it. There's only a bit of corruption at the beginning in the first top row. 

     
    The image is in mirror due to the tool I'm using.

  • Hi Erick,

    I'm trying to use the PVRTune, I can connect to the server but I have no GPU data.
    I used the prebuilt SGX version on the TDA4VE here https://developer.imaginationtech.com/downloads/

    root@j721s2-evm:~/david.khouya/PVRPerfServer/Linux_armv8_64# ./PVRPerfServerDeveloper 
    PVRPerfServerDeveloper v14.143 64-bits - Build 17.1@4676419.
    Copyright (C) Imagination Technologies Ltd. All rights reserved.
    * Support:            DevTech@imgtec.com
    * OS:                 Linux version 6.1.46-g5892b80d6b (aarch64-none-linux-gnu-gcc (Arm GNU Toolchain 11.3.Rel1) 11.3.1 20220712, GNU ld (Arm GNU Toolchain 11.3.Rel1) 2.38.20220708) #1 SMP PREEMPT Wed Apr  3 19:34:28 UTC 2024
    * Time (local):       Tue, 23 Apr 2024 15:36:34 +0000 (15:36:34)
    * Time (UTC):         Tue, 23 Apr 2024 15:36:34 +0000 (15:36:34)
    Error: failed to initialise services connection (driver support query info failed).
    * Not connected to PowerVR driver.
    * Processor count:    2
    This server is j721s2-evm:6520 (lo:127.0.0.1,eth0:x.x.x.x,docker0:172.17.0.1)...
    Waiting for connection (press q to quit)...
    200.0ps [GPU data unavailable!] : 00/00,T

    Do you have a specific version that works with SDK 9.1?

    Thanks,

  • David,

    Please use the other ones, the SGX version will not work since our drivers are for the newer architectures of GPU. Actually SGX was on previous devices, but for TDA4 devices we are on the newer architecture, so you can use any of the other download links:

  • David,

    This is the debugger we spoke of: https://www.lauterbach.com/products/debugger/powerdebug-system

    There are variations of this debugger and cable, as seen on their site. Our boards support the 60-pin MIPI connector, and there are adapters to connect to this debugger. If you are interested in purchasing, feel free to ask the question and we can check if the one you are looking at is compatible.

    Regards,

    Erick

  • Thanks for the link. 

    For the update on the output image of the opengl node, I did some validation on other corrupted .bmp files and in every images there's no sign of corruption that is similar to what we see on the display. 

    At this point, I think it point out the display node as where the issue could occur.

    I will still investigate other things tomorrow as we talked. So add vertical line to the images before the display node. If you have other ideas regarding some display node tests that we could do, please let me know.

    Thanks,

  • Hi,

    I did some validation on other corrupted .bmp files and in every images there's no sign of corruption that is similar to what we see on the display. 

    Could you elaborate here? How is the saved RGB look like incase of corrupted .bmp file?

    At this point, I think it point out the display node as where the issue could occur.

    Also, regarding the display node, could you share me the configuration of this node? i.e. the display_params 

    Regards,

    Nikhil

  • Could you elaborate here? How is the saved RGB look like incase of corrupted .bmp file?

    The .bmp header is corrupted but the image seem to be intact. By analysing the content of the image, we are able to see the image without the major corruption that we see at the display level. 

    Also, regarding the display node, could you share me the configuration of this node? i.e. the display_params 

    Here is what we do

    		tivx_display_params_t params;
    
    		memset(&params, 0, sizeof(params));
    
    		params.outHeight = m_height;
    		params.outWidth = m_width;
    		params.opMode = TIVX_KERNEL_DISPLAY_ZERO_BUFFER_COPY_MODE;

  • Hi,

    I see that you are using pipe 0 (not exclusively set, so pipe 0 by default) of the display node here. 

    Are you using any other pipes as well or is it just 1 display node? 

    Just wanted to understand if there is any overlays being used by the display node or all the different layers are being positioned into the frame by the OpenGL node itself?

    Regards,

    Nikhil

  • Hi Nikhil and Erik,

    There's only one application using the display in our system. 

    So all the iteam are built from the opengl node which output a complete vximage and we feed it to the display node as the complete image. We don't add overlay through an other path.

    We added lines on the GPU node to the output image a we can see a slight shift. 

    Right now we are working on adding those line using the CPU instead of the GPU so we can isolate the work done by the GPU.

    Also Erik, we talked about have commands to load the GPU and the DMA. Is it something you can share we us. I would help to see if we can reproduce without having live sensor setup. 

  • David,

    Thanks! This is quite interesting. A vertical line would be nice too, but I think this is sufficient for the moment.

    I've got a command that can load the GPU quite easily, you can throw it in a for loop so it keep playing:

    rgx_compute_test -w 64 -k 16 -s 1024 -f 4098 &> /dev/null

    If you want more or less loading, we can tune the "-k" argument to another power of 2.

    For loading up the memory with some module that can create large transactions, I'm checking with the team, or Nikhil may be able to suggest here.

    Regards,

    Erick

  • Hi David,

    Right now we are working on adding those line using the CPU instead of the GPU so we can isolate the work done by the GPU.

    By above you mean to write a horizontal and vertical line using the CPU/A72 on the m_openGLESRGBX->image (I believe this is the image being fed to the display node), before giving it to the display node (but after the GPU has filled this buffer)

    Is my understanding correct here?

    Regards,

    Nikhil

  • 3. We use [tda4-eDP] -> [active displayport to hdmi adapter] -> [Display] 

    Additionally, could you check the eDP output directly instead of using the convertor?

    Is it feasible to test this on display supporting eDP?

    Regards,

    Nikhil

  • By above you mean to write a horizontal and vertical line using the CPU/A72 on the m_openGLESRGBX->image (I believe this is the image being fed to the display node), before giving it to the display node (but after the GPU has filled this buffer)

    Exact. We are working on making the line with the CPU as the moment. 

    Additionally, could you check the eDP output directly instead of using the convertor?

    Is it feasible to test this on display supporting eDP?

    Yes, we will be able to do that tomorrow. 

    And thanks Erick, we will see if we can reproduce in playback while putting load on the GPU. 

    Nikhil, if you could share the display statistics example that we could use to see if there's underflow which having the issue.

    Thanks

  • Hi Nikhil,

    We were able to reproduce the issue by drawing lines with CPU, using ‘Draw2D_drawLine()’

    Also, I tried to load to GPU it doesn't seem to have an impact on the repeatability of the display issue.
  • Hi,

    Please find the patch below for display statistics.

    The patch has to be applied in the video_io folder in the SDK.

    /cfs-file/__key/communityserver-discussions-components-files/791/disp_5F00_stats.patch

    You could do the below in your application to print out the display stats.

    +                    if (status == VX_SUCCESS)
    +                    {
    +                        status = tivxNodeSendCommand(obj->disp_node, 0u,
    +                                    TIVX_DISPLAY_GET_CURRENT_STATUS,
    +                                    NULL, 0u);
    +                    }

    After applying this patch, build video_io from sdk_builder folder as shown below and then build vision_apps (or your application)

    cd ${PSDKRA}/sdk_builder/
    make video_io -j
    make vision_apps -j

    We were able to reproduce the issue by drawing lines with CPU, using ‘Draw2D_drawLine()’

    In the attached image, where is the stream coming from the GPU? I believe you are writing the lines on top of the output image from the GPU right?

    Regards,

    Nikhil

  • Thanks, I'll try the patch

    In the attached image, where is the stream coming from the GPU? I believe you are writing the lines on top of the output image from the GPU right?

    We created a new vximage with the line and this is the one that we display. 
    The openglnode is still running but we don't forward the vximage to the display. So we are seeing a vximage that is modified only by the CPU. 

  • David,

    Just summarizing the results so far from our meeting today, and the follow on tests that are planned:

    1) Running the Draw Lines CPU-only version to Display at different points of the pipeline, to see if there is a specific part of the pipeline that is affecting the display node.

    2) Testing Nikhil's suggestion:

    Additionally, could you check the eDP output directly instead of using the convertor?

    Is it feasible to test this on display supporting eDP?

    3) Testing playback mode with other loading, GPU loading did not seem to do anything. Other types of loading underway.

    4) Enabling display statistics to see if the DSS is reporting over/underflows or anything else.

    Regards,

    Erick

  • Hi,

    So here is the status

    1) Running the Draw Lines CPU-only version to Display at different points of the pipeline, to see if there is a specific part of the pipeline that is affecting the display node.

    We ran the most minimal code which use the CPU only that we could on the display side and we still see the offset.

    OnFrame callback runs the following
    
    DrawCPU();
    vxProcessGraph(m_displayGraph);
    

    2) Testing Nikhil's suggestion:

    Additionally, could you check the eDP output directly instead of using the convertor?

    Is it feasible to test this on display supporting eDP?

    We can reproduce the issue with direct display port connection

    3) Testing playback mode with other loading, GPU loading did not seem to do anything. Other types of loading underway.

    Exact, we didn't tried anything else on our side yet.

    4) Enabling display statistics to see if the DSS is reporting over/underflows or anything else.

    We can see some underflow when the issue happen

    Since there's underflow, is it something the we might not do correctly with the openvx framework?

    And last here is 2 videos of the issue.
    Click here to play this video

    Click here to play this video

  • Hi David,

    That's most likely coming due to underflow error. What else are you running along with the display? 

    Can you try with the changes i had shared on below FAQ? 

    (99+) [FAQ] PROCESSOR-SDK-J721S2: How to enable QoS for DSS in SBL or in SPL boot flow? - Processors forum - Processors - TI E2E support forums

    Regards,

    Brijesh

  • Brijesh,

    Do these settings apply to SDK 9.1?

    Regards,

    Erick

  • That's most likely coming due to underflow error. What else are you running along with the display? 

    We run our application which uses the VPAC, C7X + MMA (TIDL), C7X operations and Linux interface drivers like can / v4l / probably few other like i2c.

  • Hi David,

    Can you please check this FAQ and depending on the bootflow you are using, can you please try applying these changes? 

    Regards,

    Brijesh

  • Brijesh,

    It seems the QoS settings are already set in SDK 9.1 for the DSS, are any of the other QoS settings important?:

    arch/arm/mach-k3/j721s2/j721s2_qos_data.c

    12 struct k3_qos_data j721s2_qos_data[] = {
     13     /* modules_qosConfig0 - 2 endpoints, 10 channels */
     14     {
     15         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 0,
     16         .val = ATYPE_3 | ORDERID_15,
     17     },
     18     {
     19         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 1,
     20         .val = ATYPE_3 | ORDERID_15,
     21     },
     22     {
     23         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 2,
     24         .val = ATYPE_3 | ORDERID_15,
     25     },
     26     {
     27         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 3,
     28         .val = ATYPE_3 | ORDERID_15,
     29     },
     30     {
     31         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 4,
     32         .val = ATYPE_3 | ORDERID_15,
     33     },
     34     {
     35         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 5,
     36         .val = ATYPE_3 | ORDERID_15,
     37     },
     38     {
     39         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 6,
     40         .val = ATYPE_3 | ORDERID_15,
     41     },
     42     {
     43         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 7,
     44         .val = ATYPE_3 | ORDERID_15,
     45     },
     46     {
     47         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 8,
     48         .val = ATYPE_3 | ORDERID_15,
     49     },
     50     {
     51         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA + 0x100 + 0x4 * 9,
     52         .val = ATYPE_3 | ORDERID_15,
     53     },
     54     {
     55         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC + 0x100 + 0x4 * 0,
     56         .val = ATYPE_3 | ORDERID_15,
     57     },
     58     {
     59         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC + 0x100 + 0x4 * 1,
     60         .val = ATYPE_3 | ORDERID_15,
     61     },
     62     {
     63         .reg = K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC + 0x100 + 0x4 * 2,
     64         .val = ATYPE_3 | ORDERID_15,
     65     },
    ...

    The team will test with the other QoS settings as  well since they are missing from SDK 9.1

    Thanks,

    Erick

  • Hi Brijesh,

    Can you please check this FAQ and depending on the bootflow you are using, can you please try applying these changes? 

    We are in the process of doing it.

    In complement to Erick comment, for the underflow is there any other things that we could check or an other debugging information that could be activated in order to debug the issue?

    Thanks,

  • Hi David,

    Right now the suspicion of the issue is QOS tuning, where DSS might not be receiving enough priority considering other pipeline running.

    In parallel, there was another test suggested to run only DSS graph (i.e. without other interfaces  like VPAC, C7X + MMA (TIDL), C7X operations and Linux interface drivers like can / v4l / probably few other like i2c)

    Could you let me know the output of this experiment? Were you still seeing issue with this DSS alone graph?

    If yes, could you share the source code for the same so that I could reproduce it at my end?

    Regards,

    Nikhil

  • Hi,

    So we are still working on it for the priority patch. We currently apply the SPL patch in uboot but once we are in linux, We don't see the priority bit set.

    Are you aware of a module that could override that once linux is booted? 

    We also did some test by lowering our video input to 5FPS to put less pressure on the data transfer and it seem to work. We didn't had the display issue any more. 

  • David,

    Are you aware of a module that could override that once linux is booted? 

    There should not be a module that overrides once Linux boots because the drivers get initialized and it is bad practice to set the bits once the modules are up and running. Perhaps the function to set the QoS bits is not setting them at u-boot? You could stop u-boot before it starts Linux, and see what the register values by connecting through CCS/debugger?

    Regards,

    Erick

  • David, team,

    It looks like the QOS_DSS0_DMA_PRIORITY bit is not settable, so the current patch you have should be applying correctly, even if the DMA_PRIORITY_BIT is not setting. Let's see if your test with the patch settings all getting applied changes the behavior at all of the underflows.

    Regards,

    Erick

    1. I applied the patch settings using this file here. This patch has the same changes as the one provided by Brijesh (/cfs-file/__key/communityserver-discussions-components-files/791/5305.0001_2D00_Added_2D00_QoS_2D00_parameters_2D00_settings.patch), but applicable directly to SDK 9.1:
      1. diff --git a/arch/arm/mach-k3/include/mach/hardware.h b/arch/arm/mach-k3/include/mach/hardware.h
        index 51389f36ea..6134caf304 100644
        --- a/arch/arm/mach-k3/include/mach/hardware.h
        +++ b/arch/arm/mach-k3/include/mach/hardware.h
        @@ -19,6 +19,7 @@
         #ifdef CONFIG_SOC_K3_J721S2
         #include "j721s2_hardware.h"
         #include "j721s2_qos.h"
        +#include "j721s2_qos_params.h"
         #endif
         
         #ifdef CONFIG_SOC_K3_AM642
        diff --git a/arch/arm/mach-k3/include/mach/j721s2_qos_params.h b/arch/arm/mach-k3/include/mach/j721s2_qos_params.h
        new file mode 100644
        index 0000000000..a79a9a9878
        --- /dev/null
        +++ b/arch/arm/mach-k3/include/mach/j721s2_qos_params.h
        @@ -0,0 +1,221 @@
        +/* SPDX-License-Identifier: GPL-2.0+ */
        +/*
        + * K3: J721E QoS params definitions
        + *
        + * (C) Copyright (C) 2021-2022 Texas Instruments Incorporated - http://www.ti.com/
        + */
        +#ifndef __ASM_ARCH_J721E_QOS_PARAMS_H
        +#define __ASM_ARCH_J721E_QOS_PARAMS_H
        +
        +#define QOS_C66SS0_MDMA_ATYPE                (0U)
        +#define QOS_C66SS1_MDMA_ATYPE                (0U)
        +#define QOS_VPAC0_DATA0_ATYPE                (0U)
        +#define QOS_VPAC0_DATA1_ATYPE                (0U)
        +#define QOS_VPAC0_LDC0_ATYPE                 (0U)
        +#define QOS_DMPAC0_DATA_ATYPE                (0U)
        +#define QOS_DSS0_DMA_ATYPE                   (3U)
        +#define QOS_DSS0_FBDC_ATYPE                  (0U)
        +#define QOS_GPU0_M0_RD_ATYPE                 (0U)
        +#define QOS_GPU0_M0_WR_ATYPE                 (0U)
        +#define QOS_GPU0_M1_RD_ATYPE                 (0U)
        +#define QOS_GPU0_M1_WR_ATYPE                 (0U)
        +#define QOS_ENCODER0_RD_ATYPE                (0U)
        +#define QOS_ENCODER0_WR_ATYPE                (0U)
        +#define QOS_DECODER0_RD_ATYPE                (0U)
        +#define QOS_DECODER0_WR_ATYPE                (0U)
        +#define QOS_R5FSS0_CORE0_MEM_RD_ATYPE        (0U)
        +#define QOS_R5FSS0_CORE0_MEM_WR_ATYPE        (0U)
        +#define QOS_R5FSS0_CORE1_MEM_RD_ATYPE        (0U)
        +#define QOS_R5FSS0_CORE1_MEM_WR_ATYPE        (0U)
        +
        +#define QOS_VPAC0_LDC0_ORDER_ID              (1U)
        +#define QOS_C66SS0_MDMA_ORDER_ID             (5U)
        +#define QOS_C66SS1_MDMA_ORDER_ID             (5U)
        +#define QOS_ENCODER0_RD_ORDER_ID             (6U)
        +#define QOS_ENCODER0_WR_ORDER_ID             (6U)
        +#define QOS_DECODER0_RD_ORDER_ID             (6U)
        +#define QOS_DECODER0_WR_ORDER_ID             (6U)
        +#define QOS_GPU0_M0_RD_ORDER_ID              (7U)
        +#define QOS_GPU0_M0_WR_ORDER_ID              (7U)
        +#define QOS_GPU0_M1_RD_ORDER_ID              (7U)
        +#define QOS_GPU0_M1_WR_ORDER_ID              (7U)
        +#define QOS_DSS0_DMA_ORDER_ID                (15U)
        +#define QOS_DSS0_FBDC_ORDER_ID               (10U)
        +#define QOS_R5FSS0_CORE0_MEM_RD_ORDER_ID     (11U)
        +#define QOS_R5FSS0_CORE0_MEM_WR_ORDER_ID     (11U)
        +#define QOS_R5FSS0_CORE1_MEM_RD_ORDER_ID     (11U)
        +#define QOS_R5FSS0_CORE1_MEM_WR_ORDER_ID     (11U)
        +
        +#define QOS_DSS0_DMA_PRIORITY                (1U)
        +#define QOS_DSS0_FBDC_PRIORITY               (1U)
        +#define QOS_VPAC0_LDC0_PRIORITY              (3U)
        +#define QOS_C66SS0_MDMA_PRIORITY             (5U)
        +#define QOS_C66SS1_MDMA_PRIORITY             (5U)
        +#define QOS_ENCODER0_RD_PRIORITY             (6U)
        +#define QOS_ENCODER0_WR_PRIORITY             (6U)
        +#define QOS_DECODER0_RD_PRIORITY             (6U)
        +#define QOS_DECODER0_WR_PRIORITY             (6U)
        +#define QOS_GPU0_M0_RD_MMU_PRIORITY          (3U)
        +#define QOS_GPU0_M0_RD_PRIORITY              (7U)
        +#define QOS_GPU0_M0_WR_PRIORITY              (7U)
        +#define QOS_GPU0_M1_RD_PRIORITY              (7U)
        +#define QOS_GPU0_M1_RD_MMU_PRIORITY          (3U)
        +#define QOS_GPU0_M1_WR_PRIORITY              (7U)
        +#define QOS_R5FSS0_CORE0_MEM_RD_PRIORITY     (2U)
        +#define QOS_R5FSS0_CORE0_MEM_WR_PRIORITY     (2U)
        +#define QOS_R5FSS0_CORE1_MEM_RD_PRIORITY     (2U)
        +#define QOS_R5FSS0_CORE1_MEM_WR_PRIORITY     (2U)
        +
        +#define QOS_ATYPE_MASK         0x30000000
        +#define QOS_VIRTID_MASK            0x0fff0000
        +#define QOS_PVU_CTX(virtid)        ((0x1 << 28) | (virtid << 16))
        +#define QOS_SMMU_CTX(virtid)       ((0x2 << 28) | (virtid << 16))
        +
        +/* CBASS */
        +
        +#define QOS_C66SS0_MDMA                             0x45d81000
        +#define QOS_C66SS0_MDMA_NUM_J_CH                    3
        +#define QOS_C66SS0_MDMA_NUM_I_CH                    1
        +#define QOS_C66SS0_MDMA_CBASS_GRP_MAP1(j)           (QOS_C66SS0_MDMA + 0x0 + (j) * 8)
        +#define QOS_C66SS0_MDMA_CBASS_GRP_MAP2(j)           (QOS_C66SS0_MDMA + 0x4 + (j) * 8)
        +#define QOS_C66SS0_MDMA_CBASS_MAP(i)                (QOS_C66SS0_MDMA + 0x100 + (i) * 4)
        +
        +#define QOS_C66SS1_MDMA                             0x45d81400
        +#define QOS_C66SS1_MDMA_NUM_J_CH                    3
        +#define QOS_C66SS1_MDMA_NUM_I_CH                    1
        +#define QOS_C66SS1_MDMA_CBASS_GRP_MAP1(j)           (QOS_C66SS1_MDMA + 0x0 + (j) * 8)
        +#define QOS_C66SS1_MDMA_CBASS_GRP_MAP2(j)           (QOS_C66SS1_MDMA + 0x4 + (j) * 8)
        +#define QOS_C66SS1_MDMA_CBASS_MAP(i)                (QOS_C66SS1_MDMA + 0x100 + (i) * 4)
        +
        +#define QOS_R5FSS0_CORE0_MEM_RD                     0x45d84000
        +#define QOS_R5FSS0_CORE0_MEM_RD_NUM_J_CH            3
        +#define QOS_R5FSS0_CORE0_MEM_RD_NUM_I_CH            1
        +#define QOS_R5FSS0_CORE0_MEM_RD_CBASS_GRP_MAP1(j)   (QOS_R5FSS0_CORE0_MEM_RD + 0x0 + (j) * 8)
        +#define QOS_R5FSS0_CORE0_MEM_RD_CBASS_GRP_MAP2(j)   (QOS_R5FSS0_CORE0_MEM_RD + 0x4 + (j) * 8)
        +#define QOS_R5FSS0_CORE0_MEM_RD_CBASS_MAP(i)        (QOS_R5FSS0_CORE0_MEM_RD + 0x100 + (i) * 4)
        +
        +#define QOS_R5FSS0_CORE1_MEM_RD                     0x45d84400
        +#define QOS_R5FSS0_CORE1_MEM_RD_NUM_J_CH            3
        +#define QOS_R5FSS0_CORE1_MEM_RD_NUM_I_CH            1
        +#define QOS_R5FSS0_CORE1_MEM_RD_CBASS_GRP_MAP1(j)   (QOS_R5FSS0_CORE1_MEM_RD + 0x0 + (j) * 8)
        +#define QOS_R5FSS0_CORE1_MEM_RD_CBASS_GRP_MAP2(j)   (QOS_R5FSS0_CORE1_MEM_RD + 0x4 + (j) * 8)
        +#define QOS_R5FSS0_CORE1_MEM_RD_CBASS_MAP(i)        (QOS_R5FSS0_CORE1_MEM_RD + 0x100 + (i) * 4)
        +
        +#define QOS_R5FSS0_CORE0_MEM_WR                     0x45d84800
        +#define QOS_R5FSS0_CORE0_MEM_WR_NUM_J_CH            3
        +#define QOS_R5FSS0_CORE0_MEM_WR_NUM_I_CH            1
        +#define QOS_R5FSS0_CORE0_MEM_WR_CBASS_GRP_MAP1(j)   (QOS_R5FSS0_CORE0_MEM_WR + 0x0 + (j) * 8)
        +#define QOS_R5FSS0_CORE0_MEM_WR_CBASS_GRP_MAP2(j)   (QOS_R5FSS0_CORE0_MEM_WR + 0x4 + (j) * 8)
        +#define QOS_R5FSS0_CORE0_MEM_WR_CBASS_MAP(i)        (QOS_R5FSS0_CORE0_MEM_WR + 0x100 + (i) * 4)
        +
        +#define QOS_R5FSS0_CORE1_MEM_WR                     0x45d84C00
        +#define QOS_R5FSS0_CORE1_MEM_WR_NUM_J_CH            3
        +#define QOS_R5FSS0_CORE1_MEM_WR_NUM_I_CH            1
        +#define QOS_R5FSS0_CORE1_MEM_WR_CBASS_GRP_MAP1(j)   (QOS_R5FSS0_CORE1_MEM_WR + 0x0 + (j) * 8)
        +#define QOS_R5FSS0_CORE1_MEM_WR_CBASS_GRP_MAP2(j)   (QOS_R5FSS0_CORE1_MEM_WR + 0x4 + (j) * 8)
        +#define QOS_R5FSS0_CORE1_MEM_WR_CBASS_MAP(i)        (QOS_R5FSS0_CORE1_MEM_WR + 0x100 + (i) * 4)
        +
        +#define QOS_ENCODER0_WR                             0x45dc1000
        +#define QOS_ENCODER0_WR_NUM_J_CH                    2
        +#define QOS_ENCODER0_WR_NUM_I_CH                    5
        +#define QOS_ENCODER0_WR_CBASS_GRP_MAP1(j)           (QOS_ENCODER0_WR + 0x0 + (j) * 8)
        +#define QOS_ENCODER0_WR_CBASS_GRP_MAP2(j)           (QOS_ENCODER0_WR + 0x4 + (j) * 8)
        +#define QOS_ENCODER0_WR_CBASS_MAP(i)                (QOS_ENCODER0_WR + 0x100 + (i) * 4)
        +
        +#define QOS_DECODER0_RD                             0x45dc0400
        +#define QOS_DECODER0_RD_NUM_J_CH                    2
        +#define QOS_DECODER0_RD_NUM_I_CH                    1
        +#define QOS_DECODER0_RD_CBASS_GRP_MAP1(j)           (QOS_DECODER0_RD + 0x0 + (j) * 8)
        +#define QOS_DECODER0_RD_CBASS_GRP_MAP2(j)           (QOS_DECODER0_RD + 0x4 + (j) * 8)
        +#define QOS_DECODER0_RD_CBASS_MAP(i)                (QOS_DECODER0_RD + 0x100 + (i) * 4)
        +
        +#define QOS_DECODER0_WR                             0x45dc0800
        +#define QOS_DECODER0_WR_NUM_J_CH                    2
        +#define QOS_DECODER0_WR_NUM_I_CH                    1
        +#define QOS_DECODER0_WR_CBASS_GRP_MAP1(j)           (QOS_DECODER0_WR + 0x0 + (j) * 8)
        +#define QOS_DECODER0_WR_CBASS_GRP_MAP2(j)           (QOS_DECODER0_WR + 0x4 + (j) * 8)
        +#define QOS_DECODER0_WR_CBASS_MAP(i)                (QOS_DECODER0_WR + 0x100 + (i) * 4)
        +
        +#define QOS_VPAC0_DATA0                             0x45dc1500
        +#define QOS_VPAC0_DATA0_NUM_I_CH                    32
        +#define QOS_VPAC0_DATA0_CBASS_MAP(i)                (QOS_VPAC0_DATA0 + (i) * 4)
        +
        +#define QOS_DMPAC0_DATA                             0x45dc0100
        +#define QOS_DMPAC0_DATA_NUM_I_CH                    32
        +#define QOS_DMPAC0_DATA_CBASS_MAP(i)                (QOS_DMPAC0_DATA + (i) * 4)
        +
        +#define QOS_ENCODER0_RD                             0x45dc0c00
        +#define QOS_ENCODER0_RD_NUM_J_CH                    2
        +#define QOS_ENCODER0_RD_NUM_I_CH                    5
        +#define QOS_ENCODER0_RD_CBASS_GRP_MAP1(j)           (QOS_ENCODER0_RD + 0x0 + (j) * 8)
        +#define QOS_ENCODER0_RD_CBASS_GRP_MAP2(j)           (QOS_ENCODER0_RD + 0x4 + (j) * 8)
        +#define QOS_ENCODER0_RD_CBASS_MAP(i)                (QOS_ENCODER0_RD + 0x100 + (i) * 4)
        +
        +#define QOS_VPAC0_DATA1                             0x45dc1900
        +#define QOS_VPAC0_DATA1_NUM_I_CH                    64
        +#define QOS_VPAC0_DATA1_CBASS_MAP(i)                (QOS_VPAC0_DATA1 + (i) * 4)
        +
        +#define QOS_VPAC0_LDC0                              0x45dc1c00
        +#define QOS_VPAC0_LDC0_NUM_J_CH                     2
        +#define QOS_VPAC0_LDC0_NUM_I_CH                     3
        +#define QOS_VPAC0_LDC0_CBASS_GRP_MAP1(j)            (QOS_VPAC0_LDC0 + 0x0 + (j) * 8)
        +#define QOS_VPAC0_LDC0_CBASS_GRP_MAP2(j)            (QOS_VPAC0_LDC0 + 0x4 + (j) * 8)
        +#define QOS_VPAC0_LDC0_CBASS_MAP(i)                 (QOS_VPAC0_LDC0 + 0x100 + (i) * 4)
        +
        +#define QOS_DSS0_DMA                                0x45dc2000
        +#define QOS_DSS0_DMA_NUM_J_CH                       2
        +#define QOS_DSS0_DMA_NUM_I_CH                       10
        +#define QOS_DSS0_DMA_CBASS_GRP_MAP1(j)              (QOS_DSS0_DMA + 0x0 + (j) * 8)
        +#define QOS_DSS0_DMA_CBASS_GRP_MAP2(j)              (QOS_DSS0_DMA + 0x4 + (j) * 8)
        +#define QOS_DSS0_DMA_CBASS_MAP(i)                   (QOS_DSS0_DMA + 0x100 + (i) * 4)
        +
        +#define QOS_DSS0_FBDC                               0x45dc2400
        +#define QOS_DSS0_FBDC_NUM_J_CH                      2
        +#define QOS_DSS0_FBDC_NUM_I_CH                      10
        +#define QOS_DSS0_FBDC_CBASS_GRP_MAP1(j)             (QOS_DSS0_FBDC + 0x0 + (j) * 8)
        +#define QOS_DSS0_FBDC_CBASS_GRP_MAP2(j)             (QOS_DSS0_FBDC + 0x4 + (j) * 8)
        +#define QOS_DSS0_FBDC_CBASS_MAP(i)                  (QOS_DSS0_FBDC + 0x100 + (i) * 4)
        +
        +#define QOS_GPU0_M0_RD                              0x45dc5000
        +#define QOS_GPU0_M0_RD_NUM_J_CH                     2
        +#define QOS_GPU0_M0_RD_NUM_I_CH                     48
        +#define QOS_GPU0_M0_RD_CBASS_GRP_MAP1(j)            (QOS_GPU0_M0_RD + 0x0 + (j) * 8)
        +#define QOS_GPU0_M0_RD_CBASS_GRP_MAP2(j)            (QOS_GPU0_M0_RD + 0x4 + (j) * 8)
        +#define QOS_GPU0_M0_RD_CBASS_MAP(i)                 (QOS_GPU0_M0_RD + 0x100 + (i) * 4)
        +
        +#define QOS_GPU0_M0_WR                              0x45dc5800
        +#define QOS_GPU0_M0_WR_NUM_J_CH                     2
        +#define QOS_GPU0_M0_WR_NUM_I_CH                     48
        +#define QOS_GPU0_M0_WR_CBASS_GRP_MAP1(j)            (QOS_GPU0_M0_WR + 0x0 + (j) * 8)
        +#define QOS_GPU0_M0_WR_CBASS_GRP_MAP2(j)            (QOS_GPU0_M0_WR + 0x4 + (j) * 8)
        +#define QOS_GPU0_M0_WR_CBASS_MAP(i)                 (QOS_GPU0_M0_WR + 0x100 + (i) * 4)
        +
        +#define QOS_GPU0_M1_RD                              0x45dc6000
        +#define QOS_GPU0_M1_RD_NUM_J_CH                     2
        +#define QOS_GPU0_M1_RD_NUM_I_CH                     48
        +#define QOS_GPU0_M1_RD_CBASS_GRP_MAP1(j)            (QOS_GPU0_M1_RD + 0x0 + (j) * 8)
        +#define QOS_GPU0_M1_RD_CBASS_GRP_MAP2(j)            (QOS_GPU0_M1_RD + 0x4 + (j) * 8)
        +#define QOS_GPU0_M1_RD_CBASS_MAP(i)                 (QOS_GPU0_M1_RD + 0x100 + (i) * 4)
        +
        +#define QOS_GPU0_M1_WR                              0x45dc6800
        +#define QOS_GPU0_M1_WR_NUM_J_CH                     2
        +#define QOS_GPU0_M1_WR_NUM_I_CH                     48
        +#define QOS_GPU0_M1_WR_CBASS_GRP_MAP1(j)            (QOS_GPU0_M1_WR + 0x0 + (j) * 8)
        +#define QOS_GPU0_M1_WR_CBASS_GRP_MAP2(j)            (QOS_GPU0_M1_WR + 0x4 + (j) * 8)
        +#define QOS_GPU0_M1_WR_CBASS_MAP(i)                 (QOS_GPU0_M1_WR + 0x100 + (i) * 4)
        +
        +#define QOS_MMC0_RD_CBASS_MAP(i)                    (0x45d9a100 + (i) * 4)
        +#define QOS_MMC0_WR_CBASS_MAP(i)                    (0x45d9a500 + (i) * 4)
        +#define QOS_MMC1_RD_CBASS_MAP(i)                    (0x45d82100 + (i) * 4)
        +#define QOS_MMC1_WR_CBASS_MAP(i)                    (0x45d82500 + (i) * 4)
        +
        +#define QOS_D5520_RD_CBASS_MAP(i)                   (0x45dc0500 + (i) * 4)
        +#define QOS_D5520_WR_CBASS_MAP(i)                   (0x45dc0900 + (i) * 4)
        +
        +/* NAVSS North Bridge (NB) */
        +#define NAVSS0_NBSS_NB0_CFG_MMRS                    0x3702000
        +#define NAVSS0_NBSS_NB1_CFG_MMRS                    0x3703000
        +#define NAVSS0_NBSS_NB0_CFG_NB_THREADMAP            (NAVSS0_NBSS_NB0_CFG_MMRS + 0x10)
        +#define NAVSS0_NBSS_NB1_CFG_NB_THREADMAP            (NAVSS0_NBSS_NB1_CFG_MMRS + 0x10)
        +
        +#endif
        diff --git a/arch/arm/mach-k3/j721s2_init.c b/arch/arm/mach-k3/j721s2_init.c
        index a6f789f035..3c7ecef786 100644
        --- a/arch/arm/mach-k3/j721s2_init.c
        +++ b/arch/arm/mach-k3/j721s2_init.c
        @@ -146,6 +146,280 @@ static void setup_qos(void)
         		writel(j721s2_qos_data[i].val, (uintptr_t)j721s2_qos_data[i].reg);
         }
         
        +void setup_navss_nb(void)
        +{
        +        /* Map orderid 8-15 to VBUSM.C thread 2 (real-time traffic) */
        +        writel(2, NAVSS0_NBSS_NB0_CFG_NB_THREADMAP);
        +        writel(4, NAVSS0_NBSS_NB1_CFG_NB_THREADMAP);
        +}
        +
        +void setup_vpac_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* vpac data master 0  */
        +       for (channel = 0; channel < QOS_VPAC0_DATA0_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_VPAC0_DATA0_ATYPE << 28), (uintptr_t)QOS_VPAC0_DATA0_CBASS_MAP(channel));
        +       }
        +
        +       /* vpac data master 1  */
        +       for (channel = 0; channel < QOS_VPAC0_DATA1_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_VPAC0_DATA1_ATYPE << 28), (uintptr_t)QOS_VPAC0_DATA1_CBASS_MAP(channel));
        +       }
        +
        +       /* vpac ldc0  */
        +       for (group = 0; group < QOS_VPAC0_LDC0_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_VPAC0_LDC0_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_VPAC0_LDC0_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_VPAC0_LDC0_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_VPAC0_LDC0_ATYPE << 28) | (QOS_VPAC0_LDC0_PRIORITY << 12) | (QOS_VPAC0_LDC0_ORDER_ID << 4), (uintptr_t)QOS_VPAC0_LDC0_CBASS_MAP(channel));
        +       }
        +
        +}
        +
        +void setup_dmpac_qos(void)
        +{
        +       unsigned int channel;
        +
        +       /* dmpac data  */
        +       for (channel = 0; channel < QOS_DMPAC0_DATA_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_DMPAC0_DATA_ATYPE << 28), (uintptr_t)QOS_DMPAC0_DATA_CBASS_MAP(channel));
        +       }
        +}
        +
        +void setup_dss_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* two master ports: dma and fbdc */
        +       /* two groups: SRAM and DDR */
        +       /* 10 channels: (pipe << 1) | is_second_buffer */
        +
        +       /* master port 1 (dma) */
        +       for (group = 0; group < QOS_DSS0_DMA_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_DSS0_DMA_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_DSS0_DMA_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_DSS0_DMA_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_DSS0_DMA_ATYPE << 28) | (QOS_DSS0_DMA_PRIORITY << 12) | (QOS_DSS0_DMA_ORDER_ID << 4), (uintptr_t)QOS_DSS0_DMA_CBASS_MAP(channel));
        +       }
        +
        +       /* master port 2 (fbdc) */
        +       for (group = 0; group < QOS_DSS0_FBDC_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_DSS0_FBDC_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_DSS0_FBDC_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_DSS0_FBDC_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_DSS0_FBDC_ATYPE << 28) | (QOS_DSS0_FBDC_PRIORITY << 12) | (QOS_DSS0_FBDC_ORDER_ID << 4), (uintptr_t)QOS_DSS0_FBDC_CBASS_MAP(channel));
        +       }
        +}
        +
        +void setup_gpu_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* gpu m0 rd */
        +       for (group = 0; group < QOS_GPU0_M0_RD_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_GPU0_M0_RD_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_GPU0_M0_RD_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_GPU0_M0_RD_NUM_I_CH; ++channel) {
        +
        +               if(channel == 0)
        +               {
        +                       writel((QOS_GPU0_M0_RD_ATYPE << 28) | (QOS_GPU0_M0_RD_MMU_PRIORITY << 12) | (QOS_GPU0_M0_RD_ORDER_ID << 4), (uintptr_t)QOS_GPU0_M0_RD_CBASS_MAP(channel));
        +               }
        +               else
        +               {
        +                       writel((QOS_GPU0_M0_RD_ATYPE << 28) | (QOS_GPU0_M0_RD_PRIORITY << 12) | (QOS_GPU0_M0_RD_ORDER_ID << 4), (uintptr_t)QOS_GPU0_M0_RD_CBASS_MAP(channel));
        +               }
        +       }
        +
        +       /* gpu m0 wr */
        +       for (group = 0; group < QOS_GPU0_M0_WR_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_GPU0_M0_WR_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_GPU0_M0_WR_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_GPU0_M0_WR_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_GPU0_M0_WR_ATYPE << 28) | (QOS_GPU0_M0_WR_PRIORITY << 12) | (QOS_GPU0_M0_WR_ORDER_ID << 4), (uintptr_t)QOS_GPU0_M0_WR_CBASS_MAP(channel));
        +       }
        +
        +       /* gpu m1 rd */
        +       for (group = 0; group < QOS_GPU0_M1_RD_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_GPU0_M1_RD_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_GPU0_M1_RD_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_GPU0_M1_RD_NUM_I_CH; ++channel) {
        +
        +               if(channel == 0)
        +               {
        +                       writel((QOS_GPU0_M1_RD_ATYPE << 28) | (QOS_GPU0_M1_RD_MMU_PRIORITY << 12) | (QOS_GPU0_M1_RD_ORDER_ID << 4), (uintptr_t)QOS_GPU0_M1_RD_CBASS_MAP(channel));
        +               }
        +               else
        +               {
        +                       writel((QOS_GPU0_M1_RD_ATYPE << 28) | (QOS_GPU0_M1_RD_PRIORITY << 12) | (QOS_GPU0_M1_RD_ORDER_ID << 4), (uintptr_t)QOS_GPU0_M1_RD_CBASS_MAP(channel));
        +               }
        +       }
        +
        +       /* gpu m1 wr */
        +       for (group = 0; group < QOS_GPU0_M1_WR_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_GPU0_M1_WR_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_GPU0_M1_WR_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_GPU0_M1_WR_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_GPU0_M1_WR_ATYPE << 28) | (QOS_GPU0_M1_WR_PRIORITY << 12) | (QOS_GPU0_M1_WR_ORDER_ID << 4), (uintptr_t)QOS_GPU0_M1_WR_CBASS_MAP(channel));
        +       }
        +}
        +
        +void setup_encoder_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* encoder rd */
        +       for (group = 0; group < QOS_ENCODER0_RD_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_ENCODER0_RD_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_ENCODER0_RD_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_ENCODER0_RD_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_ENCODER0_RD_ATYPE << 28) | (QOS_ENCODER0_RD_PRIORITY << 12) | (QOS_ENCODER0_RD_ORDER_ID << 4), (uintptr_t)QOS_ENCODER0_RD_CBASS_MAP(channel));
        +       }
        +
        +       /* encoder wr */
        +       for (group = 0; group < QOS_ENCODER0_WR_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_ENCODER0_WR_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_ENCODER0_WR_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_ENCODER0_WR_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_ENCODER0_WR_ATYPE << 28) | (QOS_ENCODER0_WR_PRIORITY << 12) | (QOS_ENCODER0_WR_ORDER_ID << 4), (uintptr_t)QOS_ENCODER0_WR_CBASS_MAP(channel));
        +       }
        +}
        +
        +void setup_decoder_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* decoder rd */
        +       for (group = 0; group < QOS_DECODER0_RD_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_DECODER0_RD_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_DECODER0_RD_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_DECODER0_RD_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_DECODER0_RD_ATYPE << 28) | (QOS_DECODER0_RD_PRIORITY << 12) | (QOS_DECODER0_RD_ORDER_ID << 4), (uintptr_t)QOS_DECODER0_RD_CBASS_MAP(channel));
        +       }
        +
        +       /* decoder wr */
        +       for (group = 0; group < QOS_DECODER0_WR_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_DECODER0_WR_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_DECODER0_WR_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_DECODER0_WR_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_DECODER0_WR_ATYPE << 28) | (QOS_DECODER0_WR_PRIORITY << 12) | (QOS_DECODER0_WR_ORDER_ID << 4), (uintptr_t)QOS_DECODER0_WR_CBASS_MAP(channel));
        +       }
        +}
        +
        +void setup_c66_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* c66_0 mdma */
        +       for (group = 0; group < QOS_C66SS0_MDMA_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_C66SS0_MDMA_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_C66SS0_MDMA_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_C66SS0_MDMA_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_C66SS0_MDMA_ATYPE << 28) | (QOS_C66SS0_MDMA_PRIORITY << 12) | (QOS_C66SS0_MDMA_ORDER_ID << 4), (uintptr_t)QOS_C66SS0_MDMA_CBASS_MAP(channel));
        +       }
        +
        +       /* c66_1 mdma */
        +       for (group = 0; group < QOS_C66SS1_MDMA_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_C66SS1_MDMA_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_C66SS1_MDMA_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_C66SS1_MDMA_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_C66SS1_MDMA_ATYPE << 28) | (QOS_C66SS1_MDMA_PRIORITY << 12) | (QOS_C66SS1_MDMA_ORDER_ID << 4), (uintptr_t)QOS_C66SS1_MDMA_CBASS_MAP(channel));
        +       }
        +}
        +
        +void setup_main_r5f_qos(void)
        +{
        +       unsigned int channel, group;
        +
        +       /* R5FSS0 core0 - read */
        +       for (group = 0; group < QOS_R5FSS0_CORE0_MEM_RD_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_R5FSS0_CORE0_MEM_RD_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_R5FSS0_CORE0_MEM_RD_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_R5FSS0_CORE0_MEM_RD_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_R5FSS0_CORE0_MEM_RD_ATYPE << 28) | (QOS_R5FSS0_CORE0_MEM_RD_PRIORITY << 12) | (QOS_R5FSS0_CORE0_MEM_RD_ORDER_ID << 4), (uintptr_t)QOS_R5FSS0_CORE0_MEM_RD_CBASS_MAP(channel));
        +       }
        +
        +       /* R5FSS0 core0 - write */
        +       for (group = 0; group < QOS_R5FSS0_CORE0_MEM_WR_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_R5FSS0_CORE0_MEM_WR_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_R5FSS0_CORE0_MEM_WR_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_R5FSS0_CORE0_MEM_WR_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_R5FSS0_CORE0_MEM_WR_ATYPE << 28) | (QOS_R5FSS0_CORE0_MEM_WR_PRIORITY << 12) | (QOS_R5FSS0_CORE0_MEM_RD_ORDER_ID << 4), (uintptr_t)QOS_R5FSS0_CORE0_MEM_WR_CBASS_MAP(channel));
        +       }
        +
        +       /* R5FSS0 core1 - read */
        +       for (group = 0; group < QOS_R5FSS0_CORE1_MEM_RD_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_R5FSS0_CORE1_MEM_RD_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_R5FSS0_CORE1_MEM_RD_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_R5FSS0_CORE1_MEM_RD_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_R5FSS0_CORE1_MEM_RD_ATYPE << 28) | (QOS_R5FSS0_CORE1_MEM_RD_PRIORITY << 12) | (QOS_R5FSS0_CORE0_MEM_RD_ORDER_ID << 4), (uintptr_t)QOS_R5FSS0_CORE1_MEM_RD_CBASS_MAP(channel));
        +       }
        +
        +       /* R5FSS0 core1 - write */
        +       for (group = 0; group < QOS_R5FSS0_CORE1_MEM_WR_NUM_J_CH; ++group) {
        +               writel(0x76543210, (uintptr_t)QOS_R5FSS0_CORE1_MEM_WR_CBASS_GRP_MAP1(group));
        +               writel(0xfedcba98, (uintptr_t)QOS_R5FSS0_CORE1_MEM_WR_CBASS_GRP_MAP2(group));
        +       }
        +
        +       for (channel = 0; channel < QOS_R5FSS0_CORE1_MEM_WR_NUM_I_CH; ++channel) {
        +
        +               writel((QOS_R5FSS0_CORE1_MEM_WR_ATYPE << 28) | (QOS_R5FSS0_CORE1_MEM_WR_PRIORITY << 12) | (QOS_R5FSS0_CORE1_MEM_RD_ORDER_ID << 4), (uintptr_t)QOS_R5FSS0_CORE1_MEM_WR_CBASS_MAP(channel));
        +       }
        +
        +}
        +
        +
        +
         void k3_spl_init(void)
         {
         	struct udevice *dev;
        @@ -243,6 +517,15 @@ void k3_mem_init(void)
         
         	setup_qos();
         
        +	setup_navss_nb();
        +	setup_c66_qos();
        +	setup_main_r5f_qos();
        +	setup_vpac_qos();
        +	setup_dmpac_qos();
        +	setup_dss_qos();
        +	setup_gpu_qos();
        +	setup_encoder_qos();
        +
         	spl_enable_dcache();
         }
         
        
    2. The patch was applied properly. I added a print in the code (not included in this patch), at uboot loading, showing the changes were there.
    3. While testing with the QoS patch and the display statistics patch (suggested above), I could see the bug. This doesn't resolve our issue. The display statistics also showed display 'underflows'.

  • PL,

    Thanks for checking this. One important one to also have is this function: setup_navss_nb() and it looks like you have it already. (and it was already in the SDK 9.1 when I checked).

    Looping in Brijesh to comment on next steps to address the underflows.

    Thanks,

    Erick

  • Tested with lower resolution (720p). Issue persists (visual and underflows)

  • Hi Brijesh, Erick and Nikhil,

    Any updates this morning regarding the next steps that we could try? Any other way to investigate the display kernel underflows?

    Thanks,

  • David,

    One suggestion the team came back with was to try using 64K page size instead of 4K page size. Is this something simple you could try?

    The processes is straightforward. You will need to re-build the linux kernel, but change the configuration for 64K page size from 4K page size. If you are not familiar with this, let me know. You will need to modify the configuration, which can modify other configurations as well automatically if done through menuconfig, rebuild, add to SD card and then boot again.

    Regards,

    Erick

  • Hi Erick,

    Yes we can do that, we will try in on our side.

    Thanks,

  • Sure, thank you

  • As another test, I tried to feed the display node with a YUV NV12 vx image instead of a RGBX vx image. I did this by adding a preliminary convert node (RGBX->NV12) in the display graph. 

    Result

    I could see the display underflowCount was incrementing at a significative lower rate. I didn't see the display offset bug once for a 13 minutes run. Although, since there are still underflows, this doesn't guarantee we wouldn't see the bug for a longer run.

    Edit:

    Also tried YUV NV12 with lower resolution (720p). Here is the number of display underflows I had comparing to YUV NV12 1080p:

    • 1080p-nv12: 18 underflows after 13 minutes

    • 720p-nv12: 5 underflows after 13 minutes

  • Erick,

    Quick update regarding the 64K pages, we are still testing this change.

    Regarding the statistics, is there something else that we could monitor which would give us more information about the memory bandwidth used at different places in the SoC?

    On the other side, if PL tests give you other ideas, let us know. 

    Thanks,

  • David,

    Regarding the statistics, is there something else that we could monitor which would give us more information about the memory bandwidth used at different places in the SoC?

    Bandwidth measurements can be done in a few different ways. The simplest will be using a lauterbach debugger, since we have scripts to generate the data. It is also possible through CCS, although right now figuring out why it is not working on CCS 12.7, but 12.6 looks stable.

    I've got instructions for both that I can share, let me know which you would prefer.

    On the other side, if PL tests give you other ideas, let us know. 

    Edit:

    Also tried YUV NV12 with lower resolution (720p). Here is the number of display underflows I had comparing to YUV NV12 1080p:

    • 1080p-nv12: 18 underflows after 13 minutes

    • 720p-nv12: 5 underflows after 13 minutes

    This does follow the idea that you are bandwidth limited and we need to somehow increase the priority of DSS, since the QoS settings do not seem to take effect. After the page size test, let me speak with Nikhil and Brijesh to see what other options we have to improve the memory bandwidth for the DSS.

    Regards,

    Erick

  • Bandwidth measurements can be done in a few different ways. The simplest will be using a lauterbach debugger, since we have scripts to generate the data. It is also possible through CCS, although right now figuring out why it is not working on CCS 12.7, but 12.6 looks stable.

    I've got instructions for both that I can share, let me know which you would prefer.

    Perfect, can you send me the instructions for CCS please?

    Thank you,

  • Hi Erick, 

    Regarding this test:

    This does follow the idea that you are bandwidth limited and we need to somehow increase the priority of DSS, since the QoS settings do not seem to take effect. After the page size test, let me speak with Nikhil and Brijesh to see what other options we have to improve the memory bandwidth for the DSS.

    When we apply the kernel configuration change below

    Subject: [PATCH] [LV-30983] Set kernel page size 64K
    ---
     kernel/configs/ti_arm64_prune.config | 24 ++++++++++++++++++++++++
     1 file changed, 24 insertions(+)
    diff --git a/kernel/configs/ti_arm64_prune.config b/kernel/configs/ti_arm64_prune.config
    index c678a9b67..5b1b7c189 100644
    --- a/kernel/configs/ti_arm64_prune.config
    +++ b/kernel/configs/ti_arm64_prune.config
    @@ -131,6 +131,30 @@ CONFIG_CAN_VCAN=m
     
     # Generic kernel
     
    +# Kernel page size 64K
    +CONFIG_ARM64_PAGE_SHIFT=16
    +CONFIG_ARM64_CONT_PTE_SHIFT=5
    +CONFIG_ARM64_CONT_PMD_SHIFT=5
    +CONFIG_ARCH_MMAP_RND_BITS_MIN=14
    +CONFIG_ARCH_MMAP_RND_BITS_MAX=29
    +CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=7
    +CONFIG_PGTABLE_LEVELS=3
    +
    +CONFIG_ARM64_4K_PAGES=n
    +CONFIG_ARM64_16K_PAGES=n
    +CONFIG_ARM64_64K_PAGES=y
    +CONFIG_ARM64_VA_BITS_42=n
    +CONFIG_ARM64_VA_BITS_52=n
    +CONFIG_ARM64_PA_BITS_52=n
    +
    +CONFIG_ARCH_FORCE_MAX_ORDER=14
    +CONFIG_ARCH_WANT_HUGE_PMD_SHARE=n
    +
    +CONFIG_PAGE_SIZE_LESS_THAN_64KB=n
    +CONFIG_ARCH_WANTS_THP_SWAP=n
    +CONFIG_THP_SWAP=n
    +CONFIG_VMXNET3=n
    +
     # We recommend to turn off Real-Time group scheduling in the
     # kernel when using systemd. RT group scheduling effectively
     # makes RT scheduling unavailable for most userspace, since it
    -- 
    2.34.1
    

    We recompile the kernel, gpu driver and the firmwares (vision_apps based) and when running we end up with EGL errors

    GL: after glReadPixels_ExtractFboData() glError (0x502)
    Do you have an idea of where it could come from?. In the dmesg, there's no errors regarding the PVR driver. 
    Thanks,
  • David,

    Yes, since you re-compiled your kernel, the GPU driver was not loaded on boot-up. It looks like it was trying to use SW rasterization. You will also need to compile the GPU kernel driver again against the kernel you are currently using, and you should see it in the /lib/modules/<kernel_version>/ and the file is called pvrsrvkm.ko.

    If it's not there, re-compile it from the Linux SDK using "make ti-img-rogue-driver". Them you can install it by the following:

    cd board-support/extra-drivers/ti-img-rogue-driver-23.3.6512818/binary_j721s2_linux_lws-generic_release

    sudo install.sh --root <your rootfs>

    Regards,

    Erick

  • Thanks Erick. We do recompile the GPU driver with the kernel changes. We see that there are some errors when loading it.

    Also when you will have time if you can send me the documentation for CCS 12.6 setup to monitor the bandwidth.

    Regards,

  • David,

    Let me try this myself and I'll get back to you.

    I also owe you the calculations for measuring the DDR bandwidth in CCS.

    Regards,

    Erick