This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] TDA4VH-Q1: How do I control the quality of service (QoS) and class of service (CoS) settings on TDA4 devices to balance transaction loads and priorities?

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH, J784S4XEVM

Tool/software:

I'm seeing issues with the display experiencing sync losts when running the TIDL AVP demo. How do I balance the transaction loads and priorities so that the DSS isn't stalled by the C7x cores?

  • 1. Problem Description

    This FAQ will reference the TDA4VH (J784S4 TRM), but the same principles apply to other TDA4 devices. 

    How do I control the quality of service (QoS) and class of service (CoS) settings on TDA4 devices to balance transaction loads and priorities?

    The case study that will be used as an example to explain the concepts is: display sync losts due to high C7x bandwidth.  


    2. System Interconnects within the TDA4VH

    Read section 3. System Interconnect of the TRM for more details.

    To understand how transactions are balanced and prioritized, it's important to have a general grasp on the system interconnects. In the following case study, we will focus primarily on the data routing of the display subsystem (DSS) and C7x to the DDR subsystem (DDRSS), but it's good to have a general understanding of all the initiators and targets.

    Figure 3-1. shows a high level diagram of initiators and the direction of their requests. The DSS falls within the "Initiator" block. Section 3.2.6 Initiator-Target Connections contains Connectivity Matrixes that further elaborate the relationships between the initiators and targets within the system.

    A C7x transaction request largely follows the following path:

    C7x -> MSMC -> DDRSS

    A DSS transaction request largely follows the following path:

    DSS -> Main CBASS -> NAVSS -> MSMC -> DDRSS

    2.1. The Navigator Subsystems (NAVSS)

    Read section 10.2.10 NAVSS North Bridge (NB) of the TRM for more details.

    This section will discuss the Main NAVSS (NAVSS0), not the MCU NAVSS.

    The NAVSS0 consists of the following components:

    • Unified DMA subsystem (UDMASS)
    • Module subsystem (MODSS)
    • North bridge subsystem (NBSS)
    • Virtualization subsystem (VirtSS)
    • ECC aggregators

    Figure 10-14. shows the NAVSS0 hardware components and their integration.

    2.1.1. NAVSS North Bridge (NB)

    Read section 10.2.10 NAVSS North Bridge (NB) of the TRM for more details.

    The NB bridges between VBUSM interfaces and a VBUSM.C interface. The NAVSS contains 2 NB bridges: NB0 and NB1.

    Figure 10-34. displays the high level structure of the NAVSS0's NB0 and NB1.

    Figure 10-34. is taken from the DRA829/TDA4VM TRM.

    The NB's quality of service mechanisms will be explained in later sections.

    2.2. The Multicore Shared Memory Controller (MSMC)

    Read section 8.1 Multicore Shared Memory Controller (MSMC) of the TRM for more details.

    The MSMC provides high-bandwidth data-movement and resource access to and from the internal processing elements of the compute cluster and the rest of the system.

    Figure 8-1. shows an overview of the MSMC and its surrounding modules.

    The MSMC's quality of service mechanisms will be explained in later sections


    3. Quality of Service (QoS)

    3.1. General Information

    Read section 3.2.1 Quality of Service (QoS) of the TRM for more details.

    Quality of service is the use of mechanisms to control traffic within a network. In this case, that network is the TDA4VH device. There are two (system-wide) methods of quality of service offered: order ID and arbitration by priority. Order ID controls the mapping of a transaction's master and that master's channels (if there are multiple); this allows for balancing traffic across different paths. Priority offers arbitration capabilities to improve latency and bandwidth.

    3.1.1. NAVSS0

    The interconnect inside NAVSS0 uses order ID to provide multiple (at least two) parallel paths to DDR and a separate set of multiple (at least two) parallel paths to SRAM. NAVSS0 also provides multiple (at least two) parallel paths for the DMA traffic to SoC level, which can provide isolated DMA traffic paths.

    3.1.2. MSMC

    The MSMC does not use order ID to provide separate routing paths. Instead, it provides two threads to isolate two classes of transactions: thread 0 and thread 2 (these names come from the NAVSS0 north bridge). The arbitration between these two threads is based on credits, and thread 2 has priority over thread 0 when both threads have credits available for transfer. In addition, there is bandwidth management scheme based on transaction priority and bandwidth starvation prevention mechanism. More details will be given in later sections.

    3.1.3. General Order ID Information

    By default all masters send transactions with order ID = 0. All transactions with the same order ID execute in order if they are sent to the same slave or going through a common bridge. On the SoC level interconnect, the order ID is used to partition the transactions to MSMC and  DDR data space into parallel routing paths. Further, the north bridge inside NAVSS0 provides multiple parallel paths to the compute cluster. Each path is separated by order ID value. All read commands towards MSMC and DDR sharing the same order ID and sharing the same master path from SoC side (including NAVSS0) provide read return data in order back to the master port. The write response can be returned out of order for the same order ID value. Therefore, programming order ID has implications on overall system performance as well as achieving QoS for certain class of traffic. Multiple configurations are needed to make sure that the QoS goal is met.

    3.2. NAVSS0 NB

    Read section 10.2.10.2.10 Quality of Service of the TRM for more details.

    3.2.1. Normal vs Real Time Traffic

    The north bridge uses 3 threads to separate traffic:

    • Thread 0: commands to VBUSM.C
    • Thread 1: commands from VBUSM.C
    • Thread 2: realtime commands to VBUSM.C

    The north bridge supports user programming of sources to either the normal thread (0) or RT thread (2). Any source mapped to the real time thread will be arbitrated before the normal thread. If there are multiple sources mapped to the same thread then they are arbitrated based on priority, and if they are the same priority, then by round robin.

    3.2.2. Order ID

    To route traffic from the VBUSM.C to the correct VBUSM source, the order ID is used. This means if multiple sources are mapped to the same thread (normal or real time), they must have different order IDs.

    3.3. Multicore Shared Memory Controller (MSMC)

    3.3.1. Resource Arbitration

    Read section 8.1.2.6 MSMC Resource Arbitration of the TRM for more details.

    If access requests target the same endpoint, they must be arbitrated. The endpoint arbitrators implement a multi-layer arbitration scheme with starvation bounds to allow the user to control/partition system bandwidth.

    The arbitration scheme is as follows:

    1. Transaction-tagged priority - Each active requestor pushes out a priority based on all of the currently active
      transactions in its pipeline. The highest priority requestors are chosen.
    2. Fair-Share State - The arbiter selects the requestor(s) with the highest internal Fairshare state.
    3. Static Priority - Finally, the arbiter uses a static priority to break any remaining ties after levels 1 and 2.

    3.3.1.1. Starvation Bounds

    To ensure a bound on the minimum bandwidth of a request, the MSMC implements starvation bounds. Every time a request wins arbitration, all loser requests decrement their starvation count. Each time a requestor wins, all of its requests' starvation counts reset to their initial states. If the starvation count reaches zero, the requestor priority level is elevated to the highest level. Once the starved request wins, the requestor's priority level reverts back to its normal level. The starvation bounds can be programmed by the user to change the bound on a requestor's minimum bandwidth.

    3.3.2. Non-real Time vs Real Time Traffic

    Read section 8.1.2.11 MSMC Quality-of-Service of the TRM for more details.

    The MSMC provides two classes of traffic: real-time (RT) and non-real time (NRT). The MSMC provides a dedicated buffering at each arbitration point that can only be consumed by RT traffic, so NRT traffic cannot completely starve out RT requests.

    There is no software control over the MSMC QoS hardware.

    Interfaces which do not support QoS features denote all traffic as non-real time.

    3.3.2.1. Non-real Time vs Real Time Way Partitioning

    Read section 8.1.2.3.1 Way Partitioning of the TRM for more details

    The MSMC attempts to minimize allocation bias, but it's sometimes useful to keep the state of tasks resident within the cache of the MSMC. To do this, the MSMC supports way-partitioning based on NRT vs RT traffic.

    The MSMC offers software control over the number of cache way groups non-real and real time traffic can allocate into: only NRT traffic, only RT traffic, or both. In this way, you can increase/decrease the amount of cache available for NRT and RT traffic.

    3.4. DDRSS

    3.4.1. MSMC2DDR Bridge

    Read sections 8.2.3.1 DDRSS MSMC2DDR Bridge and 8.2.3.1.1 VBUSM.C Threads of the TRM for more details

    The MSMC2DDR bridge supports 2 threads:

    • High Priority Thread (HPT): traffic received on VBUSM.C thread 2 belongs to HPT
    • Low Priority Thread (LPT): traffic received on VBUSM.C thread 0 belongs to LPT

    HPT has priority over LPT, and execution of commands from the command queue can be out-of-order. This ensures the HPT is guaranteed execution even when the LPT is blocked.

    Because the MSMC2DDR bridge maintains data coherency across threads, priority inversion is possible. Any HPT transactions that depend on LPT transactions due to address conflicts are blocked until execution of those corresponding LPT transactions.

    3.4.2. Class of Service (CoS)

    Read section 8.2.3.1.2 Class of Service (CoS) of the TRM for more details.

    Class of service is specific to the DDRSS, and controls how the system (VBUSM.C) priorities map to the DDRSS internal priorities. The MSMC2DDR bridge has the following registers to map VBUSM.C priorities to DRR controller priorities:

    • Range match registers:
      • DDRSS_V2A_R1_MAT_REG
      • DDRSS_V2A_R2_MAT_REG
      • DDRSS_V2A_R3_MAT_REG
    • Priority map registers:
      • DDRSS_V2A_LPT_DEF_PRI_MAP_REG
      • DDRSS_V2A_LPT_R1_PRI_MAP_REG
      • DDRSS_V2A_LPT_R2_PRI_MAP_REG
      • DDRSS_V2A_LPT_R3_PRI_MAP_REG
      • DDRSS_V2A_HPT_DEF_PRI_MAP_REG
      • DDRSS_V2A_HPT_R1_PRI_MAP_REG
      • DDRSS_V2A_HPT_R2_PRI_MAP_REG
      • DDRSS_V2A_HPT_R3_PRI_MAP_REG

    Figure 8-7. shows how priority map registers map incoming priorities to the appropriate DDR priorities. 

     

    3.5. QoS Summary

    Now that we've discussed the mechanisms and IPs that are responsible for controlling QoS. It's important to put all of the details together, and understand what it means from a high level point of view.

    3.5.1. Order ID

    The order ID for is a programmable field for requests and impacts how the request is handled between different IPs. The order ID's primary focus is to provide a mechanism to balance how data flows between different paths.

    How to program the order ID will be expanded upon within the case study.

    3.5.2. Non-real Time vs Real Time Requests

    There are 2 classes of requests that can be made within the system: non-real time (NRT) (or normal) requests and real time (RT) requests. RT requests have priority over NRT requests, and will be serviced before NRT requests. The name for the NRT and RT attribute varies IP to IP:

    NRT or RT
    MSMC
    NAVSS North Bridge
    DDR Controller
    Non-real time (NRT) Non-real time Thread 0 (normal) Low priority thread (LPT)
    Real time (RT) Real time Thread 2 (real time) High priority thread (HPT)

    Programming whether an order ID is mapped to the NRT or RT thread is done within the NAVSS north bridge registers, specifically within the NAVSS_NORTH_x_NBSS_NBx_MMRS_threadmap (x signifies 0 or 1) register.

    Register Field Bit(s) Description
    NAVSS_NORTH_0_NBSS_NB0_MMRS_threadmap RESERVED 31:3 Reserved
    THREADMAP 1

    Maps order IDs 8-15 to a VBUSM.C thread number:

    • 0: VBUSM.C thread 0 (non-real time traffic)
    • 1: VBUSM.C thread 2 (real time traffic)
    0

    Maps order IDs 0-7 to a VBUSM.C thread number:

    • 0: VBUSM.C thread 0 (non-real time traffic)
    • 1: VBUSM.C thread 2 (real time traffic)
    NAVSS_NORTH_1_NBSS_NB1_MMRS_threadmap RESERVED 31:3 Reserved
    THREADMAP 2

    Maps order IDs 10-15 to a VBUSM.C thread number:

    • 0: VBUSM.C thread 0 (non-real time traffic)
    • 1: VBUSM.C thread 2 (real time traffic)
    1

    Maps order IDs 5-9 to a VBUSM.C thread number:

    • 0: VBUSM.C thread 0 (non-real time traffic)
    • 1: VBUSM.C thread 2 (real time traffic)
    0

    Maps order IDs 0-4 to a VBUSM.C thread number:

    • 0: VBUSM.C thread 0 (non-real time traffic)
    • 1: VBUSM.C thread 2 (real time traffic)

    The NAVSS_NORTH_x_NBSS_NBx_MMRS_threadmap register's fields vary between the J7 devices. The above table is representative of the J784S4/TDA4VH.

    More details about setting the NRT/RT attribute of a request will be given in the case study.

    3.5.3. Priority

    The possible priority of a request varies between 0 and 7, with 0 being the highest and 7 being the lowest. The way to set the priority of a request varies from IP to IP, but there is usually a CTRL MMR or register within the IP to set the priority.

    For example: the DSS' DSS_DISPC_0_COMMON_M_DSS_CBA_CFG register controls the priority level of DSS requests:

    Field
    Bits
    Description
    PRI_HI 5:3

    The value sent out on the PRI_HI bus from DSS to CBA Indicates the priority level for high-priority [MFLAG] transactions.

    • Value of 0x0 indicates highest priority
    • Value of 0x7 indicates lowest priority
    PRI_LO 2:0

    The value sent out on the PRI_LO bus from DSS to CBA Indicates the priority level for normal [non-MFLAG] transactions.

    • Value of 0x0 indicates highest priority
    • Value of 0x7 indicates lowest priority

    3.5.4. Combining NRT/RT and Priority

    When combining the NRT/RT attribute and priority of a request, the hierarchy of requests is essentially as follows:

    NRT or RT Request Priority Priority Level
    Real time Highest priority
    1 Priority decreases down the list
    2
    3
    4
    5
    6
    7
    Non-real time 0
    1
    2
    3
    4
    5
    6
    7 Lowest priority

    4. Case Study: Display Sync Lost Issue

    4.1. Problem Statement

    When running the AVP demo with a 4k display plugged in, the display suffers from frequent sync losts.

    4.2. Setup and Recreation

    4.2.1. Requirements

    Item
    Link
    Comments

    J784S4XEVM

    Link N/A
    SD Card N/A N/A
    4k monitor N/A Connect the monitor to Display Port1

    PROCESSOR-SDK-RTOS-J784S4

    ti-processor-sdk-rtos-j784s4-evm-09_02_00_05.tar.gz Version 09.02
    SOC generic TI sample input data set psdk_rtos_ti_data_set_09_02_00.tar.gz Listed within the PROCESSOR-SDK-RTOS-J784S4 page

    SOC specific tidl models

    psdk_rtos_ti_data_set_09_02_00_j784s4.tar.gz Listed within the PROCESSOR-SDK-RTOS-J784S4 page

    PROCESSOR-SDK-LINUX-J784S4

    ti-processor-sdk-linux-adas-j784s4-evm-09_02_00_05-Linux-x86-Install.bin Version 09.02

    RTOS patches

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/7356.rtos_2D00_patches.tar.xz

    • https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_vision_5F00_apps_2D00_Remove_2D00_the_2D00_DSS_2D00_application_2D00_from_2D00_MCU2_5F00_0.patch
    • https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/0002_2D00_vision_5F00_apps_2D00_Remove_2D00_display_2D00_use_2D00_from_2D00_the_2D00_AVP_2D00_demo.patch
    Patches to apply to the vision_apps repository

    Linux patches

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/1070.linux_2D00_patches.tar.xz

    • https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_arm64_2D00_dts_2D00_ti_2D00_k3_2D00_j784s4_2D00_vision_2D00_apps_2D00_Re_2D00_enable_2D00_DSS_2D00_for.patch
    Patches to apply to the ti-linux-kernel repository

    4.2.2.  Host Setup

    Commands to execute on host
    # install the PROCESSOR-SDK-RTOS-J784S4
    # install the PROCESSOR-SDK-LINUX-J784S4
    # download the data set tar files
    # download and untar the patch tars
    # insert SD card
     
    export PSDKR_PATH=<path-to-rtos-sdk>
    export PSDKL_PATH=<path-to-linux-sdk>
    export DATA_SET_PATH=<path-to-directory-where-data-sets-are-stored>
    export RTOS_PATCHES=<path-to-rtos-patches>
    export LINUX_PATCHES=<path-to-linux-patches>
     
    # set up PSDK RTOS
    cd $PSDKR_PATH
    ./sdk_builder/scripts/setup_psdk_rtos.sh
     
    # set up SD card (example assumes the SD card is at /dev/sdb)
    umount /dev/sdb?*
    cd $PSDKR_PATH
    sudo sdk_builder/scripts/mk-linux-card.sh /dev/sdb
    ./sdk_builder/scripts/install_to_sd_card.sh
    cd /media/$USER/rootfs/
    mkdir -p opt/vision_apps
    cd opt/vision_apps
    tar --strip-components=1 -xf $DATA_SET_PATH/psdk_rtos_ti_data_set_09_02_00.tar.gz
    tar --strip-components=1 -xf $DATA_SET_PATH/psdk_rtos_ti_data_set_09_02_00_j784s4.tar.gz
    sync
     
    # edit and build demo app
    cd $PSDKR_PATH/vision_apps
    git init
    git add -A
    git commit -m "SDK 09.02.00.05 release"
    git am $RTOS_PATCHES/*.patch
    cd ../sdk_builder
    ./make_sdk.sh
    make linux_fs_install_sd
     
    # edit and rebuild device tree
    export PATH=$PATH:$PSDKL_PATH/linux-devkit/sysroots/x86_64-arago-linux/usr/bin/aarch64-oe-linux/
    cd $PSDKL_PATH/board-support/ti-linux-kernel-6.1.80+gitAUTOINC+2e423244f8-ti
    git add -A
    git commit -m "SDK 09.02.00.05 release"
    git am $LINUX_PATCHES/*.patch
    make ARCH=arm64 CROSS_COMPILE=aarch64-oe-linux- defconfig ti_arm64_prune.config
    make ARCH=arm64 CROSS_COMPILE=aarch64-oe-linux- DTC_FLAGS=-@ ti/k3-j784s4-vision-apps.dtbo
    sudo mv /media/$USER/rootfs/boot/dtb/ti/k3-j784s4-vision-apps.dtbo /media/$USER/rootfs/boot/dtb/ti/k3-j784s4-vision-apps.dtbo.old
    sudo cp arch/arm64/boot/dts/ti/k3-j784s4-vision-apps.dtbo /media/$USER/rootfs/boot/dtb/ti/

    4.2.3.  Target Setup 

    Commands to execute on target
    cd /opt/vision_apps
    source ./vision_apps_init.sh
    ./run_app_tidl_avp.sh

    4.2.4. Recreation

    The following video was taken with kmstest running in tandem with the AVP demo.

    Running kmstest with AVP demo
    systemctl stop weston
    kmstest & ./run_app_tidl_avp.sh

    4.3. Debugging QoS

    4.3.1. CPTracer

    CCS (and Lauterbach) offer a tool called CPTracer, which allows developers to profile traffic within the system. In this case, we'll profile the DDR traffic.

    4.3.1.1.  Setup

    1. Follow the steps for setting up CCS within section 7.4. Debugging with HLOS running on A72 (Linux / QNX) of the Processor SDK RTOS. The specific version of the PSDK RTOS mentioned here is: PSDK RTOS J784S4 10.01.00
    2. The traffic profiling window can be opened from the SoC analysis tab.
    3. This is the window that should appear when opening CPTracer:
    4. There are many filters available to prune which transactions are profiled and can be edited by clicking the gear icon of a corresponding initiator.

    4.3.1.2. Profiling Throughput

    CPTracer offers a way to profile the total throughput of transactions within a period. The following is output when a csv is exported:

    • Master ID: ID of the initiator making the transaction
    • Master Name: name of the initiator (corresponds with the Master ID)
    • Data Message: description of the row (contains the period in clock cycles)
    • Global Timestamp: time of sample window's closing (based on GTC (200MHz clock))
    • Trace Status: notes when the trace starts and ends
    • Byte Transactions: total number of bytes (sent within a period) observed at the probe that match the set filters
    • Matched: total number of transactions (within a period) observed at the probe that match the set filters
    • Avg. Length: average number of bytes (sent within a period) per transaction observed at the probe that match the set filters

    4.3.1.3. Profiling Latency

    CPTracer also offers a way to profile the latency of transactions within a period. The following is output when a csv is exported:

    • Master ID: ID of the initiator making the transaction
    • Master Name: name of the initiator (corresponds with the Master ID)
    • Data Message: description of the row (contains the period in clock cycles)
    • Global Timestamp: time of sample window's closing (based on GTC (200MHz clock))
    • Trace Status: notes when the trace starts and ends
    • Tracked: total number of transactions within a period from the initiator
    • Matched Transactions: total number of transactions (within a period) observed at the probe that match the set filters
    • Max Wait: maximum latency measurement of a single transaction (within a period) observed at the probe that match the set filters
    • Total Wait: total latency (all latency measurements summed together) (within a period) observed at the probe that match the set filters
    • Credit Wait: total credit latency (within a period) observed at the probe for credit-based buses that match the set filters

    4.3.1.4. Profiling Transactions

    CPTracer gives detailed information on transactions that are observed. The following are output when a csv is exported (this is a parred down list of relevant columns):

    • Route ID: route ID of the transaction
    • Byte Count: burst size of the transaction
    • Priority: the priority of the transaction
    • QOS: the quality of service of the transaction
    • Order ID: the order ID of the transaction

    The complete list of columns can be found in the Advanced Probe Filters section of the CPTracer documentation

    4.3.1.5. Profiling Relevant Routes

    The issue that we're viewing is occurring due to the C7x disrupting the DSS, therefore we want to profile the C7x and DSS routes.

    The route IDs for DSS and C7x transactions are as follows:

    Initiator
    Route ID
    C7x_1 Core 0x20
    C7x_1 DRU0 0x21
    C7x_1 DRU1 0x22
    C7x_1 CMMU 0x23
    C7x_2 Core 0x24
    C7x_2 DRU0 0x25
    C7x_2 DRU1 0x26
    C7x_2 CMMU 0x27
    C7x_3 Core 0x28
    C7x_3 DRU0 0x29
    C7x_3 DRU1 0x2A
    C7x_3 CMMU 0x2B
    C7x_4 Core 0x2C
    C7x_4 DRU0 0x2D
    C7x_4 DRU1 0x2E
    C7x_4 CMMU 0x2F
    DSS_INST0_VBUSM_DMA 0xA20
    DSS_INST0_VBUSM_FBDC 0xA21

    Route IDs can be found within the appendixes of the TRM

    The CCS version of CPTracer only contains EMIF0 and EMIF1 (missing EMIF2 and 3) within the MSMC_1 probe domain, but you're able to approximate the total throughput by multiplying one EMIF's throughput by 4.

    4.3.1.5.1. Profiling DSS Throughput

    The filters used to profile only DSS transactions are the following:

    • Route ID Value: 0xA20
    • Route ID Mask: 0xFFE
    • Sampling Window: 0x4000

    Only EMIF0 was profiled.

    The total bytes sent per frame were calculated with: 

    # Author: Jared McArthur
    
    import csv
    import matplotlib.pyplot as plt
    from argparse import ArgumentParser
    from textwrap import dedent
    
    def main():
        total_bytes = []
        time_stamps = []
    
        parser = ArgumentParser(prog="dss-frame-thru-calc.py")
        parser.add_argument("file", type=str)
        parser.add_argument("frame_start", type=float)
        parser.add_argument("frame_end", type=float)
    
        args = parser.parse_args()
        start = args.frame_start
        end = args.frame_end
    
        with open(args.file, "r") as csvfile:
            data = csv.DictReader(csvfile)
    
            first_stamp = 0
    
            for row in data:
                if row.get("Master ID") != "":
                    total_byte = int(row.get("Byte Transactions"), base=16)
                    time_stamp = int(row.get("Global Timestamp"), base=16)
    
                    if len(time_stamps) == 0:
                        first_stamp = time_stamp
    
                    total_bytes.append(total_byte)
                    time_stamps.append((time_stamp - first_stamp) / (1000000000 / 5))
    
        stamps_len = len(time_stamps)
        for index, stamp in enumerate(reversed(time_stamps)):
            if stamp < start or stamp > end:
                time_stamps.pop(stamps_len - 1 - index)
                total_bytes.pop(stamps_len - 1 - index)
    
        bytes_in_frame = 0
        for val in total_bytes:
            bytes_in_frame += val
    
        print(dedent(f"""
            Num periods in segment: {len(time_stamps)}
            Time elapsed in segment: {time_stamps[-1] - time_stamps[0]}
            Bytes sent in segment: {bytes_in_frame}"""))
    
        plt.plot(time_stamps, total_bytes)
        plt.show()
    
    if __name__ == "__main__":
        main()
    

    4.3.1.5.1.1. Theoretical DSS Throughput

    The theoretical DSS throughput is calculated for a 3840x2160@30fps XR32-888 display.

    Theoretical DSS Throughput
    height 3840

    width

    2160
    fps 30
    bits per pixel 32
    bits per frame 265420800
    Mb per frame 253.125
    data rate (bps) 7962624000
    data rate (Mbps) 7593.75
    4.3.1.5.1.2. Normal DSS Throughput

    The following image displays the throughput of DSS transactions without the AVP demo running.

    The total throughput of a single frame matches 1/4 of the expected throughput for a 3840x2160@30fps screen.

    Measured DSS Throughput Corrected Values for 4 EMIFs
    bytes per frame 8294400 33177600

    bits per byte

    8
    fps 30
    num emifs 1 4
    bits per frame 66355200 265420800
    Mb per frame 63.28125 253.125
    data rate (bps) 1990656000 7962624000
    data rate (Mbps) 1898.4375 7593.75
    4.3.1.5.1.3. DSS Throughput with the AVP Demo Running

    The following image displays the throughput of DSS transactions with the AVP demo running.

    The total throughput falls short of what is required to display 3840x2160@30fps. This is what causes the sync losts.

    Measured DSS Throughput Corrected Values for 4 EMIFs
    bytes per frame 8257536 33030144

    bits per byte

    8
    fps 30
    num emifs 1 4
    bits per frame 66060288 264241152
    Mb per frame 63 252
    data rate (bps) 1981808640 7927234560
    data rate (Mbps) 1890 7560
    4.3.1.5.2. Profiling DSS Latency

    Hypothetically, the DSS transactions are being stalled and causing a large spike in the total latency. To verify this hypothesis, we profile the latency of the DSS with and without the AVP demo running.

    Description
    CSV
    Plot
    DSS transaction latancies without the AVP demo running

    DSS transaction latancies with the AVP demo running

    Now that we've verified that the DSS transactions are being stalled by the C7x transactions, we can profile the C7x transactions and try to narrow down the source of the stalls.

    4.3.1.5.3. Profiling C7x Throughput

    Profiling the throughput of each separate C7x core can illuminate what exactly is causing the DSS stalls, but should be taken with a grain of salt.

    The high throughput and high number of transactions sections could be causing the stalls, but they could also simply be red herrings.

    C7x Core
    Route IDs
    Route ID Value
    Route ID Mask
    Throughput Plot
    All cores 0x20 to 0x2F 0x020 0xFF0

    1 0x20 to 0x23 0x020 0xFFC

    2 0x24 to 0x27 0x024 0xFFC

    3 0x28 to 0x2B 0x028 0xFFC

    4 0x2C to 0x2F 0x02C 0xFFC

    Looking at the above plots, it appears that C7x_4 is causing the stalls in the DSS. The high throughput sections appear to match when the DSS throughput is limited. 

    It appears that the DSS is stalling due to the high throughput transactions sent from the C7x_4; this is only partially true, but for now, let's take a closer look at C7x_4. 

    4.3.1.5.4. Profiling C7x Throughput and DSS Latency

    Unfortunately, CCS' CPTracer doesn't allow you to visualize transaction throughputs and transaction latencies at the same time. Lauterbach's CPTracer does, however.

    You can profile throughput on EMIF0 and latency on EMIF1 to view both the throughput and latency (of different routes) at the same time.

    Using this to our advantage, we can verify whether the high DSS latency actually does correlate to the DSS throughput throttling and which specific C7x_4 route has a non-zero throughput at the same time.

    Description
    Plots
    DSS throughput and DSS latency

    C7x_4 core (0x2C) throughput and DSS latency

    C7x_4 DRU0 (0x2D) throughput and DSS latency

    C7x_4 DRU1 (0x2E) throughput and DSS latency

    C7x_4 CMMU (0x2F) throughput and DSS latency

    Looking at the above plots, the issue appears to be with the C7x_4 core (0x2D) transactions.

    4.3.1.5.5. Profiling C7x_4 Core Transactions

    Using CPTracer's transaction profiling capabilities, we can look at the QoS settings for the C7x_4 core and DSS transactions.

    C7x_4 core transactions (c7x-4-core-trans.csv):

    • Route ID: 0x002D
    • Priority: 0x03
    • QoS: 0x00
    • Order ID: 0x00 

    DSS transactions (dss-trans.csv):

    • Route ID: 0xA20
    • Priority: 0x00 or 0x01
    • QoS: 0x00
    • Order ID: 0x0F

    Looking at the priorities, the C7x_4 core transactions shouldn't be superseding the DSS transactions.

    4.3.2.  Editing QoS Settings

    Although all of the QoS settings appear to be correct, for clarity sake, we will still cover how to set/change them. We also need to verify that the DSS transactions are being routed to the RT thread instead of the NRT thread.

    After setting the relevant QoS settings, you can verify your changes with the transaction profiling feature of CPTracer.

    Code to edit QoS settings can be generated with the Keystone3 Resource Partitioning Tool that comes packaged with SYSCONFIG.

    4.3.2.1. Order ID

    Changing the order ID of specific transactions allows you to route them to the NRT or RT thread. This is done within CBASS configuration registers and within the independent IPs (depending on the IP).

    4.3.2.1.1. DSS

    The order ID for the DSS is set within the CBASS configuration registers. The order ID for each DSS channel must be set and mapped to the corresponding VBUSM order ID.

    DSS_PIPE_VID1 corresponds to channels 0 and 1.
    4.3.2.1.1.1. Order ID Registers
    Address Register Bits Field Description

    0x45DC2000

    CBASS_AC_NONSAFE_QOS_Ik3_dss_main_0_dss_inst0_vbusm_dma_slv_linkgrp_1_grp_map1 N/A N/A

    The Group Map Register defines the final orderid for the initiator Ik3_dss_main_0.dss_inst0_vbusm_dma for group slv_linkgrp_1.

    31:28 ORDERID7

    Order ID signal for 7

    27:24 ORDERID6

    Order ID signal for 6

    23:20 ORDERID5

    Order ID signal for 5

    19:16 ORDERID4

    Order ID signal for 4

    15:12 ORDERID3

    Order ID signal for 3

    11:8 ORDERID2

    Order ID signal for 2

    7:4 ORDERID1

    Order ID signal for 1

    3:0 ORDERID0

    Order ID signal for 0

    0x45DC2004

    CBASS_AC_NONSAFE_QOS_Ik3_dss_main_0_dss_inst0_vbusm_dma_slv_linkgrp_1_grp_map2 N/A N/A

    The Group Map Register defines the final orderid for the initiator Ik3_dss_main_0.dss_inst0_vbusm_dma for group slv_linkgrp_1.

    31:28 ORDERID15

    Order ID signal for 15

    27:24 ORDERID14

    Order ID signal for 14

    23:20 ORDERID13

    Order ID signal for 13

    19:16 ORDERID12

    Order ID signal for 12

    15:12 ORDERID11

    Order ID signal for 11

    11:8 ORDERID10

    Order ID signal for 10

    7:4 ORDERID9

    Order ID signal for 9

    3:0 ORDERID8

    Order ID signal for 8

    0x45DC2100 + x * 4;

    where x = 0 to 9

    CBASS_AC_NONSAFE_QOS_Ik3_dss_main_0_dss_inst0_vbusm_dma_mapx 7:4 ORDERID

    Order ID signal for channel x. Selects route for load balancing (0-7 uses one route, 8-15 another).

    Also used by DDR4/LPDDR4 re-ordering to maximize throughput. Order of transactions is only guaranteed with the same order ID

    0x45DC2404

    CBASS_AC_NONSAFE_QOS_Ik3_dss_main_0_dss_inst0_vbusm_fbdc_slv_linkgrp_1_grp_map1 N/A N/A

    The Group Map Register defines the final orderid for the initiator Ik3_dss_main_0.dss_inst0_vbusm_fbdc for group slv_linkgrp_1.

    31:28 ORDERID7

    Order ID signal for 7

    27:24 ORDERID6

    Order ID signal for 6

    23:20 ORDERID5

    Order ID signal for 5

    19:16 ORDERID4

    Order ID signal for 4

    15:12 ORDERID3

    Order ID signal for 3

    11:8 ORDERID2

    Order ID signal for 2

    7:4 ORDERID1

    Order ID signal for 1

    3:0 ORDERID0

    Order ID signal for 0

    0x45DC2408

    CBASS_AC_NONSAFE_QOS_Ik3_dss_main_0_dss_inst0_vbusm_fbdc_slv_linkgrp_1_grp_map2 N/A N/A

    The Group Map Register defines the final orderid for the initiator Ik3_dss_main_0.dss_inst0_vbusm_fbdc for group slv_linkgrp_1.

    31:28 ORDERID15

    Order ID signal for 15

    27:24 ORDERID14

    Order ID signal for 14

    23:20 ORDERID13

    Order ID signal for 13

    19:16 ORDERID12

    Order ID signal for 12

    15:12 ORDERID11

    Order ID signal for 11

    11:8 ORDERID10

    Order ID signal for 10

    7:4 ORDERID9

    Order ID signal for 9

    3:0 ORDERID8

    Order ID signal for 8

    0x45DC2500 + x * 4;

    where x = 0 to 9

    CBASS_AC_NONSAFE_QOS_Ik3_dss_main_0_dss_inst0_vbusm_fbdc_mapx 7:4 ORDERID

    Order ID signal for channel x. Selects route for load balancing (0-7 uses one route, 8-15 another).

    Also used by DDR4/LPDDR4 re-ordering to maximize throughput. Order of transactions is only guaranteed with the same order ID

    4.3.2.1.1.2. Code to set Order ID
    • arch/arm/mach-k3/r5/j784s4/j784s4_qos_uboot.c: set general QoS settings for the DSS_PIPE_VID1 and ensure the DSS' internal and external order ID mappings match:
      arch/arm/mach-k3/r5/j784s4/j784s4_qos_uboot.c
      ...
       
      struct k3_qos_data qos_data[] = {
              /* DSS_PIPE_VID1 - 2 endpoints, 2 channels */
              {
                      .reg = K3_QOS_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA, 0),
                      .val = K3_QOS_VAL(0, 15, 0, 0, 0, 0),
              },
              {
                      .reg = K3_QOS_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA, 1),
                      .val = K3_QOS_VAL(0, 15, 0, 0, 0, 0),
              },
              {
                      .reg = K3_QOS_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC, 0),
                      .val = K3_QOS_VAL(0, 15, 0, 0, 0, 0),
              },
              {
                      .reg = K3_QOS_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC, 1),
                      .val = K3_QOS_VAL(0, 15, 0, 0, 0, 0),
              },
       
      ...
       
              /* Following registers set 1:1 mapping for orderID MAP1/MAP2
               * remap registers. orderID x is remapped to orderID x again
               * This is to ensure orderID from MAP register is unchanged
               */
       
              /* K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA - 1 groups */
              {
                      .reg = K3_QOS_GROUP_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA, 0),
                      .val = K3_QOS_GROUP_DEFAULT_VAL_LOW,
              },
              {
                      .reg = K3_QOS_GROUP_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_DMA, 1),
                      .val = K3_QOS_GROUP_DEFAULT_VAL_HIGH,
              },
       
              /* K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC - 1 groups */
              {
                      .reg = K3_QOS_GROUP_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC, 0),
                      .val = K3_QOS_GROUP_DEFAULT_VAL_LOW,
              },
              {
                      .reg = K3_QOS_GROUP_REG(K3_DSS_MAIN_0_DSS_INST0_VBUSM_FBDC, 1),
                      .val = K3_QOS_GROUP_DEFAULT_VAL_HIGH,
              },
       
      ...
      • K3_QOS_REG, K3_QOS_VAL, K3_QOS_GROUP_REG, and K3_QOS_GROUP_DEFAULT_VAL_LOW/HIGH are defined in arch/arm/mach-k3/include/mach/k3-qos.h:
        arch/arm/mach-k3/include/mach/k3-qos.h
        ...
         
        /* K3_QOS_REG: Registers to configure the channel for a given endpoint */
         
        #define K3_QOS_REG(base_reg, i)     (base_reg + 0x100 + (i) * 4)
         
        #define K3_QOS_VAL(qos, orderid, asel, epriority, virtid, atype) \
            (qos        << 0  | \
             orderid    << 4  | \
             asel       << 8  | \
             epriority  << 12 | \
             virtid     << 16 | \
             atype      << 28)
         
        /*
         * K3_QOS_GROUP_REG: Registers to set 1:1 mapping for orderID MAP1/MAP2
         * remap registers.
         */
        #define K3_QOS_GROUP_REG(base_reg, i)   (base_reg + (i) * 4)
         
        #define K3_QOS_GROUP_DEFAULT_VAL_LOW    0x76543210
        #define K3_QOS_GROUP_DEFAULT_VAL_HIGH   0xfedcba98
        struct k3_qos_data {
            u32 reg;
            u32 val;
        };
         
        ...
    4.3.2.1.2. C7x
    4.3.2.1.2.1. Order ID Registers
    Compute Cluster
    Address
    Register
    Bits
    Field
    Description
    COMPUTE_CLUSTER0_C71SS0_DRU_QUEUE 0x68A08000 + formula 0 + (j * 8); where j = 0 to 6 7:4 ORDERID This configures the order ID for QUEUE0.
    COMPUTE_CLUSTER0_C71SS1 0x69A08000 + formula 0 + (j * 8); where j = 0 to 6 7:4 ORDERID This configures the order ID for QUEUE0.
    COMPUTE_CLUSTER0_C71SS2 0x6AA08000 + formula 0 + (j * 8); where j = 0 to 6 7:4 ORDERID This configures the order ID for QUEUE0.
    COMPUTE_CLUSTER0_C71SS3 0x6BA08000 + formula 0 + (j * 8); where j = 0 to 6 7:4 ORDERID This configures the order ID for QUEUE0.
    4.3.2.1.2.2. Code to Set Order ID

    The code that sets the order ID is found in the respective main.c in the vision_apps repository (platform/j784s4/rtos/c7x_4/main.c):

    platform/j784s4/rtos/c7x_4/main.c
    ...
     
    /* DRU configuration */
    uint32_t gDruQoS_Enable    = 1;
    uint32_t gQoS_DRU_Priority = 3;
    uint32_t gQoS_DRU_OrderID  = 0;
     
    void setup_dru_qos(void)
    {
       uint64_t DRU_BASE = CSL_COMPUTE_CLUSTER0_MMR_DRU7_MMR_CFG_DRU_BASE;
       volatile uint64_t* queue0CFG     = (uint64_t*)(DRU_BASE + 0x8000);
     
       if(gQoS_DRU_Priority > 7 || (gDruQoS_Enable == 0))
       {
         gQoS_DRU_Priority = 0;
       }
       if(gQoS_DRU_OrderID > 15 || (gDruQoS_Enable == 0))
       {
         gQoS_DRU_OrderID = 0;
       }
     
       uint64_t queue0CFG_VAL = 0x0;
       queue0CFG_VAL |= ((uint64_t)gQoS_DRU_OrderID)<<4;
       queue0CFG_VAL |= ((uint64_t)gQoS_DRU_Priority);
     
       *queue0CFG = queue0CFG_VAL;
    }
     
    ...

    4.3.2.2. NRT and RT Routing

    Looking at the order IDs for the C7x_4 core (0) and DSS (15) transactions, it is possible to set DSS transactions as RT while keeping the C7x_4 core transactions as NRT.

    It's simple to check whether DSS is routed to the RT thread; just read the NAVSS_NORTH_x_NBSS_NBx_MMRS_threadmap registers (using devmem2).

    Address
    Register
    Value

    0x03702010

    NAVSS_NORTH_0_NBSS_NB0_MMRS_threadmap

    0x00000002

    0x03703010

    NAVSS_NORTH_1_NBSS_NB1_MMRS_threadmap

    0x00000004

    These values mean that transactions with order ID 10-15 are mapped to the RT thread. In other words, the DSS (order ID 15) transactions should have greater priority than the C7x_4 (order ID 0) transactions at all times.

    4.3.2.2.1. NRT and RT Routing in U-Boot

    For the J784S4, the routing was added into the ti-u-boot-2025.01 branch. In previous releases, the code was added through uncommitted edits packaged within the SDK. The relevant code can be found within:

    • arch/arm/mach-k3/j784s4/j784s4_init.c: route order IDs 10-15 to the RT thread
      arch/arm/mach-k3/j784s4/j784s4_init.c
      ...
       
      /* NAVSS North Bridge (NB) */
      #define NAVSS0_NBSS_NB0_CFG_MMRS                0x03702000
      #define NAVSS0_NBSS_NB1_CFG_MMRS                0x03703000
      #define NAVSS0_NBSS_NB0_CFG_NB_THREADMAP        (NAVSS0_NBSS_NB0_CFG_MMRS + 0x10)
      #define NAVSS0_NBSS_NB1_CFG_NB_THREADMAP        (NAVSS0_NBSS_NB1_CFG_MMRS + 0x10)
      /*
       * Thread Map for North Bridge Configuration
       * Each bit is for each VBUSM source.
       * Bit[0] maps orderID 0-3 to VBUSM.C thread number
       * Bit[1] maps orderID 4-9 to VBUSM.C thread number
       * Bit[2] maps orderID 10-15 to VBUSM.C thread number
       * When bit has value 0: VBUSM.C thread 0 (non-real time traffic)
       * When bit has value 1: VBUSM.C thread 2 (real time traffic)
       */
      #define NB_THREADMAP_BIT0                               BIT(0)
      #define NB_THREADMAP_BIT1                               BIT(1)
      #define NB_THREADMAP_BIT2                               BIT(2)
       
      ...
       
      /* Setup North Bridge registers to map ORDERID 10-15 to RT traffic */
      static void setup_navss_nb(void)
      {
              writel(NB_THREADMAP_BIT1, (uintptr_t)NAVSS0_NBSS_NB0_CFG_NB_THREADMAP);
              writel(NB_THREADMAP_BIT2, (uintptr_t)NAVSS0_NBSS_NB1_CFG_NB_THREADMAP);
      }
       
      ...

    4.3.2.3. Priority

    The priority for the separate IPs are set within their respective drivers.

    4.3.2.3.1. DSS
    4.3.2.3.1.1. Priority Registers
    Address
    Register
    Bits
    Field
    Description
    0x04A000A4 DSS_DISPC_0_COMMON_M_DSS_CBA_CFG 5:3 PRI_HI The value sent out on the PRI_HI bus from DSS to CBA Indicates the priority level for high-priority [MFLAG] transactions. Value of 0x0 indicates highest priority Value of 0x7 indicates lowest priority
    0x04A000A4 DSS_DISPC_0_COMMON_M_DSS_CBA_CFG 2:0 PRI_LO The value sent out on the PRI_LO bus from DSS to CBA Indicates the priority level for normal [non-MFLAG] transactions. Value of 0x0 indicates highest priority Value of 0x7 indicates lowest priority
    4.3.2.3.1.2. Code to Set Priority

    The code that sets the priority for the DSS is found in its Linux driver (drivers/gpu/drm/tidss/tidss_dispc.c):

    drivers/gpu/drm/tidss/tidss_dispc.c
    ...
     
            u32 cba_lo_pri = 1;
            u32 cba_hi_pri = 0;
     
            dev_dbg(dispc->dev, "%s()\n", __func__);
     
            REG_FLD_MOD(dispc, DSS_CBA_CFG, cba_lo_pri, 2, 0);
            REG_FLD_MOD(dispc, DSS_CBA_CFG, cba_hi_pri, 5, 3);
     
    ...

    This code sets the priority level of MFLAG transactions to 0 and non-MFLAG transactions to 1.

    4.3.2.3.2. C7x
    4.3.2.3.2.1. Priority Registers
    Compute Cluster
    Address
    Register
    Bits
    Field
    Description
    COMPUTE_CLUSTER0_C71SS0_DRU_QUEUE 0x68A08000 + formula 0 + (j * 8); where j = 0 to 6 2:0 PRI This configures the priority for QUEUE0. This will be the priority that will be presented on the External bus for all commands from this queue.
    COMPUTE_CLUSTER0_C71SS1 0x69A08000 + formula 0 + (j * 8); where j = 0 to 6 2:0 PRI This configures the priority for QUEUE0. This will be the priority that will be presented on the External bus for all commands from this queue.
    COMPUTE_CLUSTER0_C71SS2 0x6AA08000 + formula 0 + (j * 8); where j = 0 to 6 2:0 PRI This configures the priority for QUEUE0. This will be the priority that will be presented on the External bus for all commands from this queue.
    COMPUTE_CLUSTER0_C71SS3 0x6BA08000 + formula 0 + (j * 8); where j = 0 to 6 2:0 PRI This configures the priority for QUEUE0. This will be the priority that will be presented on the External bus for all commands from this queue.
    4.3.2.3.2.2. Code to Set Priority

    Like the order ID, the code that sets the priority level is found in the respective main.c in the vision_apps repository (platform/j784s4/rtos/c7x_4/main.c):

    platform/j784s4/rtos/c7x_4/main.c
    ...
     
    /* DRU configuration */
    uint32_t gDruQoS_Enable    = 1;
    uint32_t gQoS_DRU_Priority = 3;
    uint32_t gQoS_DRU_OrderID  = 0;
     
    void setup_dru_qos(void)
    {
       uint64_t DRU_BASE = CSL_COMPUTE_CLUSTER0_MMR_DRU7_MMR_CFG_DRU_BASE;
       volatile uint64_t* queue0CFG     = (uint64_t*)(DRU_BASE + 0x8000);
     
       if(gQoS_DRU_Priority > 7 || (gDruQoS_Enable == 0))
       {
         gQoS_DRU_Priority = 0;
       }
       if(gQoS_DRU_OrderID > 15 || (gDruQoS_Enable == 0))
       {
         gQoS_DRU_OrderID = 0;
       }
     
       uint64_t queue0CFG_VAL = 0x0;
       queue0CFG_VAL |= ((uint64_t)gQoS_DRU_OrderID)<<4;
       queue0CFG_VAL |= ((uint64_t)gQoS_DRU_Priority);
     
       *queue0CFG = queue0CFG_VAL;
    }
     
    ...

    4.3.3.  Editing CoS Mappings

    Since all of the QoS settings are correct and give preference to the DSS transactions (rather than the C7x_4 core transactions), the issue must lie within the DDR controller's priority mappings.

    4.3.3.1. CoS Mapping Registers

    The DDRSS contains a series of muxes to map VBUSM priorities to AXI priorities. The registers controlling the mappings are the following:

    • Route ID filters:
      • emif_ew_sscfg_V2A_R1_MAT_REG: allows for filtering and routing of route IDs to range 1 mappings 
      • emif_ew_sscfg_V2A_R2_MAT_REG: allows for filtering and routing of route IDs to range 2 mappings
      • emif_ew_sscfg_V2A_R3_MAT_REG: allows for filtering and routing of route IDs to range 3 mappings
    • Priority mappings:
      • LPT (low priority thread)
        • emif_ew_sscfg_V2A_LPT_DEF_PRI_MAP_REG: default VBUSM to AXI priority mappings
        • emif_ew_sscfg_V2A_LPT_R1_PRI_MAP_REG: range 1 VBUSM to AXI priority mappings
        • emif_ew_sscfg_V2A_LPT_R2_PRI_MAP_REG: range 2 VBUSM to AXI priority mappings
        • emif_ew_sscfg_V2A_LPT_R3_PRI_MAP_REG: range 3 VBUSM to AXI priority mappings
      • HPT (high priority thread)
        • emif_ew_sscfg_V2A_HPT_DEF_PRI_MAP_REG: default VBUSM to AXI priority mappings
        • emif_ew_sscfg_V2A_HPT_R1_PRI_MAP_REG: range 1 VBUSM to AXI priority mappings
        • emif_ew_sscfg_V2A_HPT_R2_PRI_MAP_REG: range 2 VBUSM to AXI priority mappings
        • emif_ew_sscfg_V2A_HPT_R3_PRI_MAP_REG: range 3 VBUSM to AXI priority mappings

    The exact register addresses and fields can be found within the TRM

    Take a hypothetical LPT transaction. If its route ID falls within a filter, say range 1, it will have its priority mapped using the emif_ew_sscfg_V2A_LPT_R1_PRI_MAP_REG mappings. If it doesn't fall within any filter, it will have its priority mapped using the emif_ew_sscfg_V2A_LPT_DEF_PRI_MAP_REG register. This muxing is represented in Figure 8-7.

    Checking CoS Mappings

    Let's reiterate the QoS settings for the DSS and C7x_4 core transactions:

    • DSS:
      • Route ID: 0xA20
      • Order ID: 0x0F
      • NRT or RT: RT
      • Priority: 0x00 or 0x01
    • C7x_4 core:
      • Route ID: 0x02D
      • Order ID: 0x00
      • NRT or RT: NRT
      • Priority: 0x03

    The following table contains the values of the CoS registers (the register groups are terms I came up to categorize the registers):

    Register Group Register Value
    Route ID filters emif_ew_sscfg_V2A_R1_MAT_REG 0x00000000
    emif_ew_sscfg_V2A_R2_MAT_REG 0x00000000
    emif_ew_sscfg_V2A_R3_MAT_REG 0x00000000



    LPT priority mappings
    emif_ew_sscfg_V2A_LPT_DEF_PRI_MAP_REG 0x00000000
    emif_ew_sscfg_V2A_LPT_R1_PRI_MAP_REG 0x23456677
    emif_ew_sscfg_V2A_LPT_R2_PRI_MAP_REG 0x23456677
    emif_ew_sscfg_V2A_LPT_R3_PRI_MAP_REG 0x23456677



    HPT priority mappings
    emif_ew_sscfg_V2A_HPT_DEF_PRI_MAP_REG 0x00000000
    emif_ew_sscfg_V2A_HPT_R1_PRI_MAP_REG 0x00112345
    emif_ew_sscfg_V2A_HPT_R2_PRI_MAP_REG 0x00112345
    emif_ew_sscfg_V2A_HPT_R3_PRI_MAP_REG 0x00112345

    None of the Route ID filters are enabled. This means that all priorities will be mapped using the default priority mappings.

    The default priority mappings for both the LPT and HPT transactions are equalized to 0. This means that all transactions have the same priority and explains the behavior that we've seen.

    Due to the priority mappings, both the DSS and C7x_4 core transactions have the same priorities within the DDR; this is how the C7x_4 core transactions are stalling the DSS transactions.

    4.4. Fixing the DSS Sync Losts

    Now that the issue has been root caused, we need to decide how to fix it.

    Having all of the priorities equalized within the DDR controller has benefits. Namely, no threads are evicted when a higher priority thread enters the queue. Introducing priorities also introduces latency to the system, but in this case, not honoring priorities results in greater detriments than benefits.

    There are a couple options to fix the sync lost issue:

    1. Remap C7x_4 core transactions

    2. Honor priority across all transactions

     Patches are written for both the ti-u-boot-2023.04 (SDK 9.2 release) and ti-u-boot-2025.01 (most recent at time of writing) branches 

    4.4.1. Remap C7x_4 Core Transactions

    The following U-Boot patches map the C7x_4 core transactions to the LPT range 1 priority mappings. This avoids changing the priorities of any other transactions.

    If you wanted to set all C7x transactions to the LPT range 1 priority mappings, you would replace 0xf02d0000 with 0x80208028

    4.4.1.1. ti-u-boot-2023.04

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/3833.0001_2D00_arm_2D00_mach_2D00_k3_2D00_j784s4_2D00_Remove_2D00_priority_2D00_equalization_2D00_for_2D00_.patch

    4.4.1.2. ti-u-boot-2025.01

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/7658.0001_2D00_arm_2D00_mach_2D00_k3_2D00_j784s4_2D00_Remove_2D00_priority_2D00_equalization_2D00_for_2D00_.patch

    4.4.2. Honor All Priorities

    The following U-Boot patches change the default LPT and HPT priority mappings to the same as the range 1-3 mappings. This is also what is default for the J721E.

    4.4.2.1. ti-u-boot-2023.04

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/5734.0001_2D00_arm_2D00_mach_2D00_k3_2D00_j784s4_2D00_Remove_2D00_priority_2D00_equalization_2D00_and_2D00_.patch

    4.4.2.2. ti-u-boot-2025.01

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/1362.0001_2D00_arm_2D00_mach_2D00_k3_2D00_j784s4_2D00_Remove_2D00_priority_2D00_equalization_2D00_and_2D00_.patch