This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] TDA4VM: Are there any known bugs and patches that I should use in my GPU driver?

Part Number: TDA4VM

I've encountered a GPU error. Are there known issues with my GPU Driver version that I should take to try and fix these issues?

  • THIS GUIDE IS NOW OBSOLETE. THE BUGS DESCRIBED HERE ARE NOW FIXED IN THIS FAQ FOR SDK 8.6 AND OLDER: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1316731/faq-tda4vl-q1-what-are-the-gpu-driver-bug-fixes-for-sdk-8-6-or-earlier


    The GPU driver for the TDA4x and AM6x devices is available in the Linux SDK release. Throughout the life cycle of the releases, various GPU driver bugs or system-level issues are identified that need to be addressed if you do not move to the latest SDK release. There are multiple reasons to stay on an SDK release, like needing to freeze for production or external software dependencies.

    For GPU driver-related topics, I will keep this page updated with the latest patches for known issues and their description. Since the majority of questions arise from SDK releases post SDK 7.0, the earliest GPU driver version mentioned here will be 1.13.

    For more information on the GPU driver basics, please see this FAQ on GPU 101: <Link to FAQ>

    Critical Bug - GPU Cache Coherency Issue

    Affected Versions: 1.15 or earlier

    A known bug in the GPU driver for earlier versions is that it has a difference cache policy view than the Linux kernel. It affects devices with a MSMC (L3 Cache) and a Snoop Filter. In some cases, the GPU driver may allocate memory from the kernel with one type of policy but use it in different manners conflicting with the original allocation. The result is random panics or other strange interactions with the kernel. The following command can be used to check if the patches were applied correctly, it does not pass when the patches are not in place:

    rgx_kicksync_test -ver -nc 16 -loop 100 -n 10000 -r -seed 81576

    To mitigate this issue, the below two patches were issued. Both affect the memory cache policy at different levels of the hardware/software stack, and one is more effective than the other. It is also required that the L3 cache is disabled as it conflicts with the patches.

    GPU Kernel Driver Patch – Covers the majority of transactions, but not all

    In SDK 8.5, we introduced a kernel driver patch which covers the majority of GPU memory allocations. The patch is titled “HACK: server: Make CCB allocations incoherent”. It can be found on both SDK 8.5 and SDK 8.6. While this patch fixes the issue for most cases, there are situations where the bug can occur depending on the application implementation. If you are facing issues on GPU driver versions 1.15 or earlier, please revert this patch and use the following patch which covers all cases.

    GPU QOS Register Patch - Covers all GPU transactions

    The SoC contains QOS registers that affect a variety of attributes for GPU transactions in the system. One particular register is the atype register, which configures the “type” of the GPU transaction. The details are outside of the scope of this document, but the result of setting the atype register to “3” is that the transactions will bypass the Snoop Filter and avoid the issue. The QoS register settings need to be applied before the GPU driver is loaded in Linux. There are two alternatives:

    1) Apply the register settings in u-boot. This guarantees the settings will be applied before the Linux kernel boots which is before the GPU driver is loaded.

    2) Apply the register settings from the Linux command line. This is possible, but requires the GPU driver to not load, aka blacklisted so that it does not auto-start or auto-load.

    Below are instructions on how to implement both options.

    U-boot Patch:

    Apply the following patch to u-boot.

    J721E:

    0001-HACK-j721e-QoS-workaround-for-GPU-cache-incoherency.patch

    J721S2:

    3056.0001-HACK-j721s2-QoS-workaround-for-GPU-cache-incoherency.patch

    To verify the patch has applied successfully, read-back the configuration at any of the modified registers, you should see that the 0x30000000 has been written into the registers.

    Linux Command Line Patch:

    This workaround will require that the GPU driver be blacklisted so that it is not auto-loaded. Please follow the instructions in this FAQ on how to blacklist the GPU driver and load it manually: https://e2e.ti.com/f/791/t/1218307

    Once you have reached the Linux Kernel console and log-in, please run the following script for the respective devices:

    J721E

    #!/bin/bash
    
    devmem2 0x45dc5100 w 0x30000000
    devmem2 0x45dc5104 w 0x30000000
    devmem2 0x45dc5108 w 0x30000000
    devmem2 0x45dc510C w 0x30000000
    devmem2 0x45dc5110 w 0x30000000
    devmem2 0x45dc5114 w 0x30000000
    devmem2 0x45dc5118 w 0x30000000
    devmem2 0x45dc511C w 0x30000000

    J721S2

    #!/usr/bin/env bash
    
    loops=160
    
    BASE_ADDR_READ=0x45DC5100
    BASE_ADDR_WRITE=0x45DC5900
    
    for (( i=0 ; i<$loops ; i++ ));
    do
    	addr=$(($BASE_ADDR_READ + $i * 4))
    	command='devmem2 ${addr} w 0x30000000'
    	eval ${command}
    
    	addr=$(($BASE_ADDR_WRITE + $i * 4))
    	command='devmem2 ${addr} w 0x30000000'
    	eval ${command}
    done

    Other known bugs

    There are other known bugs on these DDK versions. Below are updated GPU libraries that you can use for the respective GPU driver you are using (1.13 or 1.15). Please replace the GPU libraries in your filesystem with the ones below to apply the patch.

    GPU Driver 1.15

    Contains 3 patches:

    1. Fix_use_of_incorrect_3D_buffers
    2. Fix_TA_kick_occasionally_being_missed
    3. Reduce_tiles_in_flight_to_3

    J721E - ti-img-1.15-patched-umlibs-j721e.tar.xz

    J721S2 - ti-img-1.15-patched-umlibs-j721s2.tar.xz