This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: TDA4VM video abnormality issue

Part Number: TDA4VM

hi,ti team,

On the TDA4VM T5 machine, there was an video abnormality issue.

The pvr debug log is attached.

We are using the SDK: 8.00(tivx_queue.c and dss_m2mApi.c bugs fixed).

Can you help investigate what the problem is? 

Best Regards,

xftupvr.zip

  • Hello,

    Thank you for the information. It looks like this is a follow on to the previous work done. Can we get a summary in place to see what state of the GPU driver we are seeing here?

    1) Are you enabling the PHR (Periodic Hardware Recovery)?

    2) Do you see any abnormal GPU hardware recovery log when you see this corruption?

    3) How frequent is this video abnormality?

    4) How many different boards/SoC show this behavior?

    Regards,

    Erick

  • Hello, Erick,

    The answers to these questions are as follows:

    1) Are you enabling the PHR (Periodic Hardware Recovery)?

    ----Yes.

    2) Do you see any abnormal GPU hardware recovery log when you see this corruption?

    ----I did not seen abnormal GPU hardware recovery log.

    3) How frequent is this video abnormality?

    ----The board with a high frequency of problems has been used for approximately 500 hours and has encountered problems twice.

    4) How many different boards/SoC show this behavior?

    ----There are two.

    Best Regards,

    xftu

  • Hello,

    At first this does not seem like a GPU issue, usually we experience a GPU hardware recovery when there are issues with the GPU. Can you please confirm that this is related to the GPU by analyzing the input frame and output frame from the GPU when the error occurs? Has this analysis already been done?

    3) How frequent is this video abnormality?

    ----The board with a high frequency of problems has been used for approximately 500 hours and has encountered problems twice.

    4) How many different boards/SoC show this behavior?

    ----There are two.

    Two boards out of how many? Are you testing in production many more boards?

    Regards,

    Erick

  • Hello,

    When the problem occurs,the “HWR Event Count”  reached 28486 times,does it mean the boards experienced GPU hardware recovery?  I have analyzed the input and output frame of the GPU, the input frame is normal and output frame from the GPU is abnormal, so I confirm that the issue may be related to the GPU.

    GPU debug log:2210.pvr.zip

    Approximately 100 boards.  We are testing in production many more boards.

    Best Regards,

    xftu

  • Xftu,

    The HWR event counter measures the number of times HWRs have happened. Since you are periodically issuing the HWR through the PHR, we would expect this counter to continue increasing.

    To summarize, there are 2/100 boards showing this issue, with a reproducibility rate of twice in 500 hours of testing on the board with high frequency.

    I'll need to check with IMG on how this issue can be debugged while the PHR is enabled. I'm sending them this information.

    Regards,

    Erick

  • Hello, Erick,

    Is there a response from IMG?

    Best Regards,

    xftu

  • xftu,

    We were gathering data from our previous experiments and your current setup to see the next steps. My first suggestion is to take the latest GPU drivers which we has all the patches included for bugs that we have found until now. It also contains a critical bug fix in the GPU kernel driver which fixes the cache coherency bug that required the QoS register workaround.

    Would you be open to upgrading to GPU 1.15 driver, or will you require sticking with 1.13 driver?

    If you would be willing to upgrade the driver to the 1.15 version, I have posted instructions here: e2e.ti.com/.../faq-tda4vl-q1-what-are-the-gpu-driver-bug-fixes-for-sdk-8-6-or-earlier

    The other option will be to update the 1.13 with the patches required. The investigation will be tedious due to the reproducibility rate you have mentioned. In the past, I understood that some issues showed much more frequently for you in the 1.15 driver. If the reproducibility is 500 hours, we won't be able to get many logs for a timely debug.

    How would you like to proceed?

    Regards,

    Erick

  • Hello, Erick,

    Considering that we have previously attempted to use version 1.15 of the GPU driver on our platform, but it did not go smoothly, we would like to update version 1.13 with the required patch.

    Best Regards,

    xftu

  • xftu,

    OK, the first step in this activity will be to update your 1.13 GPU Kernel driver by adding this patch: CL6529840_Enable_cached_mappings_in_KM_on_ARM64_for_DDK_1.13.patch

    You will also need to revert the QoS settings workaround, this patch will address the bug.

    Can you please do this first, and replicate the issue?

    Regards,

    Erick

  • Hello, Erick,

    OK, I will follow these two guidelines to update the software and then replicate the issue.

    Best Regards,

    xftu

  • xftu,

    Ok, thank you. Did the patch apply correctly? Or were there issues?

    Thanks,

    Erick

  • Hello, Erick,

    We attempted to apply the patch you provided using the patch command but failed with the following error message:

    So we manually made modifications based on the patch content you provided, and the code modification are shown in the "enable_cached_map_240126. patch" file(The file has been compressed):enable_cached_map_240126.zip.

    Can you check if this modification is correct? If it is not correct, we need your help to provide another patch.

    Best Regards,

    xftu

  • Hi xftu,

    Yes, this application of the patch looks good, the variable name PhysMemValidateParams changed to _ValidateParams, otherwise the patch is the same. Please let me know how the testing goes.

    Regards,

    Erick

  • Hello, Erick,

    After patching, the issue still occurred after about a week of testing. At the same time, we are constantly iterating the app code, and the problem cannot be reproduced with the new version of the app software. So we will focus on investigating the code of the APP next. Thank you.

    Best Regards,

    xftu