This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM335x garbled LCD and SGX messages

Other Parts Discussed in Thread: AM3354, DA8XX, STRIKE

I am working on a board which uses a SOM with an AM3354. There is a LCD touchscreen which is used with a QT app, there is no X server.  I am seeing a behavior where apparently randomly, the LCD screen gets totally garbled and goes through a series of apparently random data being displayed.

After a few minutes of going through this, the system recovers and continues to function normally. Whenever this happens, I see a lot of SGX messages being dumped to the kernel log:

[ 2624.874797] PVR_K: User requested SGX debug info
[ 2624.879696] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 1.10@2359475)
[ 2624.887071] PVR_K: Flip Command Complete Data 0 for display device 11:
[ 2624.893958] PVR_K: SRC 0: ROC DevVAddr:0xD8010DC ROP:0x53df ROC:0x53dc, WOC DevVAddr:0xD8010CC WOP:0x29f0 WOC:0x29ef
[ 2624.905153] PVR_K: SRC 1: ROC DevVAddr:0xD8010B4 ROP:0x53de ROC:0x53db, WOC DevVAddr:0xD8010A4 WOP:0x29ef WOC:0x29ee
[ 2624.916345] PVR_K: Host Ctl flags= 00000006
[ 2624.920840] PVR_K: SGX Host control:
[ 2624.924609] PVR_K: (HC-0) 0x00000001 0x00000000 0x00000000 0x00000001
[ 2624.931502] PVR_K: (HC-10) 0x00000000 0x00000000 0x0000000A 0x00030D40
[ 2624.938485] PVR_K: (HC-20) 0x00000000 0x00000004 0x00000000 0x00000000
[ 2624.945468] PVR_K: (HC-30) 0x00000000 0x00279E4B 0xEFD0A0E0 0x00000000
[ 2624.952437] PVR_K: (HC-40) 0x00000000 0x00000000 0x0006131B 0x00000000
[ 2624.959412] PVR_K: SGX TA/3D control:
[ 2624.963273] PVR_K: (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000
[ 2624.970253] PVR_K: (T3C-10) 0x00000000 0x00000000 0x00000002 0x00000000
[ 2624.977325] PVR_K: (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2624.984386] PVR_K: (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2624.991457] PVR_K: (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2624.998528] PVR_K: (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.005601] PVR_K: (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.012661] PVR_K: (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.019734] PVR_K: (T3C-80) 0x00000000 0x00000000 0x0F0AD518 0x0F000000
[ 2625.026809] PVR_K: (T3C-90) 0x9C2F3000 0x0F095DC0 0x00000000 0x0F08B920
[ 2625.033870] PVR_K: (T3C-A0) 0x0F00AEA0 0x0F0AD55C 0x0F08B920 0x00000000
[ 2625.040941] PVR_K: (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.048015] PVR_K: (T3C-C0) 0x00000000 0x00000000 0x000053DD 0x000053DC
[ 2625.055092] PVR_K: (T3C-D0) 0x0F000000 0x8000B000 0x8004B000 0x0F004000
[ 2625.062153] PVR_K: (T3C-E0) 0x0F00A420 0x0F00A740 0x0F08B000 0x0F08B000
[ 2625.069228] PVR_K: (T3C-F0) 0x00000000 0x000001FD 0x000001FD 0x00000000
[ 2625.076302] PVR_K: (T3C-100) 0x00000003 0x00000000 0x00000000 0x00000001
[ 2625.083454] PVR_K: (T3C-110) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.090613] PVR_K: SGX Kernel CCB WO:0x94 RO:0x94
[ 2683.954773] PVR_K: User requested SGX debug info
[ 2683.959687] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 1.10@2359475)
[ 2683.967052] PVR_K: Flip Command Complete Data 0 for display device 11:
[ 2683.973942] PVR_K: SRC 0: ROC DevVAddr:0xD8010DC ROP:0x53f3 ROC:0x53f0, WOC DevVAddr:0xD8010CC WOP:0x29fa WOC:0x29f9
[ 2683.985141] PVR_K: SRC 1: ROC DevVAddr:0xD8010B4 ROP:0x53f2 ROC:0x53ef, WOC DevVAddr:0xD8010A4 WOP:0x29fa WOC:0x29f8
[ 2683.996339] PVR_K: Host Ctl flags= 00000006
[ 2684.000836] PVR_K: SGX Host control:
[ 2684.004606] PVR_K: (HC-0) 0x00000001 0x00000000 0x00000000 0x00000001
[ 2684.011501] PVR_K: (HC-10) 0x00000000 0x00000000 0x0000000A 0x00030D40
[ 2684.018484] PVR_K: (HC-20) 0x00000000 0x00000004 0x00000000 0x00000000
[ 2684.025468] PVR_K: (HC-30) 0x00000000 0x00288513 0xEFD0A0E0 0x00000000
[ 2684.032439] PVR_K: (HC-40) 0x00000000 0x00000000 0x000614E1 0x00000000
[ 2684.039416] PVR_K: SGX TA/3D control:
[ 2684.043278] PVR_K: (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000
[ 2684.050264] PVR_K: (T3C-10) 0x00000000 0x00000000 0x00000002 0x00000000
[ 2684.057340] PVR_K: (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.064402] PVR_K: (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.071479] PVR_K: (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.078555] PVR_K: (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.085629] PVR_K: (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.092690] PVR_K: (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.099765] PVR_K: (T3C-80) 0x00000000 0x00000000 0x0F0AD55C 0x0F000000
[ 2684.106841] PVR_K: (T3C-90) 0x9C2F3000 0x0F094A00 0x00000000 0x0F08B4C0
[ 2684.113902] PVR_K: (T3C-A0) 0x0F00AEA0 0x0F0AD55C 0x0F08B920 0x00000000
[ 2684.120977] PVR_K: (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.128053] PVR_K: (T3C-C0) 0x00000000 0x00000000 0x000053F2 0x000053F0
[ 2684.135130] PVR_K: (T3C-D0) 0x0F000000 0x8000B000 0x8004B000 0x0F004000
[ 2684.142193] PVR_K: (T3C-E0) 0x0F00A420 0x0F00A740 0x0F08B000 0x0F08B000
[ 2684.149268] PVR_K: (T3C-F0) 0x00000000 0x000001FD 0x000001FD 0x00000000
[ 2684.156342] PVR_K: (T3C-100) 0x00000003 0x00000000 0x00000000 0x00000000
[ 2684.163496] PVR_K: (T3C-110) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.170658] PVR_K: SGX Kernel CCB WO:0x3B RO:0x3B

 I have tried to reproduce this behavior exhaustively but I simply cannot reproduce it at will. It seems to be completely non-deterministic. I would like to understand those messages that are spit out by the kernel, as may be that will help me debug this.

Thank you.

  • Hi,

    Can you please post which SDK version do you use?

    Have you tried configuring more vram, see: e2e.ti.com/.../178539

    Best Regards,
    Yordan
  • Graphics SDK version is 5.01.01.02.
    Kernel is from TI SDK 8.0 (3.14.26).

    I don't have a VRAM-related option set in the kernel configuration! Also no vram in kernel args. The screen size is 480x272.
  • Hi, I'm taking over for Bruno.

    Recapitulating:

    - we are using kernel 3.14

    - we are using TI SDK 5.01.01.02

    - we are using Qt 5.4 (QML, with eglfs backend) and WindowSystem=libpvrPVR2D_FLIPWSEGL.so.1 in /etc/powervr.ini

    We are still seeing this issue, and I got very hopeful seeing that our sgx_flip_test in the way described here: https://e2e.ti.com/support/arm/sitara_arm/f/791/t/438583 .

    However, even with the FBDEV numbers bumped to 4, which takes the buffer size from

    # fbset -i
    
    mode "480x272"
        geometry 480 272 480 816 32
        timings 0 0 0 0 0 0 0
        accel true
        rgba 8/16,8/8,8/0,8/24
    endmode
    
    Frame buffer device information:
        Name        :
        Address     : 0x9d900000
        Size        : 1566720
        Type        : PACKED PIXELS
        Visual      : TRUECOLOR
        XPanStep    : 1
        YPanStep    : 1
        YWrapStep   : 0
        LineLength  : 1920
        Accelerator : No

    to

    # fbset -i
    
    mode "480x272"
        geometry 480 272 480 848 32
        timings 0 0 0 0 0 0 0
        accel true
        rgba 8/16,8/8,8/0,8/24
    endmode
    
    Frame buffer device information:
        Name        :
        Address     : 0x9d900000
        Size        : 2088960
        Type        : PACKED PIXELS
        Visual      : TRUECOLOR
        XPanStep    : 1
        YPanStep    : 1
        YWrapStep   : 0
        LineLength  : 1920
        Accelerator : No

    and makes sgx_flip_test run successfully, we still saw the crash, with the following kernel log:

    [ 1904.956316] ------------[ cut here ]------------
    [ 1904.961308] WARNING: CPU: 0 PID: 12404 at drivers/gpu/drm/drm_flip_work.c:75 flip_worker+0x54/0xb0()
    [ 1904.970941] Modules linked in: g_gilchrist(O) libcomposite configfs bess_pwrbtn(O) ti_am335x_adc st_accel_i2c st_sensors_i2c st_accel st_sensors industrialio_triggered_buffer kfifo_buf rtkb(O) wl18xx(O) wlcore(O) w1_ds2431 ds2482 mac80211(O) omaplfb(O) musb_dsps cfg80211(O) musb_hdrc bufferclass_ti(O) pvrsrvkm(O) musb_am335x tmp006 industrialio wlcore_sdio(O) compat(O) pixcir_i2c_ts lm75 [last unloaded: configfs]
    [ 1905.009716] CPU: 0 PID: 12404 Comm: kworker/u2:3 Tainted: G           O 3.14.26 #1
    [ 1905.017708] Workqueue: tilcdc flip_worker
    [ 1905.021956] Backtrace: 
    [ 1905.024562] [<c0012538>] (dump_backtrace) from [<c00127ac>] (show_stack+0x20/0x24)
    [ 1905.032551]  r6:0000004b r5:00000009 r4:00000000 r3:c08490a8
    [ 1905.038544] [<c001278c>] (show_stack) from [<c058fc24>] (dump_stack+0x24/0x28)
    [ 1905.046188] [<c058fc00>] (dump_stack) from [<c0039378>] (warn_slowpath_common+0x80/0x98)
    [ 1905.054731] [<c00392f8>] (warn_slowpath_common) from [<c003944c>] (warn_slowpath_null+0x2c/0x34)
    [ 1905.063997]  r8:00000000 r7:dd3b4f38 r6:00000003 r5:00000002 r4:dd3b4f48
    [ 1905.071119] [<c0039420>] (warn_slowpath_null) from [<c03689f0>] (flip_worker+0x54/0xb0)
    [ 1905.079568] [<c036899c>] (flip_worker) from [<c00504c4>] (process_one_work+0x128/0x3c0)
    [ 1905.088025]  r7:dd074b00 r6:dd070a00 r5:d2353700 r4:dd3b4f48
    [ 1905.094028] [<c005039c>] (process_one_work) from [<c0051304>] (worker_thread+0x124/0x3b8)
    [ 1905.102656]  r10:dd070a00 r9:c088499f r8:00000001 r7:d2508000 r6:d2353718 r5:dd070a10
    [ 1905.110939]  r4:d2353700
    [ 1905.113622] [<c00511e0>] (worker_thread) from [<c0056f1c>] (kthread+0xdc/0xf4)
    [ 1905.121241]  r10:00000000 r9:00000000 r8:00000000 r7:c00511e0 r6:d2353700 r5:00000000
    [ 1905.129509]  r4:d236adc0
    [ 1905.132198] [<c0056e40>] (kthread) from [<c000ea58>] (ret_from_fork+0x14/0x20)
    [ 1905.139813]  r7:00000000 r6:00000000 r5:c0056e40 r4:d236adc0
    [ 1905.145789] ---[ end trace c0853e06cc171a24 ]---
    [ 1942.074324] wlan0: authenticate with 24:a4:3c:59:e1:60

    I tried playing around with the vram and omapfb.vram option, but it doesn't seem to have any effect I can see in linux itself (how do I verify the vram size?). 

    Our suspicion is that the FLIP buffer overlays kernel memory, but are unsure about how to track it down. The bug happens very rarely (once a day at most on a single machine).

    Switching back to FRONTWSEGL is not easy because vsync doesn't work, and the tearing and uncapped FPS is a problem.

  • Hi Manuel,

    I see that you use latest Graphics SDK.

    Is it possible to test with the latest kernel as well (from SDK01.00.00.03 -> kernel 3.14.43)?

    Best Regards,
    Yordan

  • Hi,

    Currently I don't have many ideas on how to debug this, since as far as I understand you use custom board & the kernel panic is random...

    Have you noticed any pvr issues in the boot log? 

    Is it be possible to provide steps to try and recreate this on one of my boards? I can use either AM335x Stareter Kit or AM335x EVM, which of the two will be best to test on?

    Best Regards,
    Yordan

  • There are no pvr issues in the boot log, everything starts up fine.

    We also are a bit of a loss on how to reproduce, we are currently set up a farm of 8-16 devices with a camera pointed to be able to A/B test with. We have a build with 3.14.43 that we are currently testing. We still have the issue that sgx_flip_test doesn't work out of the box (it does when we bump DRM_NUM_FBDEV_BUFFERS to 4 like described here: https://e2e.ti.com/support/arm/sitara_arm/f/791/t/438583 , but the graphical glitches still happen).

    A colleague wrote a tool that displays the content of fb0 (I can send you the dumps in a private email as the content is not public for now). fbset -i reports enough for 3 buffers, however due to alignment the fb0 seems to contain only 2. (please note the the whitespace is part of the image)

    When running with BLITWSEGL, the content is the following:

    My suspicion is that FLIPWSEGL doesn't realize that the allocated framebuffer memory (by drm_fb_cma_helper.c ) is not enough to fit 3 aligned buffers, and writes over the boundaries, leading to the spurious issues.

    Do you know how FLIPWSEGL communicates the memory region to the GPU, and if the GPU could access non-cma-ed memory?

    When bumping DRM_NUM_FBDEV_BUFFERS to 4, sgx_flip_test works, and when running the UI with FLIPWSEGL, /dev/fb0 contains 3 copies of the display. I suspect that in this case, FLIPWSEGL thinks it has room for 4 copies of the buffer, and again writes over the allocated boundaries.

    This is turning into a critical problem for us, more testing will be done in the following days.

  • Here is a video of the glitches happening. Usually the UI comes back to normal, and the function of the machine is not affected, but sometimes the kernel starts outputting a lot of messages and the system bogs down to a halt. If you look closely, you can see the "real UI" flash by on certain frames (at 0:09 for example).

  • Manuel Odendahl32 said:
    Do you know how FLIPWSEGL communicates the memory region to the GPU, and if the GPU could access non-cma-ed memory?

    The GPU can access any memory, so it has to rely on receiving proper buffer allocation information.

    But... it would be quite easy to configure the L3 firewall of EMIF to allow GPU writes only within a specific contiguous memory region. Then, if it oversteps its bounds you will immediately get an error in the firewall log (and presumably the driver will complain as well).

    Unfortunately TI didn't bother documenting the interconnect on the AM335x, but I do have the necessary info in my notes somewhere. I will look them up for you...

  • Matthijs van Duin said:
    Unfortunately TI didn't bother documenting the interconnect on the AM335x, but I do have the necessary info in my notes somewhere. I will look them up for you...

    Attached is a header file describing the L3 firewall. firewall.h

    For this test you'd do something like:

    #define FW_EMIF 0x47c0c000
    
    // 6-bit initiator ids (used by L3 interconnect)
    #define IID_SGX 0x20
    
    // 4-bit initiator ids (used by L3 firewalls)
    #define IID4_SGX (IID_SGX >> 2)
    
    
    int fd = open( "/dev/mem", O_RDWR | O_DSYNC );
    volatile Firewall *fw = mmap( (void *)FW_EMIF, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, FW_EMIF );
    
    // permit everything
    fw->region[1].permissions = 0xffffffff;
    fw->region[1].initiators = 0xffffffff;
    // or, if nobody but SGX is supposed to be writing to this region:
    //fw->region[1].initiators = 0x55555555 | 2 << (2 * IID4_SGX);
    
    fw->region[1].start = CMA_START_ADDR & 0x7ffff000;
    fw->region[1].end = (CMA_START_ADDR + CMA_SIZE - 1) & 0x7ffff000 | 1;
    
    // region 1 now grants SGX write access to CMA region
    // revoke its write access elsewhere
    fw->region[0].initiators &= ~( 2 << (2 * IID4_SGX) );

    and then harvest errors from fw->error[0]. Note that the addresses are in EMIF's local address space, which means that physical RAM starts at 0 rather than 0x80000000, which is why I'm masking off the top bit. (I hope I remember this part correctly, otherwise try it without masking off the top bit.)

    Once an error is logged, you need to clear it before another error can be logged.

  • Hi Bruno,

    Find below interpretaion of the kernel log that you shared. In summary, it appears that some other system level instability is impacting SGX at the time of garbled data display.

    << [ 2624.874797] PVR_K: User requested SGX debug info

    This is a timeout info. SGX is waiting for some event to happen (ex - vsync or some other interrupts) and that hasn't happened within the expected period of time causing above print.

    <<[ 2624.893958] PVR_K: SRC 0: ROC DevVAddr:0xD8010DC ROP:0x53df ROC:0x53dc, WOC DevVAddr:0xD8010CC WOP:0x29f0 WOC:0x29ef
    << [ 2624.905153] PVR_K: SRC 1: ROC DevVAddr:0xD8010B4 ROP:0x53de ROC:0x53db, WOC DevVAddr:0xD8010A4 WOP:0x29ef WOC:0x29ee 

    This an output of the sequence numbers for Read and Write Operations - Pending and Complete. Each surface has one write operation pending (e.g. a 3D render) and three pending read operations (e.g. a flip). The SGX drivers cannot submit any more data until these operations happen and they release the buffers. A bit later, we see that the operation numbers have progressed: the device isn't stuck

    <<[ 2683.973942] PVR_K: SRC 0: ROC DevVAddr:0xD8010DC ROP:0x53f3 ROC:0x53f0, WOC DevVAddr:0xD8010CC WOP:0x29fa WOC:0x29f9

    << [ 2683.985141] PVR_K: SRC 1: ROC DevVAddr:0xD8010B4 ROP:0x53f2 ROC:0x53ef, WOC DevVAddr:0xD8010A4 WOP:0x29fa WOC:0x29f8

    It seems operations are being queued faster than they can be consumed (displayed), possibly because the display has stopped or some other delays in the system.

    For the flickering, it appears that wrong buffers are being displayed. Not old buffers (with the contents of previous frames) but totally wrong, initialised buffers. So somehow it seems the display is showing a different swap chain than expected. If the SGX is still running, it's rendering somewhere it can - if buffers had been freed then the SGX would page-fault.

    Couple of debugging steps -

    1. When bad things happen, dump the content of the buffers and verify that SGX is still drawing good buffers.

    2. Run the application in FRONT mode. Understanding this won't display perfect video but if the application still crashes with this mode, likely the problem is at application layer.

     

    Regards,

    Manisha

     

     

     

     

  •   We have been verifying that the framebuffers at least on the host side are good. However, as referenced here, we only seem to have double buffering enabled, not triple buffering. Bumping the number makes 3 buffers show up in /dev/fb0.

    This however leads us to our discovery this weekend. We set up a little crisis room with 4 machines and a camera pointed at them, and wrote some image recording / image recognition software to track when this is happening. We have 2 programs exercising the system, which helps make the issue appear: a loop that copies files on the filesystem, and a copy of our software doing mostly USB transfers. We can get a "snowcrash" (that's what we called the issue) happen every 35 minutes on average (click on the image it's actually an animated gif).

     Here is an example of such a timeline. Using this, we were able to identify one kernel message that correlated fairly often with the issue happening.

    [11648.420206] alloc_contig_range test_pages_isolated(9d850, 9d858) failed
    [11897.292976] alloc_contig_range test_pages_isolated(9d850, 9d858) failed
    [12857.577992] PVR_K: User requested SGX debug info
    [12857.582933] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 1.10@2359475)
    [12857.590320] PVR_K: Flip Command Complete Data 0 for display device 11:
    [12857.597210] PVR_K:     SRC 0: ROC DevVAddr:0xD8010B4 ROP:0x2d280 ROC:0x2d27c, WOC DevVAddr:0xD8010A4 WOP:0x16940 WOC:0x1693f
    [12857.608778] PVR_K:     SRC 1: ROC DevVAddr:0xD801104 ROP:0x2d280 ROC:0x2d27d, WOC DevVAddr:0xD8010F4 WOP:0x16940 WOC:0x1693f
    [12857.620344] PVR_K:  Host Ctl flags= 00000006
    [12857.624843] PVR_K: SGX Host control:
    [12857.628628] PVR_K:     (HC-0) 0x00000001 0x00000000 0x00000000 0x00000001
    [12857.635509] PVR_K:     (HC-10) 0x00000000 0x00000000 0x0000000A 0x00030D40
    [12857.642494] PVR_K:     (HC-20) 0x00000000 0x00000004 0x00000000 0x00000000
    [12857.649479] PVR_K:     (HC-30) 0x00000000 0x00C3FBC4 0xEEED45C0 0x00000000
    [12857.656450] PVR_K:     (HC-40) 0x00000000 0x00000000 0x002CB791 0x00000000
    [12857.663430] PVR_K: SGX TA/3D control:
    [12857.667293] PVR_K:     (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000
    [12857.674280] PVR_K:     (T3C-10) 0x00000000 0x00000000 0x00000002 0x00000000
    [12857.681358] PVR_K:     (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.688434] PVR_K:     (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.695497] PVR_K:     (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.702572] PVR_K:     (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.709647] PVR_K:     (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.716710] PVR_K:     (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.723785] PVR_K:     (T3C-80) 0x00000000 0x00000000 0x0F0AD518 0x0F000000
    [12857.730862] PVR_K:     (T3C-90) 0x9C41A000 0x0F094F00 0x00000000 0x0F08B4C0
    [12857.737940] PVR_K:     (T3C-A0) 0x0F00AEA0 0x0F0AD518 0x0F08B920 0x00000000
    [12857.745001] PVR_K:     (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.752077] PVR_K:     (T3C-C0) 0x00000000 0x00000000 0x00043BBF 0x00043BBD
    [12857.759153] PVR_K:     (T3C-D0) 0x0F000000 0x8000B000 0x8004B000 0x0F004000
    [12857.766217] PVR_K:     (T3C-E0) 0x0F00A420 0x0F00A740 0x0F08B000 0x0F08B000
    [12857.773292] PVR_K:     (T3C-F0) 0x00000000 0x00000307 0x00000307 0x00000000
    [12857.780378] PVR_K:     (T3C-100) 0x00000003 0x00000000 0x00000000 0x00000001
    [12857.787534] PVR_K:     (T3C-110) 0x00000000 0x00000000 0x00000000 0x00000000
    [12857.794775] PVR_K: SGX Kernel CCB WO:0x3D RO:0x3D

    Looking for "Flip Command Complete Data 0" I was able to find this thread: https://e2e.ti.com/support/arm/sitara_arm/f/791/t/242278 which has a video showing exactly the issue we are seeing. This is an "LCD SYNC LOST" issue, with a patch against the fbdev driver in the (then actual) kernel.

    This is the referenced patch: https://git.ti.com/bms-linux/bms-kernel/commit/c6c1033cb38d85b242178cc379fc3bd5d6e403be/diffs (which reinitializes the raster engine on SYNC_LOST)

    There are a couple of similar patches floating around: http://processors.wiki.ti.com/index.php/DA8xx_LCDC_Linux_FB_FAQs (Scroll to Flicker due to underflow error) (this one changes a check for SYNC_LOST && UNDERFLOW to SYNC_LOST || UNDERFLOW and reinitializes the raster engine in both cases).

    https://reviews.facebook.net/rUPSTREAMKERNELe4008e2240605a64d595f3b9e070bfaf5a6950ff  (this one actually enables the SYNC_LOST interrupt!).

    Now, we are using kernel 3.14 with DRM tilcdc, which seems to have been ported over from da8xx fbdev, but has none of those patches. Full of hope, we enabled the SYNC_LOST interrupt, but instead of reinitializing the raster, we decided to just log the SYNC_LOST and see if it correlates with the snowcrashes. And indeed it does, every one of the long snow crashes in this timeline have a corresponding plethora of log messages in the kernel log):

    literategoat
    [ 5403.878240] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [ 5403.895962] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [ 5403.913656] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    economicbush
    [41849.267241] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.284951] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.302661] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.320379] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.338087] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.355813] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.373521] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.391242] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.408948] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [41849.423181] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    lyricalshark
    [33991.289131] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.306844] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.324557] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.342271] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.359995] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.377707] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.395426] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.413133] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.430839] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [33991.445087] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    superbanteater
    [40458.114179] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [40458.131912] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [40458.149602] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [40458.167319] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [40458.185035] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [40458.202748] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [40458.216978] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [47706.280239] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [47706.297951] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004
    [47706.315661] tilcdc 4830e000.lcdc: tilcdc_crtc_irq: snowcrash?  stat: 00000004

    Here is our patch:

    diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
    index 1185f31..59c70bb 100644
    --- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
    +++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
    @@ -134,9 +134,11 @@ static void start(struct drm_crtc *crtc)
    
     	if (priv->rev == 2) {
     		tilcdc_set(dev, LCDC_CLK_RESET_REG, LCDC_CLK_MAIN_RESET);
    -		msleep(1);
    +        /* JMF: Switching to mdelay (from msleep) because this is called
    +         * from an interrupt context, which is still pretty bad */
    +		mdelay(1);
     		tilcdc_clear(dev, LCDC_CLK_RESET_REG, LCDC_CLK_MAIN_RESET);
    -		msleep(1);
    +		mdelay(1);
     	}
    
     	tilcdc_crtc->dma_completed_channel = 0;
    @@ -660,9 +662,14 @@ irqreturn_t tilcdc_crtc_irq(struct drm_crtc *crtc)
     	uint32_t stat = tilcdc_read_irqstatus(dev);
     	unsigned long irq_flags;
    
    +    if (stat & LCDC_SYNC_LOST)
    +    {
    +        dev_err(dev->dev, "%s: snowcrash?  stat: %08x", __func__, stat);
    +    }
    +
     	if ((stat & LCDC_SYNC_LOST) && (stat & LCDC_FIFO_UNDERFLOW)) {
     		stop(crtc);
    -		dev_err(dev->dev, "error: %08x\n", stat);
    +		dev_err(dev->dev, "%s: error: %08x\n", __func__, stat);
     		tilcdc_clear_irqstatus(dev, stat);
     		start(crtc);
     	} else if (stat & LCDC_PL_LOAD_DONE) {
    diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
    index d5179da..9ca0260 100644
    --- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c
    +++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
    @@ -377,7 +377,7 @@ static int tilcdc_irq_postinstall(struct drm_device *dev)
     			   LCDC_V2_UNDERFLOW_INT_ENA |
     			   LCDC_V2_END_OF_FRAME0_INT_ENA |
     			   LCDC_V2_END_OF_FRAME1_INT_ENA |
    -			   LCDC_FRAME_DONE);
    +			   LCDC_FRAME_DONE | LCDC_SYNC_LOST);
    
     	return 0;
     }

    We will try to handle the LCD_SYNC_LOST correctly by reinitializing the raster engine, but it seems that path is never really exercised, as our first try oopsed on msleep being called from an irq context (thus the mdelay patch above). Also, this doesn't solve the underlying actual issue of the LCD losing sync, even though we are supposedly using triple buffering.

  • Manuel Odendahl32 said:
    (this one actually enables the SYNC_LOST interrupt!).

    If you need to enable the SYNC_LOST separately that means it's occurring without a FIFO_UNDERFLOW, which is... bad

    While an underflow could be caused by temporary memory stalls, a lone sync lost means the DMA controller successfully transferred the wrong amount of data1 to the raster controller, thereby causing the raster controller to shut down. This implies DMA engine's start/end pointers of the front buffer were changed, causing DMA to either halt prematurely or fail to halt the transfer at the end of the framebuffer.
    1 or a bogus palette, not applicable here.

    Unless the end-of-buffer / vsync irq somehow got delayed by a full frame (which seems rather improbable) this means the driver isn't updating the back buffer pointers at that time, and indeed I see conditions on performing the update at that time. This is folly since if the pointers are updated at a later time, unless one tracks very carefully how much later, the presumed back buffer may instead already be the front buffer again, resulting in a lost sync.

    To avoid lost syncs without complicated logic the driver should update the pointers at exactly two sites:
    1. both buffers should point to the initial frame before the raster controller is (re)enabled.
    2. the vsync irq should update the back buffer pointers to the most recent available frame, even if that means repeating the current frame.

    Updating the back buffer pointers in response to the arrival of a more recent frame is very tricky to do safely: one would have to use the time elapsed since the last vsync to estimate whether the opportunity still exists, since no progress indication is available from the LCDC peripheral. Alternatively one could reduce the display latency by speculatively updating the back buffer pointers to point to a frame that hasn't been drawn yet, at the risk of tearing if drawing takes too much time, but I'm not really familiar with DRI and don't know whether such a thing is even possible.

  • Matthijs thanks so much for your knowledge.

    If the 32 bytes at the beginning or our 12 bits FB are not 0x4000 0...0 wouldn't that cause a LOST SYNC?

    I don't know much about display controllers / feeding an LCD controller (enough to maybe cause a minimal amount of damage), but I will read the couple of lines I have in that light. More importantly though we are kind of looking for a quick fix. If we get a single LOST SYNC, is disabling the raster, clearing the pointers and reenabling the raster enough to make us not show garbage on the display (at the cost of a lost frame maybe).

    The other alternative I see is to revert to fbdev which is maybe more robust (?)

    Regards, Manuel
  • Matthijs van Duin said:
    Unless the end-of-buffer / vsync irq somehow got delayed by a full frame (which seems rather improbable)

    Note that it's probably wise to test for it anyway: if LCDC_END_OF_FRAME0 and LCDC_END_OF_FRAME1 are both set in the irq handler, yell loudly about a lost vsync. To recover, either restart, or speculate that only one vsync was lost and the front buffer hasn't changed. If the speculation is wrong, a sync-lost will occur, but this is probably not worse than a dma restart. It looks like the current driver logic would always conclude that buffer 0 is now the front buffer, which is equivalent to flipping a coin. If both dirty bits are set it would "catch up", thereby overwriting the front buffer pointers and guaranteeing a sync-lost.

    As for restarting, the TRM says with emphasis that after stopping (clearing enabled-bit) one should wait for the frame-done irq before resetting (TRM section 13.4.6). The choice of words seems to suggest the concern lies with the L3 master interface, which means that ignoring this rule (as the driver currently does) could have consequences ranging from a confused DMA controller that won't restart properly up to a deadlock of the whole system, depending on design details. Unfortunately it isn't clear whether a frame-done irq is still generated if the raster controller is disabled after an underflow or sync lost, which complicates things...

    As you point out, waiting two miliseconds in an interrupt handler is pretty bad. Only "several cycles" of delay are prescribed after setting or clearing the reset, and I can't imagine any benefit to waiting more than a few microseconds.

  • This is slightly above my head. However, looking at the log, only the SYNC_LOST flag is set when this happens, no LCDC_END_OF_FRAME0 nor LCDC_END_OF_FRAME1.

    I am trying to understand how the start and end buffers of the DMA transfer are set. I was reading the tilcdc_drm.c driver in the main branch, but I now realized that it has been modified in the ti kernel we are using, and that the omaplfb_linux.c from the SDK drivers hooks itself into it, at which point I mostly gave up.
  • Manuel Odendahl32 said:
    If the 32 bytes at the beginning or our 12 bits FB are not 0x4000 0...0 wouldn't that cause a LOST SYNC?

    The tilcdc driver uses LCDC_PALETTE_LOAD_MODE(DATA_ONLY), i.e. palette loading is disabled. Assuming you didn't change that, no palette is present at the start of the framebuffer (and I see no reason to enable it unless you actually use palette-mapping, i.e. have 8 bits/pixel or less, and benefit from being able to change the palette on-the-fly.)

    If a palette is present, then an invalid one can also cause a sync-lost yes.

    Manuel Odendahl32 said:
    More importantly though we are kind of looking for a quick fix.

    I can imagine, but I have trouble suggesting one, especially since your problem lies with race conditions, which are always subtle. I also don't know enough about the interaction with the direct rendering manager.

    Manuel Odendahl32 said:
    is disabling the raster, clearing the pointers and reenabling the raster enough to make us not show garbage on the display

    I don't know, it depends greatly on how the raster controller and the display cope with this failure condition, i.e. will the raster controller immediately halt on sync lost or will it display the bogus data? if it halts immediately, will the display properly hold the last frame and properly show the next frame after the reset?

    Also, as my previous post indicates, properly resetting the LCDC is not entirely trivial either. I would personally lean more towards fixing the driver to prevent the sync lost in the first place, which should really be seen as a "driver is broken" irq. (Unless the cause is a lost vsync, in which case the question becomes what the hell is delaying the vsync irq by a whole frame time.)

    Manuel Odendahl32 said:
    The other alternative I see is to revert to fbdev which is maybe more robust (?)

    It's possible, I don't know. It doesn't really look more correct to me at first sight.

  • Manuel Odendahl32 said:
    However, looking at the log, only the SYNC_LOST flag is set when this happens, no LCDC_END_OF_FRAME0 nor LCDC_END_OF_FRAME1

    That would be expected even if a lost vsync is happening: the irq handler doesn't test for it, hence proceeds to (potentially) overwrite the front buffer pointers, resulting in a SYNC_LOST irq afterwards.

    To catch lost vsyncs, add an explicit test for both EOF0 and EOF1 being set at the same time and treat it as an error condition.

    Manuel Odendahl32 said:
    I am trying to understand how the start and end buffers of the DMA transfer are set. I was reading the tilcdc_drm.c driver in the main branch, but I now realized that it has been modified in the ti kernel we are using, and that the omaplfb_linux.c from the SDK drivers hooks itself into it, at which point I mostly gave up.

    Yeah so I have a similar problem: I understand the interaction with the hardware well enough, but not the interaction with the rest of the kernel.

    I think all of the relevant interaction here is happening in tilcdc_crtc.c though. For the correctness of the logic updating the buffer pointers it doesn't really matter how is being decided what they should point to, just when exactly they are being updated.

  • Hi Manisha, we did

    1., and even when the effect is happening, the content of the framebuffer is good.

    2. we ran our machines with FRONTWSEGL for 24 hours without an issue.

    We are pretty confident it is an issue with the pingpong transfer of flip buffers to the LCDC, as was pointed out by Matthijs.

    ---- steps taken ----

    We have investigated a bit in the tilcdc_drm driver, and I think I possibly have an idea where the bug might, but need more investigation. We also reverted to the old da8xx-fb.c driver from the 3.12 branch (forward ported to 3.14), and are in the process of testing it overnight. But we are not too confident as we capture something that looks like a snowcrash.

    These are the runs of our image recognition software. This is a standard 3.14 kernel with FLIPWSEGL, tilcdc-drm and SDK 5.01.01.02:

    We then modified tilcdc_drm to printk when a single LOST_SYNC interrupt was happening, we got less snowcrashes (maybe because we handled that interrupt, but without resetting the raster engine), but in any case:

    This is tilcdc_drm with FRONTWSEGL (the signals at the start and end is when we restarted the UI.

    And this is fbdev without FRONTWSEGL (again, only hits when we restarted the UI):

    And just because it looks so pretty, here is a few of those lost sync sequences:

    Hope this might help you to take further steps on your side.

  • We are definitely seeing this on 3.14, with da8xx-fb and graphics sdk 5.01.01.02 .

    I suspect that there is a race condition between the IRQ handler and the user land (which seems to schedule DMA ping pongs as well, I don't exactly understand why, for performance sake?). It seems easier to debug and trace this with da8xx-fb than with the DRM based system. 

  • Manuel Odendahl32 said:
    I suspect that there is a race condition between the IRQ handler and the user land (which seems to schedule DMA ping pongs as well)

    I think the race condition isn't so much with the irq handler but rather with the peripheral itself. The problem I think lies in this call and its sibling a few lines below, since update_scanout is called outside irq context (most importantly in tilcdc_crtc_page_flip), and set_scanout directly writes the framebuffer "base" and "ceil" registers of the DMA controller. It relies on tilcdc_crtc->dma_completed_channel (which is updated by the irq handler) to know which of the two buffers is the back buffer, but this is inherently subject to race condition: the DMA controller might finish the current buffer and flip to the other one after update_scanout has tested dma_completed_channel but before it updates the framebuffer pointers, resulting in SYNC_LOST.

    The only simple way to be sure it's safe to update one of the two sets of framebuffer pointers is by doing so only in direct response to the corresponding EOF irq, since then you know you have a full frame-time (minus irq latency).

    And actually now that you mention it there also seem to be multiple race conditions with the irq handler... for example dirty_bits doesn't seem to be protected by anything, and tilcdc_crtc_page_flip updates event without taking event_lock.

    Manuel Odendahl32 said:
    I don't exactly understand why, for performance sake?

    Yes. The safe way I mention above means that when the DMA controller completes frame N and switches to frame N+1, frame N+2 already needs to be available to the driver to program it into the DMA controller. This implies a display latency of at least a full frame, probably closer to two. Also, once frame N+2 is submitted the driver would hold on to all three until EOF is signalled for frame N, which means that drawing can't start yet in advance of that unless there are at least 4 framebuffers.

    Unfortunately there doesn't seem to be any way to tell how far along the DMA controller is in the current framebuffer, or whether it's close to finishing. As I mentioned before, this means you'd need to (conservatively) estimate this based on time elapsed since the last EOF to determine whether it's still safe to submit a frame at a later time.

    Anyway, since reliable display is important for us also I've been exploring how DRM works, and if time permits I may attempt a patch. However, since we haven't had problems yet ourselves1, I'm afraid I cannot really prioritize this, and since this is definitely a driver bug I feel the primary responsibility for fixing this lies with TI.
    1That's always the insidious thing about race conditions... for most people it works most of the time.

    So, , , time for you to wake up again. :-)

  • Matthijs, as always, thanks for the informative reply.

    Since the issue happens with fbdev as well, and da8xx-fb.c is considerably simpler to understand, that's where I have been investigating. I think the issue can be traced down to this patch:

    https://github.com/torvalds/linux/commit/deb95c6c958f5ba97b6b89ab18917bf79cb8ce7b, which got ported over to the DRM driver by Mr. Etheridge at TI. This is the patch that introduces userland writing to the dma registers. This is to ensure that userland writes to the DMA buffer that is not being transferred. I think the race condition is the following, assuming to buffers FB0, from address 0 to address 1, and FB1 from address 1 to address 2.

    1. userland filled FB1, schedules it for transfer, and writes par->dma_start = 1.

    2. At this point, an interrupt happens, and the kernel schedules a DMA transfer from par->dma_start = 1 to par->dma_end = 1 (because of race condition).

    3. The system is busy, and LCDC indeed tries to transfer from 1 to 1 and results in SYNC_LOST

    4. Control returns to userland, now it correctly update par->dma_end = 2, and writes it to the DMA registers, but it's already too late.

    As the raster engine is not reset on a simple SYNC_LOST irq, the system goes out of sync for a few minutes.

    Note that 2. in combination with 3. is exceedingly rare, and that the patch does indeed seem to fix the tearing issue.

    3. also explains why we only see the issue when we increase the kernel load by transferring lots of data over USB and doing a lot of disk IO, which puts the system at load 3-4, and makes it quite probable that control doesn't return to the pan function for the duration of a frame transfer.

  • Manuel Odendahl32 said:
    1. userland filled FB1, schedules it for transfer, and writes par->dma_start = 1.
    2. At this point, an interrupt happens, and the kernel schedules a DMA transfer from par->dma_start = 1 to par->dma_end = 1 (because of race condition).

    This seems very unlikely, it would require an interrupt hitting a window of a few nanoseconds *and* LCDC DMA needs to finish the current frame during that time.

    Note that the second part already means userspace is trying to submit a frame roughly around the time LCDC DMA is switching over to the other frame buffer, which also sets it up for hitting the race condition I mentioned (which has a much larger window of opportunity, at least a few dozen microseconds.)

    Note also that "userspace" isn't writing to DMA registers, the kernel is doing so on its request.

    Fixing tearing at the cost of sporadically corrupting the DMA controller state does not sound like a good tradeoff to me. If tearing is happening, then the GPU is drawing into framebuffers without regard to whether the framebuffer is in use. The commit message actually shows the real problem:

    1)Initially assume both buffers FB0 and FB1 are programmed for buffer-0.
    2)On EOF0: Program FB0 for buffer-1, indicate(wake up) application to fill up buffer-0.

    If buffer-0 is still pending display (as FB1), the application should not be notified to fill it up, or tearing may indeed occur. (It would be nice to make this an option though, since some people may prefer the lower latency dispite the risk of tearing if drawing takes too long. This is actually the "speculative drawing" setup I mentioned a few posts ago.)

  • I don't understand this part of your post:

    "the DMA controller might finish the current buffer and flip to the other one after update_scanout has tested dma_completed_channel but before it updates the framebuffer pointers, resulting in SYNC_LOST."

    If the framebuffer pointer pairs (taken as a pair) point to a valid framebuffer, if userland (by userland i mean the kernel on behalf of a syscall, outside of the irq context, sorry if that was confusing), where is the SYNC_LOST coming from. I made up my elaborate and improbable race condition so that DMA BASE and CEIL would point to a short 0 buffer.
  • I will admit I made an assumption here: namely that the DMA controller doesn't make a "shadow copy" of the ceil-pointer of the framebuffer when starting a transfer, but simply compares the active transfer pointer against the content of the ceil-pointer register. In this case, changing the ceil-pointer of the actively transferred buffer suffices to make the DMA controller transfer the wrong amount of data, resulting in SYNC_LOST.

    If it does make a shadow copy, that would make the whole issue a lot stranger.

    The TRM gives no indication either way, so this would need to be tested to be certain.

    Addendum: I've already written a simple baremetal LCDC test a while ago. I need to brush some dust off and bring it back in sync with my current baremetal codebase, then I'll run some tests to settle some uncertainties w.r.t. the precise behaviour of LCDC.

  • Thanks Matthijs that's great. We are currently running the fbdev without having pan write to the DMA registers, we'll have more definitive results tonight. It shows pretty terrible (but very consistent and reproducible) tearing artifacts. We are currently looking into actually fixing the issue, with some trace output.
  • We traced the sequence of IRQs and pans in the kernel, to see why the tearing happens. It seems that even with triple buffering it chooses the worst possible combination, which is always writing into the buffer that is currently transferred. This shows which buffer is passed as next to be scheduled on pan calls, which means that after that pan call, the GPU will start rendering to the following buffer. 

    Here at first, userland pans in FB3, and starts to draw to FB1, while DMA1 is transferring FB1(!). After having drawn to FB1, userland calls wait_for_vsync and goes to sleep. Once FB1 transfer is complete, a END1 irq is signaled, DMA0 starts transferring the scheduled FB2, FB3 is set as next transfer for DM1, and userland is woken up. Userland immediately pans in FB1, and starts writing to FB2, which is transferred by FB0 (!).

    And so on. We will try to synchronize transfers with userland by letting it lose a few vsyncs if necessary tomorrow.

    The following patch doesn't trigger any lost syncs, but does tear reproducibly (as explained above).

    From 7ea595326a0b1c48f3df590e06193422a9057beb Mon Sep 17 00:00:00 2001
    From: Jonathan Fisher <jonathanmarkfisher@gmail.com>
    Date: Tue, 13 Oct 2015 10:25:11 -0400
    Subject: [PATCH] Added debug statements to da8xx-fb.c
    
    ---
     drivers/video/fbdev/da8xx-fb.c | 7 +++++++
     1 file changed, 7 insertions(+)
    
    diff --git a/drivers/video/fbdev/da8xx-fb.c b/drivers/video/fbdev/da8xx-fb.c
    index df6b679..324272c 100644
    --- a/drivers/video/fbdev/da8xx-fb.c
    +++ b/drivers/video/fbdev/da8xx-fb.c
    @@ -974,6 +974,11 @@ static irqreturn_t lcdc_irq_handler_rev02(int irq, void *arg)
     	struct da8xx_fb_par *par = arg;
     	u32 stat = lcdc_read(LCD_MASKED_STAT_REG);
     
    +    if (status & LCD_SYNC_LOST)
    +    {
    +        pr_err("%s: snowcrash? stat: 0x%08x\n", __func__, stat);
    +    }
    +
     	if ((stat & LCD_SYNC_LOST) && (stat & LCD_FIFO_UNDERFLOW)) {
     		lcd_disable_raster(DA8XX_FRAME_NOWAIT);
     		lcdc_write(stat, LCD_MASKED_STAT_REG);
    @@ -1391,6 +1396,7 @@ static int da8xx_pan_display(struct fb_var_screeninfo *var,
     			end	= start + fbi->var.yres * fix->line_length - 1;
     			par->dma_start	= start;
     			par->dma_end	= end;
    +#if 0
     			spin_lock_irqsave(&par->lock_for_chan_update,
     					irq_flags);
     			if (par->which_dma_channel_done == 0) {
    @@ -1406,6 +1412,7 @@ static int da8xx_pan_display(struct fb_var_screeninfo *var,
     			}
     			spin_unlock_irqrestore(&par->lock_for_chan_update,
     					irq_flags);
    +#endif
     		}
     	}

  • The correct way to approach this, and this might be something ti has already done for Processor SDK 2.0, is to fix the flicker race in the DRM driver, which allows userland to implement proper double/triple buffering (because the driver tells it which buffer is FRONT).

    This is not the case in the fbdev driver. For the sake of getting this out of our way (we have a product to ship), I will hack proper double-buffered synchronization into our fbdev to remove the tearing, and probably call it a day, at least until SDK 2.0 is out. I will post our hackfix patch here.

    What are further steps that I should take to address this. I am actually surprised not more people are seeing this.
  • Manuel Odendahl32 said:
    It seems that even with triple buffering it chooses the worst possible combination, which is always writing into the buffer that is currently transferred.

    Haha, that's some nice buffer management there.

    With DRM it's still unclear to me how ownership of buffers is handled. You may be right that userspace is ultimately in charge of deciding when to show a buffer, but it will rely on the kernel for notification that the display driver has finished using a framebuffer. The commit message you showed seemed to suggest the driver indicates this too early: whenever a framebuffer is programmed twice into the LCDC (e.g. during startup or whenever new frame wasn't delivered on time) no notification should be given on dma completion, since the buffer is in fact still (or rather, again) being displayed.

    Anyway, I've gotten my lcd test to work again, finally... it was covered in a bit more dust than I anticipated (and got side-tracked now and then). I haven't done much testing yet but I can share already the first few observations:

    The DMA controller is completely oblivious to frame timing; it just reads data, alternating between the two buffers, and writes it to the async fifo to the raster controller. Only when the (2 KB) FIFO has filled up will the DMA controller begin to approximately track the pixel display rate, due to backpressure from the FIFO. This was most visible when I had a really tiny framebuffer configured, which resulted in an initial rapid-fire of many EOF0 / EOF1 events before even the first frame had been displayed. This also means that when transfer is briefly stalled during the vertical blanking interval, the DMA controller will in fact already have read the first 2 KB of the next buffer.

    The vsync period is definitely is located at start of a frame (before the "back porch"), not at the end (after the "front porch"). Ditto with the hsync period. Not really important, but I noticed it.

    SYNC_LOST is not a fatal error condition and does not halt the raster controller nor the DMA controller. The DMA controller merely embeds start of buffer / end of buffer markers in the data stream but is otherwise oblivious to how things are going and just keeps feeding data. The raster controller notices the misplaced markers and signals SYNC_LOST, and I get the impression it actually attempts some kind of recovery but I need to study it a bit more.

  • Manuel Odendahl32 said:
    What are further steps that I should take to address this. I am actually surprised not more people are seeing this.

    There's still a race condition aspect to it. There may be something about your particular configuration (display timings, graphics workload) that triggers it more often than average.

    Also plenty of people who use no graphical UI at all, or a mostly-static one.

    BTW, it appears LCDC does in fact make a shadow copy of the buffer end-pointer: I've been unable to trigger a SYNC-LOST by updating both sets of pointers while DMA was running. It still seems like bizarrely bad luck to strike in between two stores, but there's a really easy way to prevent it: use a 64-bit store to write both fields at the same time. The L3 interconnect will still break it up into two 32-bit stores, but at least you exclude the possibility of an interrupt inbetween.

    If such a store is done at an arbitrary time rather than in response to an EOF interrupt there's still some risk though that LCDC switches to the new buffer precisely in between those two 32-bit stores. As mentioned before, the simple solution is to only update the pointers in response to an EOF interrupt.

    W.r.t. sync-lost recovery, here's what happens if a single frame has only half the expected size or double the expected size:

    In case of a short framebuffer the raster engine emits a sync-lost upon encountering the unexpected end. The raster timing remains unaffected though, so I wonder what it's outputting... (note: there's a mistake in the first picture since the short framebuffer should make the total data transferred increase by only 1/2 frame worth of data. Once fixed that will place the dma-line below the framebuffer line: the raster engine must be outputting something, but not data from dma.)

    In case of a long framebuffer I get three sync-lost irqs: one right after it expected end of buffer but didn't get one, then after the vertical blanking period it complains again, presumably about the lack of start-of-buffer indicator. I have no idea why it's complaining the third time: there should be an end-of-buffer marker there. In this situation both frame timing and dma rate appears unaffected (this might however be due to lack of data points. I manually edited in the vertical blanking periods; dma would show similar shape of course, but 2 KB later).

    Horizontal axis is in frame-times, vertical in frames worth of data.

  • Hi all,

    I made some tilcdc patches to solve this issue (still work-in-progress). You can find them from:

    git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux.git ti/lcdc-fixing-3.14

    It'd be interesting to hear if they a) work b) help with the issue seen.

    The "drm/tilcdc: use single fb mode" patch is the important one, others are mostly cleanup.
  • Thank you Tomi and Matthijs.

    We will put your driver to the test in the following days (crunch time!), and let you know how it goes.

    I have come up with a "fix" for the fbdev driver to fix the tearing issue, which however I didn't get round to testing, which might be our option B. I have let our "fixed" fbdev driver (which disables writing to the DMA registers in the syscall) for a few days, and we are seeing this situation (very rarely, on a machine with load 8) where I think the interrupt gets delayed so long that the irq handler gets to run when the next DMA flip is done, which also leads to long chains of LOST_SYNC. I don't know if you driver handles that case, but it looks like it does (?).

    I definitely want to get to the bottom of this where I can exactly tell what causes the long chains of LOST_SYNC, just to be sure that the fix is indeed a correct one. One idea was to then reset the raster, and restart afresh as soon as a LOST_SYNC is caught to avoid cascading.

  • Note that based on my (admittedly limited) tests, LCDC seems to auto-recover from lost sync. It may just be the driver that needs recovering. It would probably help to avoid writing the LCDC dma registers unless you get either an EOF0 or EOF1 (but not both). The DMA registers really shouldn't be written to outside the irq handler, but if they are, at the very least make sure this is done under spinlock. Writing both start- and end-pointer with a single 64-bit write should also greatly minimize the window of opportunity for trouble.

    Manuel Odendahl32 said:
    where I think the interrupt gets delayed so long that the irq handler gets to run when the next DMA flip is done

    You can explicitly test for this: if the irq handler is run much too late, both EOF0 and EOF1 will be set. To detect if you were borderline too late, inspect the irq status register again after updating the dma pointers: if EOF0 was set at the start of the irq handler and EOF1 after updating the pointers, or vice versa, you've raced against the DMA flip.

    I'm not sure what the right way to deal with this is. A frame has already been dropped at that time, so you need to catch up somehow.

  • Note that I took a slightly different approach in my DRM driver patch I posted earlier. It uses LCDC in single frame mode, which simplifies the driver nicely. I couldn't figure out any benefit in using the LCDC in dual frame mode.


    It writes the DMA pointers outside the irqs, but inside a spinlock. And it tracks the time after vblank, and skips writing to the DMA pointer if the next vblank is coming near.

  • Tomi Valkeinen said:
    Note that I took a slightly different approach in my DRM driver patch I posted earlier. It uses LCDC in single frame mode, which simplifies the driver nicely. I couldn't figure out any benefit in using the LCDC in dual frame mode.

    Most likely it was the very existence of dual frame mode that made me assume you weren't allowed to change the pointers of the active buffer. Since, if you are allowed to, then as you say there doesn't seem to be much use for dual frame mode. I guess it's probably for drivers that just use two fixed framebuffers and never update the pointers at all.

    In general your approach seems like a good one to me. Simplification++

    The time-based test doesn't look quite right to me though, it seems to test (now - last_vsync <= margin) while it should be (now - last_vsync <= frame_time - margin), or equivalently (next_vsync - now <= margin) where next_vsync = last_vsync + frame_time.  Or rather, considering the large value of the "safety threshold" it seems it does perform this test, but for a hardcoded frame rate of 60 fps.

    Also, the irq handler is writing vblank_time without having taken irq_lock; that's unsafe on RT kernels.

  • Matthijs van Duin said:

    The time-based test doesn't look quite right to me though, it seems to test (now - last_vsync <= margin) while it should be (now - last_vsync <= frame_time - margin), or equivalently (next_vsync - now <= margin) where next_vsync = last_vsync + frame_time.  Or rather, considering the large value of the "safety threshold" it seems it does perform this test, but for a hardcoded frame rate of 60 fps.

    Also, the irq handler is writing vblank_time without having taken irq_lock; that's unsafe on RT kernels.


    Thanks for the review! Your points are valid, I'll make the changes.

  • Thanks for the patch Tomi! We have rolled it out to our machines this weekend. Our test booth was dismantled but we are setting up a new monitoring system, and will let you know about results soon!

  • Manuel Odendahl32 said:

    Thanks for the patch Tomi! We have rolled it out to our machines this weekend. Our test booth was dismantled but we are setting up a new monitoring system, and will let you know about results soon!


    Ok, good. Please note that the current patches have been designed for 60fps displays. So if your display is not 60fps, you should adjust the TILCDC_VBLANK_SAFETY_TRESHOLD_MS. Although that's not exactly required, if the display's fps is less than 60.

  • Yes I've been following on that, we indeed use 60 fps.
  • And sadly... we saw a snowcrash on one of the machines we rolled out the patch to. This is not yet our webcam monitoring setup, but we had a plethora of SYNC LOSTs in the logs. The machine wasn't even under heavy load.

    Which leads me to wonder what actually is causing the sync lost. Here is the state of my (limited) knowledge: the reason a sync lost happens is due to the DMA registers being wrong at *exactly* the time a VBLANK interrupt happens, leading for the LCDC DMA engine to end up latching in a wrong pair of registers, with the floor register pointing to the new buffer, and ceil still pointing to the old buffer (in the order we do our writes). Is the only correct way to do this correctly using an atomic 64 bit write to write to both registers at once? Does the interconnect/LCD controller allow for that? 

    But this explanation confuses me. The fact that Tomi's patch uses only one channel, that is written to inside a spinlock_irqsave, on a non-RT kernel, with a safety interval after the previous VBLANK, makes it hard to me to understand how the above race condition happens (and I think it has to happen not once, because the LCDC seems to recover after a single SYNC LOST, but multiple times in a row). Here is my tentative explanation: the VBLANK interrupt gets handled late (around 3ms if we assume the 60 fps), which leads the page_flip method which runs at a later time (say 14 ms) to write to the dma registers inside the danger time interval. As far as I can tell, page_flip however is going to be called from that same irq handler through the vblank_callback handler leading into omaplfb, which makes the 14 ms delay there very improbable. That the VBLANK irq gets serviced 16.67 ms late also seems improbable (?).

    We do indeed have a high load of i2c interrupts at all times because of sensors we have to poll. But we don't see a really bad increase in timing variability at all.

  • Here are the steps we are going to take for further testing: disable set_scanout except from the irq handler, with page_flip only setting next_fb. move the vblank writing inside the spinlock but as far as i know vanilla kernel doesn't allow nested interrupts. reinitialize the controller on SYNC_LOST. But this feels like stabbing at the dark, because I don't understand what might be going on.
  • Ok, that's bad news...

    Yes, I find it rather unlikely too that the driver would be writing to the registers JUST at the same time the HW latches the addresses.

    I guess there's a small chance that bad addresses are programmed (somehow) to the registers (outside of RAM or such), leading to sync lost. So one thing to test would be to add BUG_ON() and check that the start and end addresses written to the registers are inside the RAM address space.

    I don't know how the interconnect works, but the LCDC registers are 32 bit, so I presume even if 64bit write works on the CPU side, they are still written one by one to the registers.

    Do you have any ideas how we could reproduce the problem on our end?

  • sadly not really. we have a qt application usually redrawing at fps (triggering the buffer flips on most vblanks), and then we kind of try to increase the load by copying files and exercising other parts of the system. A coworker is trying to emulate the behaviour on a beaglebone.

    the bug happens very rarely, we use a setup of 4 machines with a webcam to try to catch them. We will do a logging version to printout the last written DMA addresses. We did that for the previous driver, and didn't find anything suspicious going on. It really seems to be writing the addresses to the registers, as we never saw the bug happen in FRONTWSEGL (single buffered) mode.
  • Manuel Odendahl32 said:
    the reason a sync lost happens is due to the DMA registers being wrong at *exactly* the time a VBLANK interrupt happens, leading for the LCDC DMA engine to end up latching in a wrong pair of registers, with the floor register pointing to the new buffer, and ceil still pointing to the old buffer

    That's the only way so far I've been able to generate a sync lost on purpose. It still seems like an awfully narrow window to hit...

    Manuel Odendahl32 said:
    Is the only correct way to do this correctly using an atomic 64 bit write to write to both registers at once? Does the interconnect/LCD controller allow for that?

    You can use a 64-bit write to set both registers "at once" in the sense that it'll be atomic for the cpu and L3 interconnect, although it will still get broken up into two word-writes for delivery to the L4 Initiator Agent and ultimately LCDC. It will get you the tightest gap possible between the writes, although a gap remains (min 20 ns, might be more... hard to say).

    Manuel Odendahl32 said:
    The fact that Tomi's patch uses only one channel, that is written to inside a spinlock_irqsave, on a non-RT kernel, with a safety interval after the previous VBLANK, makes it hard to me to understand how the above race condition happens (and I think it has to happen not once, because the LCDC seems to recover after a single SYNC LOST, but multiple times in a row)

    Really really weird indeed.

    Though, maybe changing the framebuffer again during the sync recovery time somehow confuses the recovery process? That's something I hadn't tested. The whole sync recovery process is still unclear to me; I have no idea why an oversized framebuffer resulted in 3 sync lost irqs...

  • Hmm, it does seem like a bad idea to set vblank_time in the irq handler that may be delayed.

    *idea*  If the display PLL output is no higher than 200 MHz (i.e. your pixel clock is max 100 MHz), you can switch PRUSS to that clock (CLKSEL_PRU_ICSS_OCP_CLK) and use it to obtain a timebase that's synchronous to the display subsystem. If you configure its "Industrial Ethernet Timer" (after switching it from iep_clk to ocp_clk, see TRM 4.4.3) to wrap modulo the number of PLL clocks per frame (or some multiple), then reading the counter should give you an uncalibrated but drift-free time relative to vsync. This may come in handy to debug timing issues.

  • Never calling set_scanout from the page_flip function would alleviate the need for the blank_time "safe region" entirely, and only lead to a possible 1 frame latency, which we can live with for sure.
  • Never calling set_scanout from page_flip will halve the fps. It's true that if we had a queue of buffers given to the driver, it'd just lead to a 1 frame latency. But we don't have that, we have page_flip() and a page-flip event. There's no queueing done.
  • Matthijs van Duin said:

    You can use a 64-bit write to set both registers "at once" in the sense that it'll be atomic for the cpu and L3 interconnect, although it will still get broken up into two word-writes for delivery to the L4 Initiator Agent and ultimately LCDC. It will get you the tightest gap possible between the writes, although a gap remains (min 20 ns, might be more... hard to say).


    I don't have a clue about ARM instruction set, but it's a 32 bit CPU.  Are you saying it has a store instruction for 64 bit data, which it manages to write atomically? Isn't usually the rule of thumb that on 32 bit cpu, only 32 bit values are handled atomically, not 64 bit.

  • you can use the STRD instruction to store 2 words in one instruction. I am not an expert at all but that's what I found after looking into it. I havent put it to use though. For the halving the FPS, wouldn't the IRQ write next_fb to the DMA regs, then call the vsync callback, which would update next_fb, which would be swapped in at the next irq? This would lead to the full fps as long as a new buffer gets flipped in on every vblank.
  • 1. the app calls page_flip()
    2. the driver sets the new fb to next_fb
    3. at vblank, the driver calls set_scanout() with the next_fb
    4. at next vblank, the driver sends the vblank event
    5. the app gets the vblank event, and can call page_flip() again

    So that's two vblanks that need to happen.
  • I see, I missed the skip_event flag (was doing this from memory).