AM335x garbled LCD and SGX messages

Bruno Morais

Other Parts Discussed in Thread: AM3354, DA8XX, STRIKE

I am working on a board which uses a SOM with an AM3354. There is a LCD touchscreen which is used with a QT app, there is no X server. I am seeing a behavior where apparently randomly, the LCD screen gets totally garbled and goes through a series of apparently random data being displayed.

After a few minutes of going through this, the system recovers and continues to function normally. Whenever this happens, I see a lot of SGX messages being dumped to the kernel log:

[ 2624.874797] PVR_K: User requested SGX debug info
[ 2624.879696] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 1.10@2359475)
[ 2624.887071] PVR_K: Flip Command Complete Data 0 for display device 11:
[ 2624.893958] PVR_K: SRC 0: ROC DevVAddr:0xD8010DC ROP:0x53df ROC:0x53dc, WOC DevVAddr:0xD8010CC WOP:0x29f0 WOC:0x29ef
[ 2624.905153] PVR_K: SRC 1: ROC DevVAddr:0xD8010B4 ROP:0x53de ROC:0x53db, WOC DevVAddr:0xD8010A4 WOP:0x29ef WOC:0x29ee
[ 2624.916345] PVR_K: Host Ctl flags= 00000006
[ 2624.920840] PVR_K: SGX Host control:
[ 2624.924609] PVR_K: (HC-0) 0x00000001 0x00000000 0x00000000 0x00000001
[ 2624.931502] PVR_K: (HC-10) 0x00000000 0x00000000 0x0000000A 0x00030D40
[ 2624.938485] PVR_K: (HC-20) 0x00000000 0x00000004 0x00000000 0x00000000
[ 2624.945468] PVR_K: (HC-30) 0x00000000 0x00279E4B 0xEFD0A0E0 0x00000000
[ 2624.952437] PVR_K: (HC-40) 0x00000000 0x00000000 0x0006131B 0x00000000
[ 2624.959412] PVR_K: SGX TA/3D control:
[ 2624.963273] PVR_K: (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000
[ 2624.970253] PVR_K: (T3C-10) 0x00000000 0x00000000 0x00000002 0x00000000
[ 2624.977325] PVR_K: (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2624.984386] PVR_K: (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2624.991457] PVR_K: (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2624.998528] PVR_K: (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.005601] PVR_K: (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.012661] PVR_K: (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.019734] PVR_K: (T3C-80) 0x00000000 0x00000000 0x0F0AD518 0x0F000000
[ 2625.026809] PVR_K: (T3C-90) 0x9C2F3000 0x0F095DC0 0x00000000 0x0F08B920
[ 2625.033870] PVR_K: (T3C-A0) 0x0F00AEA0 0x0F0AD55C 0x0F08B920 0x00000000
[ 2625.040941] PVR_K: (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.048015] PVR_K: (T3C-C0) 0x00000000 0x00000000 0x000053DD 0x000053DC
[ 2625.055092] PVR_K: (T3C-D0) 0x0F000000 0x8000B000 0x8004B000 0x0F004000
[ 2625.062153] PVR_K: (T3C-E0) 0x0F00A420 0x0F00A740 0x0F08B000 0x0F08B000
[ 2625.069228] PVR_K: (T3C-F0) 0x00000000 0x000001FD 0x000001FD 0x00000000
[ 2625.076302] PVR_K: (T3C-100) 0x00000003 0x00000000 0x00000000 0x00000001
[ 2625.083454] PVR_K: (T3C-110) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2625.090613] PVR_K: SGX Kernel CCB WO:0x94 RO:0x94
[ 2683.954773] PVR_K: User requested SGX debug info
[ 2683.959687] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 1.10@2359475)
[ 2683.967052] PVR_K: Flip Command Complete Data 0 for display device 11:
[ 2683.973942] PVR_K: SRC 0: ROC DevVAddr:0xD8010DC ROP:0x53f3 ROC:0x53f0, WOC DevVAddr:0xD8010CC WOP:0x29fa WOC:0x29f9
[ 2683.985141] PVR_K: SRC 1: ROC DevVAddr:0xD8010B4 ROP:0x53f2 ROC:0x53ef, WOC DevVAddr:0xD8010A4 WOP:0x29fa WOC:0x29f8
[ 2683.996339] PVR_K: Host Ctl flags= 00000006
[ 2684.000836] PVR_K: SGX Host control:
[ 2684.004606] PVR_K: (HC-0) 0x00000001 0x00000000 0x00000000 0x00000001
[ 2684.011501] PVR_K: (HC-10) 0x00000000 0x00000000 0x0000000A 0x00030D40
[ 2684.018484] PVR_K: (HC-20) 0x00000000 0x00000004 0x00000000 0x00000000
[ 2684.025468] PVR_K: (HC-30) 0x00000000 0x00288513 0xEFD0A0E0 0x00000000
[ 2684.032439] PVR_K: (HC-40) 0x00000000 0x00000000 0x000614E1 0x00000000
[ 2684.039416] PVR_K: SGX TA/3D control:
[ 2684.043278] PVR_K: (T3C-0) 0x0F003000 0x0F003120 0x0F002000 0x00000000
[ 2684.050264] PVR_K: (T3C-10) 0x00000000 0x00000000 0x00000002 0x00000000
[ 2684.057340] PVR_K: (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.064402] PVR_K: (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.071479] PVR_K: (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.078555] PVR_K: (T3C-50) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.085629] PVR_K: (T3C-60) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.092690] PVR_K: (T3C-70) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.099765] PVR_K: (T3C-80) 0x00000000 0x00000000 0x0F0AD55C 0x0F000000
[ 2684.106841] PVR_K: (T3C-90) 0x9C2F3000 0x0F094A00 0x00000000 0x0F08B4C0
[ 2684.113902] PVR_K: (T3C-A0) 0x0F00AEA0 0x0F0AD55C 0x0F08B920 0x00000000
[ 2684.120977] PVR_K: (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.128053] PVR_K: (T3C-C0) 0x00000000 0x00000000 0x000053F2 0x000053F0
[ 2684.135130] PVR_K: (T3C-D0) 0x0F000000 0x8000B000 0x8004B000 0x0F004000
[ 2684.142193] PVR_K: (T3C-E0) 0x0F00A420 0x0F00A740 0x0F08B000 0x0F08B000
[ 2684.149268] PVR_K: (T3C-F0) 0x00000000 0x000001FD 0x000001FD 0x00000000
[ 2684.156342] PVR_K: (T3C-100) 0x00000003 0x00000000 0x00000000 0x00000000
[ 2684.163496] PVR_K: (T3C-110) 0x00000000 0x00000000 0x00000000 0x00000000
[ 2684.170658] PVR_K: SGX Kernel CCB WO:0x3B RO:0x3B

I have tried to reproduce this behavior exhaustively but I simply cannot reproduce it at will. It seems to be completely non-deterministic. I would like to understand those messages that are spit out by the kernel, as may be that will help me debug this.

Thank you.

over 10 years ago

0 Matthijs van Duin over 9 years ago in reply to Tomi Valkeinen

Mastermind 8040 points

Tomi Valkeinen said:
I don't have a clue about ARM instruction set, but it's a 32 bit CPU. Are you saying it has a store instruction for 64 bit data, which it manages to write atomically? Isn't usually the rule of thumb that on 32 bit cpu, only 32 bit values are handled atomically, not 64 bit.

ARM has 64-bit load/store. I don't think they're architectually guaranteed to be atomic unless you use LDREX / STREX. Aligned 64-bit load/store is definitely atomic on the cortex-a8 though. All memory paths internally are 64-bit or 128-bit wide, as is the external bus interface, so there's simply no opportunity whatsoever to break it into words.

On a cortex-A8 you can actually make much bigger transfers that are still atomic from the cpu's point of view: an LDM / STM instruction cannot be not interrupted. An STM however completes once the data is in the write buffer*, and I don't exclude the possiblity you can get other transfers inserted afterward, though aligned 64-bit transfers will stay atomic all the way to the L3. It's also quite possible bursts stay atomic in the A8. Once a burst goes out the bus interface, I'm reasonably sure it remains atomic while travelling on the L3 interconnect.

(* except stores to strongly-ordered memory)

0 Jonathan Fisher90 over 9 years ago in reply to Tomi Valkeinen

Prodigy 50 points

Hello once again,

We've started circling back on this, Tomi, as the issue has not yet gone away. We applied your patches from the branch listed above, but it doesn't appear to have solved the issue.

However, I've seen some more patches it looks like you've created in this area (e.g. lists.freedesktop.org/.../100029.html), and was curious if these are patches that could or should be applied in the 3.14 branch?

Jonathan

0 Matthijs van Duin over 9 years ago in reply to Jonathan Fisher90

Mastermind 8040 points

Jonathan Fisher90 said:
the issue has not yet gone away

It may be interesting to note there are more LCDC problems, with a possibility they may be linked:

1. When using the framebuffer console, the screen content may jump 32 pixels to the right (with wrap to next line) and jump back again. Both jumps seem to require some sort of trigger (seems fairly random, like typing a command or triggering a vbell) but appears stable when fully idle. Robert Nelson reports he's seen this one all the way back to the 3.8 kernels.

2. Someone with severe garbling during some kinds of intensive memory access from the cpu. This was apparently traced to fifo underruns occurring on lcdc (see also this). Interestingly, 32-pixel shift was also seen, often as a prelude to full-blown garbling.

3. I have acquired a truly bizarre one: on a kernel with no framebuffer console (CONFIG_VT=n) and a 1024x768@60 lcd panel attached (not via HDMI) the last pixel (1023) of each line is corrupted, specifically it displays a copy of pixel 991 = 1023 - 32 of that line. The phenomenon was first noticed with a Qt5 application (QPA=linuxfb) but it also appears if I manually write to /dev/fb0. The content of /dev/fb0 still reads back correctly. Since this variant of the problem is completely reproducible and stable it should be easier to debug, although I'm still not sure where to begin. Unfortunately the flimsy prototype setup broke and the problem didn't manifest itself on a small low-res HDMI screen I attached. I haven't had time yet to dig further into this since other stuff has priority for me now, but I'll certainly need to face it again once the new prototype shows up on my desk.

All these have in common that they're highly mysterious, there's a "32-pixel" thing that shows up, although in different ways, and no obvious fault in software or kernel driver. The more blatent garbling in problem 2 also seems to link it to your problem.

Another thing they have in common is that all of them could be caused by problems with the async fifo between LCDC's dma controller and the raster controller...

0 Manuel Odendahl32 over 9 years ago in reply to Matthijs van Duin

Intellectual 280 points

Hey people,

I am very happy to report that I think we finally nailed it.

We had a UI update that was pushing us to constantly be at 60 fps, and snowcrash was basically making that release unshippable (it would happen every couple of hours on every single printer). We refined out testing farm setup, because we now have IoT reporting and a script to recognize dashboard, meaning we could run the UI headless on a simple board. This allowed us to test iterations of our patches very quickly (< 2 hours turnaround).

Tomi Valkeinen s patch was very very close, but we realized that the libEGL was using tilcdc_crtc_mode_set_base to induce a page flip, not tilcdc_crtc_page_flip. After that it was fairly easy to add the spinlocking + safety threshold to that method as well.

We also added a 64 bit bus write for the FB addresses, we didn't go entirely crazy on the atomicity but this also seems to help (mostly it removes the unnecessary memory barrier between the 2 register writes).

We would like to really thank everybody that helped on this thread and especially Tomi Valkeinen and Matthijs van Duin for their tremendous help.

I am attaching our fairly short patch. We have sinced shipped this firmware, and had no in field crashes whatsoever, with it being deployed to multiple thousands machines.

Thanks again! Manuel

Fullscreen snowcrash-fix.diff Download

diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
index 47a7baf..a70f37b 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
@@ -57,29 +57,44 @@ static void unref_worker(struct drm_flip_work *work, void *val)
 	mutex_unlock(&dev->mode_config.mutex);
 }
 
+static inline void __raw_writell(u64 val, volatile void __iomem *addr)
+{
+	asm volatile("strd %1, %0"
+		     : "+Qo" (*(volatile u64 __force *)addr)
+		     : "r" (val));
+}
+
+#define writell(v,c)		({ __iowmb(); __raw_writell(v,c); })
+
 static void set_scanout(struct drm_crtc *crtc, struct drm_framebuffer *fb)
 {
 	struct tilcdc_crtc *tilcdc_crtc = to_tilcdc_crtc(crtc);
 	struct drm_device *dev = crtc->dev;
+	struct tilcdc_drm_private *priv = dev->dev_private;
 	struct drm_gem_cma_object *gem;
 	unsigned int depth, bpp;
 	dma_addr_t start, end;
+	u64 *dma_regs0 = priv->mmio + LCDC_DMA_FB_BASE_ADDR_0_REG;
+
+	u64 fb_addrs;
 
 	drm_fb_get_bpp_depth(fb->pixel_format, &depth, &bpp);
 	gem = drm_fb_cma_get_gem_obj(fb, 0);
 
 	start = gem->paddr + fb->offsets[0] +
-		crtc->y * fb->pitches[0] +
-		crtc->x * bpp / 8;
+			crtc->y * fb->pitches[0] +
+			crtc->x * bpp / 8;
 
 	end = start + (crtc->mode.vdisplay * fb->pitches[0]);
 
-	tilcdc_write(dev, LCDC_DMA_FB_BASE_ADDR_0_REG, start);
-	tilcdc_write(dev, LCDC_DMA_FB_CEILING_ADDR_0_REG, end);
+
+	fb_addrs = (u64)(end) << 32 | start;
+	// pure quality
+	writell(fb_addrs, dma_regs0);
 
 	if (tilcdc_crtc->curr_fb)
 		drm_flip_work_queue(&tilcdc_crtc->unref_work,
-			tilcdc_crtc->curr_fb);
+				    tilcdc_crtc->curr_fb);
 
 	tilcdc_crtc->curr_fb = fb;
 }
@@ -438,10 +453,13 @@ static int tilcdc_crtc_mode_set(struct drm_crtc *crtc,
 }
 
 static int tilcdc_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
-		struct drm_framebuffer *old_fb)
+				     struct drm_framebuffer *old_fb)
 {
+	struct tilcdc_crtc *tilcdc_crtc = to_tilcdc_crtc(crtc);
 	struct drm_device *dev = crtc->dev;
 	int r;
+	unsigned long flags;
+	s64 tdiff;
 
 	r = tilcdc_verify_fb(crtc, crtc->fb);
 	if (r)
@@ -451,7 +469,16 @@ static int tilcdc_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
 
 	pm_runtime_get_sync(dev->dev);
 
-	set_scanout(crtc, crtc->fb);
+	spin_lock_irqsave(&tilcdc_crtc->irq_lock, flags);
+
+	tdiff = ktime_to_ms(ktime_sub(ktime_get(), tilcdc_crtc->vblank_time));
+
+	if (tdiff <= TILCDC_VBLANK_SAFETY_TRESHOLD_MS)
+		set_scanout(crtc, crtc->fb);
+	else
+		tilcdc_crtc->next_fb = crtc->fb;
+
+	spin_unlock_irqrestore(&tilcdc_crtc->irq_lock, flags);
 
 	pm_runtime_put_sync(dev->dev);
 
@@ -672,12 +699,12 @@ irqreturn_t tilcdc_crtc_irq(struct drm_crtc *crtc)
 		unsigned long flags;
 		bool skip_event = false;
 
-		tilcdc_crtc->vblank_time = ktime_get();
-
 		drm_flip_work_commit(&tilcdc_crtc->unref_work, priv->wq);
 
 		spin_lock_irqsave(&tilcdc_crtc->irq_lock, flags);
 
+		tilcdc_crtc->vblank_time = ktime_get();
+
 		if (tilcdc_crtc->next_fb) {
 			set_scanout(crtc, tilcdc_crtc->next_fb);
 			tilcdc_crtc->next_fb = NULL;

0 Manisha Agrawal over 9 years ago in reply to Manuel Odendahl32

TI__Mastermind 22946 points

Hi Manuel,

Really glad to know that the issue is been resolved at your end. Thank you for sharing the root cause and fix for the problem. We will take care to make the tilcdc driver further robust for calls from libEGL.

Regards,
Manisha

0 Tomi Valkeinen over 9 years ago in reply to Manuel Odendahl32

TI__Prodigy 625 points

Manuel Odendahl32 said:
we realized that the libEGL was using tilcdc_crtc_mode_set_base to induce a page flip, not tilcdc_crtc_page_flip.

Hmm that's interesting. What How did libEGL do the page flip? What API functions it calls? I was under the impression the mode_set_base will be called only for full mode-sets, which is a "heavy" operation (i.e. no guarantee that you can get full fps with it).

In any case, just for your information, I think all this has changed quite a bit with the later releases, which have the updated SGX driver. And tilcdc is about to be converted to support the so called atomic modesetting, which also changes how the driver works (for better, I think). We need to verify that all the code paths in the new code handle this problem correctly.

Manuel Odendahl32 said:
We also added a 64 bit bus write for the FB addresses, we didn't go entirely crazy on the atomicity but this also seems to help (mostly it removes the unnecessary memory barrier between the 2 register writes).

So you actually saw a diff in your testing with and without 64 bit write? It sounds very unlikely that the HW would read the registers exactly between the two 32 bit writes, but if you actually saw it affect, I guess it's safer to add that to the mainline driver too.

0 Manuel Odendahl over 9 years ago in reply to Tomi Valkeinen

Prodigy 50 points

I'm not sure what libEGL methods are called, we use Qt and basically defer all the low-level things to it. Sadly libEGL itself is a binary blob.

As for 64 bit, I am fairly sure we saw a significant difference between 32 bit and 64 bit writes. (the 32 bit write also has memory barriers and a function call iirc). I don't exactly remember the duration of our tests, but it looks from our database records that we had around 70 individual snowcrashes with normal kernel vs 3 with 64 bit writes, in approximately the same period of time. I really don't remember what this means exactly however.

We will stick to our current setup for a bit now I think, but good to hear that the new system will improve things further.

Cheers, Manuel

0 Tomi Valkeinen over 9 years ago in reply to Manuel Odendahl

TI__Prodigy 625 points

Manuel Odendahl said:
I'm not sure what libEGL methods are called, we use Qt and basically defer all the low-level things to it. Sadly libEGL itself is a binary blob.

Ok, no prob. I can dig it up from somewhere.

Manuel Odendahl said:
As for 64 bit, I am fairly sure we saw a significant difference between 32 bit and 64 bit writes. (the 32 bit write also has memory barriers and a function call iirc). I don't exactly remember the duration of our tests, but it looks from our database records that we had around 70 individual snowcrashes with normal kernel vs 3 with 64 bit writes, in approximately the same period of time. I really don't remember what this means exactly however.

Ok. Were those tests made without the mode_set_base improvement? In other words, did you need both fixes (64bit write and the mode_set_base) to get a reliable system, or would only mode_set_base be enough?

I think both improvements make sense, though, I'm just curious.

0 Manuel Odendahl over 9 years ago in reply to Tomi Valkeinen

Prodigy 50 points

Only the mode_set_base improvement is most probably enough. Because we saw a tangible improvement in the baseline by just switching to the 64bit transfer, we thought "let's throw that in as well". :) Thanks!

0 Matthijs van Duin over 9 years ago in reply to Tomi Valkeinen

Mastermind 8040 points

Tomi Valkeinen said:
I was under the impression the mode_set_base will be called only for full mode-sets, which is a "heavy" operation (i.e. no guarantee that you can get full fps with it).

I'd presume full mode-sets call mode_set rather than mode_set_base?

My current impression is that (atomic) modesetting basically replaces a whole pile of older calls, including page-flip, which is now essentially a special case of drm_mode_atomic but just to update the FB_ID property of one given CRTC.

0 Matthijs van Duin over 9 years ago in reply to Matthijs van Duin

Mastermind 8040 points

Matthijs van Duin said:
page-flip, which is now essentially a special case of drm_mode_atomic but just to update the FB_ID property of one given CRTC.

Just to add an example of when the basic page-flip call just doesn't suffice: say you have a windowed UI that uses double-buffering, and you're using an overlay for video playback inside a window. Now when that window is being moved, the page flip of the primary layer (containing the window decoration) needs to coincide with an update of the overlay coordinates to avoid undesirable artifacts. With the atomic modesetting API, that's just two extra properties to include in the call.

Obviously not applicable with tilcdc which has no overlays, but it was just an example. (It does apply to omapdrm of course, which fortunately seems to handle it well: I recently made a little test that animated the position and size of an overlay using atomic modesetting, and it ran flawlessly on my omap5 uevm at 60 fps with no significant CPU load).

Update: actually, a far more serious problem is that page_flip only works for the primary layer. Comments in drm_atomic.c and drm_atomic_helper.c actually refer to it as "legacy page_flip", so my guess would be that eventually the call will be removed from drivers and will just be emulated using modesetting by the DRM core for userspace compat.

0 Tomi Valkeinen over 9 years ago in reply to Matthijs van Duin

TI__Prodigy 625 points

Matthijs van Duin said:
I'd presume full mode-sets call mode_set rather than mode_set_base?

Well, with "full mode-set" I was referring to modesetting in general, versus page-flipping. mode_set_base is part of that.

Matthijs van Duin said:
My current impression is that (atomic) modesetting basically replaces a whole pile of older calls, including page-flip, which is now essentially a special case of drm_mode_atomic but just to update the FB_ID property of one given CRTC.

Atomic modesetting does replace the driver's configuration machinery almost totally. The legacy API will still remain, of course.

0 Tomi Valkeinen over 9 years ago in reply to Matthijs van Duin

TI__Prodigy 625 points

Matthijs van Duin said:
actually, a far more serious problem is that page_flip only works for the primary layer. Comments in drm_atomic.c and drm_atomic_helper.c actually refer to it as "legacy page_flip", so my guess would be that eventually the call will be removed from drivers and will just be emulated using modesetting by the DRM core for userspace compat.

Yes, driver's page_flip goes away when the driver is converted to atomic modesetting. See mainline omap_crtc.c for example. Same will happen with tilcdc when it's converted to atomic modesetting.

But this is going quite out of topic =).

0 user731153 over 8 years ago in reply to Manuel Odendahl32

Prodigy 150 points

Hello , i also make this mistake, but, i patch failed by using your 'snowcrash.diff' file. can u give me your source code after patching. thank u

0 user731153 over 8 years ago

Prodigy 150 points

Hello , i also make this mistake, but, i patch failed by using your 'diff' file. can u give me your source code after patching. thank u

Processors

Processors forum

AM335x garbled LCD and SGX messages