TDA4VM: [TDA4 OpenGL] glReadPixels takes unusual longer time to finish.

Mark Kang

Prodigy 230 points

Part Number: TDA4VM

Hi,

When using OpenGL to rendering surrounding view, we use glReadPixels to read image pixels from frame buffer to memory:

The total pixels of image reading is 120*40 pixels, and the image format is BGRA.

Normally gReadPixels takes around 0.397ms to finish reading 120*40 pixels, as showing in red box below:

However, gReadPixels sometimes takes more than 100ms to finish, and it severely causes 3-frame lag in our 30-fps surrounding view rendering.

Why does glReadPixel sometimes take such unusual longer time (> 100ms) to read 140*40 pixels?

Regards

Mark Kang.

over 2 years ago

0 Erick Narvaez over 2 years ago

TI__Mastermind 36307 points

Hello Mark,

Can you please let us know which SDK you have tried this with? And if you have seen this issue across multiple SDK versions?

The first thing we can do is try and investigate what is different during the lag-spikes. Are you familiar with collecting PVRTune data?

Regards,

Erick

0 Mark Kang over 2 years ago in reply to Erick Narvaez

Prodigy 230 points

Hi Erick

(1) We used Vision SDK v08.04.00.02 and found this issue when implementing this openGL-based view function for our product. We don't try other SDK version.

(2) We have used PVRTune to watch the CPU/GPU loading. We don't see any clue about 3-frame lag issue when inspecting PVRTune data.

Regards.

Mark Kang.

0 Erick Narvaez over 2 years ago in reply to Mark Kang

TI__Mastermind 36307 points

Hello Mark,

What is the FPS you are getting? Is there a stutter in the FPS?

Sometimes, it is difficult to measure the time around functions due to the deferred-rendering nature of the GPU we use.

Could you share the PVRTune you captured, the actual file, so we can take a look? Hopefully you are able to capture for a stretch of time to see how the frames are behaving.

Thanks,

Erick

0 Mark Kang over 2 years ago

Prodigy 230 points

Hi Erick,

(1) In normal case we got 30 fps, but when "glReadPixel takes >100ms to finish" case occurs, the frame rate drop to 26fps at that time.

(2) The PVRTune data we captured is put on https://drive.google.com/drive/folders/1Tu9gelR9IaUAu7HW8xvlvPBT8S3c8vz3 , we observe that issue occurs at time 2414.5 (see the figure below) and most part of loading is 0%.

Mark Kang.

0 Mark Kang over 2 years ago in reply to Mark Kang

Prodigy 230 points

0 Erick Narvaez over 2 years ago in reply to Mark Kang

TI__Mastermind 36307 points

Mark,

I am unable to access Google Drive link from here. Would you be able to upload this file to the ticket? You can use the "Insert" tab to do so.

Thanks,

Erick

0 Mark Kang over 2 years ago in reply to Erick Narvaez

Prodigy 230 points

Hi Erick,

Sorry to enable the access right on https://drive.google.com/drive/folders/1Tu9gelR9IaUAu7HW8xvlvPBT8S3c8vz3 . Please try again.

Mark Kang.

0 Erick Narvaez over 2 years ago in reply to Mark Kang

TI__Mastermind 36307 points

Mark,

It's not the access rights on the google drive. My company network blocks traffic to google drive, so I'll need another way to share. The E2E ticket allows upload if the file is not too big, if you just zip it up and upload it here that would be the best way.

Regards,

Erick

0 Mark Kang over 2 years ago in reply to Erick Narvaez

Prodigy 230 points

Hi Erick,

Please refer to the following .rar file. The file size is 26,713,011 bytes.

PVRTune_data.rar

Mark Kang.

0 Erick Narvaez over 2 years ago in reply to Mark Kang

TI__Mastermind 36307 points

Mark,

Thank you, I can see this issue clearly here. I'm requesting some support from our vendor as well on this topic, hope to get an explanation for this soon.

Meanwhile, a few questions for you:

1) Reproducing the issue on our side is usually the best way to get a bug solved quickly, is there any opportunity to share the application that exhibits this behavior?

2) Are there any other applications running in the background that could interfere with this app? Is this app part of a larger usecase? We want to see if the lag is due to something else that could preempt the driver from running. I see this dip in the memory utilization is periodically happening:

Any idea what this could be? It looks like the big time-gap happens around the time one of these happens.

Regards,

Erick

0 Erick Narvaez over 2 years ago in reply to Erick Narvaez

TI__Mastermind 36307 points

Hello,

Any updates on the queries above? I've spoken to our GPU vendor, we have an advanced set of tools that will allow us to check more values/counters that can help in our debug. Currently you are using PVRTuneDeveloper, but to have access to the advanced tool you will need to sign an NDA with our GPU vendor, Imagination technologies.

If you do not wish to go through this route, we can run the tool for you, but will need access to at least the binary to run this application, will this be a possibility?

Thanks,

Erick

+1 Mark Kang over 2 years ago in reply to Erick Narvaez

Prodigy 230 points

Hello Erick,

(1) As to providing application for reproducing same issue, it requires some effort for our software team modify the code from running from our own board to TDA4 EVM (most part is changing BSP from our own board to TDA4 EVM and try to reproduce the issue again). Sorry that until now we can't prepare this TDA4 EVM application since our software team has other higher priority TDA4 related SDK issue to solve.

(2) Instead, last week our engineer try to use OpenGL ES3 PBO (Pixel Buffer Object) method to workaround this issue. As the below figure shown, OpenGL ES3 PBO uses asynchronous DMA transfer and can avoid long-time blocking against CPU (in our case, 100ms). The good news is that we have run this workaround method for 3 hours and didn't see any 100ms CPU waiting. We will run the overnight test and hope that no any occurrence of 100ms CPU waiting . Could you also confirm with the GPU vendor whether our PBO method is the right choice to solve this issue?

Mark Kang.

+1 Erick Narvaez over 2 years ago in reply to Mark Kang

TI__Mastermind 36307 points

Hello,

Mark Kang said:
Could you also confirm with the GPU vendor whether our PBO method is the right choice to solve this issue?

Yes, I've requested they verify this solution for you, will reply as soon as they do.

Thanks,

Erick

0 Erick Narvaez over 2 years ago in reply to Erick Narvaez

TI__Mastermind 36307 points

Hello,

Yes, Imagination has confirmed that this is a valid approach.

Regards,

Erick

Processors

Processors forum

TDA4VM: [TDA4 OpenGL] glReadPixels takes unusual longer time to finish.