Linux/TMDSEVM572X: OpenCV performance with DSP has strange behavior.

user5989957

Prodigy 50 points

Part Number: TMDSEVM572X

Tool/software: Linux

Hello,

I use AM5728-EVM with OpenCV(enable DSP). I wonder that my Test Programs are work well.

My questions are("API" means OpenCV C++ API):

1. Enable DSP is too fast:
- I think that the test image(A4@300dpi, about 2600*3700 px, JPEG)is processed less than 1ms is too fast.
- These may be asynchronous processing? and other API include these wait time for synchronize?

2. After acceralated API, other API be late:
- For example, "canny" is acceralated, but next API like "imwrite" with current image( or "imread" with next image, even if skipped "imwrite"), are late.
- As a result, total time with enable DSP is longer than with disable DSP.

3. If 1 and 2 are normal behavior, is the following sequence unsuitable use-case?
- Receive image from imaging unit -> Image read -> Image processing 1 -> Image prosessing 2 -> ... -> Image write -> Sending image to Host
- (I think these flow are very commonly...)

Following table is my test result.

- File processing order is:
enable DSP(Change environment variable) -> DSP+image1 -> DSP+image2 -> DSP+image3 -> disable DSP ->CPU+image1 -> CPU+image2 -> CPU+image3 .
- Internal processing order is:
imread -> cvtColor -> GaussianBlur -> Canny -> imwrite.

over 6 years ago

0 Rex Chang over 6 years ago

TI__Guru 50170 points

Hi,

Could you try not running 3 images in sequence before and after DSP offloading is enabled? Could you try one image at a time?

OpenCL in OpenCV has somewhat different call flow. It submits a task (to DSP OpenCL queue) and returns immediately. This allows parallelization between accelerator (DSP) and CPU, as long as data dependency is satisfied. So, when an action is requested, CPU waits for end of submitted task before it can actually do the action. DSP in many cases is slower than A15 (NEON operation). The cost time of a certain action offloading to DSP includes: waiting time (for DSP to finish) plus actual action time and any format conversion (done on CPU). It also depends a lot on data types used, and if floating point operations are involved. This can be accelerated if DSP optimized implementation of that action is created.

Rex

0 Rex Chang over 6 years ago in reply to Rex Chang

TI__Guru 50170 points

Hi,

Have you tried running the test separately with each image? Do you get different results or they are still the same?
If you don't have any more questions, I'll close this thread. Please open a new one if you have other issues.

Rex

0 user5989957 over 6 years ago in reply to Rex Chang

Prodigy 50 points

Hi,

Thank you for your reply. I'll try to test the method that you proposed.

By the way, if I want to accelerate the use-case I said*, what is the solution? I think that the device can't keep opening many images same time, because of the memory size limit...

I imagine that; Divide processes and use IPC? The one is only processing image memories with DSP, the another is only reading/writing image files with CPU?

*: Receive image from imaging unit -> Image read -> Image processing 1 -> Image prosessing 2 -> ... -> Image write -> Sending image to Host

0 Rex Chang over 6 years ago in reply to user5989957

TI__Guru 50170 points

Hi,

The behavior you see is related to async (non-blocking) operation difference.
In case of DSP OpenCL offload, it submits a job and returns right away, if all input operands are available.
If input operand is not available it will wait, like in imwrite case, which I think is always sync (blocking) operation.

Rex

Processors

Processors forum

Linux/TMDSEVM572X: OpenCV performance with DSP has strange behavior.