PROCESSOR-SDK-AM62A: Process CSI-2 RX Data on the fly for a low latency application

Aymeric Bn

Tool/software:

Hi everyone!

Using different high level libraries like gstreamer or OpenVX, I have always measured at least 3 frames of delay between light stimulation of my MIPI CSI-2 sensor and the resulting "stimulated frame" in the linux user space. I am using the ISP to de-bayerize the images coming from my sensor.

A delay of 1 frame could be attributed to my sensor, but the rest should be on the AM62A side. Looking at the J721E documentation, I found this interesting table:

Instance	Configuration	Time taken to receive one frame	ISR latency
CSI2Rx Inst 0	1CH 1080P30 IMX390 Sensor Raw12	33.3ms (MCU2_0)	9us (MCU2_0)

Does that mean that another frame of delay comes from how j721e-csi2rx handles the incoming MIPI stream ? The last frame of delay would then be coming from the v4l2 driver ?

My need is to develop a piece of software that would process some of the MIPI data "on the fly", with the smallest latency possible. I don't need the ISP to process the data. Where should I start ? Am I missing something ?

Any information would be greatly appreciated,

Thanks!

over 1 year ago

0 Suren Porwar over 1 year ago

TI__Mastermind 28195 points

Hi Aymeric,

I have assigned your query to our expert. He is currently out of office for 2 weeks, so expect response when he will return to office.

Apologies for the delay

Best Regards,

Suren

0 Jianzhong Xu over 1 year ago in reply to Suren Porwar

TI__Guru 58915 points

Hello Aymeric,

Please check this camera mirror system app note: https://www.ti.com/lit/wp/spradc4/spradc4.pdf. It has latency analysis on CSI2-Rx and ISP. I should be able to provide further help when I'm back to office.

Regards,

Jianzhong

0 Aymeric Bn over 1 year ago in reply to Jianzhong Xu

Prodigy 20 points

Hi Jianzhong,

Thank you for your quick reply! I know mine isn't that quick but I wanted to test my affirmation before porsting. I looked at that very interesting white paper and I came across this table:

Sensor @60 fps	Frame 0
CSI2-RX		Frame 0
VISS			Frame 0
LDC				Frame 0
MSC					Frame 0
Deep Learning						Frame 0
Display							Frame 0
Time (ms)	0	16.67	24.67	32.67	40.67	48.67	65.33

If I understand well, each element is triggered by a new frame. This means that in my application, the best scenario would be this one if I use VISS module:

Sensor @30 fps	Frame 0
CSI2-RX		Frame 0
VISS			Frame 0
App (OpenVX)				Frame 0
Time (ms)	0	33.33	66.66	99.99

Or this one without the VISS:

Sensor @30 fps	Frame 0
CSI2-RX		Frame 0
App (OpenVX)			Frame 0
Time (ms)	0	33.33	66.66

If I don't use OpenVX or GStreamer, just plain video4linux using V4L2_MEMORY_MMAP or V4L2_MEMORY_DMABUF in C. The time before initialization (streamon) and the moment where I can dequeue my first buffer successfully appears to be indeed 1 frame.

For my use case, I don't need to have a full frame to start processing, so I'd be interested accessing the maped buffer earlier than when it's fully ready and can be dequeued. Using MMAP I tried reading part of the buffer that is currently being filled by only queueing two buffers and reading that hasn't been dequed yet. Indeed, I get part of the image I can later dequeue. I have to make some other tests and modify my MIPI pattern generator card to make sure there is not another level of framebuffing but I think I am getting closer.

If there is no other layer then I am happy with what I've got, I will start my operations on the first part of the image, which means my first result will be available after only 8.33ms if start after 1/4 of the frame is received.

Sensor @30 fps	Frame 0
V4L2 MMAP		Line[0:270]
V4L2 MMAP VIDIOC_DQBUF			Frame 0
Time (ms)	0	8.33	33.33

My conclusion is that for this application, it's going to be hard to use OpenVX or Gstreamer as they are designed to work with whole frames so they have no choice but to wait for at least one frame.

Just to be sure I am not reinventing the wheel, does anyone know if there is another API or if v4linux can be configured so that I can access the CSI video input as a stream of pixels or lines instead of whole frames ?

Regards,

Aymeric

0 Jianzhong Xu over 1 year ago in reply to Aymeric Bn

TI__Guru 58915 points

Aymeric Bn said:
If I understand well, each element is triggered by a new frame. This means that in my application, the best scenario would be this one if I use VISS module:

No, that's not the case. VISS latency is not the frame duration, but the processing time. As long as the VISS finished processing, data will be available for next component in the pipeline. VISS processing time depends on the frame size.

Aymeric Bn said:
Just to be sure I am not reinventing the wheel, does anyone know if there is another API or if v4linux can be configured so that I can access the CSI video input as a stream of pixels or lines instead of whole frames ?

I don't think there is such a mechanism.

Processors

Processors forum

PROCESSOR-SDK-AM62A: Process CSI-2 RX Data on the fly for a low latency application