Ping-Poing coordination

Anonymous

Other Parts Discussed in Thread: TVP5146

Hi All,

I would like to ask a basic question on Ping-Pong implementation.

The CCDC of VPFE accepts a continuous stream from BT.656 output image sensor, do the de-interlacing and put it into somewhere in the DDR2 memory…

But, does it turn back around?

If it continues to output to the memory starting from a certain address, incrementing the target address counter pointer after every field (or frame), then after a short period there will be no empty place in the memory left with past (historical) frames occupying all the space.

Obviously this could not be the way VPFE works. To a beginner like me, I am just seeking the way how VPFE coordinates with VPBE, or more generally, frame processing algorithms to efficiently make use of the memory space.

I would imagine that a possible scheme is to allocate two blocks of memory, when VPFE is writing to one VPBE reads from another, and this is would be a continually alternating process.

Pinning down this to detail one need

1. A way of alternatively change VPFE’s target address and VPBE’s source address.

2. A way that VPFE notifies VPBE of its completion, and vice versa.

In the scenario of more general video processing algorithm, changing VPBE above to the “algorithm function”:

1. A way of alternatively change VPFE’s target address and function’s reading address.

2. A way that VPFE notifies the function of its completion, and vice versa.

How this could be done? I wish to know how the registers and interrupt vectors need to be set in order to achieve this. And at my best hope, is there any code example for the abovementioned process?

I appreciate any help on this.

Sincerely,

Zheng

over 15 years ago

0 Anonymous over 15 years ago

Hi All,

I googled and found Paul Yin’s answer on a similar question saying that changing SDR_ADDR is a way of doing this.

What is the default mode of VPFE’s operation? If one does not change SDR_ADDR after setting it to an empty address at the initialization, what will happen? Will SDR_ADDR automatically increment after each frame/field?

If SDR_ADDR doesn’t increment itself after the completion of each field/frame, it looks that that VPFE will continually overwrite the previous data. So in this manner how can VPBE output the data?

Does VPBE retrieve the data rapidly before VPFE writes the next frame/field? Does it possess a buffer for this?

I also read that in page 74 of 643x EDMA user’s guide SPRU987a, EDMA can somehow do Ping-Pong buffering. Is this Ping-Pong the same as the algorithmic one (in its purpose) which I described above? Is it imperative to use EDMA to do Ping-Pong, or this can be done entirely in VPFE? Relatively, which one is more efficient?

Thanks,

Zheng

0 Paul.Yin over 15 years ago in reply to Anonymous

TI__Genius 14405 points

Zheng,

If you don't rewrite SDR_ADDR with some other value, it shouldn't change. At the beginning of the frame, the value is latched, and it will keeps overwritting previous data.

VPBE and VPFE shouldn't use the same buffer, that's why you need at least double buffering.

VPSS uses its internal DMA, no need to think about system EDMA.

0 Anonymous over 15 years ago in reply to Paul.Yin

Dear Paul,

I reckon that the necessity of doing Ping-Pong (double buffering) depends the execution time of the ISR (interrupt service routine) code.

Please have a look at the picture which is taken from page 24 of SPRU977a and I made some annotation on that.

Consider two cases:

1. At the end of one frame (EAV) the interrupt (VDINT0, 1 or 2) triggers ISR1. ISR1 does extremely simple computation on the frame just finished, for example, merely adding up the luminance value of each pixels and nothing all, which takes only a snap to finish. This is so quick so that it can happen before the start of next SAV.

In this case there is no competition: when the new frame, marked by the next SAV, comes, the processing on the previous frame has already been finished. There is nothing prevent, and no reason why VPFE should not overwrite the original frame in the DDR2 address.

2. At the end of one frame the interrupt triggers ISR2. The difference between ISR2 and ISR1 is that ISR2 is a fairly complicated computation on the frame just written and takes much longer time beyond the interval between EAV and the next SAV.

In this case there is clearly computation. If SDR_ADDR remains unchanged, when the next frame starts being written to the address occupied by the previous frame, ISR2 computation on the previous frame has not yet been finished. This is of course a serious problem.

So only in cases like this, the defining characteristics is that the time needed for ISR’s processing on the previous frame is longer than the interval between an ending EAV and the next SAV, is Ping-Pong needed. The ISR can change at its beginning line the address of SDR_ADDR so that the next frame will be written to a different address. There can be more than two buffers, but quantitatively, if the time needed for ISR is shorter than the time for one frame, for example, 0.8 frame, then double buffering is sufficient.

This is my reasoning. Are they correct? Is there any other practical considerations I need to take into account?

Sincerely,

Zheng

0 Anonymous over 15 years ago in reply to Paul.Yin

Dear Paul,

I have seen several example programs such as Spectrum’s video_loopback.c, in which the essential program part merely includes

1. tvp5146_init( );

2. vpfe_init( 0x81000000, 720, 480 ); // Setup Front-End

3. vpbe_init( 0x81000000, 720, 480, 0 ); // Setup Back-End

I once thought that there must be a double buffering to avoid competition. More concretely, when VPBE is reading the previous frame in the DDR2 memory, VPFE would need to write to a different address, and this needs to be an alternating process.

But this turns out to be only an incorrect idea because I failed to calculate quantitatively. Among the two cases I described above, the process of

1. VPFE DMA writes to DDR2

2. VPBE DMA reads from DDR2

together, due to the bandwidth of DDR2 and DMA, is a bursting operation and takes very little time for data amount as small as a single frame. So these two combined, in terms of execution time, can be classified into the first case (rapid ISR1) which I described in the reply above, and indeed in this situation no Ping-Pong is needed.

Is this understanding correct?

I would also like to verify with you my guess on the internal coordination between VPFE and VPBE:

When does VPFE send an interrupt when a frame is finished? I think there should be a communication process:

1. VPFE (CCDC, or first CCDC and then resizer, or previewer, etc.) finishes sending BT.656 stream of a complete frame to DMA.

2. DMA write the frame into DDR2 in a very short time.

3. DMA sends VPBE an interrupt (Are there other ways?) VPBE reads the frame into its internal buffer or output to the screen.

4. VPBE sends DMA and interrupt that it has finished reading the frame (so that the address occupied by the frame in DDR2 can now be written).

5. DMA sends VPFE and interrupt to tell it that VPBE has finished reading the frame.

6. Now VPFE can start writing to the address again (overwrite).

The details of steps 1-6, as I understand, are all logically necessary. But practically, due to the high speed and bandwidth of DMA and DDR2, the whole back and forth could always be finished in a very short time, shorter than then interval between EAV and the next SAV. So in the circuit design, probably several of the intermediate back-and-forth interrupt acknowledgements can be omitted.

Again, is this correct?

Sincerely,

Zheng

0 Paul.Yin over 15 years ago in reply to Anonymous

TI__Genius 14405 points

Zheng,

BTW, VPBE and VPFE are not always using the same pixel clock; even if they do, the are not always synced; even when they are synced, the delay is "random" for each run, and the delay is almost never a integer multiple of frames (not even lines).

Let's take a step back and revisit your application. Please explain to me what you plan to do in your application. We can discuss further after that.

For certain applications, it is possible to do it without using multi-buffering (ping-pong = double buffering); but for the majority of the applications, double buffering is not even enough, sometime you even need triple buffering (some of our application got 5-8 buffering). Also, most of the application are multi-threaded, especially those w/ OS support. For the multi-threaded application, multibuffering will make more sense.

Is there a field application engineering supporting you directly?

0 Anonymous over 15 years ago in reply to Paul.Yin

Dear Paul,

Is there any way I can send you a private message via the E2E system? I cannot discuss everything publically.

Zheng

0 Paul.Yin over 15 years ago in reply to Anonymous

TI__Genius 14405 points

You will need to add me as a friend first, then we will be able to talk privately. Do you have a sales rep or FAE helping you at the moment?

0 Anonymous over 15 years ago in reply to Paul.Yin

Normal 0 false false false EN-US ZH-CN X-NONE

Dear Paul,

“BTW, VPBE and VPFE are not always using the same pixel clock; even if they do, the are not always synced; even when they are synced, the delay is "random" for each run, and the delay is almost never a integer multiple of frames (not even lines).”

I don’t quite understand this. Whatever ever scheme the internal circuit is running in, I think we basically can be certain on two external behaviors:

1. The input coming into VPFE, which is usually from a camera or decoder, is a stream of BT.656 stream of constant rate.

2. The output coming out of VPBE, which would be supplied to either VGA or analog TV display, also has a constant frame rate, otherwise the frame rate at the display device cannot be maintained constant (Or if the display devices are so “intelligent” that they have their own internal buffering to mitigate this? I think this is very rate).

Therefore, if we regard VPSS (video processing subsystem) as a black box, we should basically observe the time relationship like in the table above. Since both VPFE input and VPBE output maintains constant rate, then the delay between each corresponding frame pairs is also a constant time. The delay of five frames between corresponding frames in line 1 and 3 is only conveniently assumed, and it might be not an integer multiple of frame time at all.

For VPSS to work properly, I think it should be imperative for this “constant delay” to be satisfied.

Nevertheless, we might still have no knowledge of the internal coordination between VPFE and VPBE. Perhaps they are not synced in the mode I described in the previous reply? This is why I didn’t align the frame numbers in the second line (VPFE output) precisely with the other two lines.

In fact, my ideal expectation would be like this:

The lag for each frame between neighboring states are always constant. This is very idealistic and failed to include the intricacies of the chip architecture, so might be incorrect.

Do I understand you correctly? Could you expand more on this?

Sincerely,

Zheng

0 Paul.Yin over 15 years ago in reply to Anonymous

TI__Genius 14405 points

Zheng Zhao said:

“BTW, VPBE and VPFE are not always using the same pixel clock; even if they do, the are not always synced; even when they are synced, the delay is "random" for each run, and the delay is almost never a integer multiple of frames (not even lines).”

Here are some examples on what I meant in the above statement. It probably is not a issue, sorry about the confusion.

For example: you capture D1 (60 field per second, 30 frame per second, 8bit y/c muxed @ 27mhz clock), but display 720P60 (60 frame per second, 16bit y/c non-muxed @ 74.25mhz clock), that's not even the same frame rate. The two streams are almost independent, hence not synced

Example 2: 720P60 in and 720P60 out. clock/ frame is the same, except you don't know the exact delay (hence random). By the time you start to display the first captured frame, you may be capturing the 50th pixel on the 3rd row of the 2nd frame or capturing the 6th pixel on the 50th row of the 3rd frame. The exact delay is especially indeterministic in a multithreaded application. Afterwards, the delay is constant, but you will not know exactly what the delay is.

0 Anonymous over 15 years ago in reply to Paul.Yin

Dear Paul,

The second example is exactly the quesion I am asking. Same VPFE/VPBE rate, indeterministic but constant delay.

Thanks very much.

Zheng

Processors

Processors forum

Ping-Poing coordination