FileRead->Encode->FileWrite Usecase

bing liu

Expert 2900 points

Hi,

I create a fileread->encode->filewrite usecase, I use the following steps to feed yuv data to encode input.

0. VdecVdis_ipcFramesFillBufInfo()

1. read yuv data from input file

2. Vdis_putFullVideoFrames()

3. Vdis_getEmptyVideoFrames()

4. VdecVdis_ipcFramesFreeFrameBuf()

I use the following steps to get h264 data from encode output.

0. Venc_getBitstreamBuffer()

1. write h264 data to file

2. Venc_releaseBitstreamBuffer()

Now the program could run successfully. But once I feed one frame per channel to encode input I will get newdata available callback twice and if the encode channel number are more than 2, the channel index larger than 1 will have each frames twice. The following figure will show the situation.

I don't know why this happens, could someone give me some advise?

over 12 years ago

0 Badri Narayanan over 12 years ago

TI__Guru 59700 points

Attach the usecase file where you are creating and connecting links and the application file which is calling steps 0 - 4 you have mentioned above

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

Thanks for your reply! I attach the files in the end. I refer to the file demo_vdec_vdis_frames_send.c to feed yuv data to encode. I don't quite understand the following codes in VdecVdis_ipcFramesFillBufInfo() function.

bufList->frames[bufList->numFrames].addr[0][0] =frameObj[i].bufVirt;

bufList->frames[bufList->numFrames].phyAddr[0][0] = (Ptr)frameObj[i].bufPhy;

As each frame in bufList corresponding to one encode channel, why all the addr of each frame are set to the same. I modify it in my usecase. I use multi frameObj and each has one frame.

4846.Desktop.zip

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

Do you find some issues in my codes?

0 Badri Narayanan over 12 years ago in reply to bing liu

TI__Guru 59700 points

Your application code is wrong.

IpcFramesFillBufInfo will populate only those channels which have frameObj[i].refCnt == 0.

But when reading from file you are doing

for(i=0; i<ENC_CH; i++)
			{
				fread((unsigned char *)(bufList.frames[i].addr[0][0]), 1, gReadYUVConfig.width * gReadYUVConfig.height, gReadYUVConfig.fin[i]);
				fread((unsigned char *)(bufList.frames[i].addr[0][1]), 1, gReadYUVConfig.width * gReadYUVConfig.height * 0.5, gReadYUVConfig.fin[i]);
				if(feof(gReadYUVConfig.fin[i]))
				{
					fseek(gReadYUVConfig.fin[i], 0, SEEK_SET);
				}
			}

Correct code should be

		if(bufList.numFrames)
		{
		    //printf("0rfcnt is %d\n", frmObj[1].refCnt);
			for(i=0; i<bufList.numFrames; i++)
			{
				fread((unsigned char *)(bufList.frames[i].addr[0][0]), 1, gReadYUVConfig.width * gReadYUVConfig.height, gReadYUVConfig.fin[bufList.frames[i].channelNum]);
				fread((unsigned char *)(bufList.frames[i].addr[0][1]), 1, gReadYUVConfig.width * gReadYUVConfig.height * 0.5, gReadYUVConfig.fin[bufList.frames[i].channelNum]);
				if(feof(gReadYUVConfig.fin[bufList.frames[i].channelNum]))
				{
					fseek(gReadYUVConfig.fin[bufList.frames[i].channelNum], 0, SEEK_SET);
				}
			}
			printf("feed buflist to encode\n");
			status = Venc_putFullVideoFrames(&bufList);
      		OSA_assert(0 == status);
		}

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

Thanks for your reply!

I apply the changes as you said, But the result is just the same. The channels beyond 2 still have each frame twice. In my codes, I assign one frameObj for each channel, the bufList.frame[i] is corresponding to channel i. So the above two sections of codes are the same.

I wonder if it is something wrong with IpcFramesFreeFrameBuf() function. It seems that those channels who have each frame twice do not release the input buffer properly.

0 Badri Narayanan over 12 years ago in reply to bing liu

TI__Guru 59700 points

The application logic you are using currently is highly prone to errors.

It is better you have a separate queue of free buffer per channel.

1.Allocate from empty buffer from channel specific free queue.

2.Fill with data

3.Put Full Frames

4. Get back empty frame and free into channel specific empty queue.

This way there is no chance of reading content wrongly.

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

Thanks so much for your reply! What do you mean by have a separate queue of free buffer per chanel? Should I use OSA_que? Could you give some example on how to use OSA_que?

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

I find the issue. In FileFD_ipcBitsProcessFullBufs() function, After get encode output bufList I should use fullBufList.numBufs to limit the loop time. I use ENC_CH for mistake.

Thanks so much for your kindly help!

Another question, now we want to do the video mix job on ARM side instead of using swMs Link. As we need a lot of memcpy operations, we want to know if we could use assembly codes?

0 Badri Narayanan over 12 years ago in reply to bing liu

TI__Guru 59700 points

You can use memcpy_neon which is part of DVR RDK 4.0 but that too is not suitable for copying video frames. You should use DMA if you intend to copy video frames. Doing video frame processing on ARM will result in very poor performance.Why cant you use SwMs

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

Thanks for your reply!

When we use swMs to do video mix the timedelay of the whole data process is about 200ms. I test the decode and encode timedelay and find these two are all very small, just more or less 2 ms. So the most contribution of timedelay will be dupLink and swMsLink, is that right? My boss thought 200ms is a bit long and want to do the video mix on ARM side. The data flow is as following:

RTPRecv->decode->ARM side video mix->encode->RTPSend

What do you mean by Doing video frame processing on ARM will result in very poor performance? Is it because the ARM is too slow or some other reasons?

You mention that we could use DMA to copy video frames, could you give an example on how to use DMA on ARM side?

0 bing liu over 12 years ago in reply to Badri Narayanan

Expert 2900 points

Hi Badri,

As in my app the memcpy size is 320, I test the memcpy_neon and memcpy with data size of 320. The result shows that the efficiency of memcpy_neon is almost the same as memcpy.

memcpy_neon : (320 bytes copy) = 1751.0 MB/s
memcpy_arm : (320 bytes copy) = 1746.4 MB/s

I read the post at http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html . But I don't quite understand on what conditions the memcpy_neon could get a higher efficiency. Could you give some tips?

0 Badri Narayanan over 11 years ago in reply to bing liu

TI__Guru 59700 points

We have seen memcpy_neon will give 2x better performance if block size of memcpy is greater tha 10K bytes.For small size like 320 bytes it will not give noticeable performance improvement although you are seeing 5MB/s improvement which is siginificant.

Processors

Processors forum

FileRead->Encode->FileWrite Usecase