memory usage in RDK

ranchu

Guru 20755 points

Hello,

Is there a guideline page or something similar for description of memory usage calculation when using RDK ? I mean something that can give the general idea of memory usage which takes into account the complete frames (not headers) in buffering mechanism.

Thanks,

Ran

over 12 years ago

0 Yogesh Marathe over 12 years ago

TI__Expert 7765 points

Ran,

You can look at MemAnalysis.zip placed in dvr_rdk\docs\. It is for exact same purpose.

0 ranchu over 12 years ago in reply to Yogesh Marathe

Guru 20755 points

Hi Yogesh,

Thank you very much for the reference. It helps a lot for understanding this issue. Though I have some question about it, if I may ask please, I will refer to HD demo excel:

1. Is the excel fields all calculated in theory or does it contains measurements done with a software tool ?

2. BITBUF_MEM shows that size is 96.5M, without any reference. How was it calculated or measured ?

3. Is the memory map configuration file for the demo was build according to "alloc size" in summary tab ?

4. FRAME_MEM tab shows 3 columns: total size, actual measured frame buf, and actual measured total. "actual measured total" shows some calculation with relation to Tiler memory which is not used , why is that ? in FRAME_MEM, DEI & ENC Link "measured frame buffer" is much smaller then the "total size" of this Links. How is it calculated ?

Thank you very much,

Ran

0 Yogesh Marathe over 12 years ago in reply to ranchu

TI__Expert 7765 points

Ran,

I'm not the expert in this area I will forward the question to experts but I've follwoing comments

#1 These are manually majored and calculated numbers based on the 'i' print

#2 System integraters decide section sizes in the memory map based on requirements of the processing block and requirements of the usecase

#3 Memory map configuration files are present in dvr_rdk\mcfw\src_bios6\cfg\ti816x\, different configurations are available based on combination of total DDR memory and Linux memory.

#4 I'm not sure

Also, You can refer to app note http://ap-fpdsp-swapps.dal.design.ti.com/index.php/DVR_RDK_V3.0_GA_App_Notes_Memory_Map on memory map.

0 Badri Narayanan over 12 years ago in reply to Yogesh Marathe

TI__Guru 59700 points

1. Is the excel fields all calculated in theory or does it contains measurements done with a software tool ?

- The memory requirement for each link is measured from parameters like resolution and format (eg.420SP/422I etc). The value determined from formula is then compared with the actual create time memory allocation and any mismatch are reconciled. The excel sheet formulas are meant for extra polation for new usecases.

2. BITBUF_MEM shows that size is 96.5M, without any reference. How was it calculated or measured ?

- This is the total memory allocated in the memory map. The number is set such that memory is sufficient for all usecase supported in DVR RDK. When running a particular usecase the actual used size will be different based on resolution and number of channels.

3. Is the memory map configuration file for the demo was build according to "alloc size" in summary tab ?

- yes that is correct. The alloc size is the actual size of the memory segment in the memory map configuration file.

4.FRAME_MEM tab shows 3 columns: total size, actual measured frame buf, and actual measured total. "actual measured total" shows some calculation with relation to Tiler memory which is not used , why is that ? in FRAME_MEM, DEI & ENC Link "measured frame buffer" is much smaller then the "total size" of this Links. How is it calculated ?

- As you are probably aware DM816x supports two forms of memory

- TILED

- This memory is managed by the TILER IP and this IP makes 2D memory accesses in this region efficient.

- NON_TILED

- This is the regular memory.

FRAME_MEM is the NON_TILED portion. TILED memory is allocated separately from the dedicated tiled memory section that you will see in the memory map file.

THE DEI output buffers and ENC LINK reference buffers are allocated from TILED memory for performance reasons and these buffer allocations are not accounted in the FRAME_MEM size

0 ranchu over 12 years ago in reply to Badri Narayanan

Guru 20755 points

Hi Badri, Yogesh,

I would please like to know:

1. The tiled memory is allocated for some reasons as we see in frame_mem tab in excel, but the line showing it in memory map (summary tab in excel) is missing. Why was it not included in the summary ?
2. bitbuf_mem tab in excel shows that byte per pixel in enclink is 1. As I understand the output is 420, which is 12 bits/pixel (1.5 bytes). Why does it appear as 1 byte/pixel ?
3. What is the strategy for choosing buffer length in each Link.

Thanks,

Ran

0 Badri Narayanan over 12 years ago in reply to ranchu

TI__Guru 59700 points

-- Yes this is a mistake.256MB tiled memory is allocated .You can refer the /dvr_rdk/mcfw/src_bios6/cfg/ti816x/config_1G_256MLinux.bld file for actual size allocated for TILED memory.

2. bitbuf_mem tab in excel shows that byte per pixel in enclink is 1. As I understand the output is 420, which is 12 bits/pixel (1.5 bytes). Why does it appear as 1 byte/pixel ?

-- The bitbuf mem is the size of the encoded video frame not the raw video frame. We allocate (w X h) per bitstream buffer.The size of each bitstream buffer is determined by macro in #define UTILS_ENCDEC_GET_BITBUF_SIZE in /dvr_rdk/mcfw/src_bios6/links_m3video/codec_utils/utils_encdec.h.

3. What is the strategy for choosing buffer length in each Link.

- Are you refering to number of buffers per channel ?

Generally for all memory2memory links (links other than capture and display)

3 is minimum (1 buffer in producer link/1 buffer in consumer link/1 in the pipeline)

4 is safe number avergaing out momentary peaks in processing.

If buffers have to travels thru a longer chan (merge/dup/ipcIn/ipcOut) etc, probably additional buffers are required.

Also if A8 application does not free buffers in a timely fashion occasionally additional buffers are required.

Generally set a conservative number like 4 or 6 and if you require memory optimization, reduce the number (min 3) while ensuring performance is still realtime using the Vsys_printDetailedStatistics() API.

Regards

Badri

0 ranchu over 12 years ago in reply to Badri Narayanan

Guru 20755 points

Hi Badri,

The answer is most helpful! I have some more questions, if I may:

>We allocate (w X h) per bitstream buffer.

I supposed the bitstream buffer should take much less space then the original frame. why it is allcated as w x h and not something else, much smaller ?

>3 is minimum (1 buffer in producer link/1 buffer in consumer link/1 in the pipeline)

pipeline here means the hardware subsystem? I didn't totally understand the calculation of link buffer size here, is there some doc or page about such subject.

3.
Just to complete the question/answer I would like to understand the strategy of length of buffer requested with dec API ( Vdec_requestBitstreamBuffer)

Thank you very much for your time!

Ran

0 Badri Narayanan over 12 years ago in reply to ranchu

TI__Guru 59700 points

>We allocate (w X h) per bitstream buffer.

I supposed the bitstream buffer should take much less space then the original frame. why it is allcated as w x h and not something else, much smaller ?

-- Yes that is correct. We have seen size of (W x H)/2 to be sufficient for max encoded frame size. The correct size of each bitstream buffer should actually be determined from max bit rate. Size of each bit stream buffer = (MaxBitRate / FPS) * Ratio_Of_IFrame_to_AvgEncodedFrameSize. Assuming I frame can be 10x size of average encoded frame max size of encoded frame would be (MaxBitRate/Fps) * 10.We use a simpler formula of based on input frame width and height.

2. 3 is minimum (1 buffer in producer link/1 buffer in consumer link/1 in the pipeline)

pipeline here means the hardware subsystem? I didn't totally understand the calculation of link buffer size here, is there some doc or page about such subject.

-- Pipeline is the queue between two links as shown below.

Link0 ---Queue--> Link1.

Are you refering to the size of each buffer or the number of buffers per channel ?

Size of each buffer = MaxWidth * MaxHeight * Number_Of_Bytes_Per_Pixel.

Number of Bytes per pixel depends on the YUV format. Two formats are supported:

YUV422I : Bytes per pixel == 2

YUV420SP : Bytes per pixel == 1.5

3 Just to complete the question/answer I would like to understand the strategy of length of buffer requested with dec API ( Vdec_requestBitstreamBuffer)

-- This depends on your application. Vdec_requestBitstreamBuffer returns the free bit stream buffers available. Typically you would want to request 1 empty buffer per decode channel every time you invoke Vdec_requestBitstreamBuffer .Suppose 16 D1 decode channels @ 30 fps are present, you probably have a application thread that wakes up every 16 ms requests 16 empty buffers (1 per channel) fills the buffers with data and invokes Vdec_putBitstreamBuffer. This will ensure there is no underrun in input data for the channels

0 ranchu over 12 years ago in reply to Badri Narayanan

Guru 20755 points

Hi Badri,

The information you gave me is very valuable and saves me a lot of confusion, I have just one more thing to ask...

>Suppose 16 D1 decode channels @ 30 fps are present, you probably have a application thread that wakes up every 16 ms requests 16 empty buffers (1 per channel)

1. The 16msec wakes up period is twice the frame rate (30fps) on purpose, right ? I guess that if the expected frame rate is different we will choose other period.
2. Each buffer should be filled with one frame only ? but the frame size can be smaller than the this maximum buffer size ?

Thank you very much !!!!

Ran

0 Badri Narayanan over 12 years ago in reply to ranchu

TI__Guru 59700 points

1. The 16msec wakes up period is twice the frame rate (30fps) on purpose, right ? I guess that if the expected frame rate is different we will choose other period.

-- Yes. That is correct

2. Each buffer should be filled with one frame only ? but the frame size can be smaller than the this maximum buffer size ?

-- Yes Each buffer should be filled with one frame only and filledBufSize should be set to correct frame size. The filledBufSize should be less than max buffer size

0 ranchu over 12 years ago in reply to Badri Narayanan

Guru 20755 points

Thank you very much Badri,

One more thing... "Typically you would want to request 1 empty buffer per decode channel every time you invoke Vdec_requestBitstreamBuffer". What will happen if we request more than one buffer: If we put for example 2 buffers per channel every 2/frame_rate, does the decoder will decode them one after the other with interval between each decoding equal to 1/frame_rate ?

Thank you very much,

Ran

0 Badri Narayanan over 12 years ago in reply to ranchu

TI__Guru 59700 points

You can request more than one buffer. If emtpy buffer is available it will be returned to application.

If multiple frames are queued they will be decoded one after the other. Decoding happens as fast as possible and is not dependent on frame rate in the decoder link.

i.e. If input and output buffer is available frame will be queued for decoding. Decode process time depends on HDVICP2 process time (For example 2ms / D1 frame).

Frame rate is controlled by display rate.

If AVSYNC is disabled, a new frame will be given for display at the rate of SwMs output fps.

If AVSYNC is enabled, a new frame will be given for display depending on its Presentation timestamp (PTS)

The availability of output buffer in the decoder link (which depends on display rate) will indirectly control decode rate

0 ranchu over 12 years ago in reply to Badri Narayanan

Guru 20755 points

>If multiple frames are queued they will be decoded one after the other. Decoding happens as fast as possible and is not dependent on frame rate in the decoder link.

As I understand, if requesting more than one buffer, we should check the pipeline design in terms of buffer length in each queue, in order to verify that they can handle burst of buffers. for example if pipeline is decode -> swms -> display, we should check each time delay for process of the expected frame, let's say for HD frames the delays are:
decode(2)->swms (5)->display(frame-rate), and see for this burst of frames how many frames are expected in each buffer. But if we decide that there is only one frame delivered to decoder every 1/frame_rate, then we can stay with the minimum buffer length, and there is no need to add buffers for dealing with burst, right ?

Thanks,

Ran

0 Badri Narayanan over 12 years ago in reply to ranchu

TI__Guru 59700 points

I am sorry I dont understand your comment. Are you proposing change to how many buffers Vdec_requestBitstreamBuffer should return ?

Since frame processing depends on OS scheduling of M3 thread and IP processing time (HDVICP2/HDVPSS) there is not upper limit guarantee for completion of frame processing. On average the frame is expected to take certain time based on resolution but a single frame may take greater than 16ms/33ms depending on complexity of the bitstream (for decoding) and the instantaneous DDR b/w & M3 load.

The main thing application has to ensure is there is no underrun on input. To avoid this, application feeds data as fast as possible ensuring enough input buffering inside mcfw at all links to avoid input underrun. As mentioned previously, consumption of frames on playback side is determined by display fps or presentation timestamp of the frame (if AVSYNC is enabled).

0 ranchu over 12 years ago in reply to Badri Narayanan

Guru 20755 points

>Are you proposing change to how many buffers Vdec_requestBitstreamBuffer should return ?
No, I am referring to Vdec_putBitstreamBuffer
>The main thing application has to ensure is there is no underrun on input.
The application feed frames to decoder with Vdec_putBitstreamBuffer in rate equal to 1/frame_rate, but it can feed 1 frame (every 1/frame_rate) or 2 frames every (2/frame_rate).
In general the application should feed X frames every X/frame_rate.
Now, from design view of pipeline buffering length, it seems best that the application will feed only 1 frame every 1/frame_rate. This way there is no burst of frames from the feeder of pipeline.

Any comment is helpful,

Thanks,

Ran

0 Badri Narayanan over 12 years ago in reply to ranchu

TI__Guru 59700 points

Is there any reason why you want to avoid bursty input ? From mcfw point bursty input will get averaged out and there is not issue as long as applications ensures no underrun.

If you want to avoid bursty input for some reason then yes it is better to feed at 1 frame every 1/frame_rate but this is risky as it may cause underrun as any small processing delay can result in missing realtime deadline .

Processors

Processors forum

memory usage in RDK