[OMAP3530] VLIB and cache memory allocation

Michael Price

Other Parts Discussed in Thread: OMAP3530

I'm working on an image processing application that operates on VGA images (640x480). The on-chip memory (up to 176 K in our case) isn't large enough to store complete images. What is the preferred scheme for getting the best performance out of VLIB using the L1 and L2 cache? The options include:

Placing all of the image data in external memory and depending on automatic caching
Breaking down the image data into small chunks and manually placing them in memory-mapped caches
Something in between

Thanks

Michael

over 15 years ago

0 tlee over 15 years ago

TI__Guru 62975 points

Michael,

The answer will depend on which device you are using because the architecture varies between devices. Generally speaking, you would probably get fairly good performance by setting up a ping-pong buffer in L2 for chunks of image data that is transported between EMIF and L2 via EDMA. The cache can take care of moving data between L1 and L2.

-Tommy

0 Michael Price over 15 years ago in reply to tlee

Prodigy 185 points

Tommy,

Thanks for the advice. We're using the OMAP3530 (C64x+ core, 80 KB L1 data cache, 96 KB L2 cache); does that change anything? Do you know the approximate access latencies for the various types of memory (L1D, L2, DDR) ?

Michael

0 tlee over 15 years ago in reply to Michael Price

TI__Guru 62975 points

Michael,

I'm not very familiar with the OMAP3530 architecture. I moved this thread to the OMAP forum where the OMAP35xx experts should be monitoring.

-Tommy

0 Brad Griffis over 15 years ago in reply to tlee

TI__Guru*** 125430 points

Please read through this thread which is very similar to your question.

0 Michael Price over 15 years ago in reply to Brad Griffis

Prodigy 185 points

Indeed it is! Thanks, Brad.

A summary for anyone who runs into this one instead (please correct me if any is inaccurate):

Enabling automatic caching often results in good enough performance. This is done by setting appropriate MAR bits, configuration registers (L1PCFG/L1DCFG/L2CFG), and (if applicable) DSPLink memory table entries.
The 'touch' method in section 3.2.2 of the cache user's guide (SPRU862B) provides a relatively fast way to load an external buffer into cache.
Pipelined DMA transfers should be used to send chunks of data between L1D and external memory, if you're willing to go to great lengths for maximum performance.

0 Brad Griffis over 15 years ago in reply to Michael Price

TI__Guru*** 125430 points

Nice summary! Looks great. Thanks!

Processors

Processors forum

[OMAP3530] VLIB and cache memory allocation