This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[OMAP3530] VLIB and cache memory allocation

Other Parts Discussed in Thread: OMAP3530

I'm working on an image processing application that operates on VGA images (640x480).  The on-chip memory (up to 176 K in our case) isn't large enough to store complete images.  What is the preferred scheme for getting the best performance out of VLIB using the L1 and L2 cache?  The options include:

  • Placing all of the image data in external memory and depending on automatic caching
  • Breaking down the image data into small chunks and manually placing them in memory-mapped caches
  • Something in between

Thanks

Michael

 

  • Michael,

    The answer will depend on which device you are using because the architecture varies between devices.  Generally speaking, you would probably get fairly good performance by setting up a ping-pong buffer in L2 for chunks of image data that is transported between EMIF and L2 via EDMA.  The cache can take care of moving data between L1 and L2.

    -Tommy

  • Tommy,

    Thanks for the advice.  We're using the OMAP3530 (C64x+ core, 80 KB L1 data cache, 96 KB L2 cache); does that change anything?  Do you know the approximate access latencies for the various types of memory (L1D, L2, DDR) ?

    Michael

     

  • Michael,

    I'm not very familiar with the OMAP3530 architecture.  I moved this thread to the OMAP forum where the OMAP35xx experts should be monitoring.

    -Tommy

  • Please read through this thread which is very similar to your question.

  • Indeed it is!  Thanks, Brad.

    A summary for anyone who runs into this one instead (please correct me if any is inaccurate):

    • Enabling automatic caching often results in good enough performance.  This is done by setting appropriate MAR bits, configuration registers (L1PCFG/L1DCFG/L2CFG), and (if applicable) DSPLink memory table entries.
    • The 'touch' method in section 3.2.2 of the cache user's guide (SPRU862B) provides a relatively fast way to load an external buffer into cache.
    • Pipelined DMA transfers should be used to send chunks of data between L1D and external memory, if you're willing to go to great lengths for maximum performance.

     

  • Nice summary!  Looks great.  Thanks!