This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Audio Soc example performance on l138

Other Parts Discussed in Thread: OMAP-L138

Hello,

I've got my audio sample soc example running and I was wondering some question that DSP expert could probably answer easily:

On L138 there is a 128KB shared RAM between DSP and ARM mapped @0x80000000.

DSPLInk does not use it, it use DDR2 instead so data exange are done via EMIFA.

Shouldn't be more efficient to use the shared RAM instead? (assuming we only need 128K to communicate)

Where can I find n Shared ram access time?

Is it possible to define in DSPlink, one pool based on shared ram and one pool based on DDR?

Below is what I found on my current config:

 


 

/** ============================================================================
 *  @name   RESETCTRLADDR
 *
 *  @desc   Indicates the start address of Reset Ctrl memory region.
 *          last two nibbles must be zero i.e. align to 256 boundary.
 *  ============================================================================
 */
#define  RSTENTRYID         0
#define  RESETCTRLADDR      0xC3E00000
#define  RESETCTRLSIZE      0x80

/** ============================================================================
 *  @name   CODEMEMORYADDR/CODEMEMORYSIZE
 *
 *  @desc   Indicates startaddress/size for dsplink code region.
 *  ============================================================================
 */
#define  CODEENTRYID        1
#define  CODEMEMORYADDR     (RESETCTRLADDR + RESETCTRLSIZE)
#define  CODEMEMORYSIZE     0xFFF80u

/** ============================================================================
 *  @name   SHAREDENTRYID/SHAREDMEMORYADDR/SHAREDMEMORYSIZE
 *
 *  @desc   Indicates startaddress/size for dsplink shared memory region.
 *  ============================================================================
 */
#define  SHAREDENTRYID0     2
#define  SHAREDMEMORYADDR0  (CODEMEMORYADDR + CODEMEMORYSIZE)
#define  SHAREDMEMORYSIZE0  0x5000

/** ============================================================================
 *  @name   SHAREDENTRYID/SHAREDMEMORYADDR/SHAREDMEMORYSIZE
 *
 *  @desc   Indicates startaddress/size for dsplink shared memory region.
 *  ============================================================================
 */
#define  SHAREDENTRYID1     3
#define  SHAREDMEMORYADDR1  (SHAREDMEMORYADDR0 + SHAREDMEMORYSIZE0)
#define  SHAREDMEMORYSIZE1  0x2B000

/** ============================================================================
 *  @name   POOLMEMORYADDR/POOLMEMORYSIZE
 *
 *  @desc   Indicates startaddress/size for dsplink POOL memory region.
 *  ============================================================================
 */
#define  POOLENTRYID        4
#define  POOLMEMORYADDR     (SHAREDMEMORYADDR1 + SHAREDMEMORYSIZE1)
#define  POOLMEMORYSIZE     0x000D0000u


Regards,

Pep

  • Pep said:

    Shouldn't be more efficient to use the shared RAM instead? (assuming we only need 128K to communicate)

    Where can I find n Shared ram access time?

    Is it possible to define in DSPlink, one pool based on shared ram and one pool based on DDR?

    The shared RAM is only a little bit faster than the DDR in terms of access time.  At one point I saw some benchmarks though I can't remember where (sorry)!  Maybe someone else can add more quantitative info, but I wanted to at least get you this much info...  So in my opinion it probably isn't worth bothering with changing the memory map around.  You can probably get a better improvement by changing your compiler options or fine tuning a loop!

  • So bonus question:

    For what is used ths shared RAM  :)

    Have you ever seen a use case?

    Anyway thanks for your quick answer.

  • Pep said:

    For what is used ths shared RAM  :)

    1. It provides a parallel path such that simultaneous accesses from multiple DMA masters (e.g. ARM, DSP, UPP, EDMA, etc) can occur, e.g. one can access DDR while another accesses shared SRAM at the same time.
    2. For ultra low-cost, low-power operations I've seen a few people forego external memory altogether by using all the internal RAM.
  • Thx.

    Just for info I found this document telling what are the different bandwidth associated with each bus:

    http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SoC_Constraints

     

  • Correct me if I am wrong but :

    Shared DDR is usually uncached on the ARM side, right ?
    Uncached access to DDR is usually very slow.

    Putting things in internal ram can reduce also reduce cache pressure.

  • bandini said:
    Shared DDR is usually uncached on the ARM side, right ?

    If the ARM is doing any processing the performance will be horrible.  There are many cases where the ARM allocates buffers through CMEM as non-cacheable.  However, in these cases the ARM is generally not writing to the buffers (e.g. in the case where a frame buffer gets output by the LCD controller).  Other times the ARM is writing some small amount of control data where there is also little impact due to the small size.

    bandini said:
    Uncached access to DDR is usually very slow.

    Yes, but so is the L3 memory on the OMAP-L138.