This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6748 L3 on-chip RAM vs external DDR

Other Parts Discussed in Thread: TMS320C6748, OMAPL138

Good day experts,

We are using the C6748 DSP and I noticed that the access times for the external DDR memory (clocked at 150MHz) is actually faster than the on-chip L3 RAM of the C6748.

Is this to be expected, or am I doing something wrong? I can't seem to find any specific information on the details of the L3 RAM, can someone please point me to the correct documentation?

Thanks in advance!

  • Reinier,

    DSP Internal Memories are mentioned as below
    L2 RAM
    L1P RAM
    L1D RAM

    Please kindly look the TMS320C6748 datasheet for DSP internal memory.

  • Reinier,

    If you are looking for Internal (L2 RAM) vs External memory, some details are given below

    A fast but small memory is placed close to the CPU that can be accessed without stalls.
    The next lower memory levels are increasingly larger but also slower the further away they are from the CPU.
    Refer the "Figure 1-1. Flat Versus Hierarchical Memory Architecture" in the TMS320C674x DSP Cache User's Guide.
    http://www.ti.com/lit/sprug82

  • Pubesh,

    In the data sheet you are referring to (SPRS590D.pdf), the functional block diagram in section 1.3, indicates that there are 128KB internal RAM, besides the L1 and L2 RAM.  DSP/BIOS frequently refers to this RAM as L3 internal RAM, although its not actually on the C6748 DSP subsystem. In Table 2.1, in the On-Chip Memory row, there is again reference made to this as "ADDITIONAL MEMORY 128KB RAM".

    To rephrase my question: Why is this 128KB on-chip RAM slower when compared to the external mDDR?

  • Reinier,

    Yes, Shared RAM used in the name of L3RAM in DSP/BIOS (cfg file).
    128KB RAM is the shared memory, it is available for use by other hosts without affecting DSP performance.
    128KB Shared RAM address range :80000000h to 8001FFFFh
    Refer the section "4.3 Shared RAM Memory" in TMS320C6748 TRM (spruh79)

  • Pubesh,

    Please refer me to someone with a better understanding of the C6748 DSP. 

  • Hi Reinier,

    In order to address your re-phrased question in the above post, please see the below points:

    1. Actually, Both shared memory (128KB RAM) and DDR2 memory controller are external peripherals to C674x DSP CPU (You can see this in Section 1.3, Figure 1-1.TMS320C6748 DSP Block Diagram in the C6748 TRM, spruh79a.pdf). Both needs to exchange data to CPU through SCR's (Switched central resources) and BR's (bridges) and both shared RAM and DDR2 are driven by the same PLL output clock source (PLL0_SYSCLK2), please see Table 7-1. System PLLC Output Clocks in the C6748 TRM, this shows that, both are operating at same clock speed.

    2. Basically, DSP can exchange data to external peripheral or shared memory only through SCR's and BR's. In addition to this, SCR's provide priority based data movement between master peripheral and slave peripheral, for example, DSP can send data to EMIF module without impacing the data transfer between device peripheral and internal shared memory. Also, each BR may introduce additional latency in the data exchange between DSP and peripherals/shared memory.

    3. In considering the above point, and if you see Section 3.2, System Interconnect Block Diagram in the TRM, it clearly portrays the direction path for both DDR2 and shared RAM. To access 128KB Shared RAM from SCR1, it has to undergo bridge BRF6 which introduce additional latency, SCRF4 and MPU1, but if you need to access DDR2 through EMIF, it can directly pass data to SCRF3 and MPU2. Basically, bridges are used to perform bus-width and bus operating frequency conversion and for sure, it adds a typical delay in the data exchange process between CPU and peripherals/shared memory.

    Because of the above reasons, the access time for the external DDR2 is faster than the internal shared memory of C6748 device.

    Please see the C6748 TRM below:

    http://www.ti.com/lit/spruh79

    Hope it clarifies your query.

    Thanks & regards,

    Sivaraj K

    ----------------------------------------------------------------------------------------------------------
    Please click the Verify Answer button on this post if it answers your question.
    ----------------------------------------------------------------------------------------------------------

  • Reinier

    The L3 RAM is the internal name for the 128 KB Shared RAM. It is unfortunate that the old naming convention of L3 is still present in still documentation. While from a DSP perspective it is actually a level 3 RAM , for the OMAPL138, which is the ARM+DSP device which is in the same family as C6748 (and pin compatible) , for ARM this memory block is not at L3 level, so we decided to call it Shared RAM instead.

    In general it is expected that the access latency to Shared RAM and DDR are at par, DDR should eventually have more delays due to any additional self refresh cycles and external setup/hold time considerations etc.

    So while you will not see significantly different performance between access to Shared RAM vs DDR, I do expect them to be at par or shared RAM to be slightly better.

    There is some bench data on Shared RAM access latency that is posted on an un-categorized wiki (we don't have similar data collected on DDR)

    http://processors.wiki.ti.com/index.php/Shared_RAM_Access_Considerations_on_OMAPL1x/C674x/AM1x

    You might find this useful.

    It might be better to provide additional details on your testing and setup, if the above answer is not sufficient.

    Regards

    Mukul

  • Reinier,

    I have had a different experience when running a series of benchmarks to understand the performance of accessing code and data in the various memory options available to the DSP.  This includes using the tightly coupled memory L1P, L1D and L2 as both RAM and cache, or a combination.

    In these test, accesses to the on-chip shared memory resource, aka. L3 RAM, took less cycles than DDR provided that the L3 RAM was marked cacheable.  This is an important point and not enabled by default.

  • Hi Sivaraj and Mukul,

    Thank you for your detailed explanations, it is much appreciated. It now becomes clear as to why the shared RAM runs at more or less the same speed as the DDR.

  • Brandon,

    Thank you, this is indeed an important point that I have overlooked. However, in the simple memory copy benchmark that I have performed in shared RAM vs DDR, the results were pretty much the same. I suppose it might differ in other benchmarks, but I would not expect it to be too significant?

    In any case, I think the main point for me is that there is no real substitute for L2 internal RAM.