This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6742: L1 cache configuration with AIS NOR boot

Part Number: TMS320C6742
Other Parts Discussed in Thread: TMS320C6748

Hi,

We are developing an application for a C6742 DSP based production custom board and we are facing problems to fit our application code into the internal SRAM memory of this DSP.  We are launching our application by means of a AIS NOR boot, which configures by default the internal memories L1D, L1P as cache (32KB each) and L2 as RAM (64KB). In addition to these internal memories, our board features an external SDRAM memory connected through EMIF (with speed around 24MHz). 

During the application development we have been using the TMS320C6748 DSP Development Kit (LCDK), featuring the C6748 DSP, which has a greater L2 memory (256KB) than C6742. We have been running several tests allocating all the code into this augmented L2 memory, satisfying the time performance constraints defined for the application.

The problem is that our application code is far too big to be stored completely in the final L2 64KB SRAM memory of C6742 DSP (around 200KB), and we would want to minimize the code placed in the external SDRAM memory due to the time performace constraints I mentioned before. We have identified approximately the critical sections of the application code, on which those timing restrictions are mostly applied, and possibly they will not fil completely into L2 memory. So, assuming will need to take part of this code out of L2, we have considered two possibilities here:

- Place the most critical sections of the code in internal L2 memory and take the rest of it to the external SRAM memory. This will cause part of the critical code (not the most critical, but critical in some way) to be in a significantly slower memory, which possibly will affect the timing performance.

- Configure part of L1D, L1P memories as RAM so as to fit all critical code within internal memories (L1 and L2). The goal will be to maintain enough L1 memory space available for cache purposes to not slow down the coding execution but avoiding placing critical code in the external SRAM.

The second possibility could be the optimal, but raises several questions:

1. As far as I know, there are two possibilities to change the default configuration of L1 from cache to RAM: either instructing the bootloader to change that configuration or doing it in runtime in the code application itself. I do not know if the former way is possible for the AIS NOR boot, at least I did not see any reference to do that in the bootloader app note. I have seen some alternatives in similar threads on this topic such as a secondary bootloader, but I think this would mean to discard the AIS generation tool, am I wrong? On the other hand, I have seen an example of the runtime possibility in the Cache User Guide, section 2.7, but I do not know whether it is applicable to our case. In any case, I am wondering how tricky this be. Do you have any advise on that?

2. Considering the configuration can be done, what would be the criteria to allocate code to L1P and L1D memories? Is it enough to copy, for instance, .text sections (executable code)  to L1P and .far sections (static variables) to L1D, or other considerations should be taken into account?

3. I have seen in some GEL and .cmd files, and also in the datasheet documentation of C6724 that the L1P, L1D, L2 memories are often referred by two different addresses: [0x0080 0000, 0x00E0 0000, 0x00F0 0000] and [0x1180 0000, 0x11E0 0000, 0x11F0 0000]. The second group of addresses is referred as "shared" or "mirror". What is the difference between them? 

4. Would you have any estimation regarding to what extent the approach using L1 SRAM will offer a significative peformance gain respect to the one keeping the whole L1 as cache?

Any advise or help will be appreciated

Thanks,

David

  • David,

    Sorry for the delayed response. L1D and L1P for the DSP is setup as cache by the boot loader as described in section 9.1 in the Using C674x Bootloader Application note indicates and there is no way you can modify the setting using AIS image format.

    http://www.ti.com/lit/an/spraat2f/spraat2f.pdf

    One thing that you can try though is our compiler setting on your application to optimize it for size and check for the tradeoff of performance vs size. With -O3 setting the compiler is known to inline a lot of functions that will increase the code size so keeping that in check would be a good starting point:

    http://processors.wiki.ti.com/index.php/C6000_Compiler:_Recommended_Compiler_Options#If_code_size_is_a_concern.2C_consider_using_the_following: 

    In newer versions of the compiler this is also referred to as (--opt_for_space)

    We typically do not recommend configuring L1 as SRAM simply because the effort involved to enable it is high due to need to use IDMA for data moment and also because of the performance penalty it has for the code running out of L2/L3/SRAM  memory.

    Regards,

    Rahul

  • Hi,

    First, thanks for your reply.

    So, as I understand, our best chance is to fit all the critical code in L2, leaving in external memory just non-critical sections. As you propose, I have been exploring the compiler optimization options these past few days (-O, --opt_for_speed, --opt_for_space), and I have managed to identify a tradeoff between timing performance and code size that could be sufficient on that purpose.

    Just a couple of questions to close the thread. First, the number 3 of the initial post. I have included the code of the .cmd file refering to those memory sections:

    MEMORY
    {
    DSPL2ROM o = 0x00700000 l = 0x00100000 /* 1MB L2 Internal ROM */
    DSPL2RAM o = 0x00800000 l = 0x00040000 /* 256kB L2 Internal RAM */
    DSPL1PRAM o = 0x00E00000 l = 0x00008000 /* 32kB L1 Internal Program RAM */
    DSPL1DRAM o = 0x00F00000 l = 0x00008000 /* 32kB L1 Internal Data RAM */


    SHDSPL2ROM o = 0x11700000 l = 0x00100000 /* 1MB L2 Shared Internal ROM */
    SHDSPL2RAM o = 0x11800000 l = 0x00040000 /* 256kB L2 Shared Internal RAM */
    SHDSPL1PRAM o = 0x11E00000 l = 0x00008000 /* 32kB L1 Shared Internal Program RAM */
    SHDSPL1DRAM o = 0x11F00000 l = 0x00008000 /* 32kB L1 Shared Internal Data RAM */


    EMIFACS0 o = 0x40000000 l = 0x20000000 /* 512MB SDRAM Data (CS0) */
    EMIFACS2 o = 0x60000000 l = 0x02000000 /* 32MB Async Data (CS2) */
    EMIFACS3 o = 0x62000000 l = 0x02000000 /* 32MB Async Data (CS3) */
    EMIFACS4 o = 0x64000000 l = 0x02000000 /* 32MB Async Data (CS4) */
    EMIFACS5 o = 0x66000000 l = 0x02000000 /* 32MB Async Data (CS5) */
    SHRAM o = 0x80000000 l = 0x00020000 /* 128kB Shared RAM */
    DDR2 o = 0xC0000000 l = 0x20000000 /* 512MB DDR2 Data */
    }

    SECTIONS
    {
    .text > SHDSPL2RAM
    .stack > SHDSPL2RAM
    .bss > SHDSPL2RAM
    .cio > SHDSPL2RAM
    .const > SHDSPL2RAM
    .data > SHDSPL2RAM
    .switch > SHDSPL2RAM
    .sysmem > SHDSPL2RAM
    .far > SHDSPL2RAM
    .args > SHDSPL2RAM
    .ppinfo > SHDSPL2RAM
    .ppdata > SHDSPL2RAM

    /* COFF sections */
    .pinit > SHDSPL2RAM
    .cinit > SHDSPL2RAM

    /* EABI sections */
    .binit > SHDSPL2RAM
    .init_array > SHDSPL2RAM
    .neardata > SHDSPL2RAM
    .fardata > SHDSPL2RAM
    .rodata > SHDSPL2RAM
    .c6xabi.exidx > SHDSPL2RAM
    .c6xabi.extab > SHDSPL2RAM
    }

    Mi question is just to know why the code is allocated to this"shared" section instead of the base one: DSPL2RAM 

    Also, related to the final memory setting, in which some code will need to be placed necessarily in external memory, I have seen in the Cache User's Guide that external memory addresses are set by default as noncacheables, and they must be explicity set as cacheable.I understand this is done in this way to avoid to cache memory sections that are likely to be accessed externally, but I have doubts about whether in our setting this concern is applicable. I mean, the same cache coherence mechanisms applying between L1 cache and L2 RAM would not apply between L1 cache and external SDRAM through EMIF, provided that there are no unexpected accesses to the external memory? In which setting this coherence would be violated?

    Thanks in advance,

    David

  • David,

    I would like to clarify a couple of things based on your linker command file. 

    1. Your linker command file is instructing the linker to put the code in that section if you look at SECTIONS. The > SHDSPL2RAM, puts the code in the SHDSPL2SRAM.

    2. Please note that the shared L2 RAM at address (0x1180_0000) and L2RAM at address (0x0080_0000) is referring to the same memory on the device. the first address is a global address used by other master on the device like EDMA or peripheral or co-processors. While the second address refers to the DSP internal address that it uses to access the memory so in either case the code will go to L2 RAM of the DSP

    Regards,

    Rahul