This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

cache configuration in DSP 64x+ of OMAP3530

Other Parts Discussed in Thread: OMAP3530

Hi everyone!

Currently I'm developping a vocoder algorithm that must run at a
certain speed  on the DSP 64x+ of the OMAP3530 at the beagle board platform.
After having updated my code and having optimized as much as I could,
I found out that memory could be configured in a 2-way cache associative.

Namely:

I have configured the DSP with this file .cmd:

-c                    /* 14 ROM model.. */
-m boot.map
--stack_size=4096

/* SPECIFY THE SYSTEM MEMORY MAP */

MEMORY
{
       L2_BOOT : origin = 0x107F8000, length = 0x000020  /* 20 bytes for
BOOT CODE */

       L2   : origin = 0x107F8020, length = 0x017Fe0  /* nearly 96kB
program or data RAM */

       L1P  : origin = 0x10E00000, length = 0x004000  /* 16kB program RAM */

       L1P_cache : origin = 0x10E04000, length = 0x004000  /* 16kB cache
program RAM */

       L1D  : origin = 0x10F04000, length = 0x014000  /* 48kB data RAM */
}

/* SPECIFY THE SECTIONS ALLOCATION INTO MEMORY */

SECTIONS
{

       .etext: {} > L2_BOOT

       .text >> L1P | L2    align=32
       .cio: {}                >  L2     align=32
       .switch: {} >  L2    align=32

GROUP  > L1D
       {
               .comm2arm_mem_sect:  align=32
               .epdebug :                                       align=32
               .shared_mem_sect :       align=32
               .bss       align=32
               .const: {} align=32
               .far:   {} align=32
         .cinit: {} align=32
       }

       .sysmem                 > L1D align=32


       GROUP  > L1D (HIGH)
       {
               .stack_protection_sect
               .stack
       }

 

}

//
==================================================================================

As you can see, I reserve half of L1P as cache memory.

Once done, you must configure your ARM driver and  properly
configure the space you want to use for cache.

The L1P memory at IVA2.2 subsystem can be set for working as 4/28 8/24
16/16 or full cache in 2-way associative.

So... setting correctly the XMC registers you could configure your L1P
DSP memory in a 16K/16K way. This is 16K 2-way associative cache and 16K memory
mapped .

Well. Once done all this, I compile my code and link it.When
executing... it appears some kind of error which seems to be a memory
overflow, and it doesn't make sense!

The ERROR 1008 appears!!

#Error 1008 : esternal abort on non-linefetch 

Does anyone deal with cache configuration before?
Please report me the answers, and if there's something you don't
understand I'll give you more details.

Thanks in advance,

Ignacio.

  • Ignacio Fernandez said:

    As you can see, I reserve half of L1P as cache memory.

    Once done, you must configure your ARM driver and  properly
    configure the space you want to use for cache.

    The L1P memory at IVA2.2 subsystem can be set for working as 4/28 8/24
    16/16 or full cache in 2-way associative.

    So... setting correctly the XMC registers you could configure your L1P
    DSP memory in a 16K/16K way. This is 16K 2-way associative cache and 16K memory
    mapped .

    Well. Once done all this, I compile my code and link it.When
    executing... it appears some kind of error which seems to be a memory
    overflow, and it doesn't make sense!

    The ERROR 1008 appears!!

    #Error 1008 : esternal abort on non-linefetch 

    Does anyone deal with cache configuration before?
    Please report me the answers, and if there's something you don't
    understand I'll give you more details.

    If you're using OMAP3530 then at a bare minimum I HIGHLY recommend that you use DSP/BIOS RTOS on the DSP and dsplink for intercore communication.  You may also wish to use Codec Engine.  There's a bit more work on your own part to use the framework, but you get a lot of benefit in return, e.g. an algorithm that can be used on any TI processor including DSP-only processors as well as SoC devices like OMAP.

    Please post your code that configures the cache.  Cache configuration is done through the L1PCFG, L1DCFG, and L2CFG registers as documented in spru871 and this wiki page:

    http://tiexpressdsp.com/index.php?title=Enabling_64x%2B_Cache

    If you're using BIOS then you can configure BIOS to setup the cache for you as described in the wiki article above.

    How are you loading your code into L1P?  Is the ARM loading the code or are you using CCS?  Do you get any linker warnings during the build?

     

  • Finally I managed to setup the L1P memory as 16K/16K cache and Memory mapped, respectively.

    I was wrong when doing the cache configuration on the code of the ARM.  Those register addresses were aliasing onto ARM's addresses, and that was wrong.

    I needed to configure the XMC register L1PCFG=0x3, so it can work at half cache configuration (16K cache), but the pointer to the memory must be set in the DSP, never in the ARM. That is why  when executing it appeared an error message, because I was writing on the incorrect area.

    If you wish to configure L2 or L1D don't forget to take into account the MARi registers so that the corresponding area can be cacheable.

    Hope this can help everybody.

    By the way  I had a performance improvement of about 12 ms, and hopefully it served to our purpose.

     

    Thanks