Alternative placement of DSP function (into cache memory)

malden2507

Hello,
please allow following question regarding the placement of function into C6747 fast memory, aimed for cache.

In order to make use of L1D-memory speed (0-waitstates), I'd need to place one routine (function) that should run very efficiently. My idea is to move it from IRAM directly into area aimed for caching.

Moving an algo from (L2) IRAM into fast L1D memory.

Here’s how I proceeded:

TConf: reduced “CACHE_L1D” memory section size aimed for data-cache (0x8000 > 0x6800)
TConf: created new data section in the freed part (naming it: “CACHE_L1D_SRAM”) aimed for code/data
Using an own linker command file (link_custom.cmd) following mapping defined:
SECTIONS
{
SectForCacheSRAM > CACHE_L1D_SRAM
SectForCacheSRAM: LOAD = IRAM, RUN = CACHE_L1D_SRAM
}
In the .C source file, I created a new data in the following manner:
#pragma CODE_SECTION(target_function, "SectForCacheSRAM")
void target_function(void)
{
LOG_printf(&trace, "TARGET-FUNCTION call performed");
}

So here my questions:

Am I supposed to manually copy function from its load-placement to final area from which it would run (when called)?
If yes, and in order to do the above job of copyiing, where do I get the symbols for function source (load) placement, its destination (run) placement and its length?
If not, did I get something wrong, or missed to understand something?

Using .map file (function load and run addresses) information, I tried to perform copying, but without success: application crashes.

I use CGT 6.1.23, and DSP/BIOS 5.42.01.09. CCS 5.5. The DSP is a C6747.

I've prepared a small DSP/BIOS based example to demonstrate the described problem. It's very basic (based upon TI-example); contains one .file, one custom .cmd file and one .tcf file only.

I'd be grateful for any help!

Regards,
Mladen

APPENDIX:

Memory layout (out of .cmd, compiled .tcf file):
MEMORY {
CACHE_L1P : origin = 0x11e00000, len = 0x8000
CACHE_L1D : origin = 0x11f00000, len = 0x6800
DDR2 : origin = 0x80000000, len = 0x10000000
ARM_RAM : origin = 0x10010000, len = 0x8000
IRAM : origin = 0x11818000, len = 0x20000
CACHE_L1D_SRAM : origin = 0x11f06800, len = 0x1800 <<----------- function placed into proper memory-section
}

MEMORY CONFIGURATION (out of .map file)

name origin length used unused attr fill
---------------------- -------- --------- -------- -------- ---- --------
ARM_RAM 10010000 00008000 00000000 00008000 RWIX
IRAM 11818000 00020000 0000cbcd 00013433 RWIX
CACHE_L1P 11e00000 00008000 00000000 00008000 RWIX
CACHE_L1D 11f00000 00006800 00000000 00006800 RWIX
CACHE_L1D_SRAM 11f06800 00001800 00000060 000017a0 RWIX <<----------- function placed into proper memory-section
DDR2 80000000 10000000 00000000 10000000 RWIX

MAP file:
SectForCacheSRAM
* 0 11f06800 00000060
11f06800 00000040 tsk.obj (SectForCacheSRAM)
11f06840 00000020 bios.a64P : log_printf.o64P ($Tramp$L$PI$$_LOG_printf)

FAR CALL TRAMPOLINES

callee name trampoline name
callee addr tramp addr call addr call info
-------------- ----------- --------- ----------------
$SectForCacheSRAM:tsk.obj$0x0 $Tramp$L$PI$$_target_function
11f06800 118226e0 118223d8 tsk.obj (.text)
$.bios:bios.a64P<log_printf.o64P>$0x0 $Tramp$L$PI$$_LOG_printf
1181bc40 11f06840 11f06810 tsk.obj (SectForCacheSRAM)

Cache_as_SRAM.zip

over 8 years ago

0 RandyP over 8 years ago

TI__Guru* 84110 points

Malden,

It will help you a lot to review some of the training material from this Wiki page at .
It will tell you about the basics of using memory as cache.

Your terminology is confusing the difference between SRAM and cache. The 32KB memory for L1D is SRAM. It can be used as SRAM or cache, and the power-on reset is to be 100% cache since that is the most common way to use it.

There are a finite number of configurations by which you can select some of L1D to be used as cache and some to be used as SRAM. I do not believe that 0x6800 is one of those choices, so you may have an invalid configuration requested. The linker and compiler do not know that you have set their starting addresses or sizes to invalid values, so those tools will try to create an executable file that works the way you ask it to - there is no test at that point to see if you did the right settings.

If you use the BIOS GUI to set your cache sizes, it will only allow you to choose the allowed values, so that will keep you from selecting invalid settings. You can put any value into Tconf directly without checks being made.

malden2507 said:

Am I supposed to manually copy function from its load-placement to final area from which it would run (when called)?

If yes, and in order to do the above job of copyiing, where do I get the symbols for function source (load) placement, its destination (run) placement and its length?

If not, did I get something wrong, or missed to understand something?

Create an L1D_SRAM region in the MEMORY section of the linker command file, similar to what you have done above. It must start at the correct address at the beginning of the L1D SRAM region 0x11f00000 and must be of a valid length (which you will find in the training material and somewhere in the documentation: datasheet, megamodule guide, cache guide).

Then, just place the code there like you do in the first of the two lines below.

malden2507 said:
SECTIONS
{
SectForCacheSRAM > CACHE_L1D_SRAM
SectForCacheSRAM: LOAD = IRAM, RUN = CACHE_L1D_SRAM
}

Leave off the second line unless you just really need to do a copy instead of letting the loader put the code there for you, like it puts everything else into L2. Those two lines both assign the same section to memory, so you do not want to use both lines. The Assembly Language Tools User Guide is where you will find descriptions of usage of the linker command file. You will also find how to create and access the symbols in the event you do want to copy a section of code.

Make sure you have configured L1D for the correct SRAM/Cache split prior to loading the code. This would be done in a bootloader for a production device or in a GEL file prior to loading the code in CCS. Look for examples for setting the cache in the GEL file you are using now.

After going through the training material, please let us know if you have any questions.

Regards,
RandyP

0 malden2507 over 8 years ago in reply to RandyP

Intellectual 450 points

Hello Randy,

many thanks for quickly responding; I'll need some time to download an extensive database of information, which I've started.

Meanwhile let me give you few inputs:

I did re-configure cache-size using graphical tool (DSP/BIOS Configuration Tool), so I expect it to warn me if I select invalid new cache-size.
From your comment I realized that for RAM-purpose the beginning part of L1D must be used.
OK, my Section-for-SRAM now starts at 0x11f00000. Cache istelf at say 0x11f02000.
>> Now the application doesn't cause the debugging session to terminate, but - my application still does not run.
In the document TMS320C674x DSP Cache User's Guide (SPRUG82A–February 2009) on page 24 I found the following information:
Note: Do not define memory that is to be used or boots up as cache under the MEMORY directive. This memory is not valid for the linker to place code or data in. If L1D SRAM and/or L1P SRAM is to be used, it must first be made available by reducing the cache size. Data or code must be linked into L2 SRAM or external memory and then copied to L1 at run-time.
So, it seems like the code really must be present in an "ordinary" memory at load time, run from another area - in my case, "stolen from cache".

But as I mentioned (and referring to the point 3) in my very 1st mail, do not know how to get to the symbols to use in the manual copy routine (placing desired function from load to run-memory).

Regards,
Mladen

0 RandyP over 8 years ago in reply to malden2507

TI__Guru* 84110 points

Mladen,

There were 3 documents that were mentioned plus training material. The Student Guide is well suited for showing you how to do things. The student labs have examples that can help with understanding the course material. I hope your download process is complete soon so you will have a chance to read these helpful documents.

malden2507 said:

I did re-configure cache-size using graphical tool (DSP/BIOS Configuration Tool), so I expect it to warn me if I select invalid new cache-size.

My recollection from the DSP/BIOS GUI is that it has a drop-down box with a finite set of selections. What selection did you choose?

Which version of DSP/BIOS are you using? Which version of CCS are you using?

malden2507 said:
OK, my Section-for-SRAM now starts at 0x11f00000. Cache istelf at say 0x11f02000.

Since 24K is not a valid cache size, 0x11f02000 will not be correct. When you find the valid cache sizes in section 3 of the Megamodule Reference Guide, you can subtract that from 32K to find the size of SRAM and then know where the cache starts.

malden2507 said:
it seems like the code really must be present in an "ordinary" memory at load time, run from another area - in my case, "stolen from cache".

Using what method are you loading your program? My assumption is that you are using CCS to load your program. Please confirm or correct my assumption.

If you are using CCS to load your program, then you use GEL commands to set the cache to the size you want prior to loading the code. This is how you cause it to "first be made available by reducing the cache size."

malden2507 said:
do not know how to get to the symbols to use in the manual copy routine (placing desired function from load to run-memory).

Sorry that I did not make this more clear: The Assembly Language Tools User Guide is where you will find descriptions of usage of the linker command file. You will also find how to create and access the symbols in the event you do want to copy a section of code.

malden2507 - original post said:
Moving an algo from (L2) IRAM into fast L1D memory.

You cannot do this. The D in L1D means Data memory. Your algorithm code belongs in Program memory. You can do the same things we have been talking about to change the size of L1P cache to 50% or less so you will have SRAM to use for your algorithm. But you cannot put your program code into L1D. It might be technically or conceivably possible, but it will not operate at the 0-wait state speed, and it is not tested in this way so it is not guaranteed to operate reliably.

Or did you mean that you want to put your algorithm's data in L1D? Some clarification will be helpful.

Regards,
RandyP

0 malden2507 over 8 years ago in reply to RandyP

Intellectual 450 points

Hello Randy,

many thanks for your last inputs. Now I see things more clearly.

To confirm, my intention was to use L1D (not L1P) since I've noticed that reducing data-cache size in my application case has almost no influence on overall code-efficiency.

In my previous attempts I've made two failures, by missing to understand that:

1. Re-configuring cache-size using graphical tool (DSP/BIOS Configuration Tool) must be done in an appropriate way:
System > Global Settings > 64PLUS > 64P L1P/DCFG Mode (offering only valid cache-sizes of 0/4/8/16/32 kB)
and not directly:
System > MEM > CACHE_L1D or CACHE_L1P memory section start-address and size (latter gets re-adjusted automatically).
Thus only valid cache-sizes can be used.

2. According to your information, for code execution only L1P can be used.
What a pity! In my application case, it means that program cache size must be drastically reduced (by 2/4/8/16..) which in turn decreases the code efficiency (5% by halving the cache-size); thus my main idea of placing the code into 0-waitstates RAM and increase the code efficiency this way - dies immediately.

To sumarize, I can now execute program from L1P (gained by reducing the cache size in a proper way), but the performance efficiency decreases (due to cache-size reduction).
Regards,
Mladen

0 RandyP over 8 years ago in reply to malden2507

TI__Guru* 84110 points

Mladen,

Typically in large applications, using part of L1P as SRAM only helps for critical routines that only execute once or twice then go dormant for a while before needing to execute again. The lift of leaving all of L1P as cache is usually the best for overall performance. You case may have very particular aspects that make this less helpful.

You may want to look closely at the addresses where your various program components are loaded. This may be impractical, and is very tedious, but you could take advantage of the L1P Direct Mapping (1-way) configuration to minimize how often your critical sections are evicted from cache.

If you split L1P into cache and SRAM, you can also manage overlays within the SRAM portion by using the IDMA1 function to transfer code in from L2 SRAM.

Good luck on your program. You seem to have a very strong grasp of the requirements and capabilities, so your company is fortunate you are on this task.

Regards,
RandyP

Processors

Processors forum

Alternative placement of DSP function (into cache memory)