Allocating memory for arrays on TMS320C6670

FeruzM

Intellectual 740 points

Other Parts Discussed in Thread: TMS320C6670

I have started work bit on TMS320C6670, and Have trouble with using malloc.

Could anybody explain or give some example of how it is done on DSP (malloc)?

My guesses are maybe I should tweak memory linker command file, or required programming approach is different.

Thank you in advance!

over 14 years ago

0 Aditya over 14 years ago

TI__Expert 6815 points

Yes you can allocate variables in linker command file. Section 5.3.5 of the Optimizing Compiler Users guide (SPRU187T) has more information including a sample linker command file.

0 Tim Wentz over 14 years ago

TI__Intellectual 1270 points

Using #include<stdlib.h> and malloc returns data from "heap" which has a default size of 0x400 bytes (if I recall correctly) and the section location of this heap is in ".sysmem"

You can vary the size of "heap" with an argument in the linker file and you can also specifc the memory region location of ".sysmem"

If you are using tools such as BIOS and XDC, you can create you own heaps at runtime and use Memory_alloc(). Please refer to the SYS/BIOS documentation for more information on that.

0 FeruzM over 14 years ago in reply to Tim Wentz

Intellectual 740 points

Thank you, Tim and Aditya for helpful answers!

I have an other question though, if you don't mind?! This one related with OpenMP.

How am I possibly use OpenMP library? I have read and heard that near future TI planning to add omp library, and for now is there any documents that explains how to use it?!

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

I believe TI's OpenMP implementation is still in testing. Using it would be just like using OpenMP anywhere else: pragma directives and OpenMP runtime functions such as omp_get_thread_num() -- but there would be some cache coherence that may need to be done if you explicitly use other resources outside of "normal" conditions.

Once it becomes publicly available, I'd be glad to help with anything; OpenMP will be a great tool for programming multicore DSPs

0 FeruzM over 14 years ago in reply to Tim Wentz

Intellectual 740 points

I see, but how about now, can you explain bit about partitioning data over CPUs?

Thank you for your time!

0 DanRinkes over 14 years ago in reply to FeruzM

TI__Expert 8055 points

Feruz,

I assume you are askign about partitioning data across CPUs, still with regards to malloc. If you are wanting malloc to allocate memory from locations shared by the 4 cores, you need to put the .sysmem sections for each core in a different physical location. For example, assume you are using shared memory at location 0x0C000000- 0x0C100000, you might put the .sysmem sections in memory sections as below.

Core	Location
0	0x0C000000-0x0C03FFFF
1	0x0C040000-0x0C07FFFF
2	0x0C080000-0x0C0BFFFF
3	0x0C0C0000-0x0C0FFFFF

You don't want the .sysmem sections from different cores to overlap in physical shared memory, as one core won't know what's been allocated by another core.

Regards,

Dan

0 FeruzM over 14 years ago in reply to DanRinkes

Intellectual 740 points

Hi Dan,

Can you give me a example?

I don't know if I should create an other linker.cmd (e.g. linker0.cmd -> core0, linker1.cmd -> core1 etc)? From Project Properties I found that, one can add few linker.cmds (Link Order), is that correct?

Regards,

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

If you are going to go that route (which I guess is your only option right now), then you could make different linker.cmd files for each core (specifying the desired exclusive memory region for that single core) and have multiple build configurations, one for each core. For each build configuration, you can use the resource filter to exclude the other linker.cmd files so that it will only use the one for the core you want.

As I just suggested, this method would involve swapping build configurations for each core and compiling a different binary/image for each that would be loaded separately.

0 DanRinkes over 14 years ago in reply to Tim Wentz

TI__Expert 8055 points

Feruz,

Yes, I agree with Tim here. You would have a linker command file for core 0 - core 3. This is just an arbitrary example below, without looking at the memory map for the 6670. Assume that 0x0C000000 - 0x0C1FFFFF is shared memory, and you want to allocate 1/4 of that memory to each core to be dedicated to the .sysmem section.

The memory sections in all of the linker command files might be like the one below

MEMORY

{

...

SYSMEM0: o=0x0C000000 l=0x0C080000

SYSMEM1: o=0x0C080000 l=0x0C080000

SYSMEM2: o=0x0C100000 l=0x0C080000

SYSMEM3: o=0x0C180000 l=0x0C080000

...

....

}

Then, the difference in each would be in the SECTIONS specification

Core 0 would have

SECTIONS

{

...

.sysmem > SYSMEM0

...

}

Core 1 would have

SECTIONS

{

...

.sysmem > SYSMEM1

...

}

And so on.....

0 FeruzM over 14 years ago in reply to DanRinkes

Intellectual 740 points

I see, how about attaching linker command files to the cores? Beside, making different build configuration to the each code and at the same time having partitioned data over cores might be not optimal solution, I guess?! Could you suggest me or Is there way of avoiding making different outputs for each core? (e.g. running one output file with all cores but each core calculates parts that is allocated on that core).

Sorry if most of questions are confusing!

Btw, I am connected with XDS100v1 USB emulator, CCSv5

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

If you move sysmem to private memory (L2SRAM), then each core can call simple "malloc" can keep consistency, but then other measures would need to be taken to copy that data into shared memory so that it can be shared. Keeping sysmem in the same shared memory section (i.e. all cores have the same start and end address, which would be the case for a single image, for all cores will have undesired results.

Alternatively, as I previously mentioned, you can use custom Heaps, in particular the Heap*MP structures in the IPC package, that can use shared memory as the allocation pool. The Heap*MP datatypes provide thread-safe access, so you can call "Memory_alloc" without needing to have upkeep for consistency yourself.

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

You did mention OpenMP a while ago. The way we've got that set up is that indeed there'd be a single image for all cores, and I believe (not 100% sure, but I think so) you can call simple "malloc" on each core without fear of overlap -- and sharing goes just as it would in any other OpenMP.

It really makes for simplistic programming (and definitely saves the hassle of multiple images). I think that would exactly fit your bill.

0 FeruzM over 14 years ago in reply to Tim Wentz

Intellectual 740 points

Hi,

You suggest to use IPC package, but when I add RTSC support on my project, in return having error, target is not supported.

What is the ti.target for ti.platform.evmc6670?

Thanks,

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

That's just a check that's hardcoded into the cfg file (at least if you're using the example projects from IPC)

If you look at the cfg file, you should see something like:

switch (Program.platformName) {

case "ti.sdo.ipc.examples.platforms.evm6670.core0":

var nameList = ["CORE0", "CORE1", "CORE2", "CORE3"];

break;

case "ti.sdo.ipc.examples.platforms.evm6678.core0":

var nameList = ["CORE0", "CORE1", "CORE2", "CORE3",

"CORE4", "CORE5", "CORE6", "CORE7"];

break;

default:

throw("Platform " + Program.platformName + " not supported by this example");

break;

}

So you can work around the problem by switching to evm6670.core0, which should be the exact same except it includes the DDR3 region -- or just change the cfg file to not do the hardcoded check and just specify the "procNameList" yourself

Protip:

var nameList = MultiProc.getDeviceProcNames();

Gets all the cores on your device (so the same command gets Cores0-3 on 6670 and also Cores0-7 on 6678)

0 FeruzM over 14 years ago in reply to Tim Wentz

Intellectual 740 points

Hi,

Thanks for reply!

Could you give me hello world example code, let say every processor prints "hello world, core #"?

I am on Linux machine, and there is still no mcsdk package available for linux!

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

But presumably you have the IPC and BIOS packages through other means.

The example projects for IPC are under ipc_1_2x_xx_xx/packages/ti/sdo/ipc/examples/multicore/evm667x

message_multicore.c uses the MessageQ module to pass messages between cores and is probably the most useful example (one core creates a HeapBufMP, the other cores attach to it, and then MessageQ_alloc allocates Messages using the heap)

0 FeruzM over 14 years ago in reply to Tim Wentz

Intellectual 740 points

After building example I get following error:

Platform ti.platforms.evm6670 not supported by this example message_multicore.cfg /TMessageQ line 42 XDCTools Configuration Marker

41:default:

42: throw("Platform " + Program.platformName + " not supported by this example");

43: break;

What might causing problem? As I asked earlier ti.target is unclear for me, I have tried different ones, but whenever I build same error: platform not supported by this example or code!

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

See the "Platform" line? That is what you need to change and what I was talking about in my earlier post. ti.sdo.ipc.examples.platforms.evm6670.core0 should be an option since you have IPC included as a repository (both 6678.core0 and 6670.core0 should be available. I just happen to be using 6678)

0 FeruzM over 14 years ago in reply to Tim Wentz

Intellectual 740 points

Thank you for your respond, after few days off, I tried to run example code from IPC package.

Please take a look at this video on youtube that demonstrated by me step by step!

http://www.youtube.com/watch?v=B_DsnqxX1CM

0 Tim Wentz over 14 years ago in reply to FeruzM

TI__Intellectual 1270 points

All of those errors are from the RTS.

I noticed you chose "Custom C6000" -- I have no idea what that option does -- the typical option is "Generic C66xx"

Please try that. Also, never versions don't use "whole_program" -- and I think that might be one of the reasons it takes so long. And also this:

http://processors.wiki.ti.com/index.php/Code_Generation_Tools_FAQ#Q:_What_can_I_do_about_the_linker_taking_so_long.3F

I myself have never had that problem (about that linker step taking so long. Either newer versions don't have that step or the Windows versions don't have the step), but that might be happening for you.

0 FeruzM over 13 years ago in reply to Tim Wentz

Intellectual 740 points

Dear Tim, Aditya and Dan,

I have another question regarding same topic.

If I have large array and want to do some calculation on each core, "every core has some work to do", some of them even takes part of array and do some calculation on array and return whole array to one core to be able to merge array back to large array and save output or print part of output.

My point,

1. Is there package to handle array sending and receiving?

2. While there is communication and computation timing, which way would be best to measure performance?

Thank you in advance!

Look forward to hear from you.

0 Alberto Chessa over 13 years ago in reply to FeruzM

Mastermind 6670 points

On the C6678 (I suppose it is more or less the same on the C6670) I use the PAX registers to remap each core data segment to a private area, so each core run with the same logical address but different physical address (as mentioned in http://www.ti.com/lit/an/sprab27a/sprab27a.pdf, para. 6). The program must boot from a routine that first initialize the PAX and then call the standard _c_int00(). So far, I fix the per-core memory in the linker file but I suppose it is possible to make it adaptable to the real used size (but, anyway, the PAX granularity has some limitation).

The main disavanatges are that You have to convert every memory address when you pass data to another core (if you pass simple pointers). Since I don't use SYS/BIOS, I don't know if this solutionis incompatible with it.

The heap dimension is the same for each core, but it could be modified by "haking" the libc malloc code.

0 FeruzM over 13 years ago in reply to Alberto Chessa

Intellectual 740 points

Hi,

Could anyone give an example how to allocation physical address so that all cores can access to same data?

Thanks,

Processors

Processors forum

Allocating memory for arrays on TMS320C6670