This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Allocating memory for arrays on TMS320C6670

Other Parts Discussed in Thread: TMS320C6670

I have started work bit on TMS320C6670, and Have trouble with using malloc.

Could anybody explain or give some example of how it is done on DSP (malloc)?

My guesses are maybe I should tweak memory linker command file, or required programming approach is different.

 

Thank you in advance!

  • Yes you can allocate variables in linker command file. Section 5.3.5 of the Optimizing Compiler Users guide (SPRU187T) has more information including a sample linker command file.

  • Using #include<stdlib.h> and malloc returns data from "heap" which has a default size of 0x400 bytes (if I recall correctly) and the section location of this heap is in ".sysmem"

    You can vary the size of "heap" with an argument in the linker file and you can also specifc the memory region location of ".sysmem"

     

    If you are using tools such as BIOS and XDC, you can create you own heaps at runtime and use Memory_alloc().  Please refer to the SYS/BIOS documentation for more information on that.

  • Thank you, Tim and Aditya for helpful answers!

    I have an other question though, if you don't mind?! This one related with OpenMP.

    How am I possibly use OpenMP library? I have read and heard that near future TI planning to add omp library, and for now is there any documents that explains how to use it?!

     

  • I believe TI's OpenMP implementation is still in testing.  Using it would be just like using OpenMP anywhere else: pragma directives and OpenMP runtime functions such as omp_get_thread_num() -- but there would be some cache coherence that may need to be done if you explicitly use other resources outside of "normal" conditions.

     

    Once it becomes publicly available, I'd be glad to help with anything; OpenMP will be a great tool for programming multicore DSPs

  • I see, but how about now, can you explain bit about partitioning data over CPUs?

     

    Thank you for your time!

     

  • Feruz,

    I assume you are askign about partitioning data across CPUs, still with regards to malloc.  If you are wanting malloc to allocate memory from locations shared by the 4 cores, you need to put the .sysmem sections for each core in a different physical location.  For example, assume you are using shared memory at location 0x0C000000- 0x0C100000, you might put the .sysmem sections in memory sections as below.

    Core Location
    0 0x0C000000-0x0C03FFFF
    1 0x0C040000-0x0C07FFFF
    2 0x0C080000-0x0C0BFFFF
    3 0x0C0C0000-0x0C0FFFFF

    You don't want the .sysmem sections from different cores to overlap in physical shared memory, as one core won't know what's been allocated by another core.   

    Regards,

    Dan

  • Hi Dan,

    Can you give me a example? 

    I don't know if I should create an other linker.cmd (e.g. linker0.cmd -> core0, linker1.cmd -> core1 etc)? From Project Properties I found that, one can add few linker.cmds (Link Order), is that correct? 

    Regards,

    /F

  • If you are going to go that route (which I guess is your only option right now), then you could make different linker.cmd files for each core (specifying the desired exclusive memory region for that single core) and have multiple build configurations, one for each core.  For each build configuration, you can use the resource filter to exclude the other linker.cmd files so that it will only use the one for the core you want.

    As I just suggested, this method would involve swapping build configurations for each core and compiling a different binary/image for each that would be loaded separately.

  • Feruz,

    Yes, I agree with Tim here.  You would have a linker command file for core 0 - core 3.  This is just an arbitrary example below, without looking at the memory map for the 6670.  Assume that 0x0C000000 - 0x0C1FFFFF is shared memory, and you want to allocate 1/4 of that memory to each core to be dedicated to the .sysmem section.

    The memory sections in all of the linker command files might be like the one below

    MEMORY

    {

    ...

    ...

       SYSMEM0:   o=0x0C000000 l=0x0C080000

       SYSMEM1:   o=0x0C080000 l=0x0C080000

       SYSMEM2:   o=0x0C100000 l=0x0C080000

       SYSMEM3:   o=0x0C180000 l=0x0C080000

    ...

    ....

    }

     

    Then, the difference in each would be in the SECTIONS specification

    Core 0 would have

    SECTIONS

    {

      ...

      ...

      .sysmem > SYSMEM0

      ...

      ...

    }

     

    Core 1 would have

    SECTIONS

    {

      ...

      ...

      .sysmem > SYSMEM1

      ...

      ...

    }

     

    And so on.....

  • I see, how about attaching linker command files to the cores? Beside, making different build configuration to the each code and at the same time having partitioned data over cores might be not optimal solution, I guess?! Could you suggest me or Is there way of avoiding making different outputs for each core? (e.g. running one output file with all cores but each core calculates parts that is allocated on that core). 

    Sorry if most of questions are confusing!

    Btw, I am connected with XDS100v1 USB emulator, CCSv5

  • If you move sysmem to private memory (L2SRAM), then each core can call simple "malloc" can keep consistency, but then other measures would need to be taken to copy that data into shared memory so that it can be shared. Keeping sysmem in the same shared memory section (i.e. all cores have the same start and end address, which would be the case for a single image, for all cores will have undesired results.

    Alternatively, as I previously mentioned, you can use custom Heaps, in particular the Heap*MP structures in the IPC package, that can use shared memory as the allocation pool.  The Heap*MP datatypes provide thread-safe access, so you can call "Memory_alloc" without needing to have upkeep for consistency yourself.

  • You did mention OpenMP a while ago. The way we've got that set up is that indeed there'd be a single image for all cores, and I believe (not 100% sure, but I think so) you can call simple "malloc" on each core without fear of overlap -- and sharing goes just as it would in any other OpenMP.

    It really makes for simplistic programming (and definitely saves the hassle of multiple images).  I think that would exactly fit your bill.

  • Hi,

    You suggest to use IPC package, but when I add RTSC support on my project, in return having error, target is not supported.

    What is the ti.target for ti.platform.evmc6670?

     

    Thanks,

    /F

     

  • That's just a check that's hardcoded into the cfg file (at least if you're using the example projects from IPC)

    If you look at the cfg file, you should see something like:

    switch (Program.platformName) {

        case "ti.sdo.ipc.examples.platforms.evm6670.core0":

            var nameList = ["CORE0", "CORE1", "CORE2", "CORE3"];

            break;

        case "ti.sdo.ipc.examples.platforms.evm6678.core0":

            var nameList = ["CORE0", "CORE1", "CORE2", "CORE3",

                            "CORE4", "CORE5", "CORE6", "CORE7"];

            break;

        default:

            throw("Platform " + Program.platformName + " not supported by this example"); 

            break;

    }

     

    So you can work around the problem by switching to evm6670.core0, which should be the exact same except it includes the DDR3 region -- or just change the cfg file to not do the hardcoded check and just specify the "procNameList" yourself

     

    Protip: 

    var nameList = MultiProc.getDeviceProcNames();

    Gets all the cores on your device (so the same command gets Cores0-3 on 6670 and also Cores0-7 on 6678)

  • Hi,

    Thanks for reply!

    Could you give me hello world example code, let say every processor prints "hello world, core #"?

     

    I am on Linux machine, and there is still no mcsdk package available for linux!

    /F

  • But presumably you have the IPC and BIOS packages through other means.

    The example projects for IPC are under ipc_1_2x_xx_xx/packages/ti/sdo/ipc/examples/multicore/evm667x

     

    message_multicore.c uses the MessageQ module to pass messages between cores and is probably the most useful example (one core creates a HeapBufMP, the other cores attach to it, and then MessageQ_alloc allocates Messages using the heap)

  • After building example I get following error:

    Platform ti.platforms.evm6670 not supported by this example message_multicore.cfg /TMessageQ line 42 XDCTools Configuration Marker

     

    41:default:

    42:  throw("Platform " + Program.platformName + " not supported by this example"); 

    43: break;

    What might causing problem? As I asked earlier ti.target is unclear for me, I have tried different ones, but whenever I build same error: platform not supported by this example or code!

     

    /F

  •  

    See the "Platform" line? That is what you need to change and what I was talking about in my earlier post. ti.sdo.ipc.examples.platforms.evm6670.core0 should be an option since you have IPC included as a repository  (both 6678.core0 and 6670.core0 should be available. I just happen to be using 6678) 

  • Thank you for your respond, after few days off, I tried to run example code from IPC package.

    Please take a look at this video on youtube that demonstrated by me step by step!

    http://www.youtube.com/watch?v=B_DsnqxX1CM

     

    /F

  • All of those errors are from the RTS.

    I noticed you chose "Custom C6000" -- I have no idea what that option does -- the typical option is "Generic C66xx"

    Please try that.  Also, never versions don't use "whole_program" -- and I think that might be one of the reasons it takes so long. And also this: 

    http://processors.wiki.ti.com/index.php/Code_Generation_Tools_FAQ#Q:_What_can_I_do_about_the_linker_taking_so_long.3F

    I myself have never had that problem (about that linker step taking so long. Either newer versions don't have that step or the Windows versions don't have the step), but that might be happening for you.

  • Dear Tim, Aditya and Dan,

    I have another question regarding same topic.

    If I have large array and want to do some calculation on each core, "every core has some work to do", some of them even takes part of array and do some calculation on array and return whole array to one core to be able to merge array back to large array and save output or print part of output.

    My point,

    1. Is there package to handle array sending and receiving?

    2. While there is communication and computation timing, which way would be best to measure performance?

     

    Thank you in advance!

    Look forward to hear from you.

    /F

  •  

    On the C6678 (I suppose it is more or less the same on the C6670) I use the PAX registers to remap each core data segment to a private area, so each core run with the same logical address but different physical address (as mentioned in http://www.ti.com/lit/an/sprab27a/sprab27a.pdf, para. 6). The program must boot from a routine that first initialize the PAX and then call the standard _c_int00(). So far, I fix the per-core memory in the linker file but I suppose it is possible to make it adaptable to the real used size (but, anyway, the PAX granularity has some limitation).

    The main disavanatges are that You have to convert every memory address when you pass data to another core (if you pass simple pointers). Since I don't use SYS/BIOS, I don't know if this solutionis incompatible with it.

    The heap dimension is the same for each core, but it could be modified by "haking" the libc malloc code.

     

     

  • Hi,

    Could anyone give an example how to allocation physical address so that all cores can access to same data?

    Thanks,

    /F