Hi,
How can I allocate memory for one variable so that all cores can access to that data, when one code ran to all cores?
PS. preferable variable - arrays, pointers
Thanks,
F
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
How can I allocate memory for one variable so that all cores can access to that data, when one code ran to all cores?
PS. preferable variable - arrays, pointers
Thanks,
F
There are a couple ways to do this.
Does the allocation need to be static (global array) or dynamic (malloc)?
Are you using a linker file? Are you using a RTSC cfg file?
Are you using SYS/BIOS? Are you using IPC?
How is your memory set up? Do you already have a region of shared memory? (typically MSMCSRAM or DDR). Is the code running out of core-local L2SRAM for each core?
Hi,
Does the allocation need to be static (global array) or dynamic (malloc)?
A: dynamic (malloc)
Are you using a linker file? Are you using a RTSC cfg file?
A: linker file
Are you using SYS/BIOS? Are you using IPC?
A: none, but depends on needs
How is your memory set up? Do you already have a region of shared memory? (typically MSMCSRAM or DDR). Is the code running out of core-local L2SRAM for each core?
A: in linker file all areas are set
Thanks!
Regards,
Feruz
I am having difficulty imagining how you would need to use dynamic memory allocation with data shared between cores. You can either have the situation where:
1) One core allocates a previously-unknown size of memory. Then all cores access this memory. In this case, a large static buffer works just as well and makes things simpler.
or
2) Each core allocates a chunk of memory from shared memory. There may or may not be sharing between post-allocated sections.
In both cases:
You need to have a small scratch buffer in shared memory so that cores can communicate addresses of the buffers after they have been allocated.
In the first case:
Using SYS/BIOS, you can change the default heap instance used by BIOS (so you can still use "malloc()" without changing the function. This is the same as relocating .sysmem, I believe) or you can create a SYS/BIOS Heap object (HeapBuf, HeapMem, HeapMultiBuf) with a designated memory address and then use Memory_alloc on that Heap object.
In the second case:
You'll need to use Heap*MP objects since multiple cores need to perform allocations from the same memory pool.
There are more details, but I'm not certain what your needs are.
All right I understand your points.
Let's say I have large variable/array (I mean memory) and I need to make some calculation on that variable.
What would you think, is best way to do it efficiently?
Having it in shared memory?! - what if it doesn't fit there.
Allocating from DDR and all cores can have access to that?! - from efficiency point of view it would be slower.
At the same time I don't know if there is way to exchange parts of array (like 1st core does some calculation with its part and sends it over to others, at the end will collect all parts from all cores and combine result)?
Thanks,
F
Both of those issues you raise are fundamental problems of any computer architecture, not just DSPs.
667x has 512KB * # of cores of shared memory. It's not huge, but it's not tiny.
Variables in DDR are cached in L2 and L1 unless caches are turned off. However, when data is cached, it must be written back to memory in order for another core to read the updated value.
If only one core is working on an independent set of data for a long period of time, then caching data in L2 and L1 should be decent. If multiple cores are frequently sharing data, then the data should be located in faster memory such as MSMCSRAM or core-local L2SRAM (as you are discussing in the other thread). If your data is too large for this, then you would need to partition the problem and move parts of the data from DDR to shared memory, work on shared memory, and then move the data back to DDR and bring in another part of the data.
twentz,
twentz said:667x has 512KB * # of cores of shared memory. It's not huge, but it's not tiny.
I would like to clarify this on some points:
1. 667x is used confusingly by us at TI, and usually means the devices for which x = 1, 2, 4, 8. In FeruzM's other thread, the device is specified as C6670. When x=0 for the C6670, each CorePac has 1MB L2; when x=1-8, each CorePac has .5MB L2.
2. The MSMC SRAM is shared memory and the DDR3 is shared memory. Each CorePac's L2 can be accessed by other cores but even though it can be accessed by all CorePacs I do not usually consider it to be shared memory, for what that is worth. On older TI multicore devices, there was a speed penalty trying to access another core's L2. I have not measured this myself on the C667x devices, but there is probably a speed penalty here, too, when trying to use another core's L2 as shared memory.
3. For C667x when x=0, the MSMC SRAM is 2MB; when x=1-8, the MSMC SRAM is 4MB.
FeruzM,
As another approach to how twentz is trying to help, I would ask: What type of processing do you want to do with the shared memory that you would allocate?
Regards,
RandyP
Hi,
Thank you for valuable information.
As an example let's consider simple matrix multiplication, shared memory used to hold larger matrices, and then cores get their parts, do calculation, returns to the shared memory final result.
Thanks,
Feruz