Hello,
I want to build multicore application in C6472 using the shared memory. I am using CCS v4 as the compiler. I have read the document SPRUEG5C. The arbitration logic seems to be quite complex to me. In this regard I have some questions (I'm sorry for posting them if they appear silly) .
1. Does using optimization level-3 in CCSV4 imply that the shared memory access is configured to be pre-fetchable always? In that case I think I cannot use atomic access monitor in optimization level-3. Because SPRUEG5C says "Atomic access should go only to non-prefetchable address spaces." Am I correct?
2. Other than the configuration of prefetchable/nonprefetchable part of the shared memory, the power down issues and the fault indications, do I need to use the SMC memory mapped registers manually from my code or they will be used by the arbitration logic hardware only?
3. Reading the document for SMC controller it seems to me that the user 'talks' to the atomic access monitor and the atomic access monitor controls the arbitration logic hardware. The user cannot directly control the arbitration logic hardware without using atomic access monitor. Am I right?
4. While programming do I need to specify somehow the per-bank SMC controller through which I am trying to access the shared memory or the location of the shared memory I'm trying to access itself indicates the hardware about the per-bank controller through which the request should go?
5. From the C code if I want to access a shared memory location by a declaration like "#define VALUE (*((volatile unsigned int *) 0x00200000))" and then "VALUE=0x1234", when this code is compiled will it automatically pass the write(or read) request through the atomic access monitor (by using LL, SL, CMTL instructions in assembly) or I have to take some other step to ensure atomic access? If so then what are such steps?
6. Can anyone please send me an example project where shared memory is used by multiple cores preferably without using DSP-BIOS?
Regards,
AC.
I don't know if you are already aware of this, but here is the link to SMMQT: http://software-dl.ti.com/dsps/dsps_registered_sw/sdo_sb/targetcontent/MQT/index.html
I does use DSP/BIOS though, but the source code might answer many of your questions.
AC,
Is there a particular reason that would like to stay away from BIOS for your multi-core application? If not, TI offers a product called 'IPC' that facilitates developing multicore BIOS applications on devices including C6472. Low-level hardware operations like interacting with Atomic Access Monitors on C6472 are abstracted away by IPC modules such as GateMP which is used for protection of shared resources including shared memory. Other functionality that is offered include inter-processor notifications, multicore heaps and data structures.
Regarding your questions, I'll have to get back to you regarding questions #1 & 2.
Regarding question #3, yes you are right. The user only interacts with the atomic access monitor. When using IPC, the user interacts with the GateMP module which itself interacts with AAM's at a lower level.
Regarding question #4, the location of the shared memory itself determines the mapping to hardware.
Regarding question #5, no--operations on volatile variables aren't automatically made atomic between multiple processors. You would have to protect these operations using atomic access monitors (or GateMP if you are using IPC).
Regarding question #6--IPC ships with a couple multi-core applications that can be built for C6472. However, these applications do use BIOS.
Shreyas
Hello Shreyas,
Do you have any particular reason to recommed IPC over SMMQT? SMMQT also uses the AAMs for intercore arbitration, and provides simplified API calls via the BIOS MSGQ module.
I would be greatly interested to know about any differentiation between the two choices, as I have already some progress with SMMQT.
Viswa.
Thanks a lot Viswanath L and Shreyas Prasad for your prompt and precise replies. I am going through the SMMQT examlpes to decipher them. As Shreyas has asked, the reasons I am interested to build application without DSP BIOS are:
1. I want to learn and observe the actions going on in the register level. I know sometimes that sounds a bit impractical and time-inefficient, notwithstanding. Actually previously I stumbled in the same way for firing an ISR in a single core without using DSP BIOS. But later I managed to do that by mixing some C and assembly codes. I am also able to generate interprocessor interrupt from one core and service that from other core without BIOS using IPCGR registers and the corresponding event numbers. So now I am targeting the shared memory access.
2. I want the whole code to be visile to me(as much as possible). So I am trying to avoid API based abstractions as much as possible.
3. I want to avoid any overhead from the application code. Though it is said that DSP BIOS comes with a little overhead, still just trying to cope up if the application is manageable without BIOS then its fine.
Thanks once again for the replies. Regards,
Viswanath, the main difference between IPC and SMMQT is that SMMQT works with BIOS 5.x and IPC works with BIOS 6.x. Also, SMMQT has limited functionality and device support compared to IPC. It does offer message passing via MSGQ and it ships with code to use Atomic Access Monitors on C6472. IPC is also more portable between multiple devices since hardware details (i.e. atomic access monitors, hardware semaphores, etc) are abstracted away in top-level modules.AC, I understand your motivations to avoid BIOS and operate at the register level. FYI (regarding point #2), BIOS6 and IPC are both open source and ship with the source code as well.
Thanks Shreyas for the information.
1. Can you please tell me the path where the open source libraries are there for BIOS or IPC?
2. What is the concept of memory bank? Is it there inside the Shared Memory Controller only(seems like that from the indication of SMC boundary in figure 2 of SPRUEG5C)? Or the whole shared L2 RAM (768KB) is divided into 4 physically separate address spaces called banks in case of C6472(seems like that from sect. 4.2 of SPRUEG5C which says: "SMC divides SL2 RAMs address space into 4 physical pages.")? What is the meaning of "256 bits wide memory bank" mentioned in the SMC controller user guide(figure 2, SPRUEG5C)?
3. If the memory banks are really 4 segments of the physical memory, then
(i) What are the address boundaries? Are they(bank-0, bank-1, bank-2, bank-3) equally spaced within the total 0x BFFFF locations of the SL2 RAM of C6472 or something else?
(ii) Can four different cores read/write different SL2 RAM locations in/through four different banks at the same time(assuming there is no previous request pending)? Or the arbitration logic will come into picture to resolve the conflict and sequentially arrange the requests and give the cores a feeling that the read/write are done simultaneously? Assuming the accesses are not made atomic because they are accessing different locations.
You can download SYS/BIOS at http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/sysbios/index.html and IPC at http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ipc/index.html. Note that you will also need XDCTools since both BIOS and IPC depend on this product. XDCTools can be downloaded at http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/rtsc/index.html.
All 3 products contain both built libraries and source code.
I'm not familiar enough with the memory architecture of C6472 to answer your remaining questions. I will forward this post to someone more knowledgeable about this topic.
I forwarded your remaining questions and obtained the following details regarding the C6472 SMC:
Hi AC.
I have EXACTLY the same issues as you with the understanding of the SMC. I have been studying the SMC (SPRUEG5C) as well as the CSL user's guide with slow progress towards understanding how to use the SMC and ensure atomicity. There is clearly also a shortage in proper simple example projects that work 'out-of-the-box'.
My reason for avoiding SYS/BIOS is that my multi-core application uses the SRIO peripheral in DirectIO mode, and this mode is not supported under SYS/BIOS. Only message-passing mode is supported under SYS/BIOS. There are MANY other perfectly legitimate and sensible reasons to avoid an OS. Generally you can get much better optization and performance out of an application by programming it in bare-board (i.e. NO SYS/BIOS), especially if that application involves the execution of repetitive single tasks, regardless of whether it is multi-core or not. As soon as your application becomes more multi-task oriented, it is strongly advised to move to an OS (Like SYS/BIOS) of some sort.
I have a bare-board (i.e. NO SYS/BIOS) multi-core application in which all the cores try to access (i.e. read AND write) a common integer variable in SL2 RAM. This variable is used as a semaphore between the cores to arbitrate access to other resources, and simply has the value 1 for a 'busy' condition, and 0 for 'non-busy'. This variable has the simple purpose of indicating to the cores that certain resources are blocked from being accessed, because another core is busy working on them. However, if atomic access to this variable is not guaranteed, it could obviously happen that two cores simultaneously read the value as 0 (non-busy), then simultaneously assert the variable to 1 (busy), and then simultaneously access the other resources, thus defeating the purpose of atomicity to the resources. According to the specifications of the SL2 controller, it supports atomic access monitoring, but I have had no success in understanding how this is accomplished
My question is the following:
* Is the atomic access to SL2 RAM supposed to be transparent to the end-user, or does it have to be controlled manually?
In my opinion it should be transparent.
I have been successful in developing this code for the C6474 by using the on-chip SEMAPHORE module, but now I want to migrate the code to the C6472 for performance comparison purposes.
Have you had any success in the mean-time? Do you have any advice on how this (seemingly simple) task mentioned above can be accomplished/guaranteed? Maybe a different approach? Your help would be greatly appreciated.
Regards.
Estian.
Hi,
atomic access to the SL2 RAM is supported via the instruction set of the device. You'll find an explanation and examples in the TMS320C64x/C64x+ DSP
CPU and Instruction Set Reference Guide (SPRU732) in the chapter "C64x+ CPU Atomic Operations".
We offer 3 instructions that work together with shared L2 memory on the TMS320C6472:
LL — Load Linked Word from Memory
SL — Store Linked Word to Buffer
CMTL — Commit Store Linked Word to Memory Conditionally
How does it work?
The LL instruction reads a word of memory and prepares to execute an SL instruction. The LL instruction reads a word form memory with a side effect, a link valid flag is set true and the address is monitored. If any other process stores to that address, the link valid flag is cleared. The link valid flag is also cleared if the SL instruction is executed with a different address. The SL instruction buffers a word to be stored to memory by the CMTL instruction. It does not commit the change. Finally the CMTL instruction reads the value of the link valid flag. If the link valid flag is true, the data buffered by the SL instruction is written to memory. If the commit fails, the update must be retried.
I hope that helps.
Kind regards,
one and zero
Please click the Verify Answer button on this post if it answers your question.
You can also follow me on Twitter: http://twitter.com/oneandzeroTI
Do you want to read interesting multicore articles? Check out our Multicore Mix
As per my understanding (developed by reading and discussion in this thread) the user 'talks' to the atomic access monitor and the atomic access monitor controls the arbitration logic hardware. The user cannot directly control the arbitration logic hardware without using atomic access monitor. In my application more than one cores were trying to read a location simultaneously but that was not a write attempt. So I could bypass the requirement of atomicity. I used the shared memory location as a simple memory mapped register and that worked. If not mentioned, the code generation tool will not generate LL, SL, CMTL instructions (such as in my case I did not want atomicity, so SL2 access was using simple load store instructions). Also the post by Shreyas Prasad helped me understand the interleaved memory structure which is helpful for VLIW architecture. For large chunk of data (~2000) I checked the time required to write this chunk by different cores to different non intersecting regions in SL2. The overall time is almost same for simultaneous try of 1,2,3,4 cores. For more cores this time increases. This re-ensures the 4-bank structure.
But as per my understanding harping on the same string as one and zero, if DSP BIOS is to be avoided then the only way to ensure atomicity is to use LL, SL, CMTL. Now next question comes whether to embed these assembly codes inside C code using 'asm'? I was advised in forum not to do so. Because for doing that I need other registers also. But don't know whether the cross compiled C code is doing something with that register or not. So there comes question of push pop into stack or else knowing the way the registers are handled by the code gen. tools. Things will be complicated. So I think writing functions using these assembly instructions and calling them from C code in order to maintain atomicity will be a better option. Though in that case also there are some restrictions on the register usage (can be found in 'optimizing compiler' doc) still I think (not tried) that will be a simple way.
Hi AC,
you're absolutely right. You shouldn't use the asm() in your C-code. What you can do is copy the exa mples in an .asm file. You can call the assembler functions from C.
For more info on how to mix C and assembly please have a look in TMS320C6000 Optimizing Compiler (SPRU187), Chapter 7.5
Hi One and Zero / AC,
Thank you for your prompt replies and advice!
Wow! I really can't believe that things can be so complicated for something so simple. Atomicity is certainly much simpler with a hardware semaphore like the C6474...
Have the TI developers considered developing a CSL API that would make this process a little simpler, and perhaps include in a new release of the CSL?
Ok, so before I spend a few weeks attempting to get this working the way you suggest, I would like to know:
Hi.
Ok, so based on the examples in the documentation you suggested, I have attempted a very simple approach and it seems I am stumbling at the very first hurdle. Here is the simple C code for my main routine:
#include <stdio.h>
extern asmfunc(void);
void main(void)
{
asmfunc();
}
And here is the simple assembly function (.asm file) I created, and simply included into my CCS 4.0 project:
.global _asmfunc
_asmfunc:
NOP 4
I am getting the fllowing compiler error:
"../asmfunc.asm", ERROR! at line 1: [E0002] Illegal mnemonic specified
What am I doing wrong? I am also unsure as which Build Options settings I have to fiddle with in CCS 4.0. There are so many parameters, its making my head spin... :)
Please help!
Hi Estian,
please try like that:
.global _asmfunc _asmfunc: NOP 4 Kind regards, one and zero