• Not Answered

Multi-core programming in C6472 using shared memory

Hello,

I want to build multicore application in C6472 using the shared memory. I am using CCS v4 as the compiler. I have read the document SPRUEG5C. The arbitration logic seems to be quite complex to me. In this regard I have some questions (I'm sorry for posting them if they appear silly) .

1. Does using optimization level-3 in CCSV4 imply that the shared memory access is configured to be pre-fetchable always? In that case I think I cannot use atomic access monitor in optimization level-3. Because SPRUEG5C says "Atomic access should go only to non-prefetchable address spaces." Am I correct?

2. Other than the configuration of prefetchable/nonprefetchable part of the shared memory, the power down issues and the fault indications, do I need to use the SMC memory mapped registers manually from my code or they will be used by the arbitration logic hardware only?

3. Reading the document for SMC controller it seems to me that the user 'talks' to the atomic access monitor and the atomic access monitor controls the arbitration logic hardware. The user cannot directly control the arbitration logic hardware without using atomic access monitor. Am I right?

4. While programming do I need to specify somehow the per-bank SMC controller through which I am trying to access the shared memory or the location of the shared memory I'm trying to access itself indicates the hardware about the per-bank controller through which the request should go?

5. From the C code if I want to access a shared memory location by a declaration like "#define VALUE (*((volatile unsigned int *) 0x00200000))" and then "VALUE=0x1234", when this code is compiled will it automatically pass the write(or read) request through the atomic access monitor (by using LL, SL, CMTL instructions in assembly) or I have to take some other step to ensure atomic access? If so then what are such steps?

6. Can anyone please send me an example project where shared memory is used by multiple cores preferably without using DSP-BIOS?

 

Regards,

AC.

 

 

19 Replies

  • I don't know if you are already aware of this, but here is the link to SMMQT: http://software-dl.ti.com/dsps/dsps_registered_sw/sdo_sb/targetcontent/MQT/index.html

    I does use DSP/BIOS though, but the source code might answer many of your questions.

  • AC,

    Is there a particular reason that would like to stay away from BIOS for your multi-core application?  If not, TI offers a product called 'IPC'  that facilitates developing multicore BIOS applications on devices including C6472.  Low-level hardware operations like interacting with Atomic Access Monitors on C6472 are abstracted away by IPC modules such as GateMP which is used for protection of shared resources including shared memory.  Other functionality that is offered include inter-processor notifications, multicore heaps and data structures.

    Regarding your questions,  I'll have to get back to you regarding questions #1 & 2.

    Regarding question #3, yes you are right.  The user only interacts with the atomic access monitor.  When using IPC, the user interacts with the GateMP module which itself interacts with AAM's at a lower level.

    Regarding question #4, the location of the shared memory itself determines the mapping to hardware.

    Regarding question #5, no--operations on volatile variables aren't automatically made atomic between multiple processors.  You would have to protect these operations using atomic access monitors (or GateMP if you are using IPC).

    Regarding question #6--IPC ships with a couple multi-core applications that can be built for C6472.  However, these applications do use BIOS.

    Regards,

    Shreyas

  • In reply to Shreyas Prasad:

    Hello Shreyas,

    Do you have any particular reason to recommed IPC over SMMQT? SMMQT also uses the AAMs for intercore arbitration, and provides simplified API calls via the BIOS MSGQ module.

    I would be greatly interested to know about any differentiation between the two choices, as I have already some progress with SMMQT.

    Regards,

    Viswa.

  • In reply to Viswanath L:

    Thanks a lot  Viswanath L and Shreyas Prasad for your prompt and precise replies. I am going through the SMMQT examlpes to decipher them. As Shreyas has asked, the reasons I am interested to build application without DSP BIOS are:

    1. I want to learn and observe the actions going on in the register level. I know sometimes that sounds a bit impractical and time-inefficient, notwithstanding. Actually previously I stumbled in the same way for firing an ISR in a single core without using DSP BIOS. But later I managed to do that by mixing some C and assembly codes. I am also able to generate interprocessor interrupt from one core and service that from other core without BIOS using IPCGR registers and the corresponding event numbers. So now I am targeting the shared memory access.

    2. I want the whole code to be visile to me(as much as possible). So I am trying to avoid API based abstractions as much as possible.

    3. I want to avoid any overhead from the application code. Though it is said that DSP BIOS comes with a little overhead, still just trying to cope up if the application is manageable without BIOS then its fine. 

    Thanks once again for the replies. Regards,

    AC.

  • In reply to AC53351:

    Viswanath, the main difference between IPC and SMMQT is that SMMQT works with BIOS 5.x and IPC works with BIOS 6.x.  Also, SMMQT has limited functionality and device support compared to IPC.  It does offer message passing via MSGQ and it ships with code to use Atomic Access Monitors on C6472.  IPC is also more portable between multiple devices since hardware details (i.e. atomic access monitors, hardware semaphores, etc) are abstracted away in top-level modules.

    AC, I understand your motivations to avoid BIOS and operate at the register level. FYI (regarding point #2), BIOS6 and IPC are both open source and ship with the source code as well.

  • In reply to Shreyas Prasad:

    Thanks Shreyas for the information.

    1. Can you please tell me the path where the open source libraries are there for BIOS or IPC?

    2. What is the concept of memory bank? Is it there inside the Shared  Memory Controller only(seems like that from the indication of SMC boundary in figure 2 of SPRUEG5C)? Or the whole shared L2 RAM (768KB) is divided into 4 physically separate address spaces called banks in case of C6472(seems like that from sect. 4.2 of SPRUEG5C which says: "SMC divides SL2 RAMs address space into 4 physical pages.")? What is the meaning of "256 bits wide memory bank" mentioned in the SMC controller user guide(figure 2, SPRUEG5C)?

    3. If the memory banks are really 4 segments of the physical memory, then

        (i) What are the address boundaries? Are they(bank-0, bank-1, bank-2, bank-3) equally spaced within the total 0x BFFFF locations of the SL2 RAM of C6472 or something else?

        (ii) Can four different cores read/write different SL2 RAM locations in/through four different banks at the same time(assuming there is no previous request pending)? Or the arbitration logic will come into picture to resolve the conflict and sequentially arrange the requests and give the cores a feeling that the read/write are done simultaneously? Assuming the accesses are not made atomic because they are accessing different locations.

     

    Regards,

    AC.

  • In reply to AC53351:

    You can download SYS/BIOS at http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/sysbios/index.html and IPC at http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ipc/index.html.  Note that you will also need XDCTools since both BIOS and IPC depend on this product.  XDCTools can be downloaded at http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/rtsc/index.html.

    All 3 products contain both built libraries and source code.

    I'm not familiar enough with the memory architecture of C6472 to answer your remaining questions.  I will forward this post to someone more knowledgeable about this topic.

    Regards,

    Shreyas

     

  • In reply to Shreyas Prasad:

    I forwarded your remaining questions and obtained the following details regarding the C6472 SMC:

    C6472 SMC controls 4 banks of memory.  Each bank is 256 bits wide and the banks are interleaved as follows:
    - bank 0: base address + 0:31, 128:159, ...
    - bank 1: base address + 32:63, 160:191, ...
    - bank 2: base address + 64:95, 192:223, ...
    - bank 3: base address + 96:127, 224:255, ...
     
    SMC allows 4 concurrent accesses to 4 different banks.  If there is a bank conflict, it will select one access and let the others wait.


    Regards,
    Shreyas
  • In reply to Shreyas Prasad:

    Hi AC.

    I have EXACTLY the same issues as you with the understanding of the SMC. I have been studying the SMC (SPRUEG5C) as well as the CSL user's guide with slow progress towards understanding how to use the SMC and ensure atomicity. There is clearly also a shortage in proper simple example projects that work 'out-of-the-box'.

    My reason for avoiding SYS/BIOS is that my multi-core application uses the SRIO peripheral in DirectIO mode, and this mode is not supported under SYS/BIOS. Only message-passing mode is supported under SYS/BIOS. There are MANY other perfectly legitimate and sensible reasons to avoid an OS. Generally you can get much better optization and performance out of an application by programming it in bare-board (i.e. NO SYS/BIOS), especially if that application involves the execution of repetitive single tasks, regardless of whether it is multi-core or not. As soon as your application becomes more multi-task oriented, it is strongly advised to move to an OS (Like SYS/BIOS) of some sort.

    I have a bare-board (i.e. NO SYS/BIOS) multi-core application in which all the cores try to access (i.e. read AND write) a common integer variable in SL2 RAM. This variable is used as a semaphore between the cores to arbitrate access to other resources, and simply has the value 1 for a 'busy' condition, and 0 for 'non-busy'. This variable has the simple purpose of indicating to the cores that certain resources are blocked from being accessed, because another core is busy working on them. However, if atomic access to this variable is not guaranteed, it could obviously happen that two cores simultaneously read the value as 0 (non-busy), then simultaneously assert the variable to 1 (busy), and then simultaneously access the other resources, thus defeating the purpose of atomicity to the resources. According to the specifications of the SL2 controller, it supports atomic access monitoring, but I have had no success in understanding how this is accomplished

     

    My question is the following:

    * Is the atomic access to SL2 RAM supposed to be transparent to the end-user, or does it have to be controlled manually?

    In my opinion it should be transparent.

     

    I have been successful in developing this code for the C6474 by using the on-chip SEMAPHORE module, but now I want to migrate the code to the C6472 for performance comparison purposes.

    Have you had any success in the mean-time? Do you have any advice on how this (seemingly simple) task mentioned above can be accomplished/guaranteed? Maybe a different approach? Your help would be greatly appreciated.

     

    Regards.

    Estian.

     

     

  • In reply to Estian Malan:

    Hi,

    atomic access to the SL2 RAM is supported via the instruction set of the device. You'll find an explanation and examples in the TMS320C64x/C64x+ DSP

    CPU and Instruction Set Reference Guide (SPRU732) in the chapter "C64x+ CPU Atomic Operations".


    We offer 3 instructions that work together with shared L2 memory on the TMS320C6472:

    • LL — Load Linked Word from Memory

    • SL — Store Linked Word to Buffer

    • CMTL — Commit Store Linked Word to Memory Conditionally

    How does it work?

    The LL instruction reads a word of memory and prepares to execute an SL instruction. The LL instruction reads a word form memory with a side effect, a link valid flag is set true and the address is monitored. If any other process stores to that address, the link valid flag is cleared. The link valid flag is also cleared if the SL instruction is executed with a different address. The SL instruction buffers a word to be stored to memory by the CMTL instruction. It does not commit the change. Finally the CMTL instruction reads the value of the link valid flag. If the link valid flag is true, the data buffered by the SL instruction is written to memory. If the commit fails, the update must be retried.

     

    I hope that helps.

     

    Kind regards,

    one and zero