• Not Answered

Multi-core programming in C6472 using shared memory

Hello,

I want to build multicore application in C6472 using the shared memory. I am using CCS v4 as the compiler. I have read the document SPRUEG5C. The arbitration logic seems to be quite complex to me. In this regard I have some questions (I'm sorry for posting them if they appear silly) .

1. Does using optimization level-3 in CCSV4 imply that the shared memory access is configured to be pre-fetchable always? In that case I think I cannot use atomic access monitor in optimization level-3. Because SPRUEG5C says "Atomic access should go only to non-prefetchable address spaces." Am I correct?

2. Other than the configuration of prefetchable/nonprefetchable part of the shared memory, the power down issues and the fault indications, do I need to use the SMC memory mapped registers manually from my code or they will be used by the arbitration logic hardware only?

3. Reading the document for SMC controller it seems to me that the user 'talks' to the atomic access monitor and the atomic access monitor controls the arbitration logic hardware. The user cannot directly control the arbitration logic hardware without using atomic access monitor. Am I right?

4. While programming do I need to specify somehow the per-bank SMC controller through which I am trying to access the shared memory or the location of the shared memory I'm trying to access itself indicates the hardware about the per-bank controller through which the request should go?

5. From the C code if I want to access a shared memory location by a declaration like "#define VALUE (*((volatile unsigned int *) 0x00200000))" and then "VALUE=0x1234", when this code is compiled will it automatically pass the write(or read) request through the atomic access monitor (by using LL, SL, CMTL instructions in assembly) or I have to take some other step to ensure atomic access? If so then what are such steps?

6. Can anyone please send me an example project where shared memory is used by multiple cores preferably without using DSP-BIOS?

 

Regards,

AC.

 

 

19 Replies

  • In reply to one and zero:

    Hi,

    As per my understanding (developed by reading and discussion in this thread) the user 'talks' to the atomic access monitor and the atomic access monitor controls the arbitration logic hardware. The user cannot directly control the arbitration logic hardware without using atomic access monitor. In my application more than one cores were trying to read a location simultaneously but that was not a write attempt. So I could bypass the requirement of atomicity. I used the shared memory location as a simple memory mapped register and that worked. If not mentioned, the code generation tool will not generate LL, SL, CMTL instructions (such as in my case I did not want atomicity, so SL2 access was using simple load store instructions). Also the post by Shreyas Prasad helped me understand the interleaved memory structure which is helpful for VLIW architecture. For large chunk of data (~2000) I checked the time required to write this chunk by different cores to different non intersecting regions in SL2. The overall time is almost same for simultaneous try of 1,2,3,4 cores. For more cores this time increases. This re-ensures the 4-bank structure.

    But as per my understanding harping on the same string as one and zero, if DSP BIOS is to be avoided then the only way to ensure atomicity is to use LL, SL, CMTL. Now next question comes whether to embed these assembly codes inside C code using 'asm'? I was advised in forum not to do so. Because for doing that I need other registers also. But don't know whether the cross compiled C code is doing  something with that register or not. So there comes question of push pop into stack or else knowing the way the registers are handled by the code gen. tools. Things will be complicated. So I think writing functions using these assembly instructions and calling them from C code in order to maintain atomicity will be a better option. Though in that case also there are some restrictions on the register usage (can be found in 'optimizing compiler' doc) still I think (not tried) that will be a simple way.

    Regards,

    AC.

  • In reply to AC53351:

    Hi AC,

    you're absolutely right. You shouldn't use the asm() in your C-code. What you can do is copy the exa mples in an .asm file. You can call the assembler functions from C.

    For more info on how to mix C and assembly please have a look in TMS320C6000 Optimizing Compiler (SPRU187), Chapter 7.5

    Kind regards,

    one and zero

  • In reply to one and zero:

    Hi One and Zero / AC,

    Thank you for your prompt replies and advice!

    Wow! I really can't believe that things can be so complicated for something so simple. Atomicity is certainly much simpler with a hardware semaphore like the C6474...

    Have the TI developers considered developing a CSL API that would make this process a little simpler, and perhaps include in a new release of the CSL?

     

    Ok, so before I spend a few weeks attempting to get this working the way you suggest, I would like to know:

    1. Is there perhaps another way of developing a semaphore that guarantees atomicity towards common resources for the C6472? I'll have you know that I have studied all the available documents on the TI website that address this, but to me, none of them seem as sound and reliable as a hardware solution.
    2. Which 'examples' are you referring to before, and where exactly can I get hold of them?

    Regards.

    Estian.

     

  • In reply to Estian Malan:

    Hi.

     

    Ok, so based on the examples in the documentation you suggested, I have attempted a very simple approach and it seems I am stumbling at the very first hurdle. Here is the simple C code for my main routine:

     

    #include <stdio.h>

    extern asmfunc(void);

    void main(void)

    {

     asmfunc();

    }

     

    And here is the simple assembly function (.asm file) I created, and simply included into my CCS 4.0 project:

     

    .global _asmfunc

    _asmfunc:

    NOP 4

     

    I am getting the fllowing compiler error:

     

    "../asmfunc.asm", ERROR!   at line 1: [E0002] Illegal mnemonic specified

    .global _asmfunc


     

    What am I doing wrong? I am also unsure as which Build Options settings I have to fiddle with in CCS 4.0. There are so many parameters, its making my head spin... :)

     

    Please help!

    Estian.

     

     

     

  • In reply to Estian Malan:

    Hi Estian,

    please try like that:

         .global _asmfunc

    _asmfunc:

    NOP 4

     

    Kind regards,

    one and zero

  • In reply to one and zero:

    Hi one and zero.

    I cannot see the difference between what you suggested and my previous post, but your code seems to work!

    I copied your code directly into my .asm file and recompiled

    What is the difference. The spacing?

    Estian.

  • In reply to Estian Malan:

    Hi Estian,

    yes it's the spacing. Only labels can start in the first column ....

    Kind regards,

    one and zero

  • In reply to one and zero:

    Hi.

    Ok, so I'm officially lost again.

    I have managed to replicate the simple examples of SPRU187Q (Ch. 7) with success.

    However, I am unable to successfully implement the shared accumulator SL2 atomic operations of SPRU732H, (Ch 9, sec. 9.3.2).

    One again, here is my C code with main routine:

     

    #include <stdio.h>

    extern asmfunc(int * );

    #pragma DATA_SECTION(gvar,".SL2"); // SL2 in Linker Command Starts at 0x10200000

    int gvar = 0;

    void main(void)

    {  

     asmfunc(&gvar);

    }

     

    And here is the .asm file:

     

    .global _asmfunc

    _asmfunc:

    LL *A8, A6 ;load linked (lock) and store in A6

    NOP 4

    ADD A6,1,A6 ;add one to A6 and store back into A6

    SL A6, *A8 ;new value to store back

    CMTL *A8, A1 ;commit the store

    NOP 4

    [!A1] B _asmfunc ;if commit failed, try again

     

    I'm getting the following compiler errors:

     

    "../asmfunc.asm", ERROR!   at line 4: [E0004] Memory operand must be B side register LL *A8, A6 ;load linked (lock) and store in A6

    "../asmfunc.asm", ERROR!   at line 7: [E0004] Memory operand must be B side register SL A6, *A8 ;new value to store back

    "../asmfunc.asm", ERROR!   at line 8: [E0004] Memory operand must be B side register CMTL *A8, A1 ;commit the store

     

    I naively attempted to change all the A registers to B registers, which seems to compile. However, the function simply hangs up during execution, and according to Table 7-2 (SPRU187Q) these changes does not make sense in any case..

    Please help and explain!

    Estian.

  • In reply to Estian Malan:

    Estian,

    You need to be very careful when using registers in Assembly called from C.  When you call asmfunc and pass the address of gvar the compiler and assembler have an agreement as to what register the address of gvar is going to be stored in.  You can't just switch all of the registers to B registers and hope for this to work, because the Compiler is still going to store the gvar address value in the same location.  From the looks of this code, the assembly routing expects the address of gvar to be in register A8.  Whether or not this is the correct register, I am not sure.  Check the Compiler Guide for the parameter passing conventions or debug this on a Simulator or hardware and look at the assembly instructions generated by the compiler in Main just before calling asmfunc and see where the address of A8 is being stored. 

    Regarding the errors that you are getting from the assembler, you also need to be familiar with the Instructions that you are using.  See the C64x+ Cpu Instruction Set Users Guide for details on the available instructions for the 6472 (I believe you referenced this document.)  When you look at the LL instruction, it specifies that it needs to use the .D2 unit.  This unit gets it's operand from the B register file.  So, one of the operands must be in a B register.  I _think_ it's the pointer value, but I'm not 100% sure.  Again, it never hurts to try this, compile it, and then step through and debug it to see if the registers get updated as expected.

    Regards,
    Dan