This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Run openMP application on TMDXEVM6678L by Desktop-linux-sdk

Other Parts Discussed in Thread: SYSBIOS

Hi all,

   I'm using a TMDXEVM6678L board for a project. Previously I connected this board via JTAG to the PC, and wrote applications in CCS tools along with MCSDK. To run a openMP application, I needed to specify how many cores to use in the configuration file for the board and group these number of cores as one before loading the .out file onto them. That's what I did and it worked.

  Right now I connect the board to PC via PCIe and application is developed under Desktop-linux-sdk, I'd also like to load application through PCIe to DSP to run. (Actually I was asking how to run application in case of using PCIe connection and Desktop-linux-sdk, but haven't figured out yet: http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/331999/1158899.aspx#1158899 ). Suppose I can run one application on one core, how about openMP application on which I'd like to employ multi-threading for multiple cores. Is there a way to do this? How can I do this?

  Thank you very much for any help!

  Regards,

  Jie

  • Does this post help you at all?

    http://e2e.ti.com/support/development_tools/compiler/f/343/p/297898/1039270.aspx#1039270

     

    Are you able to load a program to your DSP via PCIe? On the 8681e board that I have, Advantech has provided software to do this (based on TI software).  I use the Linux driver (v08) from here:

    http://support.advantech.com.tw/Support/SearchResult.aspx?keyword=DSPC-8681&searchtabs=BIOS,Certificate,Datasheet,Driver,Firmware,Manual,Online%20Training,Specification,Utility,FAQ,Installation,Software%20API,Software%20API%20Manual&select_tab=Driver

  • Chris Jagielski said:

    Does this post help you at all?

    http://e2e.ti.com/support/development_tools/compiler/f/343/p/297898/1039270.aspx#1039270

     

    Are you able to load a program to your DSP via PCIe? On the 8681e board that I have, Advantech has provided software to do this (based on TI software).  I use the Linux driver (v08) from here:

    http://support.advantech.com.tw/Support/SearchResult.aspx?keyword=DSPC-8681&searchtabs=BIOS,Certificate,Datasheet,Driver,Firmware,Manual,Online%20Training,Specification,Utility,FAQ,Installation,Software%20API,Software%20API%20Manual&select_tab=Driver

    Yes I think the OpenMP post can help.

    I don't think the EVM6678 has the driver like 8681e. So by using the Linux driver, you can load any applications to DSP on top of a SYS/BIOS?

    Jie

  • Jie Liao- said:

    So by using the Linux driver, you can load any applications to DSP on top of a SYS/BIOS?

    No, it is not loaded on top of SYS/BIOS.  What is loaded is an application that also has SYS/BIOS with it.  Are you familiar with the OS?  It's not quite like how the Linux kernel works.  I suggest a little reading:

    • http://www.ti.com/tool/sysbios
    • http://www.ti.com/lit/pdf/spruhd4

    These videos are great!

    • http://processors.wiki.ti.com/index.php/TI-RTOS_Workshop#Intro_to_TI-RTOS_Kernel_Workshop_Online_Video_Tutorials
    • http://processors.wiki.ti.com/index.php/C6000_Embedded_Design_Workshop#C6000_Embedded_Design_Workshop_Videos_-_NOW_AVAILABLE
  • Chris Jagielski said:

    So by using the Linux driver, you can load any applications to DSP on top of a SYS/BIOS?

    No, it is not loaded on top of SYS/BIOS.  What is loaded is an application that also has SYS/BIOS with it. 

    Ok, I think I know what you mean. I had some experience with freeRTOS on cortex-m3, I think it's similar to SYS/BIOS. Even if the application need to be integrated with SYS/BIOS, can you create a task in SYS/BIOS, of which the job is to 1) wait for a message from the host, 2) get to the memory to run the application which has been loaded to DSP memory according to address specified by the message, 3) store the results of the application and write back a message to host with the address of the results, 4) resume back to wait for a message from host?

    Are you familiar with the OS?  It's not quite like how the Linux kernel works.  I suggest a little reading:

    • http://www.ti.com/tool/sysbios
    • http://www.ti.com/lit/pdf/spruhd4

    No, SYS/BIOS is not like linux. These are great reading resources.

    [/quote]

  • Jie Liao- said:

    can you create a task in SYS/BIOS, of which the job is to 1) wait for a message from the host, 2) get to the memory to run the application which has been loaded to DSP memory according to address specified by the message, 3) store the results of the application and write back a message to host with the address of the results, 4) resume back to wait for a message from host?

    I'm not sure about #2.  Why would you be interested in doing this, do you want to have the option to run multiple "applications" in your DSP?

    The way that I run multiple applications/algorithms in my DSP is this.  My DDR starts at 0x80000000, so I have my Linux PC send a flag to the DSP via PCIE at that address.  Depending on the flag, the DSP will run the corresponding algorithm.  If the flag is not set, then the DSP will just poll forever.

    Void my_function1()
    {
       //do something 1
    }
    
    Void my_function2()
    {
       //do something 2
    }
    
    Void my_SysBios_task( ) {
       while(1)
       {
          if( *(volatile int *)0x80000000 == 0xface)
             my_function1();
    
          if( *(volatile int *)0x80000000 == 0xcade)
             my_function2();
    
       }
    
    }
    
    Int main(Int argc, Char* argv[])
    {
       BIOS_start();
       return (0);     //never gets here
    }

  • Chris Jagielski said:

    can you create a task in SYS/BIOS, of which the job is to 1) wait for a message from the host, 2) get to the memory to run the application which has been loaded to DSP memory according to address specified by the message, 3) store the results of the application and write back a message to host with the address of the results, 4) resume back to wait for a message from host?

    I'm not sure about #2.  Why would you be interested in doing this, do you want to have the option to run multiple "applications" in your DSP?

    Yes exactly. I'd like to run multiple applications but I don't want the DSP to implement the application. I want it to make the DSP a general purpose computing end with SYS/BIOS running on it. You reminded me to concern whether SYS/BIOS could do this as in Linux, I'm not sure, I'll need to search for it.

    The way that I run multiple applications/algorithms in my DSP is this.  My DDR starts at 0x80000000, so I have my Linux PC send a flag to the DSP via PCIE at that address.  Depending on the flag, the DSP will run the corresponding algorithm.  If the flag is not set, then the DSP will just poll forever.

    Void my_function1()
    {
       //do something 1
    }
    
    Void my_function2()
    {
       //do something 2
    }
    
    Void my_SysBios_task( ) {
       while(1)
       {
          if( *(volatile int *)0x80000000 == 0xface)
             my_function1();
    
          if( *(volatile int *)0x80000000 == 0xcade)
             my_function2();
    
       }
    
    }
    
    Int main(Int argc, Char* argv[])
    {
       BIOS_start();
       return (0);     //never gets here
    }

    [/quote]

    What is the flag? Sorry I didn't get it from the code. So from the above basically one dsp core could choose what to run according to the message from the host PC? and all the algorithms are implemented on DSP side right?  If so, I think this method would be one of my options too.

  • The flag for algorithm #1 is "0xface", and the flag for #2 is "0xcade".

    This code, in the DSP, will read in what's located at memory location 0x80000000 and store it to a variable "test_var".

    volatile int test_var = *(volatile int *)0x80000000;

    This code will write the integer 12 to that memory location:

    *(volatile int *)0x80000000 = 12;

      

    Jie Liao- said:

    So from the above basically one dsp core could choose what to run according to the message from the host PC? and all the algorithms are implemented on DSP side right?

    This is exactly correct.  It works very well for my applications, and is so simple.

    Also - note that this method makes it easy for multiple cores to communicate too.  They can talk to each other "across" the DDR, since all 8 cores on your c6678 share the same DDR memory.

    So for my application, I actually have a few cores doing the same thing for one algorithm.  So, they all receive the same flag and run the same code, and they all set their own unique flags when they're done.  Then, another separate core can read those new flags, and run the next algorithm once the previous set of cores are finished.

    I had to use this approach because of the problems I faced getting OpenMP to play nice with the other software.

  • Chris Jagielski said:

    This is exactly correct.  It works very well for my applications, and is so simple.

    Also - note that this method makes it easy for multiple cores to communicate too.  They can talk to each other "across" the DDR, since all 8 cores on your c6678 share the same DDR memory.

    So for my application, I actually have a few cores doing the same thing for one algorithm.  So, they all receive the same flag and run the same code, and they all set their own unique flags when they're done.  Then, another separate core can read those new flags, and run the next algorithm once the previous set of cores are finished.

    I had to use this approach because of the problems I faced getting OpenMP to play nice with the other software.

    Chris, thank you for your clear explanation. I think I got it. Your method provides a really very good reference for my project.

    I have one more question, with your method you'll need to load the specific application code to each specific core before you sending any data to run, since you have up to 32 cores, theoretically use can accomplish your algorithm using such many cores with very fast computation speed. Will you reload some of the application code in case of that what have been on the DSP cores are not enough for your algorithm?

  • Jie Liao- said:

    with your method you'll need to load the specific application code to each specific core before you sending any data to run

    This is correct.

    Jie Liao- said:

    Will you reload some of the application code in case of that what have been on the DSP cores are not enough for your algorithm?

    For my applications, I won't be able to "re-load" the DSP cores during execution.  I'm adding all of my algorithms at once, and all that C code is stored on the stack.  So really I just have to make sure that my DSP code fits in the memory allocated for the stack, which I've chosen as L2 memory.  So my code and its small variables (and scratch space) are stored on the stack, and I put my larger variables into various heaps (MSMCSRAM, or DDR3 usually, since they have so much memory space).

    The other consideration with my 32 cores is execution speed.  Each core must finish it's part of the overall algorithm in a certain amount of time, such that my high-level frame rate is preserved.  So some of my code is "spread out" over a few cores, and other code is just one one core to compute one single algorithm "task".

    This is the fun part of development!

  • Also I should mention that TI has many ways to facilitate IPC via software in the c6678, and they will be faster and more efficient that my method.  I chose my methods mainly due to my limited time constraints in development.

    -->    http://processors.wiki.ti.com/index.php/IPC_Users_Guide

  • Yes, I see. Thanks a lot for your information.