This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6run, c6runapp, c6runlib, C6xxx, multithreading support.

Other Parts Discussed in Thread: C6RUN-DSPARMTOOL

Is there is a significant limitation in the C6Run/C6runapp/C6runlib implementation with regards to multi-threading?

 

 

I tried to modify the “emqbit” example both under the “c6runapp” and “c6runlib” subdirectories to take advantage of multiple threads.    The idea is to have the DSP perform multiple independent FFT calculations simultaneously invoked from different Linux threads within a single application.  I create three threads that make three independent calls to the FFT functions.  I modified the global variables used by the FFT functions so that each instance of the FFT functions would have a different set of global variables.

 

The ARM version of the modified code runs the multiple thread concurrently with no problems.   The output of the ARM version is shown below:

 

root@arago:~# ./cfft_arm

--->Thread number: 2

--->Thread number: 1

N=16,nTimes=100: 0.00554 s

--->Thread number: 0

N=16,nTimes=100: 0.005627 s

N=16,nTimes=100: 0.02564 s

N=32,nTimes=100: 0.033943 s

N=32,nTimes=100: 0.054209 s

N=32,nTimes=100: 0.054368 s

N=64,nTimes=100: 0.093689 s

N=64,nTimes=100: 0.094466 s

N=64,nTimes=100: 0.09396 s

N=128,nTimes=100: 0.239032 s

N=128,nTimes=100: 0.239148 s

N=128,nTimes=100: 0.240635 s

N=256,nTimes=100: 0.5425 s

N=256,nTimes=100: 0.562583 s

N=256,nTimes=100: 0.545791 s

N=512,nTimes=100: 1.23677 s

N=512,nTimes=100: 1.25717 s

N=512,nTimes=100: 1.26549 s

N=1024,nTimes=100: 2.82154 s

N=1024,nTimes=100: 2.82273 s

N=1024,nTimes=100: 2.90611 s

N=2048,nTimes=100: 6.36212 s

N=2048,nTimes=100: 6.36486 s

N=2048,nTimes=100: 6.53283 s

N=4096,nTimes=100: 14.2879 s

N=4096,nTimes=100: 14.3137 s

N=4096,nTimes=100: 14.6409 s

N=8192,nTimes=100: 31.1867 s

N=8192,nTimes=100: 31.248 s

N=8192,nTimes=100: 31.8663 s

N=16384,nTimes=100: 67.0542 s

N=16384,nTimes=100: 67.2279 s

N=16384,nTimes=100: 66.8199 s

---->Thread 0 returns: 0

---->Thread 1 returns: 0

---->Thread 2 returns: 0

root@arago:~#

 

However, the DSP version fails identically in the case of C6runapp and also C6runlib. The failed output is shown below.

 

root@arago:~# ./cfft_dsp

--->Thread number: 2

--->Thread number: 1

--->Thread number: 0

PROC_setup () failed. Status = [0x8000800b]

C6RUN_IPC_create() failed!

PROC_setup () failed. Status = [0x8000800b]

C6RUN_IPC_create() failed!

 

So my question is it is possible to use the c6xxxx interfaces to execute more than one function at a time in the DSP?   

If not how difficult would it be to add this functionality?

 

I also tried renaming the FFT associated functions so that each thread function had a unique name, but the results were identical.

 

It seems like this should be a basic feature any software/hardware interface that allows a second CPU (the DSP) to act as a co-processor  for the  multi-threaded / multi-tasking OS (Linux).

 

 

  • Orlando,

    A few thoughts

    Orlando Perdomo said:
    Is there is a significant limitation in the C6Run/C6runapp/C6runlib implementation with regards to multi-threading?
     

     

    In a sense, yes.  The current implementation only supports one function call in flight at a time.  Lifting this restriction is actually next on the roadmap, as well as adding APIs for an asynchronous calling mechanism

     

    Orlando Perdomo said:
     
    I tried to modify the “emqbit” example both under the “c6runapp” and “c6runlib” subdirectories to take advantage of multiple threads.    The idea is to have the DSP perform multiple independent FFT calculations simultaneously invoked from different Linux threads within a single application.  I create three threads that make three independent calls to the FFT functions.  I modified the global variables used by the FFT functions so that each instance of the FFT functions would have a different set of global variables.
     

     

    I assume that you are using pthreads to do this, but I'm curious to see exactly how that looks, especially in the c6runapp case (where all the code sits on the DSP and pthread calls wouldn't make sense).

     

    Orlando Perdomo said:
    However, the DSP version fails identically in the case of C6runapp and also C6runlib. The failed output is shown below.
     
    root@arago:~# ./cfft_dsp
    --->Thread number: 2
    --->Thread number: 1
    --->Thread number: 0
    PROC_setup () failed. Status = [0x8000800b]
    C6RUN_IPC_create() failed!
    PROC_setup () failed. Status = [0x8000800b]
    C6RUN_IPC_create() failed!
     


    The above is particularly curious, since it would seem each thread was trying to initialize the DSP (load the DSP image, start the IPC interface, etc.).  That should happen only once in an application, regardless of how many threads are trying to call the DSP.  The behavior you show seems like three separate ARM/Linux processes trying to make use of the DSP, rather than three separate threads of the same process.

    Would you mind sharing your code or outlining exactly how you made the application multithreaded?

    Regards, Daniel

     

  • 8715.thread_emqbittar.doc

    Thanks for the quick response.   I have a tar file with the modified C6runlib code, again not elegant but I believe it should have worked.   Please rename file from .doc to .tar.   It is multiple threads using pthreads library on Linux.  Perhaps there is a way of preventing it from trying to initialize the DSP for each thread.

    By the way I'm using c6run_0_94_05_06 . Are there newer versions?

    Thanks

    - Orlando

  • Orlando,

    Thanks for posting this.  I will take a look at it and get back to you.

    Yes there are newer versions, the most recent being 0.97.01.01.  You can get it from the Gforge file release page (https://gforge.ti.com/gf/project/dspeasy/frs/) or from the TI download site (http://focus.ti.com/docs/toolsw/folders/print/c6run-dsparmtool.html).

    Instructions for how to integrate these into an SDK or to use in a stand-alone fashion can be found in the wiki documentation (http://processors.wiki.ti.com/index.php/Getting_Started_With_C6Run#System_Setup)

    Regards, Daniel

  • Thanks again,

    Here is another less complicated example called "simple".

    It basically creates three trivial threads.  One add 5 to a given number the other substracts 5 and the third multiplied the given number by 5.  This is done 5000000 times in each thread.  After each thread returns the result is printed.  Again rename simpletar.doc to simple.tar7343.simpletar.doc

     

    ARM version run:

    root@arago:~# ./simple_arm
    --->Thread number: 2
    --->Thread number: 1
    --->Thread number: 0
    ---->Thread 0 returns: 0
    -----> Add: 15
    ---->Thread 1 returns: 0
    -----> Sub: 5
    ---->Thread 2 returns: 0
    -----> Mult: 50

    DSP version run:

    root@arago:~# ./simple_dsp
    --->Thread number: 2
    --->Thread number: 1
    --->Thread number: 0
    PROC_setup () failed. Status = [0x8000800b]
    C6RUN_IPC_create() failed!
    PROC_setup () failed. Status = [0x8000800b]
    C6RUN_IPC_create() failed!
    root@arago:~#

     

  • Orlando,

    It seems the initialization routines are not (yet) thread-safe. I guess this shouldn't be too suprising, since we haven't had a test case for this until now.  I've adapted you simple app as a test case and will use it to check this feature when it is fully implemented.  In the meantime, the workaround for this should be to init the C6Run framework in the main thread/function before starting any threads that will make use of the DSP.  You can do this by calling the following function:

    int C6RUN_libInit( void );

    I will file a bug on this so it can be tracked to completion.

    Regards, Daniel

  • Orlando,

    I've filed a bug to track this issue: https://gforge.ti.com/gf/project/dspeasy/tracker/?action=TrackerItemEdit&tracker_item_id=1330&start=0

    I've also checked this example into the trunk showing a suggested use based on your simple.tar: https://gforge.ti.com/gf/project/dspeasy/scmsvn/?action=browse&path=%2Ftrunk%2Ftest%2Fc6runlib%2Fmulti_threaded%2F

    Note that one important issue with your sample was corrected, since you were trying to pass pointers to ARM global variables to the DSP.

    To resolve the bug, we will need to remove the requirement to explicilty do the init call in the main routine, either by forcing the call to happen upon program load, or by making the init thread-safe (through something like pthread_once()).  The approach we take will likely depend on whether we expect users to explicitly be shutting down the DSP and then restarting the DSP during the application runtime.  If this usage is supported, it will still have to be done in a single thread, and any other threads would have to prevent from calling the DSP while it is off.

    Regards, Daniel

  •  

    Thanks for your help, this is great news. Keep me posted when the bug is resolved...

    A couple of related issues.

    1.- I was trying to use a #define to trigger the call to C6RUN_libInit()

    main()

    {

    ...

    #ifdef  _TMS320C6X

              C6RUN_libInit();

    #endif

    ...

    }

    But it does not seem to work. What #define is reliable for use with the l138 DSP?  I looked around and found __TMS320C6X__ but it also does not work.

     

    2.- What debugging facilities are available when creating a c6runlib library/application.  Any special logging any debugger support? Can we look at both the ARM and DSP side?

     

    Thanks,

    - Orlando

  • Regarding #define to identify code specific for the DSP.  We can use the compile time  -D option to define for example -D DSP_ONLY and then check for it in the code... But my concern is... How is it that the system include files and other assorted examples are using _TMS320C6X and __TMS320C6X__?

     

    Another item that is an issue...   If I have more than one thread executing code on the DSP at the same time.  How do I set the priority on the DSP side?  I can use Pthreads library on the ARM side to set priority and define other real-time behaviors but that will only affect the ARM side (or ARM portion of the thread).  Is there even a crude way to control priority on the DSP of a remote function on the DSP side?  If priority on the DSP side is not yet implemented is there a workaround.

     

    - Orlando 

  • Orlando,

    I'm going to refer to the example I referenced earlier, which has been checked into the source tree.  If you look at that example you can see that we use a compile-time #define to select different behavior.  The reason we do this is that the main() function is always run on the ARM and therefore is always compiled with GCC.  We are using the c6runlib scenario, where only a library of functions have been moved to the DSP.  If we used c6runapp, moving all the code to the DSP, we could not use the pthreads library because it doesn't exist for the DSP.

    Orlando Perdomo said:

    Another item that is an issue...   If I have more than one thread executing code on the DSP at the same time.  How do I set the priority on the DSP side?  I can use Pthreads library on the ARM side to set priority and define other real-time behaviors but that will only affect the ARM side (or ARM portion of the thread).  Is there even a crude way to control priority on the DSP of a remote function on the DSP side?  If priority on the DSP side is not yet implemented is there a workaround.

    Right now, only one ARM thread can call the DSP at a time.  The first thread that calls the DSP will block waiting for the DSP to return the function result.  The other threads will attempt to call the DSP, but they will block since we only allow one function to be in flight at a time.  So right now this is not an issue.  When we truly add multi-threaded support, where multiple functions can be queued-up and be in flight for processing, then we will need to also deal with thread priority.  This might be tricky though since the DSP OS is a real-time preemptive scheduling kernel, so higher priority threads would always run over lower-priority ones, since there is no time-slicing.  But I think as an initial design, thread priority from the ARM would get transmitted to the DSP so associated function calls would be made at a higher priority.

    Regards, Daniel

     

  • Any chance you can give me an idea of when an early release of the true multithread and eventually priority based threading support may be available?  I don't mind starting a project without support for this if I have an idea of when this will be available.  At least is it one of the next items in the immediate queue to be worked on?

  • Orlando,

    We will be publishing a formal roadmap to the wiki soon, but this is the main feature to be added for the next major quarterly release (will be 0.98.xx.yy).  That is currently planned for near the end of Q2, so we are talking about a June or July timeframe.

    Regards, Daniel

  • Thanks,

    One last question regarding debugging.  Are there any techniques or tools that can be used to debug both sides (ARM and DSP) or just DSP side while running under C6runlib.  Also related.  How can I take advantage of the the --C6Run:debug option?

    - Orlando

  • Orlando,

    Until now, debugging has been somewhat rudimentary.  To see what the DSP is doing, you can connect with an emulator to the DSP core and monitor its progress.  You can also use printf statements on the DSP to output to the ARM console, which is a rather heavy, but often effective, way to track the DSP code.

    The --C6Run:Debug (when used with the c6runlib-ar tool) option will cause the built library to include all debug versions of the various underlying components.  So you'll see a lot of debug output on the console, but it is all framwork debug info, so it may not be especially useful for application debug (though it can be very useful for the C6Run developer).

    Regards, Daniel

  • Where can I find more detailed information about the DSP compiler used with C6run, C6runlib?    Specifically, details about any C extensions, pragmas, or other features specific to the capabilities of the DSP.     Is this compiler the same as that provided with Code Composer?      Can code developed with the code composer compiler be easilly compiled with the DSP compiler used with c6runlib?  Basically are the features or other extensions the same?

    Thanks,

     - Orlando Perdomo

  • Orlando,

    The C6Run tools wrap around the standard Texas Instruments C6000 code generation tools (the TI CGT). This is the only compiler TI produces for the C6000 architecture and is the same that is used under Code Composer studio.  The actual compiler executable is cl6x.

    So anything you do (code-wise) under Code Compose Studio, you can do with code compiled with the front-end C6Run tools.  The main difference is that the C6Run tools present an interface that more closely matches GCC, and we apply some default options to the TI compiler when building code to keep up that appearance.

    Regards, Daniel

  • Daniel,

    One more idea/question regarding multithreading.   I understand that currently only one call from the ARM side is serviced at a time on the DSP side.  However, can that one DSP call start multiple worker threads (completely within the DSP without any involvement from of the ARM CPU  that eventually return data to the main DSP thread, and then the main DSP thread returns data to the ARM?   That is as far as the ARM CPU there is only one DSP thread.

     

    ARM ->MainDSPThread starts

                                                    -> DSP thread1

                                                    -> DSP thread2

                                                   -> DSP thread 3

                      All sub DSP threads finish

    ARM <-MainDSP Thread returns to ARM

    Notice how there is only one remote procedure call from the ARM into the DSP to activate the main DSP Thread.  That is from the ARM's point of view only one thread is executing on the DSP.

    Thanks,

     

     - Orlando

  • Orlando,

    Yes, this should work.  Note that you'll have make use of the BIOS task APIs (and other thread synchronization APIs), which will mean that you will need to reference the BIOS header files when compiling the code and make sure the tool knows where to find them (using -I option).  I have not tried to do this personally, but can't think of any reason why it wouldn't work.

    Regards, Daniel