This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMP System Analyzer

Other Parts Discussed in Thread: SYSBIOS

Hi,

I am working with a C6678 and I tested the OpenMP Example 1 of the CCS Studio: "Example that demonstrates using UIA with OMP". Unfortunately when I debug the programm I get weird messages in the system analyzer, but not the expected ones.

I checked other posts and tried the following:

  • switched to CGT 7.4.0
  • added the gel file in the target configuration
  • unchecked the box in the Auto Run Options "On a program load or restart"

I attached the program and the log file.

What can I do to ghet OpenMP working together with the System Analyzer?

 

Kind regards,

 

Timon

2727.Logs.zip

5001.OpenMP_Example1_UIA.zip

 

Used configuration:

  • TMS320C6678
  • Windows 7
  • CCS: 5.3.0.00090
  • Compiler: 7.4.0
  • IPC: 1.25.0.04
  • MCSDK 2.01.02.05
  • System Analyzer (UIA Target) 1.00.03.25
  • Hello,

    The above mentioned problem is not resolved, yet. I tried different configurations, but I was unable to run an OpenMP program together with the system analyzer.

    Can anyone help?

    Kind regards,

    Timon

  • Hello, 

    Please, could anyone answer my question?

    Regards,

    Timon

  • Timon,

    Sorry for the slow response. This was redirected to me but I was on vacation.

    You mentioned  'getting weird messages but not the expected ones' can you clarify what you're expecting. I opened the attached log file and it seems to have valid data (per screenshot below).

    Regards,

    Imtaz.

  • Hi Imtaz,

    Thanks for your reply. In the example code are some log_write2 calls, for example:

    /* GENERATE REFERENCE TIME */
    Log_write2(UIABenchmark_start, (xdc_IArg)"Reference time start", NULL);
    for (i = 0; i < NUMLOOPS; i++) {
    delay(DELAYLENGTH);
    }
    Log_write2(UIABenchmark_stop, (xdc_IArg)"Reference time stop", NULL);

    So, I expected to have a Message 'Reference time start' in the System Analyzer. And I would expect to see somewhere the reference time. Or am I completly wrong?

    In the meantime there arised a second problem.

    I started from the OpenMP Example 3 (Matrix Vector Multiplication) which was working. Then I enhanced this program with some System Analyzer functions like below:

    /* Logging */
    #include <xdc/runtime/Log.h>
    #include <xdc/runtime/Diags.h>
    #include <ti/uia/events/UIAEvt.h>
    #include <ti/uia/events/UIABenchmark.h>

    ....

    Log_write1(UIABenchmark_start, (IArg)"Parallel_Part");

          /* parallel part of the code */

    Log_write1(UIABenchmark_stop, (IArg)"Parallel_Part");


    And the lines below I added in the *.cfg file.

    /* ----------------------------------System Analyzer --------------------------------------*/

    var LoggingSetup = xdc.useModule('ti.uia.sysbios.LoggingSetup');
    LoggingSetup.eventUploadMode = LoggingSetup.UploadMode_JTAGRUNMODE;
    LoggingSetup.loadLogging = false;
    LoggingSetup.mainLoggerSize = 0x40000;
    LoggingSetup.sysbiosTaskLogging = false;
    LoggingSetup.overflowLoggerSize = 0x8000;

    var UIAEvt = xdc.useModule('ti.uia.events.UIAEvt');
    var UIABenchmark = xdc.useModule('ti.uia.events.UIABenchmark');


    But then when I started I did not get the desired output as described in this guide. I had no more print statements in the console and also no logs in the System Analyzer. How can I get the System Analyzer and OpenMP working together?

    Kind Regards,

    Timon

  • Timon,

    I am not familar with  "Example that demonstrates using UIA with OMP" ... can you point me to where you got this.

    You are correct to expect the benchmark events in SA Log View. The log file you provided only had 437 records, did you collect a larger dataset  and still did not see the Benchmark events. btw, if you do not need semaphore/taskswitch events etc. you can turn them off to reduce the traffic.

    Regards,

    Imtaz.

  • Hi Imtaz,

    The example I refer to can be loaded in CCS (File/New/CCS Project) as you can see in the following print screen:

      

    I turned off the taskswitch events by adding the following line in the *.cfg file:

    LoggingSetup.sysbiosTaskLogging = false;

    As result I did have no more messages at all. Also the expected log messages did not appear.

    Why is this program (or the System Analyzer) not executing correctly? What do I have to change??

    Second problem

    Could you please make me some suggestion how to solve the second problem I also mentioned in the last post. 

    (Summary: I worked with the OpenMP Example 3 (matrix vector multiplication), as you see in the print screen above (3rd example). Then I added some Log_writexxx statements to benchmark the program with the System Analyzer. This had the effect, that I had neither outputs on the console nor events on the System Analyzer log.).

    I attached the program.8547.OpenMP_MatVecMul.zip

    Could you please have a look on this? What do I have to change to get this OpenMP example program running together with the System Analyzer?

    Kind regards,

    Timon

  • Timon --

    Which version of OMP (and BIOS, UIA, etc.) are you using?

    Thanks,
    -Karl-

  • Hi Karl,

    I am using the following configuration:

    • TMS320C6678
    • Windows 7
    • CCS: 5.3.0.00090
    • Compiler: 7.4.0
    • IPC: 1.25.0.04
    • MCSDK 2.01.02.05
    • System Analyzer (UIA Target) 1.1.1.14
    • OMP: 1.1.3.2
    • SYS/BIOS: 6.34.4.22
    Regards,
    Timon
  • Hi Timon,

        Benchmarking events are controlled by the Diags_ANALYSIS flag, which by default is disabled.  The default setting for LoggingSetup. mainLogging is true, which should enable the Diags_ANALYSIS flag for the Main module (which covers all non-XDC code), however all other modules will still have this flag disabled.  Could you please add the following to your .cfg file to ensure that the default setting for the Diags_ANALYSIS flag is configured to enable logging of these events?

    var Diags = xdc.useModule('xdc.runtime.Diags');

    var Main = xdc.useModule('xdc.runtime.Main');

    var Defaults = xdc.useModule('xdc.runtime.Defaults');

    Main.common$.diags_ANALYSIS = Diags.ALWAYS_ON;

    Defaults.common$.diags_ANALYSIS = Diags.ALWAYS_ON;

    Section 5.4.2 of the System Analyzer 1.1 User's guide (spruh43d.pdf in the docs folder of the UIA package) has more info on this topic.

    Does this resolve the problem you are seeing?

    Regards,

      Brian

    ( I'm in the process of setting up a system with the same configuration you are using to try to get a better understanding of the problems you are encountering and will post again when I have more info).

  • Hi Brian,

    Thanks for your reply. I testet your suggestions, but I couldn't recognize any difference. The expected log messages do not appear in the System Analyzer log.

    Do you have further suggestions to solve this problem?

     

    Kind regards,

    Timon

     

  • Hi Timon,

         I've got a C6670 board with your software configuration up and running, and it does look like there are some things you will need to be careful with.

    1) adding a 'do forever' loop to avoid 'abort' when the CPU0 program completes.

    The way the omp_uia_example.c file in the "Example that demonstrates using UIA with OMP" project is written, it will abort once the for (n=1; n <=numProcs; n++) loop completes.    I added the following to the main() function after the for loop in order to prevent  CPU0 exiting with abort so that I wouldn't think it had crashed. 

        while(1){
            delay(DELAYLENGTH);
        }

    2) Ensure that the gel Global_Default_Setup script is run

    After I connect to all cores on the target, the following is displayed in the console window:

    C66xx_0: GEL Output: Setup_Memory_Map...
    C66xx_0: GEL Output: Setup_Memory_Map... Done.
    C66xx_0: GEL Output: No initialization performed since bootmode = 0x00000005
    C66xx_0: GEL Output: You can manually initialize with GlobalDefaultSetup

    Is this what you see? 

    In order to ensure that the CPU subsystems are properly initialized, you should run Scripts / EVMC668L_Init Functions/Global_Default_Setup .

    3)  Loading the program.

    The program runs from local L2 SRAM on each of the CPUs and uses a Shared Region for IPC.   The order that the program is loaded is important, since CPU0 handles the Shared Memory region initialization.   What generally works well is to select all CPUs, connect to them all, reset them all and then to load the program into all of the CPUs.  You should see CPU 0 run to main and halt, and the other CPUs will stay running.   Is this what you are seeing?

    4) Starting System Analyzer and Filtering events

    Once CPU0 has hit main, run System Analyzer (Tools / System Analyzer / Live,  Start)

    If you have not set LoggingSetup.sysbiosTaskLogging = false; in your .cfg file, you will see a lot of task switch events.  To filter these out so that you can easily see the benchmarking events, configure a filter for the "Live Session: Logs" view by clicking the Filter button at the top of the view  and configuring it to show only events logged by the logger named Main Logger:

    To run the filter, click the button in the dialog labelled "Filter", and then click the Close button to hide the dialog.  Now run CPU0 and you should see the events you are interested in displayed properly:

    To cut down on the number of events that are logged, you can add the following line to your .cfg script (as you mentioned in one of your posts):

    LoggingSetup.sysbiosTaskLogging = false;

    I've verified that this works with the "Example that demonstrates using UIA with OMP" project.

    I'm starting to look into the problems you were having with the OpenMP Example 3 (matrix vector multiplication) and will post again once I have that working (or not).

    Regards,

      Brian

  • Hi Timon,

        I made the changes that you described in your post to the OpenMP Example 3 (matrix vector multiplication) for the C6670 and it runs ok for me - output to console, etc - when I do a fresh launch.   I did notice a couple of problems when trying to re-load and re-run the project however:  If you just do a CPU reset on all CPUs and then re-load the project, sometimes you will get emulator errors such as "Trouble writing to Register PC" and "Failed to remove the debug state".  Once that happened, I had to restart CCS and power-cycle the board in order to be able to work with the target.  On another occassion, the following message was displayed on the console and CPU0 exited:

    ti.sdo.ipc.SharedRegion: line 331: assertion failure: A_idTooLarge: id cannot be larger than numEntries
    xdc.runtime.Error.raise: terminating execution

    I believe this may be caused by loading the project without clearing out the shared memory data structures.  I have not seen this when starting up properly.

    Here's my modified main function with the event logging added.  It's a bit different than the omp_matvec.c file that you posted - the benchmark start and stop are around the parallelized code instead of the for loop that prints out the starting values of the matrix.

    #include <ti/omp/omp.h>
    #include <stdio.h>
    /* Logging */
    #include <xdc/runtime/Log.h>
    #include <xdc/runtime/Diags.h>
    #include <ti/uia/events/UIAEvt.h>
    #include <ti/uia/events/UIABenchmark.h>

    #define SIZE 10


    void main()
    {

        float A[SIZE][SIZE], b[SIZE], c[SIZE], total;
        int i, j, tid;

        /* Initializations */
        total = 0.0;
        for (i=0; i < SIZE; i++)
        {
            for (j=0; j < SIZE; j++)
                A[i][j] = (j+1) * 1.0;
            b[i] = 1.0 * (i+1);
            c[i] = 0.0;
        }
        printf("\nStarting values of matrix A and vector b:\n");
        for (i=0; i < SIZE; i++)
        {
            printf("  A[%d]= ",i);
            for (j=0; j < SIZE; j++)
                printf("%.1f ",A[i][j]);
            printf("  b[%d]= %.1f\n",i,b[i]);
        }
        printf("\nResults by thread/row:\n");

        Log_write1(UIABenchmark_start, (IArg)"Parallel_Part");

        /* Create a team of threads and scope variables */
    #pragma omp parallel shared(A,b,c,total) private(tid,i)
        {
            tid = omp_get_thread_num();

            /* Loop work-sharing construct - distribute rows of matrix */
    #pragma omp for private(j)
            for (i=0; i < SIZE; i++)
            {
                for (j=0; j < SIZE; j++)
                    c[i] += (A[i][j] * b[i]);

                /* Update and display of running total must be serialized */
    #pragma omp critical
                {
                    total = total + c[i];
                    printf("  thread %d did row %d\t c[%d]=%.2f\t",tid,i,i,c[i]);
                    printf("Running total= %.2f\n",total);
                }

            }   /* end of parallel i loop */

        } /* end of parallel construct */

        Log_write1(UIABenchmark_stop, (IArg)"Parallel_Part");

        printf("\nMatrix-vector total - sum of all c[] = %.2f\n\n",total);

    }

    Once the target has been properly reset, the GEL script run, the project has been loaded on all cores rand CPU 0 reaches main(), I start Tools / System Analyzer / Live and click Start,, and then select CPU0 and run it,.  The program then terminates normally (abort at exit.c on all CPUs).  The following is shown in the log view.  (The console window shows the same as before the code modifications were made).

    Does this order of operations, along with the source code modified as above, work for you?

    One thing to note: printing to the console using printf is implemented using CCS system breakpoints, so it will cause the processors to halt momentarily.  This skews the CPU timestamps that are logged so that you will not see a properly correlated view of the events across multiple cores.  In general, it is best to log these types of statements using UIA events instead of using printf statements.  This will also greatly reduce the CPU overhead associated with logging the information.

    Regards,

      Brian

  • Hi Brian,

    Thank you very much for your detailled answer!!

    1) adding a 'do forever' loop to avoid 'abort' when the CPU0 program completes.

    I have done this.

    2) Ensure that the gel Global_Default_Setup script is run

    I added the initialization of the gel file directly in the target configuration, as described in this post by Uday. With this procedure I get a slightly different output in the console window. I attached the output in a separat file, as it is too large to directly print in this post. Is this initialization correct, or do I have to change something?

    C66xx_0: GEL Output: Setup_Memory_Map...
    C66xx_0: GEL Output: Setup_Memory_Map... Done.
    C66xx_1: GEL Output: Setup_Memory_Map...
    C66xx_1: GEL Output: Setup_Memory_Map... Done.
    C66xx_2: GEL Output: Setup_Memory_Map...
    C66xx_2: GEL Output: Setup_Memory_Map... Done.
    C66xx_3: GEL Output: Setup_Memory_Map...
    C66xx_3: GEL Output: Setup_Memory_Map... Done.
    C66xx_0: GEL Output: 
    Connecting Target...
    C66xx_0: GEL Output: DSP core #0
    C66xx_0: GEL Output: C6678L GEL file Ver is 2.005 
    C66xx_0: GEL Output: Global Default Setup...
    C66xx_0: GEL Output: Setup Cache... 
    C66xx_0: GEL Output: L1P = 32K   
    C66xx_0: GEL Output: L1D = 32K   
    C66xx_0: GEL Output: L2 = ALL SRAM   
    C66xx_0: GEL Output: Setup Cache... Done.
    C66xx_0: GEL Output: Main PLL (PLL1) Setup ... 
    C66xx_0: GEL Output: PLL in Bypass ... 
    C66xx_0: GEL Output: PLL1 Setup for DSP @ 1000.0 MHz.
    C66xx_0: GEL Output:            SYSCLK2 = 333.3333 MHz, SYSCLK5 = 200.0 MHz.
    C66xx_0: GEL Output:            SYSCLK8 = 15.625 MHz.
    C66xx_0: GEL Output: PLL1 Setup... Done.
    C66xx_0: GEL Output: Power on all PSC modules and DSP domains... 
    C66xx_0: GEL Output: Security Accelerator disabled!
    C66xx_0: GEL Output: Power on all PSC modules and DSP domains... Done.
    C66xx_0: GEL Output: PA PLL (PLL3) Setup ... 
    C66xx_0: GEL Output: PA PLL Setup... Done.
    C66xx_0: GEL Output: DDR3 PLL (PLL2) Setup ... 
    C66xx_0: GEL Output: DDR3 PLL Setup... Done.
    C66xx_0: GEL Output: DDR begin (1333 auto)
    C66xx_0: GEL Output: XMC Setup ... Done 
    C66xx_0: GEL Output: 
    DDR3 initialization is complete.
    C66xx_0: GEL Output: DDR done
    C66xx_0: GEL Output: DDR3 memory test... Started
    C66xx_0: GEL Output: DDR3 memory test... Passed
    C66xx_0: GEL Output: PLL and DDR Initialization completed(0) ...
    C66xx_0: GEL Output: configSGMIISerdes Setup... Begin
    C66xx_0: GEL Output: 
    SGMII SERDES has been configured.
    C66xx_0: GEL Output: Enabling EDC ...
    C66xx_0: GEL Output: L1P error detection logic is enabled.
    C66xx_0: GEL Output: L2 error detection/correction logic is enabled.
    C66xx_0: GEL Output: MSMC error detection/correction logic is enabled.
    C66xx_0: GEL Output: Enabling EDC ...Done 
    C66xx_0: GEL Output: Configuring CPSW ...
    C66xx_0: GEL Output: Configuring CPSW ...Done 
    C66xx_0: GEL Output: Global Default Setup... Done.
    C66xx_0: GEL Output: Invalidate All Cache...
    C66xx_0: GEL Output: Invalidate All Cache... Done.
    C66xx_0: GEL Output: GEL Reset...
    C66xx_0: GEL Output: GEL Reset... Done.
    C66xx_0: GEL Output: Disable all EDMA3 interrupts and events.
    C66xx_1: GEL Output: 
    Connecting Target...
    C66xx_1: GEL Output: DSP core #1
    C66xx_1: GEL Output: C6678L GEL file Ver is 2.005 
    C66xx_1: GEL Output: Global Default Setup...
    C66xx_1: GEL Output: Setup Cache... 
    C66xx_1: GEL Output: L1P = 32K   
    C66xx_1: GEL Output: L1D = 32K   
    C66xx_1: GEL Output: L2 = ALL SRAM   
    C66xx_1: GEL Output: Setup Cache... Done.
    C66xx_1: GEL Output: Global Default Setup... Done.
    C66xx_1: GEL Output: Invalidate All Cache...
    C66xx_1: GEL Output: Invalidate All Cache... Done.
    C66xx_1: GEL Output: GEL Reset...
    C66xx_1: GEL Output: GEL Reset... Done.
    C66xx_2: GEL Output: 
    Connecting Target...
    C66xx_2: GEL Output: DSP core #2
    C66xx_2: GEL Output: C6678L GEL file Ver is 2.005 
    C66xx_2: GEL Output: Global Default Setup...
    C66xx_2: GEL Output: Setup Cache... 
    C66xx_2: GEL Output: L1P = 32K   
    C66xx_2: GEL Output: L1D = 32K   
    C66xx_2: GEL Output: L2 = ALL SRAM   
    C66xx_2: GEL Output: Setup Cache... Done.
    C66xx_2: GEL Output: Global Default Setup... Done.
    C66xx_2: GEL Output: Invalidate All Cache...
    C66xx_2: GEL Output: Invalidate All Cache... Done.
    C66xx_2: GEL Output: GEL Reset...
    C66xx_2: GEL Output: GEL Reset... Done.
    C66xx_3: GEL Output: 
    Connecting Target...
    C66xx_3: GEL Output: DSP core #3
    C66xx_3: GEL Output: C6678L GEL file Ver is 2.005 
    C66xx_3: GEL Output: Global Default Setup...
    C66xx_3: GEL Output: Setup Cache... 
    C66xx_3: GEL Output: L1P = 32K   
    C66xx_3: GEL Output: L1D = 32K   
    C66xx_3: GEL Output: L2 = ALL SRAM   
    C66xx_3: GEL Output: Setup Cache... Done.
    C66xx_3: GEL Output: Global Default Setup... Done.
    C66xx_3: GEL Output: Invalidate All Cache...
    C66xx_3: GEL Output: Invalidate All Cache... Done.
    C66xx_3: GEL Output: GEL Reset...
    C66xx_3: GEL Output: GEL Reset... Done.
    

    3)  Loading the program.

    My procedure:

    I start the program by pressing the butten "Debug" as you see in the print screen below. This automatically builds the program connects to the target and loads the program. I further unchecked the box "Run to symbol main on a program load or restart" (Tools/Debugger Options/Auto Run and Launch Options), as I read to do so in other posts. But with this settings my program does not run to main and stop, instead all cores are suspended and I start them all together (I grouped the cores). Is that wrong?

    Your procedure:

    I also tried your procedure, which means to manually connect the cores, make a reset and download the code. Indeed, that works!! But is there a possibility to start debug the program with the butten "Debug" and to perform a reset before downloading the code?

    This works properly for the first run, but a restart is not possible. Is it necessary to reload the code, every time I want to make a restart??

    OpenMP Example 3 (matrix vector multiplication)

    I proceeded the debug process as you recommended. With a completly fresh launch the program executes propberly and the correct results are visible in the Console window as well as in the logs of the System Analyzer. However, as you also described, a "Restart" with proceeding "CPU Reset" is not possible. Different error messages a appear. Every time doing a fresh launch is a bit silly. Is there not a reliable and easy way doing a proper restart?

     

    Thanks for your answers.

    Kind regards,

     

    Timon

     

     

  • Hi Timon,

        It looks like the best way to automate the program load on all cores and then run on all cores is to write a DSS (Debug Server Scripting) script that can be invoked as the 'initialization script' in the Debug Launch configuration.  I'll try to put a first cut of one together as a proof of concept and will post when I have more info.   Hopefully the script will allow you to cleanly reload the project on an existing launch without having to do a fresh launch.   This is going to take a bit of time to put together and test - I hope to be able to post an update tomorrow to let you know how it's going.  Are you ok with this approach?

    Regards

      Brian

  • Hi Brian,

     

    That's sounds to become a very useful workaround! Thank you for investing your time in that.

     

    Regards,

     

    Timon

  • Hi Timon,

       It looks like the root of the problem is that there are a couple of global variables that are defined in UIA to provide an interface between the target and the host to enable the JTAG transport to find the datastructures it needs to interface with:

    1. ti_uia_runtime_QueueDescriptor_gPtrToFirstDescriptor - head of a list of buffers containing events to be uploaded for that CPU
    2. ti_uia_runtime_QueueDescriptor_gUpdateCount - counts how many times the list of buffers has been updated, allowing the host to determine when it needs to walk the list to update its own local copy of the list

      Each CPU needs to have it's own instance of these global variables.  Because a shared image is being used in this application, the global variables need to be allocated to local memory (L2SRAM) so that they don't collide, but what is happening is that they are all located in DDR3 at the same address.  This creates a race condition where all of the CPUs are trying to access the same global variable at the same time without any locking.  Manually starting the CPUs avoids the race condition, but when the CPUs start together (e.g. via a group run), then the second CPU that accesses the  ti_uia_runtime_QueueDescriptor_gPtrToFirstDescriptor finds it is not null and ends up setting the 'next' pointer in its own descriptor to the address of the descriptor that it is adding.  The next time it accesses the linked list, it gets stuck in a loop while traversing the linked list.which locks up the CPU.

    I am trying to find a way to force allocation of these global variables to L2SRAM to work around this problem, but am hitting some other problems.  The variables are allocated to the .neardata section, but the linker.cmd file that is auto-generated by XDC does not honor assigning this section to L2SRAM. 

    It looks like there will need to be a fix submitted to the UIA package to address this problem.  What is your timeline for needing this?

    Regards,

      Brian


  • Hi Brian,

    That looks like a serious problem which isn't easy to fix...

    My project is fast progressing and therefore I would be happy to have soon a solution to this problem. I guess that you need at least one week to fix this problem. In week four I will be on holiday and so my project won't going on anyway. It would be very helpful if I could work further on afterwards. That means it would be good to have your solution on Monday 21th January.

    Kind regards,

    Timon

  • Hi Timon,

        We've implemented a fix and will be releasing it later this week.  I'll post again when the release is available for download.

    Regards,

      Brian

  • Hi Brian,

    I am pleased to hear, that you programmed a fix. I am looking forward to test it.

     

    Regards,

     

    Timon

  • Hi Timon,

        We have an engineering release available now - the official GA release is expected later next week (target date is Wednesday).   If you are ok working with the eng release, please send me an email and I will email you back a link that you can download the package from.

    Regards,

      Brian

  • Hi Brian,

    Unfortunately, the engineering release didn't work for us. Sent you an email regarding the details.

    Thanks, Clemens

  • Hi Clemens and Timon,

       I've finished a first cut at a tutorial on OpenMP that includes workarounds for a number of bugs in the OpenMP project, CCS and System Analyzer.  It shows how to use the CCS Sync Group feature to work with OpenMP projects, and includes an updated OpenMP example project, a link to the UIA engineering release that fixes the problems mentioned earlier in this thread, and step-by-step instructions on how to build, load, run the program and analyze the results. 

    Thanks for your patience and help with this.  We're working on getting bug fixes for the problems we've encountered into the upcoming CCSv5.4 release.  Please post to this thread if you have any questions or run into problems.

    Regards,

      Brian

  • Hi Clemens and Timon,
       The more I work with this, the more instability I'm finding.  The program posted on the tutorial page sometimes crashes upon launch after power cycling the board.  I'll keep working on trying to identify what is causing this to occur and will post here when I have more information.

    Regards,
      Brian

  • Thanks for keeping us up-to-date regarding the current progress.

  • Hi Clemons and Timon,

      It turns out that there are a number of bugs that can make running an OpenMP project challenging.  Here's a summary of what you will need to know in order to avoid them:

    • use v7.4.0 and only v7.4.0 of the C6000 compiler (as mentioned in several other E2E posts :) )
    • use the set of packages that ship with the MCSDK - don't mix and match the versions  
    • there are a number of bugs in CCSv5.3 relating to the way groups and sync groups work.  E.g. don't halt one core in a group and then halt the group.   These bugs will be fixed in CCSv5.4
    • there is a bug in System Analyzer which prevents some start/stop events from being displayed in the execution graph.  This will be fixed in CCSv5.4
    • there are a number of bugs in the OMP examples - a version of the OMP Example project - C6678 Examples / Example that demonstrates using UIA with OMP   that fixes these bugs is posted on the Tutorial7 page
    • there is a bug in the way UIA and the System Analyzer JTAG transport handle shared global variables as mentioned previously.  The UIA fixes are available in UIA 1.2.1.8_eng and later.  The JTAG transport will be fixed in CCSv5.4.

      I've updated UIA Tutorial 7 with instructions on how to work around these bugs by creating a Launch Configuration that will automate many of the steps involved in getting an OpenMP project to run properly.   You will still need to terminate the session instead of reloading the project, however, in order to avoid running into some bugs which can cause CCS to crash.

    FWIW, I've also written a DSS Script 3377.OpenMP_LoadScript.zip that provides an alternate approach to using the Launch Configuration to auto-load the symbols and the project.  The Launch configuration approach is faster and automates more steps, but takes longer to configure and change when you want to e.g. change the number of cores the project runs on or switch from Debug to Release.

    • The DSS script assumes you have configured the Launch configuration the same as outlined in the Tutorial, except no project is specified in the Program tab.
    • You will need to customize the script to load in the .out file for your particular project.  It contains some tips on how to customize the script and where to go to get more info on the DSS APIs.
    • To use the script, launch the target and wait for it to show all of the cores as 'disconnected'.  Open the Scripting console (View / Scripting console) and enter loadJSFile "<path to the location of the OpenMP_Load.js file, with forward slashes instead of back slashes>".  e.g. loadJSFile "c:/ti/OpenMP_Load.js"
    • This will  load the project into CPU 0 and symbols into the other cores. 
    • The launch configuration for CPU 0 should be configured to auto-run to main with software breakpoints disabled, so CPU 0 should start running automatically. 
    • Group all of the slave cores and run them.  (One or more slave cores may halt at c_int00 before CPU 0 halts at main.  If so, run the group again)
    • CPU 0 should halt at main. 
    • Halt the group, run system analyzer, then run CPU 0 and then the group. 
    • View the results, then terminate the session.

    Thanks for your patience with this.   Things will be a lot more robust in CCSv5.4.

    Regards,

      Brian

    [Note: The OpenMP_Load.js script has been updated to clean up documentation that was causing javadoc parsing errors]

  • Hello Brian,

    I've tried CCS-5.4beta1 to see if it fixes our UIA+OpenMP issues, however it still does cause problems.

    Here is what I did:
    - Used CGT-7.4.0
    - Used UIA-1.2.1.8_eng
    - Used the OpenMP example provided by you
    - Followed the steps mentioned in Tutorial 7

    I see some events logged from multiple cores, unfourtunetaly I still don't get any graphs generated.
    Worse, when I tried to stopp system analyzer, CCS hanged.

    I again recorded a screencast to illustrate which steps we've taken in which order:
    http://www.youtube.com/watch?v=eb2orGir5mE&feature=youtu.be

    Regards, Clemens

    PS: We disabled sysbiosTaskLogging to reduce the amount of events logged.

  • Hi Clemens,

        Thank you for your feedback and providing the screencast.  It was very helpful.

    Re: I see some events logged from multiple cores, unfortunately I still don't get any graphs generated.

    The Concurrency graph is derived from SysBios Task switch events.  You will need to enable sysbiosTaskLogging to see anything in this graph.

    The Load graph is derived from SysBios the Load module, which instruments the Idle task and determines the percentage of time a CPU spends running anything other than the SYS/BIOS Idle loop.  From what I can tell, OMP doesn't appear to spend time in the idle loop, so there are no statistics collected to enable the Load events.  I'll follow up with the OMP development team to see if this is the case and whether CPU load information can be generated some other way.

    The version of the execution graph that ships in CCSv5.3 and in the CCSv5.4 Beta1 contains a bug fix for function profiling which, as a side-effect, prevents benchmark start/stop events from being displayed unless there is a SysBios task switch event logged in the data.  We have a fix for this but it was not implemented in time for Beta1.

    Re: Worse, when I tried to stop system analyzer, CCS hanged.

    This is a new bug, related to a recent bug fix that was introduced to handle discontinuities introduced by dropped events.   Thank you very much for reporting this!  It's identified a hole in our test cases as well.  We are working on a fix for this and will ensure that it is in place for the GA release.

    For Beta1, a workaround is to close the Concurrency graph view prior to closing System Analyzer or shutting down CCS.  There may be other side-effects when using this view that cause slow-downs in CCS, however, so it would be safest to avoid using it if possible.

    Regards,

      Brian

  • Hi Brian,

    Thanks for your continued support regarding this issue. I should not haved toyed with the default settings. I'll give it another try tomorrow.

    Glad I was of help regarding the bug-report ;)

    Thanks, Clemens