This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

combining OpenMP and pre-optimized multicore codecs / algorithms (such as H.264)

Other Parts Discussed in Thread: SYSBIOS

TI C66x Experts-

We have ported OpenCV to c66x and it's working well (we replaced mem management in rts6600_elf.lib and we mapped some OpenCV functions to VLIB functions, among other things).  Now we need to enable HAVE_OPENMP and use the c66x compiler's OpenMP capability under these conditions:

  -cores 0 to N-1 are running H.264 encoder, which
   is provided by TI as highly optimized for multicore
   operation (N is 2 to 6)

  -we need to be able to control the number of
   cores allocated for OpenMP threads

We basically only need nested for-loop support.  OpenCV has a file called parallel.cpp that's used across several modules if HAVE_OPENMP is defined.  All that's really happening is a C++ class that can re-factor for-loops depending on available platform multicore options (OpenMP, OpenCL, CUDA, etc).  I'd say at least 80% of the full OpenMP capability is not required.
 
We've been looking at the omp_hello example and have the following questions:

1) Why is MCSM mapped to DDR3 memory in the omp_hello example RTSC platform file? Is it possible to not do this?  MCSM memory is used extensively by H.264.

2) It looks like L2 mem is reserved for stack and .threadprivate section.  We're currently using 64 KB for L2 cache, and most of remaining L2 mem for H.264, streaming, and network I/O code (we have about 90 KB available).  If we don't have enough L2 mem for OpenMP threads, is it possible to map .threadprivate to DDR3 mem; i.e. to separate areas used by each core?

3) To make fundamental changes in how OpenMP uses memory, is there library source code that we can modify?  The OpenMP user guide mentions rebuilding the OMP lib with different settings / paths, following steps in the IPC Users Guide.  Is this sufficient or are there additional libs we may need to modify?

Thanks.

-Jeff
Signalogic

PS. We're about to start working with H.265, and we also need combined OpenMP functionality in that case also.

  • Hi Jeff, which OpenMP version are you using?
    No sure if you have already seen below link for porting newer version to Keystone1 (C6678): processors.wiki.ti.com/.../Porting_OpenMP_2.x_to_KeyStone_1
    In anycase, I will try to find an OpenMP expert who can reply your questions..
    thank you,Paula
  • We don’t support OpenMP Runtime 1.0 and expect you to follow the steps outlined in the wiki to build your own OpenMP Runtime 2.0.

    Please get back to us if you have any issues with it.
    Thank you.
  • Raja-

    > We don’t support OpenMP Runtime 1.0 and expect you to follow the
    > steps outlined in the wiki to build your own OpenMP Runtime 2.0.

    Thanks for your reply to me (and Paula).   So let me confirm what we should be doing:

    1) Build OpenMP runtime 2.0 library.

    2) Make source code changes to reserve cores 0 and 1 (force starting core to be 2).

    3) Test, combined with TI H.264 codec.

    Does this sound correct ?

    -Jeff

  • Raja-

    I ported OpenMP 2.x for KeyStone 1 following instructions in the link  http://processors.wiki.ti.com/index.php/Porting_OpenMP_2.x_to_KeyStone_1, which involved rebuilding the C6678 PDK and openmp_dsp_2_01_16_02. 

    Do I have to make any more changes other than the below cfg change to guarantee OpenMP uses only cores from 2 - 8?

    var OpenMP = xdc.useModule('ti.runtime.ompbios.OpenMP');
    OpenMP.masterCoreIdx = 2;
    OpenMP.numCores = 6;

    Thanks
    Anish

  • Raja-

    Can you let me and Anish know on this ?  Thanks.

    -Jeff

  • Jeff/Anish,

    If you don't have to reserve core 0 and 1, you can specify OpenMP.masterCoreIdx = 0 and OpenMP.numCores = 6 to test your application on 6 cores. With the OpenMP runtime 2.0 release 2_01_16_03, you are able to set OpenMP.numCores = 6, however you will see XDC runtime error 'index out of range' if you specify OpenMP.masterCoreIdx to any value other than 0, we are working on a fix and will notify you as soon as it's available. Since ' -cores 0 to N-1 are running H.264 encoder, which is provided by TI as highly optimized for multicore operation (N is 2 to 6)', hope the index error doesn't block your progress.

    Regards,
    Garrett
  • Garrett-

    Thanks for your answer. We'll try the 6 core test. Do you have a time-frame on the fix to allow OpenMP.masterCoreIdx to be zero ? The H.264 multicore codes depend on core 0 for housekeeping and "master core" tasks. I would be nervous moving that to another core.

    -Jeff
  • Jeff,

    The current release allows you to set OpenMP.masterCoreIdx to be zero without any issue, and the index error occurs  when OpenMP.masterCoreIdx is **non** zero (from Anish's post). The index error has been fixed and we will push to git.ti.com/.../ti-openmp-dsp-runtime shortly (in a day or so). If you need immediate access, here is the changes:

    In OpenMP.xs

    --- a/src/ti/runtime/ompbios/OpenMP.xs
    +++ b/src/ti/runtime/ompbios/OpenMP.xs
    @@ -133,17 +133,20 @@ function configureIpc(masterCoreIdx, numCores)
         // Configure MultiProc with the base index and number of cores
         var MultiProc   = xdc.useModule('ti.sdo.utils.MultiProc');
         MultiProc.baseIdOfCluster = masterCoreIdx;
    -    // Total number of processors
    -    MultiProc.numProcessors = numCores;
    +
    +    // Total number of processors available in the system
    +    MultiProc.numProcessors = dspNames.length;
         // Number of processors in cluster set by setConfig
    +    var endIndex = MultiProc.baseIdOfCluster + numCores;
         MultiProc.setConfig(null,
    -                 dspNames.slice(MultiProc.baseIdOfCluster, numCores));
    +                 dspNames.slice(MultiProc.baseIdOfCluster, endIndex));
    +

         // Avoid wasting shared memory for MessageQ transports or notify.
         // Ipc is used exclusively to set up a HeapMemMP in the specified
         // Shared Region to support a shared heap for malloc's
         var Ipc         = xdc.useModule('ti.sdo.ipc.Ipc');
    -    for (var i = 0; i < MultiProc.numProcessors; i++) {
    +    for (var i = 0; i < MultiProc.numProcsInCluster; i++) {
             Ipc.setEntryMeta({
                     remoteProcId: i,
                     setupMessageQ: false,


    In the config file omp_config.cfg:
         SharedRegion.setEntryMeta( sharedRegionId,
                                    {   base: ddr3.base,
                                        len:  sharedHeapSize,
    -                                   ownerProcId: 0,
    +                                   ownerProcId: OpenMP.masterCoreIdx,
                                        cacheEnable: true,
                                        createHeap: true,
                                        
    Regards,
    Garrett

  • The index error fix for non zero of OpenMP.masterCoreIdx is available in git.ti.com/.../ti-openmp-dsp-runtime now. You can view the diff there or clone the OpenMP DSP runtime from git://git.ti.com/openmp/ti-openmp-dsp-runtime.git with tag v02.01.17.02.

    -Garrett
  • Thanks Garrett. I will try OpenMP with the new code changes.

    -Anish

  • Garrett,

    We got OpenMP hello and basic convolution programs running with below settings

    OpenMP.masterCoreIdx = 0;
    OpenMP.numCores = 8;

    Next we will try with 

    OpenMP.masterCoreIdx = 2;
    OpenMP.numCores = 6;

    -Anish

  • Hello,

    When I tried to use OpenMP with master core = 2 and OpenMP.numCores = 4, core 0 is getting stuck at module initialization step in SYS/BIOS startup seqeunce. I have tried adding Startup firstFxns and core 0 gets to that function with no problem.I have already made OpenMP.xs, omp_config.cfg changes suggested by Garrett and rebuilt the library.

    I am using OpenMP in RTSC mode and loading the same code to cores 0, 1, 2, 3, 4 and 5 using TI dsp_utils program.  __StartCores function in omplib.c is used to kickstart other cores by core 0. I am able to run OpenMP without any issues with  master core = 0.

    I was reading about module initialization at http://rtsc.eclipse.org/docs-tip/Using_xdc.runtime_Startup and my program is also not reaching main.

    I am loading same .out file to all cores. Do I need a different .out file just for core 0 which do not have OpenMP code? 

    Do OpenMP module has initialization functions? What else can be done to debug this problem?

    Thanks
    Anish

    1184.omp_config.cfg

    omplib.c
    #include <stdio.h>
    #include <string.h>
    #include "/opt/ti/ti-cgt-c6000_8.0.1/include/c6x.h" 
    #include <ti/sysbios/family/c66/Cache.h>
    #include <ti/csl/csl_bootcfgAux.h>
    #include <ti/csl/csl_ipcAux.h>
    
    #pragma DATA_SECTION (mainProg,"L2SRAM");
    volatile int mainProg = 0;
    #pragma DATA_SECTION (ompProg,"L2SRAM");
    volatile int ompProg = 0;
    #define LOG_LEN 0x100000
    #pragma DATA_SECTION(log_buffer, "DDR3")
    #pragma DATA_ALIGN(log_buffer,8)
    volatile unsigned char log_buffer[8][LOG_LEN];
    volatile unsigned int log_idx = 0;
    va_list va;
    
    void core_print(uint16_t loglevel, char *fmt, ...)
    {
       char outputString[1024];
       va_start(va, fmt);
       vsnprintf(outputString, 1024, fmt, va);
    
       if (log_idx < LOG_LEN)  {
          memcpy((void *)&log_buffer[DNUM][log_idx], outputString, strlen(outputString));
          log_idx += strlen(outputString);
        }
    }
    
    void __StartCores(void){
        
       int core = 0;
       volatile unsigned int numCores = 0;
       //volatile int nCoreList = 0x1f;
       volatile int nCoreList = 0x03f;
       unsigned int tmp_coreList;
        
       mainProg |= 0x1;   
       tmp_coreList = nCoreList;
       mainProg |= 0x2;
       
       TSCL = 0; /* start TSC register counting */
       
    /* calculate number of cores */
    
    	do {
          numCores++;
          tmp_coreList >>= 1;
        } while (tmp_coreList > 0);
       
       if (DNUM == 0) {  /* core zero only */
          CSL_BootCfgUnlockKicker();
          tmp_coreList = nCoreList;
          do  {
             if (tmp_coreList & 1) {
                *(unsigned int *)(0x0C3FFF00 + core*4) = 0;
                if (core != 0) CSL_IPC_genGEMInterrupt( core, 0 );
                while(TSCL < core * 1000000);
             }
             core++;
             tmp_coreList >>= 1;
            } while(tmp_coreList > 0);
        }
    
       mainProg |= 0x4;
    
       while(TSCL < 10000000);
    
       mainProg |= 0x8;
    
    /* The core has reached __StartCores */
    
       *(unsigned int *)(0x0C3FFF00 + DNUM*4) = 1;
    
       mainProg |= 0x10;
      
    /* All cores wait here to sync up */
    
       while(1) {
            
          int i = 0;
          core = 0;
          tmp_coreList = nCoreList;
          mainProg |= 0x20;
          Cache_wbInvAll();
          mainProg |= 0x40;
    
          do {
             if (*(unsigned int *)(0x0C3FFF00 + core*4) == 1) i++;
                core++;
                tmp_coreList >>= 1;
           } while (tmp_coreList > 0);
    
           if (i == numCores) break;
        }
    	
        mainProg |= 0x20;       
    }
    

  • Anish,

    Which version of the OpenMP runtime are you using? v2.01.17.02 fixes a defect related to using a non zero master core index in RTSC mode.


    Also, if you have any cores that are not configured to be part of the OpenMP runtime, you'll have to use your own binaries for those cores.

    Ajay

  • Ajay,

    The version I am running is openmp_dsp_2_01_16_03 which got installed when I ran mcsdk-hpc_03_00_01_12_setuplinux.bin(MCSDK HPC Package for Linux)

    I am not able to find where to download openmp_dsp_2_01_17_02 from. Can you please provide the download url for openmp_dsp_2_01_17_02?

    Thanks

  • Ajay-

    We understand the need for separate .out (binaries) and that's our long-term objective:  one .out with multimedia codecs and another with our c66x OpenCV port, accelerated with TI OpenMP

    However, for the immediate test purpose, isn't there a way to use the same openmp-enabled .out for all cores?  For example, if master core index = 2, then cores 0 and 1 should skip any OpenMP related initialization and not hang during SYSBIOS module init.  Can you guys add that ?

    Thanks.

    -Jeff
    Signalogic

  • Jeff,

    The intent is that any core not part of the OpenMP runtime will have it's own custom binary. 

    As a quick test, you can override the default definition of __TI_omp_initialize_rtsc_mode provided by the OpenMP runtime and supply your own version. In this version, you can disable OpenMP initialization for cores not participating in the OpenMP runtime.

    Ajay

  • Ajay,

    As far as I can tell, OpenMP runtime version 2.01.17.02 is not publicly available. I am not able to find that version in TI website.
    Where can I download OpenMP runtime version 2.01.17.02 from?


    Thanks
    Anish

  • Anish,

    Source for v2.01.17.02 is available on  tag v02.01.17.02. You will have to build the OpenMP Runtime for Keystone following the instructions outlined here: 

    Ajay