This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMP QMSS manual setup

Other Parts Discussed in Thread: TMS320C6670, SYSBIOS

Hi, everyone,

I'm using TMS320C6670 DSP and trying to implement OpenMP in my project. But I need to do a manual QMSS initialization since it's used for other peripherals (Ethernet, FFTC). So I took OpenMP 2_01_16_03 Hello_world example and modify it a bit:

#define PAD     (64)
#define PNUM    (16384)
#define OPAD    (8)
#define TRNUM   (4)

#pragma DATA_SECTION(iqs, ".inputbuff");
#pragma DATA_SECTION(data, ".inputbuff");
#pragma DATA_ALIGN(iqs, 128);
#pragma DATA_ALIGN(data, 128);
#pragma DATA_ALIGN(win, 128);
float    data[2*(PNUM+PAD)];
float   iqs[2*PNUM];
float   win[PNUM];

inline static float * restrict windowing(
    int         n          // number of sampled data
)
{
    short i;
    float ampl = (float)(1./(1<<0));

    for (i=0; i < n; i++)
    win[i] = (float) ampl;

    return win;
}
int main (int argc, char *argv[]) {

  int i, n_treds,c_l_n,cl_pc,num_p_pc,iq_pairs;
  double st_o=0.0,fn_o=0.0;
  float *win;

  win = windowing(PNUM);

  omp_set_num_threads(TRNUM);

  iq_pairs=64/sizeof(float); // number of IQ pairs in one cache line (128 bytes)
  c_l_n= (PNUM+iq_pairs-1)/iq_pairs;// cache lines number
  cl_pc= (c_l_n +3)/4; // cache lines per core number.
  num_p_pc = cl_pc*iq_pairs; // number of IQ pairs per core

  st_o = omp_get_wtime();
#pragma omp parallel shared (data,iqs,win) private (i)
  {
	  int id;
      float * restrict pout;
      const float * restrict pin,* restrict pw;
	  id=omp_get_thread_num();

      pout=&data[PAD+id*num_p_pc];
      pin=&iqs[id*num_p_pc];
      pw=&win[id*num_p_pc];
#pragma omp for
      for (i=0;i<num_p_pc/2;i++)
      {
    	  pout[2*i]=pin[2*i]*pw[i];
    	  pout[2*i+1]=pin[2*i+1]*pw[i];
      }
  }
  fn_o = omp_get_wtime();
  printf("Calculation time = %f mus\n", (fn_o-st_o)*1000000);
}

If i using standard example platform and config file with this code everything calculating just fine, ~4 times faster than sequential code.

But I need to specify additional regions for other peripherals, so I made some changes to the config file (as they given in the OpenMP user guide):

    var ompSettings = xdc.useModule("ti.runtime.openmp.Settings");
    ompSettings.runtimeInitializesQmss  =  false;
    var OpenMP = xdc.useModule('ti.runtime.ompbios.OpenMP');    
    OpenMP.qmssMemRegionIndex = 1; // region 0 is occupied by Ethernet
    OpenMP.qmssFirstDescIdxInLinkingRam = 32; // 0 region is 32 descriptor long


// after shared region and heapOM config added that lines

// __TI_omp_start_rtsc_mode configures the runtime and calls main
    var Startup = xdc.useModule('xdc.runtime.Startup');
    
    Startup.lastFxns.$add('&qmssInitOmp');
    
    Startup.lastFxns.$add('&__TI_omp_initialize_rtsc_mode');

Where the QMSS initialization function look like:

int qmssInitOmp (void)
{
    int               result = 0;
    Qmss_MemRegInfo     memCfg;
    Qmss_InitCfg        qmssInitConfig;

    // Set up QMSS configuration
    if (DNUM==0)
    {

    memset (&qmssInitConfig, 0, sizeof (Qmss_InitCfg));
    // Use internal linking RAM
    qmssInitConfig.linkingRAM0Base  =  0;
    qmssInitConfig.linkingRAM0Size  =  0;
    qmssInitConfig.linkingRAM1Base  =  0x0;
    qmssInitConfig.maxDescNum       =  NUM_HOST_DESC+256;

    qmssInitConfig.pdspFirmware[0].pdspId = Qmss_PdspId_PDSP1;
    qmssInitConfig.pdspFirmware[0].firmware = (void *) &acc48_le;
    qmssInitConfig.pdspFirmware[0].size = sizeof (acc48_le);

    // Initialize the Queue Manager
    result = Qmss_init (&qmssInitConfig, qmssGblCfgParams);
    if(result != QMSS_SOK) {
        #if DEBUG_ERRORS
        System_printf("Error initializing Queue Manager SubSystem, Error code : %d\n", result);
        #endif // DEBUG_ERRORS
        return -1;
    }
    // Initialize and setup CPSW Host Descriptors
    memset (gHostDesc, 0, SIZE_HOST_DESC * NUM_HOST_DESC);
    memCfg.descBase         =  (unsigned int *) Convert_CoreLocal2GlobalAddr ((unsigned int) gHostDesc);
    memCfg.descSize         =  SIZE_HOST_DESC;
    memCfg.descNum          =  NUM_HOST_DESC;
    memCfg.manageDescFlag   =  Qmss_ManageDesc_MANAGE_DESCRIPTOR;
    memCfg.memRegion        =  Qmss_MemRegion_MEMORY_REGION0;
    memCfg.startIndex       =  0;

    // Insert Host Descriptor memory region
    result = Qmss_insertMemoryRegion(&memCfg);
    if(result == QMSS_MEMREGION_ALREADY_INITIALIZED) {
        #if DEBUG_ERRORS
        System_printf("Memory Region %d already Initialized \n", memCfg.memRegion);
        #endif // DEBUG_ERRORS
    } else if(result < QMSS_SOK) {
        #if DEBUG_ERRORS
        System_printf("Error: Inserting memory region for Eth %d, Error code : %d\n", memCfg.memRegion, result);
        #endif // DEBUG_ERRORS
        return -1;
    }

    Qmss_start();
    }
    return 0;
}

So when I made all includes and so on for the QMSS the project builds without errors, but during run time, then the core 0 enters #pragma omp zone it just hangs at the lines 228, 229 of the tomp_util.h file :

        while (mysense != barrier->sense)
            tomp_completePendingTasks();

I think I made some mistake with QMSS init but I can't find it by m own. Thank you in adwance.

Best Regards,

Pavlo!

  • Hi Pavlo,

    It is very difficult to comment on this issue. Are you able to run the OpenMP example successfully?

    Thank you.

  • Hi Raja,

    thank you for the interest to this topic. I will share a project with you for more convenience.

    This is a working copy of the project with QMSS init by OpenMP+ the .gel file for DDR3 init. To reproduce the problem, please uncomment the following lines in config file:

    //ompSettings.runtimeInitializesQmss  =  false;    
    //OpenMP.qmssMemRegionIndex = 1;//4; //1
    //OpenMP.qmssFirstDescIdxInLinkingRam = 32;//224; //160
    //Startup.lastFxns.$add('&qmssInitOmp');

    OMP2exam.zip

    ompExam.gel

  • Hi,

    Is there any chance to get this problem resolved anytime soon? Because I'am stuck here and the clock is ticking. Unfortunately there is nothing I can do, because I've done every step according to the OpenMP UG. Thank you in advance.

    Best regards,
    Pavlo!
  • Hi Pavlo,

    Please confirm you have followed the procedure in processors.wiki.ti.com/.../Porting_OpenMP_2.x_to_KeyStone_1 to rebuild PDK:

    - Modify the pdk_C66xx_1_1_2_6/packages/ti/drv/qmss/src/qmss_drv.c file in all three of the PDKs (C6678, C6670, and C6657) to add DATA_SECTION pragmas for the qmssLObj and qmssLObjIsValid variables, like below. Place each DATA_SECTION pragma on the line above each variable declaration.

        #pragma DATA_SECTION (qmssLObj, ".far:local");
        #pragma DATA_SECTION (qmssLObjIsValid, ".far:local");
        
    In addition, under "OpenMP.ddrSize = ddr3.len;", add the followings in your omp_config.cfg file:
         var Cache   = xdc.useModule('ti.sysbios.family.c66.Cache');
        Cache.setMarMeta(msmcNcVirt.base, msmcNcVirt.len, 0);
        Cache.setMarMeta(OpenMP.ddrBase, OpenMP.ddrSize,
                                            Cache.PC|Cache.PFX|Cache.WTE);
        Cache.setMarMeta(OpenMP.msmcBase, OpenMP.msmcSize,
                                            Cache.PC|Cache.PFX|Cache.WTE);
    Which will over-ride the BIOS defaults and ensure that the non-cached region of MSMC is set up correctly, and can resolve the barrier issue you observed.

    Regards,
    Garrett

  • Dear Garrett,

    Thank you for your response. The lines with pragma I have made during the PDK rebuild procedure. I confirm that. I just check qmss_drv.c files in all installed PDK directories, this lines are in place.

    When I tried to add the "cache" lines I'am getting this error :INTERNAL ERROR: QMSS queue operation failed - src/omp_init.c, 182.

    I dig in in that problem and error occurs during region insertion in the tomp_qmss.c file line:

        Qmss_Result region = Qmss_insertMemoryRegion (memRegInfo);

    I had no time to wait for response anymore, so I have solved this problem with writing my own __TI_omp_initialize_rtsc_mode custom function, which is exactly the same as standard one, except custom tomp_initQmss function where I specify and insert all the region I need. So this custom initialize function called instead of __TI_omp_initialize_rtsc_mode from .cfg file and working correctly. So the bottom line is that I'am using autoinit routine with QMSS init function replacement, rather then  ompSettings.runtimeInitializesQmss  =  false.

    Best regards,

    Pavlo!

  • Hi Pavlo,

    Thanks for your update and good to know you have resolved the issue.
    Not sure why you were seeing the "INTERNAL ERROR: QMSS queue operation failed", for your reference, attached is the omp_config.cfg file I modified and verified with your example project and patched PDK QMSS library.

    Regards,
    Garrettomp_config.cfg

  • Pavlo,

    Is your CCS set up to load the OpenMP program on all 4 cores before it starts running on any core? One way to do this is to uncheck the "Run to symbol" functionality in the Debug configuration for all 4 cores.

    _images/omp_disabling_run_to_main.png

    If one of the cores starts running before the other cores have completed loading the program, the CCS loader can overwrite initialized data shared across the cores, leading to undefined behavior from the running core. I have seen the "INTERNAL ERROR: QMSS queue operation failed" occur in this scenario. 

    To avoid this race condition, let the loader complete loading on all cores and then start the cores running.

    Ajay

  • Hi, Ajay,
    I can confirm that your suggested method resolved QMSS init issue.
    So to sum up if we are using all the stuff suggested by Garrett and Ajay we can make the program work. Maybe it should be mentioned in the UG for OpenMP.
    But one more problem occurred. The program run correctly, but at the end I get this exception. Is this related problem?
    6f0
    A14=0x7ff93cff A15=0xd5ebfdfd
    A16=0x0 A17=0xfecc410e
    A18=0x0 A19=0x0
    A20=0xc040100 A21=0xc050200
    A22=0xc0601f0 A23=0x880100a0
    A24=0x88010090 A25=0x88010080
    A26=0x88010070 A27=0x88010050
    A28=0x88000000 A29=0x70
    A30=0x58 A31=0x1
    B0=0x0 B1=0x0
    B2=0x1 B3=0x88013c0d
    B4=0x4 B5=0x4
    B6=0xa000000c B7=0xffffffff
    B8=0xa0000010 B9=0x0
    B10=0x80000000 B11=0x8014
    B12=0x0 B13=0xf8
    B14=0x88014f90 B15=0x1081ffd8
    B16=0x820961 B17=0x40240000
    B18=0x6 B19=0x1d
    B20=0xa0000440 B21=0x82a610
    B22=0x82a610 B23=0x82a610
    B24=0x88010000 B25=0x8800fff8
    B26=0x8800fff0 B27=0xf37f2fb3
    B28=0x88007fdc B29=0xffffffff
    B30=0x25 B31=0xffffffff
    NTSR=0x1000c
    ITSR=0x0
    IRP=0x0
    SSR=0x0
    AMR=0x0
    RILC=0x0
    ILC=0x0
    Exception at 0x88013c2c
    EFR=0x2 NRP=0x88013c2c
    Internal exception: IERR=0x10
    Resource conflict exception
    ti.sysbios.family.c64p.Exception: line 248: E_exceptionMin: pc = 0x88013c2c, sp = 0x1081ffd8.
    To see more exception detail, use ROV or set 'ti.sysbios.family.c64p.Exception.enablePrint = true;'
    xdc.runtime.Error.raise: terminating execution
  • Pavlo,

    The OpenMP documentation at does mention having to load all cores before running, but its not very obvious. I've modified the documentation to add a separate section on "Running OpenMP applications within CCS" for the next release. I've also updated the "Integrating QMSS" section to include the caching setup steps.

    Regarding the exception, I'm unable to reproduce it with your example. I've attached a version of your example modified to perform QMSS initialization via qmssInitOmp function. It works fine on my EVM. Note: I built the example outside of CCS - I've included the Makefile I used to build. 

    AjayopenMP2_guide_exampl.zip

  • Ajay,

    I dig in a little more to that problem and find out that this exception appears when I change the Auto Run options with unchecking the box "On a program load or reset" only for core 0.

    When I have unchecked "On a program load or reset" box for every core from 4 I was using and problem with exception was solved. Sorry I missed that in your previous post

    Best Regards,

    Pavlo!

  • Pavlo,

    Glad that the problem has been resolved. Thanks for the update!

    Ajay