This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Extending IpcBenchmark example to all cores[4 or 8]?

Hi All,

          what i see is IpcBenchmark example is built on two cores i.e. it should be run only on two cores after each calling IPC_attach to other with following parameters and this will create a <filename>_pe66.c for inclusion of .cfg configured functions to call as part of ipc_attach()

=================================================================================

/* use IPC over QMSS */
MessageQ.SetupTransportProxy = xdc.useModule(Settings.getMessageQSetupDelegate());
var TransportQmssSetup = xdc.useModule('ti.transport.ipc.qmss.transports.TransportQmssSetup');
MessageQ.SetupTransportProxy = TransportQmssSetup;

TransportQmssSetup.descMemRegion = 0;
Program.global.descriptorMemRegion = TransportQmssSetup.descMemRegion;

Program.global.numDescriptors = 2048;

Program.global.descriptorSize = cacheLineSize; // multiple of cache line size

TransportQmss.numDescriptors = Program.global.numDescriptors;
TransportQmss.descriptorIsInSharedMem = true;
TransportQmss.descriptorSize = Program.global.descriptorSize;
TransportQmss.useAccumulatorLogic = false;
TransportQmss.pacingEnabled = false;
TransportQmss.intThreshold = 1;
TransportQmss.timerLoadCount = 0; // timer ticks. This value only has effect when the packingEnabled is true.
TransportQmss.accuHiPriListSize = 1100; // this number should be >= twice the threshold+2

==============================================================================

The above example caters to only between two cores [CORE0 and CORE1] [Point to Point]. with same .cfg settings, can underlying transportqmss mechanism be used for CORE0 to CORE2 or CORE0 to CORE3 configurations [let me remind you, the settings in the .cfg file will be same for all cores for getting the underlying transport mechanism].

I did that experiment and couldn't make the example project work for multiple cores. Things i did

1. I extended to the proclist to four cores.

2. MultiProc wil now have 4 cores.

assuming this, i tried to attach for CORE0 to CORE1, it works and for CORE0 to CORE2 it doesn't. what i understand is a core can attach to other core on qmss transport only once [please refer to http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/192645/689667.aspx#689667]. in that case how to attach ipc over qmss transport for all cores with all cores [please tell w.r.t to this example and from .cfg level, is it possible?]

Thanks

RC Reddy

  • 0602.bench_qmss.cfg

    /* --COPYRIGHT--,BSD
     * Copyright (c) 2011, Texas Instruments Incorporated
     * All rights reserved.
     *
     * Redistribution and use in source and binary forms, with or without
     * modification, are permitted provided that the following conditions
     * are met:
     *
     * *  Redistributions of source code must retain the above copyright
     *    notice, this list of conditions and the following disclaimer.
     *
     * *  Redistributions in binary form must reproduce the above copyright
     *    notice, this list of conditions and the following disclaimer in the
     *    documentation and/or other materials provided with the distribution.
     *
     * *  Neither the name of Texas Instruments Incorporated nor the names of
     *    its contributors may be used to endorse or promote products derived
     *    from this software without specific prior written permission.
     *
     * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
     * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
     * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
     * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
     * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
     * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
     * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
     * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
     * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
     * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
     * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     * --/COPYRIGHT--*/
    
    #include <xdc/std.h>
    #include <xdc/cfg/global.h>
    
    /* XDC.RUNTIME module Headers */
    #include <xdc/runtime/System.h>
    #include <xdc/runtime/IHeap.h>
    #include <xdc/runtime/Timestamp.h>
    
    /* IPC module Headers */
    #include <ti/ipc/Notify.h>
    #include <ti/ipc/Ipc.h>
    #include <ti/ipc/MultiProc.h>
    #include <ti/ipc/MessageQ.h>
    #include <ti/ipc/SharedRegion.h>
    
    /* PDK module Headers */
    #include <ti/platform/platform.h>
    
    /* BIOS6 module Headers */
    #include <ti/sysbios/BIOS.h>
    #include <ti/sysbios/family/c66/Cache.h>
    
    /* CSL modules */
    #include <ti/csl/csl_cacheAux.h>
    #include <ti/csl/csl_chip.h>
    
    /* QMSS LLD*/
    #include <ti/drv/qmss/qmss_drv.h>
    #include <ti/drv/qmss/qmss_firmware.h>
    
    /* CPPI LLD */
    #include <ti/drv/cppi/cppi_drv.h>
     
    #include <ti/transport/ipc/examples/common/bench_common.h>
    
    #include <ti/transport/ipc/qmss/transports/TransportQmss.h>
    
    /************************ EXTERN VARIABLES ********************/
    /* QMSS device specific configuration */
    extern Qmss_GlobalConfigParams  qmssGblCfgParams;
    /* CPPI device specific configuration */
    extern Cppi_GlobalConfigParams  cppiGblCfgParams;
    /**************************************************************/
    
    #define NUM_MONOLITHIC_DESC         numDescriptors
    #define SIZE_MONOLITHIC_DESC        descriptorSize
    #define MONOLITHIC_DESC_DATA_OFFSET 16
    
    #define HEAP_ID         0
    
    /* Number of times to run the loop */
    #define NUMLOOPS        100
    #define NUMIGNORED      (5)
    #define NUM_MSGS        (10)
    
    /* Benchmark parameters */
    Char localQueueName[6];
    Char nextQueueName[6];
    Char prevQueueName[6];
    
    UInt numCores = 0;
    UInt16 prevCoreId;
    
    UInt16 selfId;
    UInt64 timeAdj = 0;
    Types_FreqHz timerFreq, cpuFreq;
    
    /* Results */
    UInt32 rawtimestamps[NUMLOOPS];
    UInt32 latencies[NUMLOOPS - 1];
    
    MessageQ_Handle messageQ = NULL;
    MessageQ_QueueId nextQueueId, prevQueueId;
    
    UInt64 timeLength = 0;
    
    Float cpuTimerFreqRatio;
    Statistics latencyStats;
    
    /* Descriptor pool [Size of descriptor * Number of descriptors] */
    /* place this monolithic descritor pool in shared memory */
    #pragma DATA_SECTION (monolithicDesc, ".desc");
    #pragma DATA_ALIGN (monolithicDesc, 16)
    UInt8               monolithicDesc[SIZE_MONOLITHIC_DESC * NUM_MONOLITHIC_DESC];
    
    #define NUM_MSGS_TO_PREALLOC (2000)
    
    #pragma DATA_SECTION (txMsgPtrs, ".msgQ_ptrs");
    TstMsg *txMsgPtrs[NUM_MSGS_TO_PREALLOC];
    
    #pragma DATA_SECTION (rxMsgPtrs, ".msgQ_ptrs");
    TstMsg *rxMsgPtrs[NUM_MSGS_TO_PREALLOC];
    
    /**
     *  @b Description
     *  @n  
     *      This functions prints the statistics gathered for the transport during
     *      the latency test.
     */
    Void printStatistics()
    {
        UInt32 timeElapsed;
        UInt i;
    
        /* Convert timestamps to CPU time */
        for (i = 0; i < NUMLOOPS; i++) {
            rawtimestamps[i] *= cpuTimerFreqRatio;
        }
        
        for (i = 0; i < NUMLOOPS - 1; i++) {
            latencies[i] = (rawtimestamps[i + 1] - rawtimestamps[i]) / numCores;
        }
    
        getStats(latencies + NUMIGNORED, NUMLOOPS - NUMIGNORED - 2, &latencyStats);
        
        timeElapsed =  rawtimestamps[NUMLOOPS - NUMIGNORED - 2] -
                rawtimestamps[NUMIGNORED];
        /* Throughput = time elapsed divided by total #of of hops */
        
        System_printf("======== SYSTEM ATTRIBUTES ======== \n");
        System_printf("Device name:                  %s\n", DEVICENAME);
        System_printf("Processor names:              %s\n", PROCNAMES);
        System_printf("CPU Freq:                     %d MHz\n", 
            cpuFreq.lo / 1000000);
        System_printf("Timer Freq:                   %d MHz\n\n", 
            timerFreq.lo / 1000000);
    
        System_printf("======== BENCHMARK ATTRIBUTES ======== \n");
        System_printf("MessageQ setup delegate:      %s\n", TRANSPORTSETUP);
        System_printf("Number of processors:         %d\n", numCores);
        System_printf("Number of messages received:  %d\n", latencyStats.numVals);
        System_printf("Build profile:                %s\n\n", BUILDPROFILE);
    
        System_printf("======== MESSAGEQ BENCHMARK RESULTS ======== \n");    
        System_printf("Average 1-way latency:        %10d (cycles/msg)           %10d (ns/msg)\n", 
            (UInt32)latencyStats.mean, CYCLES_TO_NS(latencyStats.mean, cpuFreq.lo));
        System_printf("Maximum 1-way latency:        %10d (cycles/msg) (#%5d)  %10d (ns/msg)\n", 
            latencyStats.max, latencyStats.maxIndex, CYCLES_TO_NS(latencyStats.max, cpuFreq.lo));
        System_printf("Minimum 1-way latency:        %10d (cycles/msg) (#%5d)  %10d (ns/msg)\n", 
            latencyStats.min, latencyStats.minIndex, CYCLES_TO_NS(latencyStats.min, cpuFreq.lo)); 
        System_printf("Standard deviation:           %10d (cycles/msg)\n", 
            (UInt32)latencyStats.stddev);
        System_printf("Total time elapsed:           %10d (cycles)     %10d (us)\n",
            timeElapsed, CYCLES_TO_US(timeElapsed, cpuFreq.lo));
    }
    
    /**
     *  @b Description
     *  @n  
     *      This function initalizes the platform.  It has called at startup.  This is defined in the
     *      .cfg file via the Startup.firstFxns.$add('&initPlatform'); definition.
     */
    void initPlatform(void)
    {
      platform_init_flags  pFormFlags;
      platform_init_config pFormConfig;
      /* Status of the call to initialize the platform */
      UInt32 pFormStatus;
    
      /* Only run on single core */
      if (CSL_chipReadReg (CSL_CHIP_DNUM) == 0)
      {
        /*
         * You can choose what to initialize on the platform by setting the following
         * flags. Things like the DDR, PLL, etc should have been set by the boot loader.
        */
        memset( (void *) &pFormFlags,  0, sizeof(platform_init_flags));
        memset( (void *) &pFormConfig, 0, sizeof(platform_init_config));
    
        pFormFlags.pll = 0; /* PLLs for clocking  	*/
        pFormFlags.ddr  = 0; /* External memory 		*/
        pFormFlags.tcsl = 1; /* Time stamp counter 	*/
        pFormFlags.phy  = 0; /* Ethernet 			*/
        pFormFlags.ecc  = 0; /* Memory ECC 			*/
    
        pFormConfig.pllm = 0;	/* Use libraries default clock divisor */
    
        pFormStatus = platform_init(&pFormFlags, &pFormConfig);
    
        /* If we initialized the platform okay */
        if (pFormStatus != Platform_EOK)
        {
      	 /* Initialization of the platform failed. */
      	 System_printf("Platform failed to initialize. Error code %d \n", pFormStatus);
        }
      }
    }
    
    /**
     *  @b Description
     *  @n  
     *      This functions measures latency by sending a message from core0 to core1. 
     *      Core1 relays all received messages back to core 2.  Core0 will measure the roundtrip latency.
     */
    static void measure_latency()
    {
        Int              status;
        UInt numReceived;
        MessageQ_Msg     msg;
    
        System_printf("tsk0. selfproc=%d nextQueueName (%s) openned, nextQueueId=%d\n", CSL_chipReadReg (CSL_CHIP_DNUM), nextQueueName, nextQueueId);
    
        //if (selfId == 0)
        {
            msg = MessageQ_alloc(HEAP_ID, MESSAGE_SIZE_IN_BYTES);
            if (msg == NULL)
            {
               System_abort("MessageQ_alloc failed\n");
            }
    
            System_printf("tsk0. selfProc=%d calling MessageQ_put(nextQueueName=%s). msg=0x%x\n", CSL_chipReadReg (CSL_CHIP_DNUM), nextQueueName, msg);
            /* Kick off the loop */
            status = MessageQ_put(nextQueueId, msg);
            if (status < 0)
            {
                System_abort("MessageQ_put failed\n");
            }
        }
    
        //for (numReceived = 0; numReceived < 1; numReceived++)
        {
        //while (1) {
            /* Get a message */
            status = MessageQ_get(messageQ, &msg, MessageQ_FOREVER);
            if (status < 0)
            {
                System_abort("MessageQ_get failed\n");
            }
    
    //        if (selfId == 0)
    //        {
    //            rawtimestamps[numReceived] = Timestamp_get32();
    //
    //            if (numReceived == NUMLOOPS - 1)
    //            {
    //                printStatistics();
    //
    //                // free the Message.
    //                MessageQ_free(msg);
    //                break;
    //            }
    //        }
    
    //        status = MessageQ_put(nextQueueId, msg);
    //        if (status < 0) {
    //            System_abort("MessageQ_put failed\n");
    //        }
        }
    }
    
    /**
     *  @b Description
     *  @n  
     *      This functions allocate all messages to be sent up front on core0.  Synchronize core 0 and core1.
     *      Core 0 sends all messages to Core1 in a burst.  Core 1 receives all the messages 
     *      and measure the throughput over the time it took to send all the messages.
     */
    static void thruputTxRxPairPreallocFullLoad(void)
    { 
      /* Source to be executed on Core 0 */
      if (selfId == 0)
      {
        Int16 receiveCores[2] = {1, CORE_SYNC_NULL_CORE};  /* Last entry must be CORE_SYNC_NULL_CORE */
        UInt32 numTxMsgs =  NUM_MSGS_TO_PREALLOC;
        Int status;
    #if VERBOSE_MODE    
        UInt32 numSends = 0;
    #endif
        System_printf("\nThroughput via upfront allocation: Allocate all messages up front, sync cores, send all messages from core 0 to core 1\n");
    
        status = allocateMessages(numTxMsgs, HEAP_ID, &txMsgPtrs[0]);
        if (status != 0)
        {
          System_printf("Message Preallocation failed for core %d after allocating %d messages.\n", selfId, status);
          detachAll(MultiProc_getNumProcessors());
          System_exit(0);
        }
    
        /* simplified for now since only one core */
        syncSendCore (&receiveCores[0], messageQ, &nextQueueId, &txMsgPtrs[0], FALSE);
    
        /* Send all messages to core 1.  The last message sent will have a flag signifying to core 1 that
          * all messages have been sent. */
    #if VERBOSE_MODE
        numSends = sendMessages(1, &numTxMsgs, &nextQueueId, &txMsgPtrs[0]);
        System_printf ("Core %d: Sent a total of %d messages.\n", selfId, numSends);
    #else
        sendMessages(1, &numTxMsgs, &nextQueueId, &txMsgPtrs[0]);
    #endif
      }
    
      /* Source to be executed on Core 1 */
      if (selfId == 1)
      {
        Int16 sendCores[2] = {0, CORE_SYNC_NULL_CORE};  /* Last entry must be CORE_SYNC_NULL_CORE */
        UInt32 numReceived = 0;
        UInt32 delay = 0;
        Int status;
        UInt64 timeStamp;
    
    #if VERBOSE_MODE
        System_printf("Core %d: Per message work delay is %dus\n", selfId, delay);
    #endif
    
        /* Synchronize cores prior to starting test */
        syncReceiveCore (&sendCores[0], messageQ, &nextQueueId);
    
        /* Take time at start of test */
        timeStamp = getStartTime64();
    
        numReceived = receiveMessages(&sendCores[0], messageQ, &rxMsgPtrs[0], delay);
    
        /* Get execution time to transfer all messages */
        timeLength = getExecutionTime64(timeStamp, timeAdj);
    
    #if VERBOSE_MODE
        System_printf("Core %d: Received a total of %d messages.\n", selfId, numReceived);
    #endif
    
        /* Calculate throughput over all messages */
        calculateThroughput (numReceived, timeLength, cpuFreq);
    
        /* Free the messages received */
        status = freeMessages(numReceived, &rxMsgPtrs[0]);
        if (status < 0)
        {
          System_printf("Message free failed for Core %d\n", selfId);
        }
      }
    }
    
    /**
     *  @b Description
     *  @n  
     *      This configures the descriptor region and initializes CPPI, and QMSS.
     *      This function should only be called once per chip.
     *
     *  @retval
     *      Success     - 0
     *  @retval
     *      Error       - <0
     */
    Int32 systemInit (Void)
    {
      Qmss_InitCfg qmssInitConfig;   /* QMSS configuration */
      Qmss_MemRegInfo memInfo; /* Memory region configuration information */
      Qmss_Result result;
      UInt32 coreNum;
      
      coreNum = CSL_chipReadReg (CSL_CHIP_DNUM);
    
      System_printf ("\n-----------------------Initializing---------------------------\n");
      
      System_printf ("Core %d : L1D cache size %d. L2 cache size %d.\n", coreNum, CACHE_getL1DSize(), CACHE_getL2Size());
    
      memset ((Void *) &qmssInitConfig, 0, sizeof (Qmss_InitCfg));
      
      /* Set up the linking RAM. Use the internal Linking RAM. 
       * LLD will configure the internal linking RAM address and maximum internal linking RAM size if 
       * a value of zero is specified.
       * Linking RAM1 is not used */
      qmssInitConfig.linkingRAM0Base = 0;
      qmssInitConfig.linkingRAM0Size = 0;
      qmssInitConfig.linkingRAM1Base = 0;
      qmssInitConfig.maxDescNum      = NUM_MONOLITHIC_DESC /*+ total of other descriptors here */;
    
    #ifdef xdc_target__bigEndian
      qmssInitConfig.pdspFirmware[0].pdspId = Qmss_PdspId_PDSP1;
      qmssInitConfig.pdspFirmware[0].firmware = (void *) &acc48_be;
      qmssInitConfig.pdspFirmware[0].size = sizeof (acc48_be);
    #else
      qmssInitConfig.pdspFirmware[0].pdspId = Qmss_PdspId_PDSP1;
      qmssInitConfig.pdspFirmware[0].firmware = (void *) &acc48_le;
      qmssInitConfig.pdspFirmware[0].size = sizeof (acc48_le);
    #endif
    
      /* Initialize Queue Manager SubSystem */
      result = Qmss_init (&qmssInitConfig, &qmssGblCfgParams);
      if (result != QMSS_SOK)
      {
          System_printf ("Error Core %d : Initializing Queue Manager SubSystem error code : %d\n", coreNum, result);
          return -1;
      }
    
      result = Cppi_init (&cppiGblCfgParams);
      if (result != CPPI_SOK)
      {
          System_printf ("Error Core %d : Initializing CPPI LLD error code : %d\n", coreNum, result);
      }
    
      System_printf ("address of monolithicDesc[] = 0x%x. Converted=0x%x\n", monolithicDesc, l2_global_address ((UInt32) monolithicDesc));
    
      /* Setup memory region for monolithic descriptors */
      memset ((Void *) &monolithicDesc, 0, SIZE_MONOLITHIC_DESC * NUM_MONOLITHIC_DESC);
      memInfo.descBase       = (UInt32 *) monolithicDesc;	/* descriptor pool is in MSMC */
      memInfo.descSize       = SIZE_MONOLITHIC_DESC;
      memInfo.descNum        = NUM_MONOLITHIC_DESC;
      memInfo.manageDescFlag = Qmss_ManageDesc_MANAGE_DESCRIPTOR;
      memInfo.memRegion      = (Qmss_MemRegion) descriptorMemRegion;
      memInfo.startIndex     = 0;
    
      result = Qmss_insertMemoryRegion (&memInfo);
      if (result < QMSS_SOK)
      {
          System_printf ("Error Core %d : Inserting memory region %d error code : %d\n", coreNum, memInfo.memRegion, result);
          return -1;
      }
      else
      {
          System_printf ("Core %d : Memory region %d inserted\n", coreNum, result);
      }
    
      /* Writeback the descriptor pool.  Writeback all data cache.
        * Wait until operation is complete. */    
      Cache_wb (monolithicDesc, 
                         SIZE_MONOLITHIC_DESC * NUM_MONOLITHIC_DESC,
                         Cache_Type_ALLD, TRUE);
      
      return 0;
    }
    
    /**
     *  @b Description
     *  @n  
     *      Task which kicks off the latency and throughput tests
     */
    Void tsk0(UArg arg0, UArg arg1)
    {
        Int status;
    
        System_printf("tsk0 starting\n");
    
        /* Register this heap with MessageQ */
        MessageQ_registerHeap((IHeap_Handle)SharedRegion_getHeap(0), HEAP_ID);
    
        /* Open the 'next' remote message queue. Spin until it is ready. */
        do {
            status = MessageQ_open(nextQueueName, &nextQueueId);
            Task_yield();
        }
        while (status < 0);
    
        measure_latency();
    
        thruputTxRxPairPreallocFullLoad();
    
        detachAll(MultiProc_getNumProcessors());
        System_exit(0);
    }
    
    /**
     *  @b Description
     *  @n  
     *      Main - Initialize the system and start BIOS
     */
    Int main(Int argc, Char* argv[])
    {
      Int32 result = 0,status = 0;
      Types_Timestamp64 time64;
      UInt64 timeStamp = 0;
    
      Timestamp_getFreq(&timerFreq);
      System_printf("timerFreq.lo = %d. timerFreq.hi = %d\n", timerFreq.lo, timerFreq.hi);
    
      BIOS_getCpuFreq(&cpuFreq);
      System_printf("cpuFreq.lo = %d. cpuFreq.hi = %d\n", cpuFreq.lo, cpuFreq.hi);
      
      cpuTimerFreqRatio = (Float)cpuFreq.lo / (Float)timerFreq.lo;
    
      Timestamp_get64(&time64);
      timeStamp = TIMESTAMP64_TO_UINT64(time64.hi,time64.lo);
      timeAdj = TIMESTAMP64_TO_UINT64(time64.hi,time64.lo) - timeStamp;
    
      selfId = CSL_chipReadReg (CSL_CHIP_DNUM);
      
      System_printf("Core (\"%s\") starting\n", MultiProc_getName(selfId));
      
      if (numCores == 0) {
          numCores = MultiProc_getNumProcessors();
      }
    
      if (selfId == 0)
      {
        /* QMSS, and CPPI system wide initializations are run on
          * this core */
        result = systemInit();
        if (result != 0) 
        {
        System_printf("Error (%d) while initializing QMSS\n", result);
        } 
      }
    
      /* Attach all cores. */
      //attachAll(numCores);
      status = Ipc_start();
      if (status < 0) {
          System_abort("Ipc_start failed!\n");
      }
    
         
      prevCoreId = (selfId - 1 + numCores) % numCores;    
    
      System_sprintf(localQueueName, "CORE%d", selfId);
      System_sprintf(nextQueueName, "CORE%d", 
          ((selfId + 1) % numCores));
      System_sprintf(prevQueueName, "CORE%d", prevCoreId);
    
      System_printf("localQueueName=%s. nextQueueName=%s. prevQueueName=%s\n", 
                      localQueueName,  nextQueueName, prevQueueName);
            
      /* Create a message queue. */
      messageQ = MessageQ_create(localQueueName, NULL);    
      if (messageQ == NULL) {
          System_abort("MessageQ_create failed\n" );
      }
    
      BIOS_start();
    
      System_printf("done BIOS_start\n", result);
    
      return (0);
    }
    
    

    Hi All,

             After doing experiments/trials, what i understand is 

    "IPC QMSS Benchmark cannot be extended to all cores "

    [meaning Core0 should be on Qmss transport to Core1,2,3]

    [meaning Core1 should be on Qmss transport to Core0,2,3]

    [meaning Core2 should be on Qmss transport to Core0,1,3]

    [meaning Core0 should be on Qmss transport to Core0,1,2]

    I have attached the files, which i used for experimentation [i modified the existing ipcqmssbenchmark example].

    if you look in ROV, only instances are created from Core0 to Core1,Core1 to Core0,Core2 to Core0,Core3 to Core0 and in simple sense, the ipcqmssbenchmark is only Point (core) to Point (core), there is no flexibility in the example.

    I was pulled into the assumption that ipc over qmss can be extended to all cores [based on the ipc benchmark example figures in cycles]. Why don't Ti mention that it is only point(core) to point(core).This will save time for many others. 

       

    if you look at above picture, TransportQmss instance is with valid parameters between Core3 to Core0 and rest i.e Core3 to Core1 and Core3 to Core2 didn't happen at all and their instances are filled with nulls [because of this below piece of code in TransportQmss.c]

    /* Configure and start the QMSS queues - One receive queue created per core.
    * A single free queue will be created for all cores */
    /* START: Once per core configuration */
    if (TransportQmss_module->qmssInitialized == 0)
    {
    /* Increment module variable each time this core attaches to another.
    * Will be used to track when to close this core's socket. */
    TransportQmss_module->qmssInitialized++;
    /* Restore interrupts */
    Hwi_restore(hwiKey);

    If  my understanding is wrong, please let me know how to extend IPC over qmss to all cores, when i meant all cores, following should be true

    [meaning Core0 should be on IPC,Qmss transport to Core1,2,3]

    [meaning Core1 should be on IPC,Qmss transport to Core0,2,3]

    [meaning Core2 should be on IPC,Qmss transport to Core0,1,3]

    [meaning Core0 should be on IPC,Qmss transport to Core0,1,2]

    Also, i request Members from Ti to extend this example for point to multipoint.

    Thanks

    RC Reddy

  • RC,

    Extending the example to more than two cores should be fairly simple so I tried it quickly and found a bug in the QMSS transport initialization code preventing communication between all cores except Core0 and Core1.   The bug affects both the QPEND and Accumulator configurations of the QMSS transport.  The bug is being tracked and will be resolved in a future MCSDK release.  There is a workaround that allows the Accumulator configuration of the QMSS transport to work properly.  However, the QPEND configuration will still not work.

    Additionally, we're also tracking two more items regarding IPC transport example enhancements for a future MCSDK release.

    1) Extension of all IpcBenchmark examples (SHM, QMSS, and SRIO) to all cores on a target EVM.  A new test will be added which passes messages between all cores on a given device in a round-robin fashion.  This will provide an example of how to setup connections and pass messages between all cores on a device.

    2) A new example project which uses the IPC SRIO transport to send messages between cores on different chips and the IPC QMSS transport to send messages between cores within a chip.  Based on one of your other threads (http://e2e.ti.com/support/embedded/bios/f/355/p/187454/675085.aspx#675085) an example like this will greatly help you and anyone else who is trying to achieve this type of functionality on our EVMs.

    The QMSS transport bugs and additional example features are tracked here:

    https://cqweb.ext.ti.com/cqweb/#/SDO-Web/SDOWP/QUERY/34439429&format=HTML&noframes=false&format=HTML&loginId=readonly&password=&version=cqwj

    The IRs in question are:

    SDOCM00092946 - Cannot communicate between Core 0 and Core 1 using IPC QMSS Transport

    SDOCM00092952 - IPC QMSS Transport QPEND functionality does not work for Core2+

    SDOCM00092953 - Extend Current ipcBenchmark Examples to all cores on a given device

    SDOCM00092954 - Add new IPC Example Project that Uses IPC SRIO for chip to chip and IPC QMSS for core to core

    To work around the bug in the QMSS transport, and allow the use of the Accumulator configuration for all cores, you can do the following:

    1)  To workaround the bug affecting QMSS transport's storing of connection parameters make the following changes to C:\<c6670_pdk_install_dir>\packages\ti\transport\ipc\qmss\transports\TransportQmss.c.  The attached diff shows the original on the left, and workaround on the right.

    The summary of the change moves code within the TransportQmss_instance_init function from lines 482 - 487 to line 280.  Essentially the transport object parameter storage code is being moved out of the "once only" portion of the initialization code.  This storage should be done each time the Instance_init function is called.

    2) Rebuild the TransportQmss transport

    Open a Windows command window and navigate to:

    >cd c:\<pdk_install_path>\packages\ti\transport\ipc\qmss\transports

    Set up the build environment with the following commands.  Make sure to fill xx_yy_zz in with the BIOS, IPC, and XDC versions used in your environment and replace <install_path> with proper directory path to where your components are installed.

    set XDCPATH=c:\<bios_install_path>\bios_6_xx_yy_zz\packages\
    set XDCPATH=%XDCPATH%;c:\<ipc_install_path>\ipc_1_24_yy_zz\packages\

    # Path to directory which contains the compiler \bin directory
    set XDCCGROOT=c:\ti\ccsv5\tools\compiler\c6000

    set PATH=%PATH%;c:\<xdc_install_path>\xdctools_3_23_yy_zz\

    # Use the following command to rebuild the QMSS transport
    > xdc –PR .

    3)  After rebuilding the QMSS transport with the workaround the QPEND configuration will still not work due to an interrupt configuration bug.  So the transport needs to be configured to use the Accumulator queues instead of the QPEND queues.  This is done in the example .cfg file.

    Change

    TransportQmss.numDescriptors = Program.global.numDescriptors;            
    TransportQmss.descriptorIsInSharedMem = true;        
    TransportQmss.descriptorSize = Program.global.descriptorSize;
    TransportQmss.useAccumulatorLogic = false;
    TransportQmss.pacingEnabled = false;
    TransportQmss.intThreshold = 1;
    TransportQmss.timerLoadCount = 0; // timer ticks. This value only has effect when the packingEnabled is true.
    TransportQmss.accuHiPriListSize = 1100;  // this number should be >= twice the threshold+2

    to

    TransportQmss.numDescriptors = Program.global.numDescriptors;            
    TransportQmss.descriptorIsInSharedMem = true;        
    TransportQmss.descriptorSize = Program.global.descriptorSize;
    TransportQmss.useAccumulatorLogic = true;
    TransportQmss.pacingEnabled = false;
    TransportQmss.intThreshold = 1;
    TransportQmss.timerLoadCount = 0; // timer ticks. This value only has effect when the packingEnabled is true.
    TransportQmss.accuHiPriListSize = 1100;  // this number should be >= twice the threshold+2

    4) Extend the example project to all cores on the EVM (c6670 used as an example).  This is done in the .cfg file as well.

    Change

        case "ti.platforms.evm6670":    
            Program.global.USING_C6670 = 1;    
            procNameList = ["CORE0", "CORE1"];
            Program.global.shmBase = 0x0C000000;
            Program.global.shmSize = 0x00050000; /* Sized for greater than 2000 128 byte messageQ messages */
            break;

    to
        case "ti.platforms.evm6670":    
            Program.global.USING_C6670 = 1;    
            procNameList = ["CORE0", "CORE1","CORE2", "CORE3"];
            Program.global.shmBase = 0x0C000000;
            Program.global.shmSize = 0x00050000; /* Sized for greater than 2000 128 byte messageQ messages */
            break;

    5) Rebuild the CCS project. 

    The latency test will send a MessageQ message from Core0 to Core1 to Core2 to Core3 then back to Core0 in a round robin fashion.  The throughput part of the example will probably still fail due to the increased memory usage in MSMC from increasing the number of cores but this is a minor error which can be fixed by most likely increasing the shmSize.

    Justin

  • Hi Justin,

                     Thanks [to guys who will be resolving all 4 bug tickets] for fixing the bugs/issues and for the new example to be added.I have few requests in addition to what Ti has planned

    1. Please don't restrict testing [of fixes/newcode for all four bugs] to c6670, extend it to c6678.

    2. Apart from round-robin fashion [This test is only for IpcqmssBenchmark 4 or 8 cores example], please add following test case [since, i am seeing a issue in shared memory core to core communication, i will confirm it after due testing..this issue doesn't happen in round-robin fashion way of testing].

    Init: -  All parameters/settings/configs/heaps/MessageQ's/cppi/qmss are all created and inited [no srio inti here, since i am talking about SDOCM00092953 ]

    Step A: - Core0 will send messages to all cores [Core1,Core2 and Core3].

    Step B: - Core1 will send messages to all cores [Core0,Core2 and Core3].

    Step C: - Core2 will send messages to all cores [Core0,Core1 and Core3].

    Step D: - Core3 will send messages to all cores [Core0,Core1 and Core2].

    Step E:- Now, each MessageQ [in each core] should have messages from all cores(except itself) and at this stage read the MessageQ's [MessageQ_get] iteratively to get each message. 

    This test stresses in vertical [MessageQ length/depth wise] as well as horizontal [point to multipoint] directions. I am writing a pseudo-code 

    for(CoreIndx = 0;CoreIndx < numCores;CoreIndx++)
    {
        if(selfId == CoreIndx)
        {
           continue;
        }

        HeapOpen(CoreIndx);

        MessageQOpen(CoreIndx);

        /* Message from CORE<X> [X is the proc on which code is running] from all cores (except itself) */
        SendData(CoreIndx);

    }

    ReceiveData();

    3. As of now, i haven't got any confirmed reply of whether the producer,consumer SRIO example is bidirectional in nature [http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/192639.aspx].

    Please test the existing IPC SRIO Producer,Consumer framework for bidirectional nature, if it is not..example has to be made bi-directional? suggest what has to be done for making it bidirectional?

    4. And in the new implementation [i.e. IPC over QMSS+SRIO], SRIO has to be made bidirectional [since i assume at this point, qmss would be point to multipoint and full duplex at same time]

    A.Core0_ChipX ==> bidirectional ==> Core0_ChipY 

    Core0_ChipX <== bidirectional <== Core0_ChipY 

    B.Core0_ChipX ==> Full duplex <== Core0_ChipY [at any point of time, since these two chips operate on MessageQ, expectation is that send/receive can be asynchronous and it may result in both the chips opening the SRIO simultaneously and sending message to each other].[This works in shared memory way of inter core communication, but i haven't tested it w.r.t to ipcqmssbenchmar].

    5.Combination of testing [for IPC over QMSS + SRIO] Point 4 (Sub point B) and Point 2.

    whatever i have requested would stress the system in all directions  and it would be a robust test framework for anyone to use [including myself]. Thanks again, for responding to my queries.

    Meanwhile, i will do testing of the fixes [of the code which you pasted] and will reply back if i find any issues further.

    Thanks

    RC Reddy.

  • Hi Justin,

                   I Set all the paths and other stuff as below and i am stuck in following error

    =====================ERROR=====================================

    C:\Program Files (x86)\Texas Instruments\pdk_C6670_1_0_0_19\packages\ti\transpor

    t\ipc\qmss\transports>xdc -PR .
    C:/PROGRA~2/TEXASI~1/xdctools_3_23_01_43/bin/xdcenv: can't open '.xdcenv.mak', b
    ecause: Permission denied
    xdctools_3_23_01_43\gmake.exe: *** No rule to make target `ûPR'. Stop.

    ====================STEPS====================================

    set XDCPATH = C:\Program Files (x86)\Texas Instruments\bios_6_33_02_31\packages\

    set XDCPATH = %XDCPATH%;C:\Program Files (x86)\Texas Instruments\ipc_1_24_02_27\packages\

    set XDCCGROOT = c:\ti\ccsv5\tools\compiler\c6000

    set PATH = %PATH%;C:\Program Files (x86)\Texas Instruments\xdctools_3_23_01_43\

    xdc –PR .

    I followed the above steps and landed into the permission error. Let me know any workaround. I tried enabling read and write permissions and other windows operating system stuff. everything failed. kindly help

    CCS V5 -> 5.1.1.00031 

    MCSDK -> 1.0.0.19

    Windows 7 -> 64 bit operating system 

    Thanks

    RC Reddy

  • RC,

    Windows 7 doesn't like when you try to modify stuff in the c:\Program Files\ or c:\Program Files (x86)\ directories.  Try reinstalling your components into a c:\ti\ directory.  Then try to recompile the transport there.

    Justin

  • Hi Justin,

                   1. Other way around, can you give me already built packages with the said changes. Is it possible that i copy those and use it?

    2. Also suggest if any other alternative is available.

    Thanks

    RC Reddy

  • Hi Justin,

                     I did little more tries and landed into this error. This doesn't look like any permission error.

    c:\Program Files (x86)\Texas Instruments\pdk_C6670_1_0_0_19\packages\ti\transpor
    t\ipc\qmss\transports>xdc -PR .
    xdctools_3_23_01_43\gmake.exe: *** No rule to make target `ûPR'. Stop.

    Kindly help in solving this

    Thanks

    RC Reddy

  • Hi All,

             I could build the xdc using following steps

    step1==========

    goto

    c:\Program Files (x86)\Texas Instruments\pdk_C6670_1_0_0_19\packages\ti\transpor
    t\ipc\qmss\transports>

    set XDCPATH = C:\Program Files (x86)\Texas Instruments\bios_6_33_02_31\packages\

    set XDCPATH = %XDCPATH%;C:\Program Files (x86)\Texas Instruments\ipc_1_24_02_27\packages\

    set XDCCGROOT = c:\ti\ccsv5\tools\compiler\c6000

    set PATH = %PATH%;C:\Program Files (x86)\Texas Instruments\xdctools_3_23_01_43\

    at this path C:\Program Files (x86)\Texas Instruments\xdctools_3_23_01_43\bin

    make all the .exe as "Run this program as administrator" [in properties tab]

    "xdc clean -- removes all generated files"

    "xdc all -- builds all package files"
    "xdc release -- builds all release archives"
    "xdc test -- builds and runs all tests"
    "xdc .make -- builds just the makefiles"
    "xdc .interfaces -- builds all headers and schema files"
    "xdc .docs -- builds documentation files"
    "xdc .executables -- builds all executables "
    "xdc .dlls -- builds all DLLs"
    "xdc --help -- output make command options"
    "xdc -help -- output xdc command options"

    step2===========

    goto

    c:\Program Files (x86)\Texas Instruments\xdctools_3_23_01_43>

    xdc –PR .

    step3============

    built the project in ccs

    Thanks

    RC Reddy

  • Hi Justin,
    I followed the steps provided by you EXACTLY and built the ipc-qmss Transport, but when i try to
    insert a breakpoint, i dont see it being placed at proper code in TransportQmss.c, so with this
    i understand that still the code is not taken properly [meaning build of TransportQmss didn't happen
    properly]. Kindly provide me with correct steps to build

    Thanks
    RC Reddy