This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Which APIs to use for working with QMSS?

Other Parts Discussed in Thread: SYSBIOS

Hi,

What is the current state of APIs for writing multicore software using the QMSS?

So far we have had a look at:
- CSL (quite verbose, flexible, not well documented)
- OpenEM (not mature enough, only little documentation, horrible example projects)

Are there any further APIs which allow using QMSS from a mid to high-level, maybe something located between OpenEM and Chip Support Library?

Does it make sence to evaluate the QMSS transport of Sysbios/IPC for writing multithreaded code using QMSS?
In the MCSDK User's guide it is mentioned a one-way comunication takes 1.6 kCycles using the QMSS transport and about 2k Cycles using shared memory.
Why is the benefit so small with QMSS (or asked the other way round: where does the overhead come from)?

Thank you in advance, Clemens

  • Clemens,

    We have drivers that support QMSS running in SYS/BIOS.  The drivers are part of the TI MCSDK.  Under the directory,,\mcsdk_installed_directory\packages\ti\drv\qmss, you can find the driver for QMSS and related examples. 

    The SYS/BIOS IPC has two options for its transport, shared memory and QMSS.  You can configure your software to use either one as IPC transport.

    Xiaohui

  • You can also use the QMSS (and CPPI if you are using infrastructure DMA) LLDs which are in

     c:\ti\pdk_C6670_1_1_2_5\packages\ti\drv\qmss

     c:\ti\pdk_C6670_1_1_2_5\packages\ti\drv\cppi

    They contain example projects and test projects in c:\ti\pdk_C6670_1_1_2_5\packages\ti\drv\exampleProjects.

    Substitute the device and version as appropriate for the MCSDK you have installed.

    Most of the overhead is due to cache writeback/inv, and fences when shared structures are placed in DDR.

    There will be significantly less overhead if you place descriptors/buffers in L2 of individual cores, because you won't need the cache writeback/inv (as well as because L2 is much faster than DDR).

    By using higher level APIs such as IPC or openEM, you don't have to worry about getting semaphore/cache/fences in the right places, because its already done for you.

  • Most of the overhead is due to cache writeback/inv, and fences when shared structures are placed in DDR.



    I had a look at the IPC-benchmark example projects mentioned at http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide#Latency_Benchmark_Setup, and those examples place shared structures either in MSM or L2. I am curious where this high latency in the order of 1500-3000 cycles comes from?
    Are there one-way latency benchmark results available when using CPPI directly?

    Thanks, Clemens