Which APIs to use for working with QMSS?

Clemens Eisserer

Other Parts Discussed in Thread: SYSBIOS

Hi,

What is the current state of APIs for writing multicore software using the QMSS?

So far we have had a look at:
- CSL (quite verbose, flexible, not well documented)
- OpenEM (not mature enough, only little documentation, horrible example projects)

Are there any further APIs which allow using QMSS from a mid to high-level, maybe something located between OpenEM and Chip Support Library?

Does it make sence to evaluate the QMSS transport of Sysbios/IPC for writing multithreaded code using QMSS?
In the MCSDK User's guide it is mentioned a one-way comunication takes 1.6 kCycles using the QMSS transport and about 2k Cycles using shared memory.
Why is the benefit so small with QMSS (or asked the other way round: where does the overhead come from)?

Thank you in advance, Clemens

over 12 years ago

0 Xiaohui Li over 12 years ago

TI__Intellectual 1870 points

Clemens,

We have drivers that support QMSS running in SYS/BIOS. The drivers are part of the TI MCSDK. Under the directory,,\mcsdk_installed_directory\packages\ti\drv\qmss, you can find the driver for QMSS and related examples.

The SYS/BIOS IPC has two options for its transport, shared memory and QMSS. You can configure your software to use either one as IPC transport.

Xiaohui

0 John Dowdal over 12 years ago

TI__Intellectual 2180 points

You can also use the QMSS (and CPPI if you are using infrastructure DMA) LLDs which are in

c:\ti\pdk_C6670_1_1_2_5\packages\ti\drv\qmss

c:\ti\pdk_C6670_1_1_2_5\packages\ti\drv\cppi

They contain example projects and test projects in c:\ti\pdk_C6670_1_1_2_5\packages\ti\drv\exampleProjects.

Substitute the device and version as appropriate for the MCSDK you have installed.

Most of the overhead is due to cache writeback/inv, and fences when shared structures are placed in DDR.

There will be significantly less overhead if you place descriptors/buffers in L2 of individual cores, because you won't need the cache writeback/inv (as well as because L2 is much faster than DDR).

By using higher level APIs such as IPC or openEM, you don't have to worry about getting semaphore/cache/fences in the right places, because its already done for you.

0 Clemens Eisserer over 12 years ago in reply to John Dowdal

Expert 2430 points

Most of the overhead is due to cache writeback/inv, and fences when shared structures are placed in DDR.

I had a look at the IPC-benchmark example projects mentioned at http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide#Latency_Benchmark_Setup, and those examples place shared structures either in MSM or L2. I am curious where this high latency in the order of 1500-3000 cycles comes from?
Are there one-way latency benchmark results available when using CPPI directly?

Thanks, Clemens

Processors

Processors forum

Which APIs to use for working with QMSS?