time consumption of MessageQ Module

Yang Lu99085

Hi,

I test the time consumption of sending and receiving message under MessageQ Module on EVM6678L. Core 0 is master core and all the eight cores are slave cores. Sending of message from the master core to all slave cores and sending back from slave cores to the master core form a complete message exchange. I let such message exchange carry out several times and find that it takes 23 cpu cycles to complete the first time message exchange, while from the second time on, it takes more or less 121690 cpu cycles to complete the message exchange once. Why is there such a big difference of the cpu cycle consumption between message exchange for the first time and the second time? Is there any method that I can reduce the cpu cycle of message exchange of the second time to be as small as that of the first time? What about the Notify Module? I hear that the Notify module is faster than the MessageQ Module, but I'm not sure how much faster it can achieve. Can anyone tell me the time consumption of Notify Module?

Regards,

Yang

over 13 years ago

0 Benoit Dribault over 13 years ago

Intellectual 440 points

Hi,

I did some tests on MessageQ and Notify.

The master (core 0) send a message to a slave (core 1), then, the slave send back the message to the master.

I found :

MessageQ : 16000 cycles

Notify : 8000 cycles

I think there but be a problem with your first message exchange, 23 cycles seems to not be enough... But 121690 seems to be too much! Are you sure the communication goes well ?

To do my tests, I started from the exemples :

"IPC and I/O Exemples / C6678 Exemples / C6678: MessageQ (single image for all cores) "

"IPC and I/O Exemples / C6678 Exemples / C6678: Notify (single image for all cores) "

0 Yang Lu99085 over 13 years ago in reply to Benoit Dribault

Intellectual 590 points

Hi Benoit,

Thank you for your reply. In your test, does the master (core 0) send message to all the eight slave cores, or just to one slave (core 1) ?

I have found that the cycle for the first time message exchange is also 121690, so the number 23 above is wrong. I think the communication goes well, but the cpu cycle is too long as you have mentioned. Below is my code. I check the cpu cycle by reading the TSCH and TSCL register. Would you please help to check my code and see if there is any inappropriate place? Thank you very much!

Regards,

Yang

7043.cycletest.rar

0 Benoit Dribault over 13 years ago in reply to Yang Lu99085

Intellectual 440 points

In my test, the master send and receive a message only with core 1, so to send and receive messages with 8 cores, according to my results, you should find 8*16000 = 128 000 cycles, and that is almost what you found.

If MessageQ is to expensive for you, you can try with Notify. Unfortunately, it was also to expensive for me.

To do the communication between cores, I am using flags in MSMCSRAM, which is faster.

0 Yang Lu99085 over 13 years ago in reply to Benoit Dribault

Intellectual 590 points

Hi Benoit,

Thank you for your fast reply! Yes, MessageQ and Notify are both expensive for me too. So I'm considering the MSMCSRAM method you recommend, but I don't quite know how to do it.

Is it also an IPC module? Do I still need to call Ipc_start() funtion?

Or I just have to place my data in the MSMCSRAM and make the region to be accessible to all the cores. I mean the cores are not attached to each other anymore, but they can all access to the shared region MSMCSRAM so that they can write data to the region as well as read data from there.

Would it be convenient for you to attach some simple code as well as the Linker file here to illustrate this method?

Regards,

Yang

0 Benoit Dribault over 13 years ago in reply to Yang Lu99085

Intellectual 440 points

With this method, you don't need Ipc_start(). You don't need SYS/BIOS anymore too.

But you have to be careful about the cache coherency. When you tell a core to write something in MSMCSRAM, it will store the value in its cache, not in MSMCSRAM, so you have to tell it to "write back" its cache. (You can try without cache, but it is very slow...)

You also have to "invalidate" the cache before reading a shared variable, because when a core reads a value in MSMCSRAM, it saves the value in its cache, and does not fetch it from the MSMCSRAM anymore.

If you want, there are lots a documentation about cache in the cache user guide : http://www.ti.com/lit/ug/sprugy8/sprugy8.pdf

Here is a little exemple :

2437.flags in MSMCSRAM exemple.zip

0 Yang Lu99085 over 13 years ago in reply to Benoit Dribault

Intellectual 590 points

Hi Benoit,

Thank you so much for your detailed answer and the example you provide! It's a great help! I will learn how to work out this method.

Regards,

Yang

0 Yang Lu99085 over 13 years ago in reply to Benoit Dribault

Intellectual 590 points

Hi Benoit,

Here is one more question. On one slave core, say core1, I read a whole array which is placed in MSMCSRAM and operate on it. When I write back the array, I only want to write into part of the elements in the array, but not to influence others. However, CACHE_wbL1d((void*)&array, CACHE_L1D_LINESIZE, CACHE_WAIT) seems to touch the whole array. I also tried to use start address along with write length. For example, if I want to write from the third element to the seventh element in the array, I use CACHE_wbL1d((void*)&(array[2]), 5*sizeof(float), CACHE_WAIT). However, it doesn't work correctly either. So what should I do to only modify part of the array but not touch other elements in a write back operation?

Regards,

Yang

0 RCReddy over 13 years ago in reply to Yang Lu99085

Genius 3575 points

Hi Benoit and Yang,

I saw your discussion and can both of you please confirm which method of IPC did you use [shared memory based, QMSS based or SRIO based].

Actually i am using qmss based IPC only after looking at benchmarks

http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide#IPC_Benchmarks

so did you guys had a look at the above figures.

Thanks

RC Reddy

0 Benoit Dribault over 13 years ago in reply to RCReddy

Intellectual 440 points

Hi,

RC Reddy : I have seen the QMSS, but I never worked on it. According to those figures, it could be a good solution for the communication.

Yang : About the cache gestion, when you write back, you can only write back a line of cache, you can't select the number of float you want to write back. You should find a way to prevent this issue. Maybe, to create a smaller array for each slave (shared with the master), to prevent other slavec to write in. You can also add some unused data in your array to fit a line of cache.

0 Yang Lu99085 over 13 years ago in reply to Benoit Dribault

Intellectual 590 points

Hi Benoit,

Yeah, I am also considering the method you suggest. I now plan to designate a specified array to each core to store the refreshed elements writen back by the core. And then after all slaves cores have completed writing back, another array which is for storing the whole elements will be writen by copying the elements and then write back this array to MSMCSRAM.

Attached is my code and link file. However, the code doesn't run correctly. The variable u1 is modified in core 1 and then writen back to MSMCSRAM, however when I print it in core 0, it appears to be the original value which is set as initial value. Would you please help check this simple code and tell me where did I do wrong? Thank you!

Regards,

Yang

2018.main.rar

0 RCReddy over 13 years ago in reply to Yang Lu99085

Genius 3575 points

Hi All,

Sorry to interrupt into your discussion. I have following question. Are you using qmss qpend based communication and i am surprised that you are getting such a high cycles for inter-core communication.

Thanks

RC Reddy

0 Yang Lu99085 over 13 years ago in reply to RCReddy

Intellectual 590 points

Hi RCReddy,

I think I am using shared memory bassed communication and my code is based on the image-processing demo. Using my method, sending a message from master core to all eight cores and receiving message sent back by the slave cores will cost 121690 cpu cycles. I haven't tried qmss qpend based communication. Do you know how many cycles does it would take using qmss qpend based communication?

Now I'm also considering not to use IPC modules, but use flags in MSMCSRAM suggested by Benoit, but I haven't worked this method out. Do you have any idea whether qmss qpend based communication will be superior to the MSMCSRAM method?

Regards,

Yang

0 RCReddy over 13 years ago in reply to Yang Lu99085

Genius 3575 points

Hi Yang,

If you look at this post

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/200021.aspx

you can get the actual cycles and in that post i am asking Team Ti how are the cycles different from the one mentioned in the IPC Benchmark Wiki page.

======================================================================

As of now the IPC using QMSS has one driver issue and Team Ti is working on it

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/193003/696978.aspx#696978

======================================================================

http://e2e.ti.com/support/embedded/bios/f/355/p/151065/553836.aspx#553836

at above post, Ti member discusses about the qmss qpend based communication and its utility of cycles/msg

Thanks

RC Reddy

Processors

Processors forum

time consumption of MessageQ Module