This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

time consumption of MessageQ Module

Hi,

I test the time consumption of sending and receiving message under MessageQ Module on EVM6678L. Core 0 is master core and all the eight cores are slave cores. Sending of message from the master core to all slave cores and sending back from slave cores to the master core form a complete message exchange. I let such message exchange carry out several times and find that it takes 23 cpu cycles to complete the first time message exchange, while from the second time on, it takes more or less 121690 cpu cycles to complete the message exchange once. Why is there such a big difference of the cpu cycle consumption between message exchange for the first time and the second time? Is there any method that I can reduce the cpu cycle of message exchange of the second time to be as small as that of the first time? What about the Notify Module? I hear that the Notify module is faster than the MessageQ Module, but I'm not sure how much faster it can achieve. Can anyone tell me the time consumption of Notify Module?

Regards,

Yang

  • Hi,

    I did some tests on MessageQ and Notify.

    The master (core 0) send a message to a slave (core 1), then, the slave send back the message to the master.

    I found :

    MessageQ : 16000 cycles

    Notify : 8000 cycles

    I think there but be a problem with your first message exchange, 23 cycles seems to not be enough... But 121690 seems to be too much! Are you sure the communication goes well ?

    To do my tests, I started from the exemples :

    "IPC and I/O Exemples / C6678 Exemples / C6678: MessageQ (single image for all cores) "

    "IPC and I/O Exemples / C6678 Exemples / C6678: Notify (single image for all cores) "

  • Hi Benoit,

    Thank you for your reply. In your test, does the master (core 0) send message to all the eight slave cores, or just to one slave (core 1) ?

    I have found that the cycle for the first time message exchange is also 121690, so the number 23 above is wrong. I think the communication goes well, but the cpu cycle is too long as you have mentioned. Below is my code. I check the cpu cycle by reading the TSCH and TSCL register. Would you please help to check my code and see if there is any inappropriate place? Thank you very much!

    Regards,

    Yang

    7043.cycletest.rar

  • In my test, the master send and receive a message only with core 1, so to send and receive messages with 8 cores, according to my results, you should find 8*16000 = 128 000 cycles, and that is almost what you found.

    If MessageQ is to expensive for you, you can try with Notify. Unfortunately, it was also to expensive for me.

    To do the communication between cores, I am using flags in MSMCSRAM, which is faster.

  • Hi Benoit,

    Thank you for your fast reply! Yes, MessageQ and Notify are both expensive for me too. So I'm considering the MSMCSRAM method you recommend, but I don't quite know how to do it.

    Is it also an IPC module? Do I still need to call Ipc_start() funtion?

    Or I just have to place my data in the MSMCSRAM and make the region to be accessible to all the cores. I mean the cores are not attached to each other anymore, but they can all access to the shared region MSMCSRAM so that they can write data to the region as well as read data from there.

    Would it be convenient for you to attach some simple code as well as the Linker file here to illustrate this method?

    Regards,

    Yang

  • With this method, you don't need Ipc_start(). You don't need SYS/BIOS anymore too.

    But you have to be careful about the cache coherency. When you tell a core to write something in MSMCSRAM, it will store the value in its cache, not in MSMCSRAM, so you have to tell it to "write back" its cache. (You can try without cache, but it is very slow...)

    You also have to "invalidate" the cache before reading a shared variable, because when a core reads a value in MSMCSRAM, it saves the value in its cache, and does not fetch it from the MSMCSRAM anymore.

    If you want, there are lots a documentation about cache in the cache user guide : http://www.ti.com/lit/ug/sprugy8/sprugy8.pdf

    Here is a little exemple :

    2437.flags in MSMCSRAM exemple.zip

  • Hi Benoit,

    Thank you so much for your detailed answer and the example you provide! It's a great help! I will learn how to work out this method.

    Regards,

    Yang

  • Hi Benoit,

    Here is one more question. On one slave core, say core1, I read a whole array which is placed in MSMCSRAM and operate on it. When I write back the array, I only want to write into part of the elements in the array, but not to influence others. However, CACHE_wbL1d((void*)&array, CACHE_L1D_LINESIZE, CACHE_WAIT) seems to touch the whole array. I also tried to use start address along with write length. For example, if I want to write from the  third element to the seventh element in the array, I use CACHE_wbL1d((void*)&(array[2]), 5*sizeof(float), CACHE_WAIT). However, it doesn't work correctly either. So what should I do to only modify part of the array but not touch other elements in a write back operation?

    Regards,

    Yang

  • Hi Benoit and Yang,

                                      I saw your discussion and can both of you please confirm which method of IPC did you use [shared memory based, QMSS based or SRIO based].

    Actually i am using qmss based IPC only after looking at benchmarks 

    http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide#IPC_Benchmarks

    so did you guys had a look at the above figures.

    Thanks

    RC Reddy

  • Hi,

    RC Reddy : I have seen the QMSS, but I never worked on it. According to those figures, it could be a good solution for the communication.

    Yang : About the cache gestion, when you write back, you can only write back a line of cache, you can't select the number of float you want to write back. You should find a way to prevent this issue. Maybe, to create a smaller array for each slave (shared with the master), to prevent other slavec to write in. You can also add some unused data in your array to fit a line of cache.

  • Hi Benoit,

    Yeah, I am also considering the method you suggest. I now plan to designate a specified array to each core to store the refreshed elements writen back by the core. And then after all slaves cores have completed writing back, another array which is for storing the whole elements will be writen by copying the elements and then write back this array to MSMCSRAM. 

    Attached is my code and link file. However, the code doesn't run correctly. The variable u1 is modified in core 1 and then writen back to MSMCSRAM, however when I print it in core 0, it appears to be the original value which is set as initial value. Would you please help check this simple code and tell me where did I do wrong? Thank you!

    Regards,

    Yang

    2018.main.rar

  • Hi All,

                Sorry to interrupt into your discussion. I have following question. Are you using qmss qpend based communication and i am surprised that you are getting such a high cycles for inter-core communication.

    Thanks

    RC Reddy

  • Hi RCReddy,

    I think I am using shared memory bassed communication and my code is based on the image-processing demo. Using my method, sending a message from master core to all eight cores and receiving message sent back by the slave cores will cost 121690 cpu cycles. I haven't tried qmss qpend based communication. Do you know how many cycles does it would take using qmss qpend based communication?

    Now I'm also considering not to use IPC modules, but use flags in MSMCSRAM suggested by Benoit, but I haven't worked this method out. Do you have any idea whether qmss qpend based communication will be superior to the MSMCSRAM method?

    Regards,

    Yang

  • Hi Yang,

                  If you look at this post

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/200021.aspx

    you can get the actual cycles and in that post i am asking Team Ti how are the cycles different from the one mentioned in the IPC Benchmark Wiki page.

    ======================================================================

    As of now the IPC using QMSS has one driver issue and Team Ti is working on it

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/193003/696978.aspx#696978

    ======================================================================

    http://e2e.ti.com/support/embedded/bios/f/355/p/151065/553836.aspx#553836

    at above post, Ti member discusses about the qmss qpend based communication and its utility of cycles/msg

    Thanks

    RC Reddy