This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[IPC, C6678] Multicast to slave cores via IPC Notify API

Hello,

I'm now checking IPC Notify performance on C6678.
From my investigation, it takes at least 6009 CPU cycles (quite bigger number than I expected !) for the multicast -- Core0 sends notification to slave Cores (Core1-7) and each slave core acknowledges to the notification in their notify handlers (ISR context).

Here is a test code for your reference.

sample_ipcnotify_keystone.zip

To try this code on your environment, group all cores and then run the executable on C6678 EVM.

As you see in this test code, Notify_sendEvent() is being called on master core to invoke each slave cores and I believe the overhead is taken by Notify API itself.
Please note :

  • The test has been verified with Release build configuration.
  • The last argument of Notify_sendEvent() is FALSE (no wait)
  • L1D/L1P has been configured as full case and L2 is full SRAM. All code and data are fit on L2 except the buffer for receiving acknowledges from slave cores. It is placed on MSMC RAM
  • Because I enabled cache, the result is being calculated by averaging the numbers for 100 time test iterations.
  • If MYDEBUG definition is enabled, the benchmark will be started after sending all notifications. In this case I saw only 267 CPU cycles.
  • RTSC versions (from mcsdk_2_01_02_06)
    • IPC 1.24.3.32
    • SYS/BIOS 6.33.6.50
    • XDSTools : 3.23.4.60

Now questions:

  1. Do you think this test result is reasonable ? If not, do you have any idea to get better performance ?
  2. Do you have any way to invoke all slave cores per single Notify or something IPC API ? 
  3. Do you have thinner API than Notify to invoke slave cores ?

Best Regards,
Naoki

  • My answers:

    1. Do you think this test result is reasonable ? If not, do you have any idea to get better performance ?

    >>>>>>   I am not sure.  I assume you compiled the test program with full optimization

    1. Do you have any way to invoke all slave cores per single Notify or something IPC API ? 

    I do not think so. Notify has a single reader so you need to call notify 7 times (for 7 cores)

    1. Do you have thinner API than Notify to invoke slave cores ?

    >>>>No but you can develop one yourself.  Depends on your needs, you can use the IPC registers (see 3.3.13 in ) to send interrupts to cores.  If you write your own code you can write to the all IPC registers from the same routine and save lots of time.  If you need only limited number of messages and all the messages are from core 0 to all other cores you can embedded the message ID in bits 4-31 (28 bits) in IPCGRx,  basically develop your own protocol.  For example, you can assign 4 bits to each core, so you can send up to 15 different messages (you need at least one bit set) with the core ID.

    But again, depends on your requirements.  Did you try messageQ with QMSS?  We have it for 6678 and it may be faster than the shared memory solution

    Ran

  • Hi Ran,

    Thanks for your reply. I had verified the performance of MessegeQ with QMSS before posting this thread. It looked 3500 cycles per core-to-core communication (1-way). Assuming our use case, 3500 * 7 cycles would be expected. So I checked the performance of thinner API than MessageQ (i.e., Notify). To get better performance, we would need to consider creating our own implementation for IPC.

    Best Regards,
    Naoki 

  • Will you close the thread?
  • Yes, you can regard this thread as closed. 
    Because the numbers I mentioned are given by my experiments. The other guy might get better numbers. So I keep this thread suggested answers (not verified numbers).

    Regards,
    Naoki