This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AM5728: RPMessage SWI Execution Time

Part Number: AM5728
Other Parts Discussed in Thread: SYSBIOS

Tool/software: TI-RTOS

AM5728 DSP running TI-RTOS on a Custom board.  Processor speed is 750 MHz.

PDK 1.0.5, IPC 3.44.0, XDC 3.32.1.22, BIOS 6.46.1.38

I've been using the Execution Graph tool to study the timeline of our project.  We have a hard real time requirement to service a sample data interrupt from an FPGA every 71 usec.  We are using MessageQ via IPC to communicate from the DSP to the Linux ARM Host.


The system runs along fine until we send a message from the ARM to the DSP.  The picture below was captured showing the time line of the processing.  Looking at the left side of the picture, Hwi.82633508 (blue bar at the top) is the realtime interrupt every 71 usec.  After that Hwi, the fpgaDataThread must run to completion.  This normally takes 7-8 usec to execute.  As you can see, after fpgaDataThread finishes, LogUtilThread runs, which is the lowest priority thread in the system.  So that is good.

Moving now to the second execution of Hwi.82633508 (middle of the picture), after the Hwi the fpgaDataThread does start to run, but then it gets interrupted by the EventCombiner Hwi, which must be the interrupt caused by the message arriving from the host.  X2-X1 is the Hwi time or 15 usec (which seems like a huge time for a Hwi). 
After the Hwi runs, RPMessage_swiFxn Swi runs.  Since all Swi's are higher in priority than all Tasks, the RPMessage_swiFxn Swi runs to completion. This takes X4-X3 + X6-X5 or 52 usec.  Considering the message is 4 32-bit words long and results in no actual processing in the Swi itself, it would seem that 52 usec is a very excessive amount of time to lock up the processor just to receive a short message.  If you add the time for the Hwi, we are at 67 usec out of 71 usec just for the receiving of the mesage. 
As you can also see, during the RPMessage Swi we get another Hwi.82633508 interrupt and thus we essentially miss the real time data interrupt since we never finished the previous processing.

Hopefully, I have some unneeded logging enabled or some other issue, or we might need to abandon MessageQ for a more streamlined messaging implementation.

  • The RTOS team have been notified. They will respond here.
  • Update: I made some changes to my project and the .cfg file and was able to reduce the Hwi and Swi execution times a bit, which has improved the situation some, but not fixed it. The Hwi time went from 15 to 12 usec and the Swi went from 52 to 41 usec.

    I changed the RTSC build profile from debug to release.

    I added the following lines to the .cfg file:

    BIOS.libType = BIOS.LibType_Custom;

    var Diags = xdc.useModule("xdc.runtime.Diags");
    var Defaults = xdc.useModule('xdc.runtime.Defaults');
    Defaults.common$.diags_ASSERT = Diags.ALWAYS_OFF;
    Defaults.common$.logger = null;

    Maybe there are some additional things I can be setting to lean the HWI and SWI processing out even more?

    I was interested in section 2.3.9, Thread Synchronization from the IPC Users Guide. Could I maybe change it from using the SWI to a Thread? The messages from MessageQ are not needed to be handled in realtime, so making the interface slower to respond is probably a good thing.

    Also, any way we can reduce the time of the Hwi in general would be a good thing. My HWI's do very little work and usually post a semaphore to a task. It would be nice for the HWI dispatcher to be as lean as possible. Right now, HWI's are on the order of 5-6 usec. This accounts for almost 10% of my timeline, just for the HWI. As post a semaphore, I was not able to use the HWI_plug() approach to avoid the HWI Dispatcher.
  • Chris,

    It appears you have read the wiki (processors.wiki.ti.com/.../Optimizing_IPC_Applications) to change BIOS.libType, have you also tried to build the SYS/BIOS libraries with various BIOS.customCCOpts option (-o3 should be a must) as the wiki explains?

    For HWI_plug(), the thread (e2e.ti.com/.../372449) has a example c code and .cfg file that may help.

    For MessageQ threading model (SWI vs. thread), let me check internally and then get back to you.

    Regards,
    Garrett
  • Thank you for the reply Garrett.

    I had not looked at the CCOpts, but I did just now. As instructed I printed the existing options and I got this:

    customCCOpts = -mv6600 --abi=eabi -q -mi10 -mo -pdr -pden -pds=238 -pds=880 -pds1110 --program_level_compile -o3 -g --optimize_with_debug

    Since -o3 is already present, as is -mi10, I don't anything else obvious to try. Can you recommend anything?

    As for HWI_plug, I did read that thread and tried to implement HWI_plug last week. As my HWI's call sem_post to trigger a thread to run, it seemed HWI_plug was not a viable option and it did seem to crash my code very effectively. LOL Based on the comment in the thread "Note that if your ISR needs to trigger a SWI or a semaphore, then you're better off sticking with a HWI (and in the case of semaphores you have no choice but to do it that way, according to the docs)." I gave up on that approach. It would seem that HWI_plug basically bypasses BIOS, and if you then call a BIOS function, ie sem_post, which triggers a BIOS task switch, then you are effectively running BIOS inside your ISR and boom!

    I think the real solution here would be a thread to handle MessageQ versus a SWI, so I hope you can find something out about that.

    Thanks,

    Chris
  • Part Number: AM5728

    Tool/software: TI-RTOS

    This is a follow on to this post:  https://e2e.ti.com/support/arm/sitara_arm/f/791/t/604259

    AM5728 DSP running TI-RTOS on a Custom board.  Processor speed is 750 MHz.

    PDK 1.0.5, IPC 3.44.0, XDC 3.32.1.22, BIOS 6.46.1.38

    I've been using the Execution Graph tool to study the timeline of our project.  We have a hard real time requirement to service 14 sample data interrupts from an FPGA every millisecond.

    As seen in the execution graph below, the Hwi execution time is about 9 usec.  For the 14 interrupts per 1 msec period, this adds up to 126 usec or 12.6% of the processor time.  That is a large amount of time to give up to just Hwi processing. 

    This Hwi is the result of an MSI interrupt from the PCIe subsystem.  The ISR is about as simple as one could hope for, clear pending bits in the Msi registers for PCIe and post a mailbox message to the fpgaDataThread, which you can see in the picture runs immediately after the Hwi.  I have noticed that any changes I have made to the ISR code does not affect the Hwi time.  Based on another thread, I moved all Hwi processing code from DDR3 to L2SRAM using the syntax,

    Program.sectMap[".hwi: { *.*(.text:*ti_sysbios*_Hwi_*) }"] = "L2SRAM";

    This did accomplish the moving of the code to L2SRAM, but I did not see any change in Hwi execution time.  Very disappointing.

    Since I am posting a message to a Mailbox, it is not really an option to use Hwi_plug().  So, I am looking for suggestions on how to reduce the Hwi execution time.  It seems excessive at 9 usec, but I am not sure exactly where all that time is being spent.

  • Chris,

    I discussed your application timing requirement internally with team. It doesn't seem we have a quick solution for the RPMessage SWI execution time. Would it be possible that you simplify the application so we can reproduce the issue on AM572x GP EVM?

    >> It would seem that HWI_plug basically bypasses BIOS, and if you then call a BIOS function, ie sem_post, which triggers a BIOS task switch
    I think this makes sense and Hwi_plug() is not applicable in the scenario.

    In general the code execution seems to be slow. Is only Hwi_ funciton running from L2SRAM? Also do you have L2 cache enabled? Fine tuning of cache configuration may help.

    Regards,
    Garrett
  • Garrett,

    I will have to investigate how we can give you enough code, both ARM and DSP to replicate the issue.  I have a DSP test build, not sure if we can make an ARM build.  In any case, I suspect it would have to be sent direct to TI, ie not as an attachment on this thread.

    About execution time.  We have begun to question this in general.  We need to explore this more.

    The following is from my map file showing what functions (including Hwi) are in L2SRAM:

    .hwi       0    00805120    00001500    

                     00805120    00000260     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_dispatchCore__I)

                     00805380    00000010                  : BIOS.obj ($Tramp$S$$ti_sysbios_knl_Swi_restoreHwi__E)

                     00805390    00000010     ti.targets.rts6000.ae66 : Error.oe66 ($Tramp$S$$xdc_runtime_Error_raiseX__E)

                     008053a0    00000220     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_reconfig__E)

                     008055c0    00000010     ti.targets.rts6000.ae66 : Error.oe66 ($Tramp$S$$xdc_runtime_Error_check__E)

                     008055d0    00000010                             : Core-mem.oe66 ($Tramp$S$$xdc_runtime_Core_createObject__I)

                     008055e0    00000200     sysbios.ae66 : c64p_Hwi_disp_always.obj (.text:_ti_sysbios_family_c64p_Hwi_dispatchAlways)

                     008057e0    00000140                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_Instance_init__E)

                     00805920    00000010     ti.targets.rts6000.ae66 : Core-mem.oe66 ($Tramp$S$$xdc_runtime_Core_deleteObject__I)

                     00805930    00000010     sysbios.ae66 : BIOS.obj ($Tramp$S$$ti_sysbios_knl_Task_schedule__I)

                     00805940    00000140                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_Module_startup__E)

                     00805a80    00000100                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_Instance_finalize__E)

                     00805b80    000000c0     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_family_c64p_Hwi_Object__create__S)

                     00805c40    00000020     sysbios.ae66 : BIOS.obj ($Tramp$S$$ti_sysbios_family_c66_Cache_wbInv__E)

                     00805c60    000000c0     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_family_c64p_Hwi_create)

                     00805d20    000000a0     sysbios.ae66 : c64p_Hwi_asm.obj (.text:_ti_sysbios_family_c64p_Hwi_plug__E)

                     00805dc0    00000020                  : BIOS.obj ($Tramp$S$$ti_sysbios_family_c66_Cache_invL1pAll__E)

                     00805de0    00000010     ti.targets.rts6000.ae66 : Core-smem.oe66 ($Tramp$S$$xdc_runtime_Core_constructObject__I)

                     00805df0    00000010                             : Core-params.oe66 ($Tramp$S$$xdc_runtime_Core_assignParams__I)

                     00805e00    000000a0     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_dispatchC__I)

                     00805ea0    00000020     --HOLE-- [fill = 0]

                     00805ec0    000000a0                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_eventMap__E)

                     00805f60    000000a0                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_getStackInfo__E)

                     00806000    000000a0     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_construct)

                     008060a0    00000020     --HOLE-- [fill = 0]

                     008060c0    00000080     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_hal_Hwi_checkStack)

                     00806140    00000080                  : BIOS.obj (.text:ti_sysbios_hal_Hwi_initStack)

                     008061c0    00000060                  : c64p_Hwi_asm_switch.obj (.text:_ti_sysbios_family_c64p_Hwi_switchAndDispatch__I)

                     00806220    00000060                  : c64p_Hwi_asm_switch.obj (.text:_ti_sysbios_family_xxx_Hwi_switchAndRunFunc)

                     00806280    00000060     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_family_c64p_Hwi_Object__delete__S)

                     008062e0    00000060     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_hal_Hwi_Instance_init__E)

                     00806340    00000040                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_disableInterrupt__E)

                     00806380    00000040                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_enableInterrupt__E)

                     008063c0    00000040     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_HwiProxy_create)

                     00806400    00000040     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_Object__destruct__S)

                     00806440    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_family_c64p_Hwi_Module__startupDone__F)

                     00806460    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_family_c64p_Hwi_Params__init__S)

                     00806480    00000020     --HOLE-- [fill = 0]

                     008064a0    00000020     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_clearInterrupt__E)

                     008064c0    00000020                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_startup__E)

                     008064e0    00000020                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_switchFromBootStack__E)

                     00806500    00000020                  : BIOS.obj (.text:ti_sysbios_family_c64p_Hwi_unPluggedInterrupt__I)

                     00806520    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_HwiProxy_Object__create__S)

                     00806540    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_HwiProxy_delete)

                     00806560    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_HwiProxy_disable__E)

                     00806580    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_HwiProxy_restore__E)

                     008065a0    00000020     sysbios.ae66 : BIOS.obj (.text:ti_sysbios_hal_Hwi_Instance_finalize__E)

                     008065c0    00000020                  : BIOS.obj (.text:ti_sysbios_hal_Hwi_Module_startup__E)

                     008065e0    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_Params__init__S)

                     00806600    00000020     VailLTEDsp1_pe66.oe66 (.text:ti_sysbios_hal_Hwi_destruct)

    .noncached_Program_Data  

    *          0    00806620    00000140    

                     00806620    000000c0     AM57xxPCIe.obj ( .noncached_Program_Data )

                     008066e0    00000010     ti.osal.ae66 : HwiP_tirtos.oe66 ($Tramp$S$$HwiP_disable)

                     008066f0    00000010     LogUtil.obj ($Tramp$S$$_ZN7LogUtil6logMsgEjPcz)

                     00806700    00000010     ti.drv.pcie.ae66 : pcie.oe66 ($Tramp$S$$Pcie_getPendingFuncInts)

                     00806710    00000010                      : pcie.oe66 ($Tramp$S$$Pcie_clrPendingFuncInts)

                     00806720    00000010     AM57xxPCIe.obj ($Tramp$S$$_ZN10AM57xxPCIe8pcieReadEj)

                     00806730    00000010     AM57xxPCIe.obj ($Tramp$S$$_ZN10AM57xxPCIe9pcieWriteEjj)

                     00806740    00000010     SystemEventManager.obj ($Tramp$S$$_ZN18SystemEventManager6handleE15SystemInterrupt)

                     00806750    00000010     --HOLE-- [fill = 00000000]

    I do believe I have L2 Caching enabled.  I have the following platform definition and have not disabled any cache memory regions.

  • Following up on the original post on the RPMessage_swiFxn() execution time.  We are pulling together an EVM version of the ARM and DSP code to allow TI to see the timing we are seeing.  Here is a capture of the execution time for the HWI and SWI for receiving a message from the ARM to the DSP via MessageQ.  While this is faster than on our custom board it still seems very slow.  24 usec for the SWI seems an eternity for what is a very small (4 32-bit words) message.  The code that runs in runTskMsgQRx is the code that copies the payload from the message queue and forwards it into our message processor, which occurs much much faster than the HWI and SWI.  Having the SWI block out all other tasks for 24 usec is a big issue when we have a very full 71 usec processing time. 

    If we could send the ARM and DSP code/images to you off line that would be best.  Some could email me directions directly that would be good.

    Thanks,  Chris

  • Chris, you can send me your EVM version code by email or through TI representative working with you. - Garrett
  • Yes, I can do that. Probably next week. We'll provide an ARM app and DSP code. You should be able to reproduce my last picture.

    In a meeting about this the other day, it was suggested that maybe TI could add instrumentation to the MessageQ example code provided, ex02_messageq, and see what the timing is of the RPMessage_swiFxn()?
  • After further discussion of the issues with sending our EVM code, we'd like to avoid that if possible. This thread contains two issues, 1) Why does a HWI take so long and 2) Why does the RPMessage SWI take so long.

    1) The HWI takes about 9 usec to execute. That adds up to 12% of our timeline. Is that execution time what TI would expect for the HWI?

    2) I have run both our test code on the EVM and on our custom hardware now and the RPMessage_swiFxn executes for 24 usec on both. Can TI run the TI provided example code, ex02_messageq, and instrument it to look at the RPMessage_swiFxn. I ask that TI do this as to not have any chance for our code to possibly corrupt the test in any way. If TI sees a similar SWI time and decides that is "normal" then I am curious if there is a way to change the SWI into a threaded function so that it can be run at a lower priority and not be blocking to our tasks. If not, then I am afraid we will have to abandon this method completely and go to something more efficient and I guess custom.
  • Chris,

    Just have chance to look into the ex02_messageq execution graph with a slight later IPC maintenance release 3.44.01. The RPMessage_swiFxn (X3-X4 = 29 us) result is close to what you observed. Hwi event combiner takes about 11us (X2-X1) as shown below. It's probably not trivial to change the SWI into threaded function in the IPC framework, and you may need a custom 'IPC' solution.


    Regards, Garrett

  • Thanks for doing that test Garrett, but that is pretty much worst case news for us. We do have an implementation of a shared memory communication we used on the C6482 via HPI interface. Looks like we will resurrect that in place of MessageQ. One question, or should I say first question, what is a simple bare bones way for the ARM and DSP to interrupt each other, to let the other know a message was placed in the shared memory? I found the Mailbox feature of the AM5728. Is that the simplest way?
  • Chris,

    Yes, mailbox is the simplest way to notify cores (ARM or DSP) on AM57x when a messages is placed in the shared memory. Actually this is also part of message passing mechanism in RPMsg. The wiki (processors.wiki.ti.com/.../PRU-ICSS_Remoteproc_and_RPMsg talks about ARM and PRU, but the design concept is similar for ARM and DSP.

    The rpmsg mailbox Linux driver is here: . We briefly discussed the DSP IRQ crossbar setting for mailbox (NotifySetup) in the thread e2e.ti.com/.../1855549

    Regards, Garrett

  • Thank you Garrett, that should be helpful. I've also been looking at the code in ipc_*\packages\ti\sdo\ipc\family\vayu\InterruptDsp.c as an example. Not sure if I could just call those functions myself maybe.