This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
We're running some benchmarks and trying to determine why the task context switch seems relatively slow when trying to wake a task from an ISR (using a semaphore or FreeRTOS task notification). It's taking about 25 us with the CPU clock running at 80 MHz, which comes out to about 2000 CPU cycles. We were expecting the latency to be much faster. The task is set to the highest priority, We started with the mspm0_sdk_2_01_00_03\examples\rtos\LP_MSPM0G3507\kernel\posix_demo example and reconfigured it for an MSPM0G1107.
Here's the entirety of the UART task and ISR code:
/* * Includes */ // Project-specific #include "ti_msp_dl_config.h" // Standard C library #include <assert.h> #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <string.h> // RTOS header files #include <FreeRTOS.h> #include <portmacro.h> #include <semphr.h> // TI #include <ti/drivers/dpl/HwiP.h> /* * Local variables */ static SemaphoreHandle_t m_semaphore; volatile uint8_t v_rxByte; /* * Global functions */ void UART3_IRQHandler(void) { uint8_t rxByte; switch(DL_UART_Main_getPendingInterrupt(UART_RS485_INST)) { case DL_UART_IIDX_RX: while(DL_UART_receiveDataCheck(UART_RS485_INST, &v_rxByte)) { // Unloaded a byte. } // Signal the task to transmit. xSemaphoreGiveFromISR(m_semaphore, &(BaseType_t) {}); break; default: break; } } void *RS845_thread(void *arg0) { // Initialize semaphore to signal transmit operation. m_semaphore = xSemaphoreCreateBinary(); /* * Reconfigure UART. */ DL_UART_Main_disable(UART_RS485_INST); DL_UART_clearInterruptStatus(UART_RS485_INST, -1); // Enable UART interrupt in the NVIC. HwiP_clearInterrupt(UART3_INT_IRQn); HwiP_enableInterrupt(UART3_INT_IRQn); /* * End UART reconfiguration. */ DL_UART_Main_enable(UART_RS485_INST); while(1) { // Wait for rx timeout. xSemaphoreTake(m_semaphore, portMAX_DELAY); // Transmit a character. DL_UART_transmitDataBlocking(UART_RS485_INST, '$'); } }
Here's a logic analyzer capture showing the latency between the received byte and the transmitted byte.
In main.c, the task stack size was increased and priority set to max:
diff --git a/main.c b/main.c index 0f392b2..b1c096a 100644 --- a/main.c +++ b/main.c @@ -49,7 +49,7 @@ extern void *RS845_thread(void *arg0); /* Stack size in bytes */ -#define THREADSTACKSIZE 256 +#define THREADSTACKSIZE 1024 /* Set up the hardware ready to run this demo */ static void prvSetupHardware(void); @@ -73,8 +73,8 @@ int main(void) pthread_attr_init(&attrs); /* Set priority, detach state, and stack size attributes */ - priParam.sched_priority = 1; - retc = pthread_attr_setschedparam(&attrs, &priParam); + priParam.sched_priority = configMAX_PRIORITIES - 1; + retc = pthread_attr_setschedparam(&attrs, &priParam); retc |= pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED); retc |= pthread_attr_setstacksize(&attrs, THREADSTACKSIZE); if (retc != 0) {
SysConfig setup:
/** * These arguments were used when this file was generated. They will be automatically applied on subsequent loads * via the GUI or CLI. Run CLI with '--help' for additional information on how to override these arguments. * @cliArgs --device "MSPM0G110X" --part "Default" --package "VQFN-32(RHB)" --product "mspm0_sdk@2.01.00.03" * @v2CliArgs --device "MSPM0G1107" --package "VQFN-32(RHB)" --product "mspm0_sdk@2.01.00.03" * @versions {"tool":"1.21.1+3772"} */ /** * Import the modules used in this configuration. */ const GPIO = scripting.addModule("/ti/driverlib/GPIO", {}, false); const GPIO1 = GPIO.addInstance(); const SYSCTL = scripting.addModule("/ti/driverlib/SYSCTL"); const TIMER = scripting.addModule("/ti/driverlib/TIMER", {}, false); const TIMER1 = TIMER.addInstance(); const UART = scripting.addModule("/ti/driverlib/UART", {}, false); const UART1 = UART.addInstance(); const ProjectConfig = scripting.addModule("/ti/project_config/ProjectConfig"); /** * Write custom configuration values to the imported modules. */ const divider6 = system.clockTree["PLL_CLK2X_DIV"]; divider6.divideValue = 5; const divider9 = system.clockTree["UDIV"]; divider9.divideValue = 2; const multiplier2 = system.clockTree["PLL_QDIV"]; multiplier2.multiplyValue = 8; const mux4 = system.clockTree["EXHFMUX"]; mux4.inputSelect = "EXHFMUX_XTAL"; const mux8 = system.clockTree["HSCLKMUX"]; mux8.inputSelect = "HSCLKMUX_SYSPLL2X"; const mux12 = system.clockTree["SYSPLLMUX"]; mux12.inputSelect = "zSYSPLLMUX_HFCLK"; const oscillator2 = system.clockTree["SYSOSC"]; oscillator2.enableSYSOSCFCL = true; const pinFunction4 = system.clockTree["HFXT"]; pinFunction4.inputFreq = 25; pinFunction4.enable = true; GPIO1.$name = "GPIO_RS485"; GPIO1.port = "PORTA"; GPIO1.associatedPins.create(3); GPIO1.associatedPins[0].initialValue = "SET"; GPIO1.associatedPins[0].ioStructure = "SD"; GPIO1.associatedPins[0].$name = "PIN_RS485_TX_EN"; GPIO1.associatedPins[0].pin.$assign = "PA27"; GPIO1.associatedPins[1].$name = "PIN_RS485_RX_EN"; GPIO1.associatedPins[1].pin.$assign = "PA0"; GPIO1.associatedPins[2].$name = "PIN_RS485_TERM_EN"; GPIO1.associatedPins[2].initialValue = "SET"; GPIO1.associatedPins[2].pin.$assign = "PA1"; const Board = scripting.addModule("/ti/driverlib/Board", {}, false); Board.peripheral.$assign = "DEBUGSS"; Board.peripheral.swclkPin.$assign = "PA20"; Board.peripheral.swdioPin.$assign = "PA19"; SYSCTL.HFCLKSource = "HFXT"; SYSCTL.HFCLK_Freq = 25000000; SYSCTL.enableSYSOSCFCL = true; SYSCTL.SYSPLL_Pdiv = 4; SYSCTL.SYSPLL_Qdiv = 5; SYSCTL.SYSPLL_CLK2XEn = true; SYSCTL.clockTreeEn = true; SYSCTL.validateClkStatus = true; TIMER1.timerClkDiv = 8; TIMER1.interrupts = ["ZERO"]; TIMER1.$name = "TIMER_RS485_RESPONSE"; UART1.$name = "UART_RS485"; UART1.ovsRate = "8"; UART1.targetBaudRate = 8000000; UART1.rxFifoThreshold = "DL_UART_RX_FIFO_LEVEL_ONE_ENTRY"; UART1.interruptPriority = "1"; UART1.enabledInterrupts = ["RX"]; UART1.peripheral.$assign = "UART3"; UART1.peripheral.rxPin.$assign = "PA25"; UART1.peripheral.txPin.$assign = "PA26"; UART1.txPinConfig.$name = "ti_driverlib_gpio_GPIOPinGeneric0"; UART1.txPinConfig.direction = scripting.forceWrite("OUTPUT"); UART1.txPinConfig.hideOutputInversion = scripting.forceWrite(false); UART1.txPinConfig.onlyInternalResistor = scripting.forceWrite(false); UART1.txPinConfig.passedPeripheralType = scripting.forceWrite("Digital"); UART1.txPinConfig.enableConfig = true; UART1.rxPinConfig.$name = "ti_driverlib_gpio_GPIOPinGeneric1"; UART1.rxPinConfig.hideOutputInversion = scripting.forceWrite(false); UART1.rxPinConfig.onlyInternalResistor = scripting.forceWrite(false); UART1.rxPinConfig.passedPeripheralType = scripting.forceWrite("Digital"); ProjectConfig.genDisable = true; ProjectConfig.deviceSpin = "MSPM0G1107"; /** * Pinmux solution for unlocked pins/peripherals. This ensures that minor changes to the automatic solver in a future * version of the tool will not impact the pinmux you originally saw. These lines can be completely deleted in order to * re-solve from scratch. */ pinFunction4.peripheral.$suggestSolution = "SYSCTL"; pinFunction4.peripheral.hfxInPin.$suggestSolution = "PA5"; pinFunction4.peripheral.hfxOutPin.$suggestSolution = "PA6"; TIMER1.peripheral.$suggestSolution = "TIMA0";
FreeRTOSConfig.h file is mostly unmodified from the example. Only the CPU clock was adjusted:
diff --git a/FreeRTOSConfig.h b/FreeRTOSConfig.h index 20e4040..b5e2ac5 100644 --- a/FreeRTOSConfig.h +++ b/FreeRTOSConfig.h @@ -70,7 +70,7 @@ #define configUSE_16_BIT_TICKS 0 /* Only for 8 and 16-bit hardware. */ /* Constants that describe the hardware and memory usage. */ -#define configCPU_CLOCK_HZ ((unsigned long) 32000000) +#define configCPU_CLOCK_HZ ((unsigned long) 80000000) /* Smallest stack size allowed in words */ #define configMINIMAL_STACK_SIZE ((unsigned short) 128) #define configMAX_TASK_NAME_LEN (12)
For the UART ISR trigger and executed should not >1us when you running the CPU at 80MHz, you can test it to toggle a GPIO at end of the UART ISR.
Could you help to provide a simple demo code based on LP-MSPM0G3507 here that can reproduce this issue? I can help to test is on myside.
I uploaded a minimal test example to GitHub here: https://github.com/derrick-senva/mspm0_uart_latency. I used the LP-MSPM3507 dev kit (early revision with the 48 MHz crystal) and based it off mspm0_sdk_2_01_00_03\examples\rtos\LP_MSPM0G3507\kernel\posix_demo.
The program simply responds to bytes transmitted from the integrated XDS backchannel UART. I used PuTTY to spam ASCII characters to the MSPM0. The clock tree is configured for the CPU to run at 80 MHz (from the 48 MHz crystal). UART0 is configured for 1 Mbaud, no parity, 1 stop bit, and assigned to pins PA10 and PA11. The single POSIX thread with maximum priority waits on a semaphore to be given from the UART's receive ISR and then immediately transmits a single ASCII character. I made sure to use xSemaphoreGiveFromISR from the UART ISR. I also assigned the interrupt an NVIC priority of 1 so that it doesn't exceed configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY.
The latency captured by a logic analyzer measured about 25 us between the stop bit from the XDS UART and the start bit of the MSPM0 UART's response. This is consistent with my previously posted capture. This latency seems to be caused from something with the FreeRTOS implementation in this example project. I speculated it might have had something to do with power modes being toggled during the context switch but couldn't find any evidence for that, so now I'm out of ideas as to what's causing this large delay.
I also tried running from a bare metal example without the RTOS as you suggested. I started with the project from mspm0_sdk_2_01_00_03\examples\nortos\LP_MSPM0G3507\driverlib\uart_echo_interrupts_standby. I made some simple modifications to mimic the program above, but instead of waiting on a semaphore, I simply polled a variable that was toggled by the ISR. In this scenario, I measured a latency of about 1.5 us which is much closer to what I'd expect.
Hi Derrick,
I can reproduce your issue on myside that delay of the time is about 25us, but when you enabled the optimization to fast it can be improved to about 18us
I think that should also a large number that compared with 1.5us you expected. The other hand option is you can try to create tasks instead of pthread, due to the task have the notify feature that will be 45% faster than the semaphore, I do not fund such items in pthread, maybe you can search something similar with the task notify feature, please let me know.
I switched to a FreeRTOS task and notification and it reduced latency from 25 to 23 us (from around 2000 to 1840 CPU cycles). Branch is here: https://github.com/derrick-senva/mspm0_uart_latency/tree/task_notification. That's a decent improvement in terms of raw cycles, but it still seems like an issue with the FreeRTOS port/implementation for the MSPM0. Either that or I simply don't have the correct configuration/setup for fast context switching.
I was searching around for other people's experience with context switching times and it does seem like most people are getting better results. Specifically, the official FreeRTOS FAQ here: https://www.freertos.org/Why-FreeRTOS/FAQs/Memory-usage-boot-times-context#what-is-the-context-switch-time. They cite a much better switching time using a Cortex-M3 on Keil. Although it's not a direct one-to-one comparison, I would assume the M0 architecture is close enough. They measured 84 CPU cycles which is a huge disparity. Is there someone with deep knowledge on the implementation that can advise us on how to reduce the latency, even if it involves modifying the low-level port code? Our application specifically requires low latency and multi-tasking.
Here is the response from our FreeRTOS expert that you can refer to
The reasons
Solution:
Thanks for the detailed response. I don't see any attachments for the project you mentioned(?), but I can attempt the changes you listed and report back here afterwards.
Looks like I need your reference project to see what else I have configured wrong.
I changed the optimization level of both projects to fast and also set #define configUSE_TICKLESS_IDLE 0, but somehow it made the context switching time worse. It appears to only wake the task on my configured 1 ms RTOS tick interval. You can see below that the M0 TX (response) is always spaced around 1 ms intervals.
Derrick,
If I look at the example code from your original posting, it looks like you have omitted the call to portYIELD_FROM_ISR() which is required in order to force a context switch. When you have configUSE_TICKLESS_IDLE 1, I believe the scheduler is forced to run on every wakeup from low power mode, but when you disable it, you are no longer waking up from low power mode and forcing the scheduler to run.
The behavior your describe is consistent with omitting this call. See the following example:
Thanks,
Stuart
I don't see any attachments for the project you mentioned(?)
Please refer this