This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] Linux: How do I test the real-time performance of an AM3x/AM4x/AM6x SoC?

Other Parts Discussed in Thread: AM625

I am trying to estimate the performance of real-time (RT) Linux running on one of TI's new SoCs. What's the best way to do that?

.

Please note that RT Linux is more real-time than regular Linux, but RT Linux is NOT a true real-time operating system (RTOS). For more information about real-time performance on different cores, please reference Sitara multicore system design: How to ensure computations occur within a set cycle time?

The performance of RT Linux is typically measured as interrupt latency (or interrupt response time). For more information about interrupt latency, reference the Linux section of Sitara multicore system design: How to ensure computations occur within a set cycle time?

For more information about testing with cyclictest, please reference https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1172055/faq-am625-how-to-measure-interrupt-latency-on-multicore-sitara-devices-using-cyclictest

.

For other FAQs about multicore subjects, please reference Sitara multicore development and documentation

  • Texas Instrument’s default SDK is a great starting point for your project and does a great job at showcasing all the capabilities of our chips. However, even when at an idle state, the default SDKs can keep the chip fairly busy, especially from a real-time perspective. Before we can get a reasonable view of the real-time performance of any of TI’s SoC, we first need to generate an environment that more closely resembles the environment your real-time application may be working in.

    Preparing User Space

    First, we’ll need to remove any unneeded applications from user-space. Many of these applications and services in the SDK will never make it into a production build and only take resources from the other applications we do care about. Fortunately, TI’s SDK Yocto build environment can make generating a “tiny” image via the ‘bitbake tisdk-tiny-image’ (essentially a bare busybox) fairly straight forward.

    Note: it’s easy to generate great performance numbers using TI’s tiny SDK that do not reflect reality. Be sure to add in the applications you plan on having on your production images to generate the most accurate approximation to the real-time performance you can achieve using our SoCs.

    Preparing The Device Tree

    Next, we can begin streamlining the Kernel’s device tree. TI’s Starter Kits and Evaluation Modules do a wonderful job at showcasing all the peripherals and all the capabilities of our SoCs. However, just like with our user-space applications, many real-time applications have a more focused purpose, leaving many of the peripherals on our starter kits unneeded and ultimately hurting our real-time performance.

    In the ARM world, SoC manufacturers have the freedom to put anything in the memory map wherever we please. For example, we can ‘place’ a UART instance at 0x02800000 while Broadcom may put a UART device at 0x7E201000. This freedom to place anything anywhere in the memory map is why the Linux kernel developed the concept of a device tree. It’s essentially a small file loaded by our bootloaders to describe all the devices and peripherals the SoC has. The kernel will use this file to initialize all the drivers it needs to properly configure and run each of the devices described in this file.

    For example, if we’re evaluating the am625 for our real-time application, we can use the k3-am625-sk.dts device tree source file inside the Linux kernel to describe all the features the am625’s starter kit has. To achieve a more realistic metric for our application, we can begin removing devices from this file to mimic what we plan on supporting on our final product.

    For example, if we do not plan on having any LEDs we can simply remove the entire leds node from the device tree.

    diff --git a/arch/arm64/boot/dts/ti/k3-am625-sk.dts b/arch/arm64/boot/dts/ti/k3-am625-sk.dts
    index 4f179b146cabc..27c31ed3bf7da 100644
    --- a/arch/arm64/boot/dts/ti/k3-am625-sk.dts
    +++ b/arch/arm64/boot/dts/ti/k3-am625-sk.dts
    @@ -138,20 +138,6 @@ vdd_sd_dv: regulator-4 {
                    states = <1800000 0x0>,
                             <3300000 0x1>;
            };
    -
    -       leds {
    -               compatible = "gpio-leds";
    -               pinctrl-names = "default";
    -               pinctrl-0 = <&usr_led_pins_default>;
    -
    -               led-0 {
    -                       label = "am62-sk:green:heartbeat";
    -                       gpios = <&main_gpio1 49 GPIO_ACTIVE_HIGH>;
    -                       linux,default-trigger = "heartbeat";
    -                       function = LED_FUNCTION_HEARTBEAT;
    -                       default-state = "off";
    -               };
    -       };
     };
     
     &main_pmx0 {
    
     

    Alternatively, we can also use the ‘status = “disabled”;’ property on any node and the kernel will act as if the node was removed. For example, we can disable ospi by simply adding the ‘status = “disabled”;’ property inside the &ospi0 node.

    diff --git a/arch/arm64/boot/dts/ti/k3-am625-sk.dts b/arch/arm64/boot/dts/ti/k3-am625-sk.dts
    index 4f179b146cabc..b9ee8994405e4 100644
    --- a/arch/arm64/boot/dts/ti/k3-am625-sk.dts
    +++ b/arch/arm64/boot/dts/ti/k3-am625-sk.dts
    @@ -406,7 +406,7 @@ mbox_m4_0: mbox-m4-0 {
     };
     
     &ospi0 {
    -       status = "okay";
    +       status = "disabled";
            pinctrl-names = "default";
            pinctrl-0 = <&ospi0_pins_default>;
    

    We can continue removing/editing nodes for all of the am625’s wakeup/mcu/main domains in the following files in the kernel's source code

    • k3-am62-main.dtsi
    • k3-am62-wakeup.dtsi
    • k3-am62-mcu.dtsi

    until we have a rough approximation to our final device tree.

    Note: Make sure to keep nodes that other nodes will depend on. For example, the SD card (&sdhci1) depends on multiple other nodes (linked via their phandles) to turn the power regulators (&vdd_mmc1) and configure the pinmux (&main_mmc1_pins_default)

    Once we’re happy with the final device tree, we can rebuild the dtb using ‘make dtbs’ while inside the kernel’s source code. Then just copy the resulting k3-am625-sk.dtb into the /boot/ directory along with the kernel Image in our SD card.

    Finally, with a complete rootfs and kernel tailored for our real-time application we can begin testing.

    Measuring Real-Time Performance

    Cyclictest is a very popular tool used by the wider real-time Linux community to evaluate the performance of many SoCs and to find places in the kernel that need attention from developers. Cyclictest essentially runs a single non real-time thread to start a number of measuring (real-time priority) threads that will be woken up periodically to calculate the difference between their programmed and their effective wake-up time (their latency). This latency is essentially the time it takes for the SoC to context switch from a background load to your real-time application.

    This metric can be used to get a rough estimate on how much time you have to run your real-time application. For example if you had 500us to perform some action, you would need to subtract the time the SoC took to switch contexts (eg: 75us) from that time, giving you 425us for your real-time code to run. While Linux is not an RTOS based operating system, we can be reasonably comfortable that the vast majority of the time Linux will take around 75us (in this made up example) to switch.

    There can be a bit of a learning curve especially for developers just starting out in the real-time world. However, a great test for the am625 would be something along the following:

    $ cyclictest -l100000000 -m -S -p90 -i400 -h400 -q > am625-cyclictest.hist

    This command instructs cyclictest to take 100M samples for 400us and save the resulting histogram data to a file called am625-cyclictest.hist. This will typically take more than 5 plus hours to collect the 100M samples, however the results will give us a fairly confident estimate of the performance our environment we developed above will have for our application.

    Just as an experiment, I've removed almost everything from the rootfs and the device tree, saving only the console uart and the sd card, to generate the absolute minimum the am625 can produce. (Your device will most likely have larger latency numbers and be more useful than solely logging cyclictest results Sweat smile )

    Using a simple script to convert the histogram data to a OSADL style plot shows the am625 on average too 6us to switch to the real-time task. Though as amazing as these results are, it really should be a lesson on how much control we have when evaluating SoCs for real-time applications. The more background load we place on the SoCs or the more drivers the kernel must maintain have a dramatic impact in the resulting latency plots we can generate here.

    Wrapping Up

    There is a vast body of knowledge and prior experience when it comes to real-time Linux applications to help you approximate, evaluate and debug your real-time applications that I haven't covered here. Hopefully this serves as a good starting point to get you started with your real time application.

    ~Bryan