# AM64X SOFTWARE DEVELOPMENT OVERVIEW

RUNNING ALL CORES ON THE **DEVICE FOR** FULL **ENTITLEMENT** 



## **Agenda**

- Short Overview
- MCU+ SDK Development
- Demo
- Linux Development
- Summary

## **Sitara** overview



Scalable, cost-optimized portfolio with accelerators, analog integration, robust connectivity, security and functional safety designed for industrial markets

|                                                                                           | SoCs                                                    |                             |         |                                                                                                                  |
|-------------------------------------------------------------------------------------------|---------------------------------------------------------|-----------------------------|---------|------------------------------------------------------------------------------------------------------------------|
| Single-core to quad-core <b>Arm Cortex-A53</b> , <b>A9 and A8</b>                         | Compute                                                 | allial  lin                 | Control | Single-core to quad-core <b>Arm Cortex-R5F</b> with optional Lock-Step support                                   |
|                                                                                           | Functional Safety<br>& Security                         |                             | Analog  | High-level integration of high-performance ADC, DAC, comparators and PWM                                         |
| Power-optimized neural network accelerators, audio DSP, and GPU                           | Deep Learning & Accelerators                            | ē! c                        | Connect | USB, PCIe, Ethernet Switch, Industrial Protocols, CAN-FD, and more                                               |
| Open source device enablement for Mainline Linux, RTOS and Bare Metal 3P software support | Unified Software Platform                               |                             | n       | Simplified tools (SysConfig) and libraries (DSPLIB, TIDL,) to accelerate development and performance entitlement |
| Power-optimized design                                                                    | SIL2 functional safety with common software development | -40 to 125<br>temperature r | range   | HiRel DSP                                                                                                        |



## AM64x (17mm x 17mm) Cortex®-A53 based processors

#### Cores & Memory

- Dual Cortex-A53 up to 1GHz
- Dual or Quad Cortex-R5F up to 800MHz
- >2MB on-chip SRAM
- ECC on all critical memories
- 16b LPDDR4/DDR4 controller with inline ECC

#### Functional safety features

- 400MHz Cortex-M4F subsystem has freedom from interference to enable usage as a safety monitor
  - · Dedicated Peripherals I2C, SPI, UART & GPIO
  - Tightly coupled memory of 256KB
- Diagnostic tool kit for entire SoC voltage, temp, clock, ECC monitors and Error signaling

#### 2xPRU-ICSS-Gb

- Enables up to 2x Gb industrial Ethernet protocols
- 1x industrial Ethernet protocol + motor control current and position feedback

#### Peripheral / IO Highlight

- GPMC (32b parallel bus) and FSI (serial connection for use with TI's C2000 MCUs) offer low-latency interfaces to motor control front-end
- PCIe Gen2, USB3.0/2.0, and 2-port Gb Ethernet Switch CPSW provide highspeed (Gbps) connectivity options
- RS485 support on UART
- Octal/Quad-SPI with execution-in-place support

#### Integrated analog

- 8-channel, 12-bit ADC with 4 Msps
- Simplified power solution, Integrated Voltage Monitors

#### Package

17.2 x 17.2mm, 0.8mm ball pitch





## AM243x (17mm x 17mm) Cortex®-R5F based processors

#### Cores & Memory

- Dual Cortex-A53 up to 1GHz
- Dual or Quad Cortex-R5F up to 800MHz
- >2MB on-chip SRAM
- ECC on all critical memories
- 16b LPDDR4/DDR4 controller with inline ECC

#### Functional safety features

- 400MHz Cortex-M4F subsystem has freedom from interference to enable usage as a safety monitor
  - Dedicated Peripherals I2C, SPI, UART & GPIO
  - · Tightly coupled memory of 256KB
- Diagnostic tool kit for entire SoC voltage, temp, clock, ECC monitors and Error signaling

#### 2xPRU-ICSS-Gb

- Enables up to 2x Gb industrial Ethernet protocols
- 1x industrial Ethernet protocol + motor control current and position feedback

#### · Peripheral / IO Highlight

- GPMC (32b parallel bus) and FSI (serial connection for use with TI's C2000 MCUs) offer low-latency interfaces to motor control front-end
- PCIe Gen2, USB3.0/2.0, and 2-port Gb Ethernet Switch CPSW provide highspeed (Gbps) connectivity options
- RS485 support on UART
- Octal/Quad-SPI with execution-in-place support

#### Integrated analog

- 8-channel. 12-bit ADC with 4 Msps
- Simplified power solution, Integrated Voltage Monitors

#### Package

- 17.2 x 17.2mm, 0.8mm ball pitch







- Dual / Single Arm® Cortex®-A53 (only on AM64x devices)
  - Up to 1 GHz, ARMv8-A instructions set
  - Dual core cluster with shared 256KB L2. Each core has 32KB L1 I\$ and D\$
  - AArch64 for 64b support and new architecture features
  - Backward compatible with code for previous Arm processors (AArch32)
  - Integrated Neon™ processing engine and VFPv4 compatible hardware
  - Hardware virtualization support





- Dual / Single Arm® Cortex®-A53 (only on AM64x devices)
  - Up to 1 GHz, ARMv8-A instructions set
  - Dual core cluster with shared 256KB L2. Each core has 32KB L1 I\$ and D\$
  - AArch64 for 64b support and new architecture features
  - Backward compatible with code for previous Arm processors (AArch32)
  - Integrated Neon<sup>™</sup> processing engine and VFPv4 compatible hardware
  - Hardware virtualization support



- 2x- or 4x- Arm<sup>®</sup> Cortex<sup>®</sup>- R5Fs (AM64x and AM243x)
  - Up to 800MHz, ARMv7-R instruction set
  - 2x dual clusters with total of 256KB TCM. Each core has 32KB I\$ and 32KB D\$
  - No Lock-step operation, only split mode. Optimized for real time operations
  - Integrated Neon<sup>™</sup> processing engine and VFPv3 compatible hardware
  - Multi-processing extensions for multiprocessing functionality
  - Vectored Interrupt Manager (VIM)



- Dual / Single Arm® Cortex®-A53 (only on AM64x devices)
  - Up to 1 GHz, ARMv8-A instructions set
  - Dual core cluster with shared 256KB L2. Each core has 32KB L1 I\$ and D\$
  - AArch64 for 64b support and new architecture features
  - Backward compatible with code for previous Arm processors (AArch32)
  - Integrated Neon™ processing engine and VFPv4 compatible hardware
  - Hardware virtualization support



- 2x- or 4x- Arm® Cortex®- R5Fs (AM64x and AM243x)
  - Up to 800MHz, ARMv7-R instruction set
  - 2x dual clusters with total of 256KB TCM. Each core has 32KB I\$ and 32KB D\$
  - No Lock-step operation, only split mode. Optimized for real time operations
  - Integrated Neon™ processing engine and VFPv3 compatible hardware
  - Multi-processing extensions for multiprocessing functionality
  - Vectored Interrupt Manager (VIM)



- 1x- Arm® Cortex®- M4F (AM64x and AM243x)
  - Up to 400MHz, ARMv7-R instruction set
  - Integrated Full precision floating point unit (FPU)
  - 256KB of local SRAM, 192KB I-code and 64KB D-code
  - Ability to execute code from unified memory (I-code/D-code) or external memory via System bus. SoC integration includes Region Address Translation (RAT) to enable contiguous memory
  - Nested Vector Interrupt Controller (NVIC)



- 2x PRU\_ICSSG Industrial Communication Subsystems (AM64x and AM243x)
  - 2x general-purpose PRU cores with 12KB program RAM, 2 KB data RAM per CPU, MAC, CRC16/32 HW accelerator
  - 2x auxiliary Real-time transfer units with 8KB program RAM, 2KB data RAM, MAC, CRC16/32 HW accelerator
  - 2x transmit real-time transfer units with 6KB program RAM, 2KB data RAM, MAC, CRC16/32 HW accelerator
  - 64KB shared RAM
  - Two MDIO ports
  - Two PRU\_ICSSG support 2x (each) RGMII or MII\_RT. One PRU\_ICSSG supports 2x SGMII, RGMII, or MII\_RT
  - Two Industrial Ethernet Peripheral and industrial Ethernet timers (IEP)
  - 1 x 16550-compatible UART
  - Capable of supporting master and/or slave modes of protocols such as:
    - TSN, Profinet IRT, Ethernet/IP with DLR, Profibus, EtherCAT, POWERLINK, Sercos 3, Hiperface DSL, BiSS C, EnDat 2.2, HSR/PRP, and more
  - Capable of supporting operation as standard Gb Ethernet
  - SD interface upgraded to support Manchester encoding
  - Load sharing of PRU's with concurrent Sigma Delta and EnDAT interface
  - Reset isolation interface

#### PRU-ICSS-Gb

Industrial Ethernet
Supported Protocols:

TSN, EtherCAT, PROFINET, EtherNet/IP PROFIBUS, SERCOS 3 and more...

#### **PRU-ICSS-Gb**

Motor control (OR) Industrial Ethernet

9x SigmaDelta decimation filters 3x abolute encoder interfaces Supported encoders: Hiperface DSL, EnDat 2.2, Tamagawa. BiSS C etc.



## Sitara built around a unified software platform





## Processor SDK common development experience

#### Linux features:

- Updated to the latest Long Term support (LTS) Linux kernel, boot loader and Yocto file system on an annual basis
- •Robust, commercialgrade ARM® GNU compiler collection (GCC) toolchain
- Yocto Project™ OE Core compatible file systems support enables tailored Linux application support
- •RT-Linux releases include a fully pre-emptible kernel for real-time applications





#### RTOS/No OS features:

- •Robust real-time RTOS kernel (Free RTOS)
- •Includes network communications support, examples, and drivers
- Driver libraries can be used with or without an RTOS kernel
- Free and available as open source
- Available for AM64x and AM243x







## Software Enablement











- Processor SDK for Linux
- Other HLOS commercial offerings
- Bootloader: U-Boot
- Load other cores via remoteproc
- MCU+ SDK
- No RTOS and RTOS capability, CCS IDE
- Bootloader: SBL or load via Linux
- MCU+ SDK
- No RTOS and RTOS capability, CCS IDE
- Bootloader: SBL or load via Linux
- PRU Software Support Package
- No RTOS, bare metal code only, CCS IDE
- Bootloader: SBL or load via Linux

## **Linux (Kernel) Development Overview**



- Linux OS (Ubuntu)
- Largely Command Line
- make files
- kgdb/gdb
- printf()
- TFTP/NFS



## no RTOS/RTOS Development Overview



## **Software Development Summary**



## MCU+ SDK | Benefits for the end user

















#### MCU Simplicity

- •Simple drivers with GUI configuration tool
- •110+ examples to run and debug with CCS
- Multi-Core Bootloader examples included



#### MCU Optimized

- Low Latency Drivers
- Low Memory Usage
- No-OS or FreeRTOS



#### **Tools**

- SysConfig Tool
- •MCU+ Academy & TI Resource Explorer
- •TI ARM CLANG compiler
- •CCS + FreeRTOS live debugging
- Board flashing tools



#### Libraries

- •TinyUSB, LwIP, TSN
- Industrial Networking Stacks Integrated
- Motor Position Encoders and Control Algorithms
- Inter-Processor
   Communication (IPC)



## Sitara MCU | MCU+ SDK Overview



- Simplified and easy to use MCU+ SDK for MCU+ applications on R5F, M4F and C66
- Open Source OS and middleware stacks – FreeRTOS, LwIP, tinyUSB
- Simplified, low memory, low latency optimized drivers
- SysConfig for easy system configuration like pinmux, clock, driver setup
- Pre-integrated industrial protocols and motor control protocols
- Lots of examples and a step by step "MCU Academy" to quickly get started
- Interfacing with other OS like Linux (on A53) to expand to more applications and end equipments



## OS Environment | FreeRTOS



#### **Overview**

- Well established RTOS with 15+ years of deployments and partnerships with leading semiconductor vendors
- MIT open source license, allows customer to deploy in production and also protect their IP

#### FreeRTOS Kernel [LINK]

- Primary RTOS for MCU+ SDK on R5F, M4F, C66 CPUs
- Pre-integrated with all device drivers, middleware (LwIP, TinyUSB), industrial protocols
- RTOS aware debugging and state viewer integrated with CCS IDE
- Deterministic and sub-micro second task switch and interrupt latency
- < 20 KB RTOS Kernel size</li>
- Driver porting layer (DPL) allows switching to NO RTOS and/or other RTOS, like SafeRTOS
- Optional POSIX threading layer to allow application level portability



## Middleware | Inter Processor Communication (IPC)

 Inter - Processor Communication enables multiple different CPUs on the SOC to collaborate to realize the system use-case

#### IPC Notify

- Low level API, < 1us latency, ~ 6KB code size</li>
- Allows to interrupt other CPUs using SOC HW mechanisms

#### IPC RPMessage

- Higher level abstracted API, < 5us latency, ~12KB code size</li>
- Allows to exchange message packets with logical endpoints
- Used to talk to Linux when present on the SOC

#### Spinlock

HW mechanism to implement mutual exclusion across multiple CPUs

#### Shared memory

 Shared memory architecture allows to keep data in shared memory and only pass pointer to data in IPC message, avoids memory copy overheads.



## **Boot** | Boot Flow and Bootloader

- SOC ROM boots a secondary bootloader (SBL). SBL included in MCU+ SDK
- Boot modes supported by SBL
  - UART
  - OSPI
  - MMC/SD COMING SOON !!
- Features
  - Ability to boot all MCU CPUs on the SOC
  - Optionally, do multi-stage booting, to reduce boot time for "early response" functions
  - Post build tools to convert compiler generated applications into format suitable for flashing and booting
  - Flashing tool to flash application binaries over UART







## Sitara MCU | Debug and trace tools

- Code composer studio from TI (free download)
  - Eclipse based IDE
  - C6x, R5F, M4F debug and trace via JTAG
    - single step, breakpoint, watch point, disassembly
  - FreeRTOS aware (Real-time Object View ROV)
  - Powerful scripting via Debug Server Scripting (DSS)
  - Access to system memory and peripheral registers through Debug Access Port
  - Multicore debugging
  - Multiprocessor debugging

- TRACE32 from Lauterbach
  - Support C6x, ARMv7 (R5F, M4F)
  - OS aware debugging FreeRTOS, AutoSAR
  - Powerful script language
  - Easy high-level and assembler debugging
  - Support for CoreSight components like Debug Access Port, Trace Funnel, Trace Port Interface Unit, Embedded Trace Buffer, Cross Trigger Interface, Cross Trigger Matrix, System Trace Port, Trace Memory Controller
  - Real-time access to system memory and peripheral registers through Debug Access Port without halting the core
  - Multicore debugging
  - Multiprocessor debugging
  - Safety tool kit and certifications making them useable for ISO 26262 and DO-178C



http://www.ti.com/tool/download/CCSTUDIO



www.lauterbach.com/ Search for CHIP = TI





## **Tools** | SysConfig Features and Benefits

| Feature                                                                        | Benefit to end user                                                                                                                                                                         |  |
|--------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| SW Driver Configuration                                                        | Quickly add a peripheral to your project, interactive config without needing to refer TRM or API guide. Only include features and modules that are needed, saving code size and complexity. |  |
| Integrated Pinmux Configuration                                                | No need for separate pinmux tool, resolve pin conflicts interactively and without running on EVM                                                                                            |  |
| Clock enable and clock frequency setup depending on SW driver that is selected | Hides DMSC / SYSFW APIs call complexity from end user. Only modules that are needed get enabled.                                                                                            |  |
| MPU / MMU / RAT configuration                                                  | Easily set memory access controls for the CPU without needing to read TRM                                                                                                                   |  |
| Debug log configuration to select UART, CCS or shared memory output            | Target logs to console of interrest, quickly enable/disable log and log "zones"                                                                                                             |  |
| IPC configuration                                                              | Tune shared memory usage and IPC type (low latency vs normal latency)                                                                                                                       |  |
| Board peripheral configuration (FLASH, EEPROM, LED, ETHPHY)                    | Pick and choose board level peripherals and quickly start using them in your project                                                                                                        |  |
| Multi-core validation                                                          | Resolve resource conflicts including across multiple R5F and M4F before running on EVM                                                                                                      |  |





## Tools | MCU+ Academy

#### Easy-to-use training modules for Sitara MCU developers

- AM243x EVM Quick Start Guide
- SDK Setup Guide with LED Blink Example
- Learn MCU+ SDK Fundamentals
  - Projects and files structure
  - SysConfig Tool
  - Adding Drivers
  - Linker file, Multicore example
  - Debugging, Flashing tools
- Learn Advanced Topics
  - Benchmarking (CMSIS with Cycle Counter)
  - Multicore communication
  - Direct Memory Access

# MCU+ Academy

The MCU+ Academy is a great resource for developers to learn about the Sitara™ MCU Platform.



## R5F development using CCS

- Start from template CCS projects in AM64x MCU+ SDK:
  - <mcu\_plus\_sdk\_am64x\_x\_x\_x\_x>\examples\empty\am64x-evm
- Make sure to go through the <u>EVM set up</u> in the <u>Documentation</u>
- MCU+ Academy:
  - Made for AM243x but applicable to AM64x in principle

- Files that are specific for AM243x should be renamed for AM64x, e.g., library names referenced in part 4 should replace "am243x" with "am64x".





## Demo



- noRTOS Blink LED with R50\_0
- Run Linux with A53 Cluster



- Linux OS (Ubuntu)
- Largely Command Line
- make files
- kgdb/gdb
- printf()
- TFTP/NFS

- Windows OS
- CCS GUI
- Projects
- GUI Debugging
- Console
- JTAG



## MCU+ Differences for Linux

▼ USB (1)

TinyUSB





-0 nortos ti-arm-clang/example.syscf

## Launching R5F application from Linux

- Steps needed to make a R5F application launchable from Linux:
  - 1. Add IPC to syscfg similarly to the benchmark demo:

    mcu\_plus\_sdk\_am64x\_x\_x\_x\_x\examples\motor\_control\benchmark\_demo\am64xevm\r5fss0-0 nortos\examples.syscfg
  - Make sure no resource conflict with Linux. For example, in the MCU+ Academy Part 4 project, UART should be removed from syscfg or a different UART instance than 0 should be used.
  - 3. Add .resource\_table section to DDR in linker command file. Or simply reuse the linker command file from the benchmark demo:

    mcu\_plus\_sdk\_am64x\_x\_x\_x\_x\examples\motor\_control\benchmark\_demo\am64x-evm\r5fss0-0 nortos\ti-arm-clang\linker.cmd
  - 4. Build CCS project and strip all symbols from the .out file:

    C:/ti/ccs1040/ccs/tools/compiler/ti-cgt-armllvm\_1.3.0.LTS/bin/tiarmstrip
    -o=<stripped binary file> <original .out file>
  - 5. Copy the stripped .out file to Linux filesystem and set symbolic link /lib/firmware/am64-main-r5f0 0-fw, am64-main-r5f0 1-fw, etc.



## **Processor SDK Linux Kernel**

- Based off of <u>kernel.org Longterm maintenance</u>)
  - Provides bug fixes for about 2 years from the community
- Move to new Longterm annually
  - Allows users to pick up new features and capabilities

Kernel

SoC

Hardware

**Processor SDK Linux** 

## Processor SDK Linux Bootloader





## Processor SDK Linux Filesystem



## **Processor SDK Linux Example Applications**





### Processor SDK Linux Toolchain



## **Processor SDK Linux Arago Distribution**



## **Processor SDK Linux Summary**



## **Summary**

- AM64x is a powerful, heterogeneous core processor to meet a variety of system needs
- SDKs and tools are provided to facilitate software development on all cores
- Training and examples are available to accelerate development

### **Future Webinars**

- AM64x Multiple Industrial Protocols
  - October 19 at 10 AM CST and October 21 at 10 PM CST
- AM64x IPC
  - November 16 at 10 AM CST and November 18 at 10 PM CST
- More information, registration, and videos:
- https://training.ti.com/process-monthly-webinar-series



# **Backup**



## References

AM64x Linux Software Development Kit (SDK)

https://www.ti.com/tool/PROCESSOR-SDK-AM64X

AM64x MCU+ Software Development Kit (SDK)

https://www.ti.com/tool/download/MCU-PLUS-SDK-AM64X

AM64x Linux SDK User Guide

 https://software-dl.ti.com/processor-sdklinux/esd/AM64X/latest/exports/docs/devices/AM64X/index.html

AM64x MCU+ SDK Documentation

• <a href="https://software-dl.ti.com/mcu-plus-sdk/esd/AM64X/08">https://software-dl.ti.com/mcu-plus-sdk/esd/AM64X/08</a> 00 00 21/exports/docs/api guide am64x/index.html

#### AM64x EVM:

https://www.ti.com/tool/TMDS64GPEVM

#### AM64x SK board:

https://www.ti.com/tool/SK-AM64

