Sitara TM Processors: Running TI-RTOS on the
ARM Cortex TM -M4 Processor
Agenda
• Dual-core ARM Cortex-M4 Image Processing Unit (IPU) Subsystem
– Memory Map
– UNICACHE and MMU – Bit-band
• Create Cortex-M4 Applications
• Load and Run Cortex-M4 Applications
• M4 User Case: IVA-HD Support
– SMP/BIOS on Cortex-M4 Cores – Multimedia Software Stack
• For More Information
2
Dual-core ARM Cortex-M4 IPU Subsystem
Dual-core ARM
Cortex-M4 Overview
4
• The ARM Cortex-M4 processor is targeted to high-performance, low-cost devices for a broad range of digital signal control, embedded market segments.
AM5728 diagram shown
ARM M4
• Sitara AM57x processors instantiate two dual-core ARM Cortex-M4 Image Processing Unit (IPU) subsystems:
o IPU1 subsystem is available for general purpose usage.
o IPU2 subsystem is dedicated to IVA (Image & Video Accelerator) support and not available for other processing.
• Each IPU subsystem contains two ARM
Cortex-M4 processor cores:
– IPUx_C0 – IPUx_C1
• The two IPUx cores share a common Level 1 (L1) cache:
IPUx_UNICACHE
• Level 2 (L2) master interface (MIF) splitter for access to memory or configuration port
• Configuration port: Used for unicache maintenance and
IPUx Subsystem Overview
IPU Memory Map
• (1) At reset, the MMU is loaded with Page 0, which forces the L2 RAM (0x5502_0000) to be address 0x0. Page 1 is loaded with the physical address of the shared cache MMU register and IPU_WUGEN registers to the virtual address 0x4000_0000.
var entry = AMMU.smallPages[0];
entry.pageEnabled = AMMU.Enable_YES;
entry.translationEnabled = AMMU.Enable_YES;
entry.logicalAddress = 0x00000000;
entry.translatedAddress = 0x55020000;
entry.size = AMMU.Small_16K;
entry.L1_cacheable = AMMU.CachePolicy_NON_CACHEABLE;
/* Overwrite smallPage[1] so that 16K is covered. H/w reset value configures only 4K */
entry = AMMU.smallPages[1];
entry.pageEnabled = AMMU.Enable_YES;
entry.translationEnabled = AMMU.Enable_YES;
entry.logicalAddress = 0x40000000;
entry.translatedAddress = 0x55080000;
entry.size = AMMU.Small_16K;
• (2) Can also be accessed from L3_MAIN (by other initiators, such as: MPU, DSP, etc).
IPUx_UNICACHE and AMMU
• IPUx_UNICACHE allows basic maintenance operations, which are performed through a
dedicated interface:
– Preload – Lock – Clean – Invalidate
• The Attribute MMU (AMMU) for the UNICACHE - IPUx_UNICACHE_MMU provides the
multi-access cache with region-based address translation, read/write control, access
IPUx_UNICACHE Configuration
8
IPUx_UNICACHE_MMU Configuration
UNICACHE and AMMU TI-RTOS Configuration
ipc_3_43_01_03/examples/DRA7XX_linux_elf/ex02_messageq/ipu1/
IpuAmmu.cfg
/* --- Cache ---*/
var Cache = xdc.useModule('ti.sysbios.hal.unicache.Cache');
Cache.enableCache = true;
/* --- AMMU ---*/
var AMMU = xdc.useModule('ti.sysbios.hal.ammu.AMMU');
/*********************** Small Pages *************************/
/* smallPages[0] & smallPages[1] are auto-programmed by h/w */
/* L2RAM: 64K mapped using 4 smallPages(16K); cacheable; translated */
/* config small page[2] to map 16K VA 0x20000000 to PA 0x55020000 */
AMMU.smallPages[2].pageEnabled = AMMU.Enable_YES;
AMMU.smallPages[2].logicalAddress = 0x20000000;
AMMU.smallPages[2].translatedAddress = 0x55020000;
AMMU.smallPages[2].translationEnabled = AMMU.Enable_YES;
AMMU.smallPages[2].L1_writePolicy = AMMU.WritePolicy_WRITE_BACK;
AMMU.smallPages[2].L1_allocate = AMMU.AllocatePolicy_ALLOCATE;
AMMU.smallPages[2].L1_posted = AMMU.PostedPolicy_POSTED;
AMMU.smallPages[2].L1_cacheable = AMMU.CachePolicy_CACHEABLE;
AMMU.smallPages[2].size = AMMU.Small_16K;
10
IPUx_MMU
• An additional MMU provides address translation for the accesses done from the IPUx
subsystem to the L3_MAIN interconnect. The main characteristics of this MMU:
− 32 entries
− Page-based or access-based endianness conversion
− Two-level descriptor hierarchy
− One intermediate page table
− Four page sizes (16 MiB, 1 MiB, 64 KiB, 4 KiB)
• Configuration; See the IPC resource table:
Bit-Banding Overview
12
• Bit-banding is an optional feature of the Cortex-M4 processor. Bit-banding maps a
complete word of memory onto a single bit in the bit-band region. For example, writing
to one of the alias words sets or clears the corresponding bit in the bit-band region.
– Enables every individual bit in the bit-band region to be directly accessible from a word-aligned address using a single LDR instruction.
– Enables individual bits to be toggled without performing a read-modify-write sequence of instructions.
• The two ARM Cortex-M4 cores share the same memory system, and it is possible to use
the bit-band feature to carry semaphore operations.
Each ARM Cortex-M4 processor supports two bit-band regions:
• Bit-band Region 1 applies to the virtual address space 0x2000 0000–0x200F FFFF (1 MiB). It is recommended that the user map the L2 IPUx_RAM (64 KiB) to this virtual space and use it only for bit-banding operations.
• Bit-band Region 2 applies to the virtual address space 0x4000 0000–0x400F FFFF (1
Bit-Banding in ARM Cortex-M4
• Many of the components running on the IPUs, including IPC, must access peripherals physically located in this bit-band region 2. As a result, these accesses must be performed indirectly using a virtual memory address, mapped using the IPU's AMMU.
• The components are aligned on mapping this memory using one Large AMMU page that maps 512M of physical memory beginning at 0x4000:0000 to virtual memory beginning at 0x6000:0000. Then the components access the peripherals using the 0x6XXX:XXXX address space.
AMMU.largePages[3].pageEnabled = AMMU.Enable_YES;
AMMU.largePages[3].logicalAddress = 0x60000000;
AMMU.largePages[3].translatedAddress = 0x40000000;
AMMU.largePages[3].translationEnabled = AMMU.Enable_YES;
AMMU.largePages[3].size = AMMU.Large_512M;
/* configure the interrupt cross-bar mmr base address */
var IntXbar = xdc.useModule('ti.sysbios.family.shared.vayu.IntXbar');
IntXbar.mmrBaseAddr = 0x6A002000;
/* configure hardware spin lock base address to match ammu mapping of L3/L4 */
var GateHWSpinlock = xdc.useModule('ti.sdo.ipc.gates.GateHWSpinlock');
GateHWSpinlock.baseAddr = 0x6A0F6800;
Bit-Banding in ARM Cortex-M4
Create Cortex-M4 Applications
Sitara Processors: Running TI-RTOS on the ARM Cortex-M4 Processor
Bare-metal Examples
16
http://processors.wiki.ti.com/index.php/Processor_SDK_Bare_Metal_Examples
More Bare-metal Examples
• From a bare-metal code firmware development perspective,
there is no CSL function provided. But Register CSL for IPU
cores is available here:
– pdk_am57xx_1_0_3\packages\ti\csl\soc\am572x\src – pdk_am57xx_1_0_3\packages\ti\csl\soc\am571x\src
• More bare-metal Cortex-M4 examples are available in
Processor SDK RTOS package.
TI-RTOS Examples
18
PDK Driver Examples
• PDK from Processor SDK contains Windows and Linux scripts to create example and
test CCS projects for all PDK sub-components:
pdkProjectCreate.bat [soc] [board] [endian] [module] [processor] [pdkDir]
Description: (first option is default)
soc - AM335x / AM437x / AM571x / AM572x board - all
-or-
Refer to "pdk_<soc>_<version>\packages\ti\board\lib for valid board inputs for the soc
endian - little / big module - all
-or-
fatfs / gpio / i2c / icss_emac / mmcsd / nuimu / nimu_icss /
IPC Examples
Cortex-M4 IPC examples are available in IPC3.x examples/ directory:
ipc_3_43_01_03/examples/DRA7XX_linux_elf/ex02_messageq/ipu1 $ ls
bin Ipu1.cfg IpuAmmu.cfg MainIpu1.c makefile Server.c Server.h
ipc_3_43_01_03/examples/DRA7XX_bios_elf/ex01_hello/ipu1 $ ls
ex01_hello_ipu1.gel HelloIpu1.c Ipu1.cfg makefile
20
Load and Run Cortex-M4 Applications
Sitara Processors: Running TI-RTOS on the ARM Cortex-M4 Processor
Wake Up the Cortex-M4 Core
• Typically, the slave cores (IPU, DSP) wait in reset state until the master core (MPU A15) wakes up the slave core to run code. While debugging using CCS, this can be done with GEL script function
“AM572x MULTICORE Initialization.”
See: http://processors.wiki.ti.com/index.php/AM572x_GP_EVM_Hardware_Setup#Multi-core_Initialization
• When booting the system from the ROM bootloader (for example, booting an image from the SD card), the secondary boot loader configures the device clocks, DDR, and wakes up IPU. Therefore, there is no need to use GEL initialization scripts to redo the clock and DDR settings.
• In the IPC framework with Linux running on the Cortex-15 and SYS/BIOS on Cortex-M4, the Linux kernel remoteproc driver handles the IPU wake up sequence. Clock and DDR are configured by U- Boot.
22
Load and Run a Cortex-M4 Application
• The Cortex-M4 images are expected in the directory /lib/firmware/ and are loaded
during Linux kernel boot-up via the remoteproc driver:
• To check the Cortex-M4 core boot log:
Core Binary on the Host Binary on the Target File System IPU1 server_ipu1.xem4 /lib/firmware/dra7-ipu1-fw.xem4 IPU2 server_ipu2.xem4 /lib/firmware/dra7-ipu2-fw.xem4 DSP1 server_dsp1.xe66 /lib/firmware/dra7-dsp1-fw.xe66 DSP2 server_dsp2.xe66 /lib/firmware/dra7-dsp2-fw.xe66
Cortex-M4 User Case: IVA-HD Support
Sitara Processors: Running TI-RTOS on the ARM Cortex-M4 Processor
24
SMP/BIOS on Cortex-M4 Cores
• Symmetric Multiprocessing (SMP) involves a single OS instance managing processing
on two or more identical processor cores that share a common view of memory and
peripherals.
• SMP/BIOS is an operational mode of SYS/BIOS that is supported on a few dual-core
Cortex-M3/M4 subsystems (IPU) and multi-core Cortex-A15 subsystems present on
several TI SoC devices.
Cortex-M4 User Case: IVA-HD Support - ipumm
Software stack of accelerated codec encoding/decoding
26
Cortex-M4 User Case: ipumm
• ipumm is part of TI multimedia component to utilize hardware video accelerated video codecs in IVA-HD. It contains the server side of distributed codec engine to utilize the HW codecs.
• ipumm is publically available at https://git.ti.com/ivimm/ipumm ipumm/build
common.bld config.bld
ipumm/platform/ti/configs/vayu
IpcCommon.cfg.xs Ipu2Smp.cfg IpuAmmu.cfg
/* Configure BIOS for SMP-mode */
var BIOS = xdc.useModule('ti.sysbios.BIOS');
BIOS.smpEnabled = true;
/* --- CORE0 ---*/
var MultiProc = xdc.useModule('ti.sdo.utils.MultiProc');
MultiProc.setConfig("IPU2", ["HOST", "IPU2", "IPU1", "DSP2", "DSP1"]);
/* We are IPU2 */
For More Information:
• Processor SDK RTOS Software Developer Guide
• IPC Users Guide
• Cortex-M4 Technical Reference Manual: Bit-Banding
• Processor Training: Software Stack of Accelerated Codec Encoding/Decoding
• For questions about this training, refer to the E2E Community website:
http://e2e.ti.com
28