• No results found

The Freescale Embedded Hypervisor

N/A
N/A
Protected

Academic year: 2021

Share "The Freescale Embedded Hypervisor"

Copied!
67
0
0

Loading.... (view fulltext now)

Full text

(1)

The Freescale Embedded Hypervisor

November, 2010

(2)

Agenda

AMP Considerations

Technical Overview of the Freescale Embedded Hypervisor

ePAPR & Device Trees Overview

Porting an OS to the Freescale Embedded Hypervisor

Hypervisor Boot Flow

(3)
(4)

Partitioning Multicore Systems

Multicore

System

Hardware

CPU

Shared

Cache

CPU

CPU

CPU

I/O

Interrupt

Controller

Memory

(5)

Symmetric Multiprocessing (SMP)

Multicore

System

Hardware

CPU

Shared

Cache

CPU

CPU

CPU

I/O

Interrupt

Controller

Memory

I/O

I/O

I/O

I/O

Linux

®

App

App

App

App

App

App

(6)

Asymmetric Multiprocessing (AMP)

CPU

CPU

CPU

CPU

Linux

®

App

App

Memory

RTOS

App

App

Legacy OS

App

App

Memory

Memory

Memory

I/O

I/O

I/O

I/O

Multicore

System

Hardware

Shared

Cache

I/O

Interrupt

Controller

(7)

Unsupervised AMP (Asymmetric Multiprocessing)

CPU

CPU

CPU

CPU

Linux

®

App

App

Memory

RTOS

App

App

Legacy OS

App

App

Memory

Requires partitioning hardware

resources:

Private resources: CPUs,

memory, I/O devices

Shared resources: memory,

devices

Doing this cooperatively (all

operating systems well-behaved)

presents challenges

Memory

Memory

I/O

I/O

I/O

I/O

Multicore

System

Hardware

Shared

Cache

I/O

Interrupt

Controller

(8)

Supervised AMP

CPU

CPU

CPU

CPU

partition

App

App

Memory

partition

App

App

partition

App

App

Memory

Memory

Memory

I/O

I/O

I/O

I/O

Hypervisor/Supervisor

Shared

Cache

I/O

Interrupt

Controller

Linux

®

RTOS

Legacy OS

Multicore

System

Hardware

Hypervisor Software

Analogous to role of an operating

system kernel in managing user

processes

More privileged than operating systems

Enforces system security

Manages globally shared resources

Virtualizes some resources– e.g.

(9)

Use Cases

Virtualization enables running multiple OSes on a

system at the same time

Why do this?

Oversubscription or underutilized resources

Run multiple OSes on a single underutilized CPU

Consolidation

Multiple operating systems/partitions on a single multicore chip

Multiple homogeneous operating systems in an AMP configuration on multiple

cores

Divided workload–(e.g. control plane, data plane)

Multiple operating systems, possibly heterogeneous, need to work securely and

seamlessly together. Isolation mechanisms are needed for safety, robustness.

Efficient inter-partition communication mechanisms are needed for cooperation.

Isolate untrusted software/sandboxes

Migration

Migrate functionality from legacy RTOS to another OS (e.g. Linux).

Security

CPU

Guest

OS

Memory

CPU

Memory

Guest

OS

CPU

Guest

OS

Memory

hypervisor

hypervisor

Guest

OS

CPU

hypervisor

hypervisor

(10)

Unsupervised AMP Considerations

Security– all OSes must be trusted and well behaved because

any OS can map any physical address

How will global resources be initialized, shared, and/or used?

How are global events handled?

MPIC

who initializes?

PAMU

who will initialize and set up PAACT table?

(11)

Unsupervised AMP Considerations…continued

CPC (Platform Cache) Partitioning

who initializes?

Corenet Coherency Domains

How are they set up and initialized

Scarce Resources– P4080 has only 2 dUARTs, GPIO pins

Datapath Initialization

how are Fman, Bman, Qman, PME initialized?

OS assumptions about physical address 0x0?

Debugging

how to debug multiple OSes at the same time?

(12)

Unsupervised AMP Considerations…continued

error management

who will set up configurable error parameters? (e.g. single bit ECC error

threshold) in DDR, CPC, etc

where is the platform error interrupt (interrupt 0) routed to? How is it handled?

Error conditions: global DDR, CPC, CCM, internal SRAM

How are device errors handled? PCIE, Qman, Bman, Fman

who handles PAMU access violations (interrupt 8)?

If one OS crashes, how will recovery occur? /How will other OSes know?

Boot

what is the boot sequence for starting all OSes? Do they all start at the same

time?

Is there a 'primary' OS that takes care of global initialization?

How will the sequencing work?

(13)

Technical Overview of the Freescale Embedded

Hypervisor

(14)

Freescale Embedded Hypervisor

CPU

CPU

CPU

CPU

partition

App

App

Memory

partition

App

App

partition

App

App

Memory

Memory

Memory

I/O

I/O

I/O

I/O

Hypervisor

Shared

Cache

I/O

Interrupt

Controller

Linux

®

RTOS

Legacy OS

Multicore

System

Hardware

(15)

Freescale’s Embedded Hypervisor

A small hypervisor for embedded systems based on Power Architecture

®

technology (architecture version 2.06)

Initial version focuses on static partitioning

CPUs, memory and I/O devices can be divided into logical partitions

Partitions are isolated from one another

Configuration is fixed until a reconfigure and system reboot

Not addressing problem of multiple operating systems on 1 CPU

Uses the Embedded Hypervisor feature in the QorIQ/e500mc which makes

virtualization efficient

Uses a combination of full-virtualization and para-virtualization which

provides good performance and minimal changes to guest operating

systems

(16)

Hypervisor Contrasts

Freescale Hypervisor

Implementation

Traditional Hypervisor

Implementation

Requirement

: supervised AMP

--isolation, performance

Implications: No more than one OS

per core, OS has direct control of

high-speed peripherals

Requirement

: high level of

virtualization-- solves problem of

under-utilized CPUs, plus isolation

Implications: more than one OS

per core, complexity, performance

implications

QorIQ

™ P4080 hypervisor hardware assists

in meeting both requirement sets

Guest

OS

CPU

Guest

OS

CPU

Guest

OS

CPU

Guest

OS

(17)

Hypervisor Features

virtual CPU

(e500vcpu)

services

boot services

(ePAPR)

Emulation

(privileged instructions)

guest operating system

debug stub

Hypervisor

Debug console

hypercalls

device tree

system hardware

UART

device

tree

direct

I/O

mux

UART

Doorbells

GPIO

PIC

IOMMU

Channels

Byte

Partition

Mgmt

Operating System sees a virtual core plus hypervisor services

Virtual CPU (like e500mc

minus hypervisor features)

Services via hypercall

Debug stub interface for

debugging guest operating

systems

(18)

MMU – CPU to Memory Access Control

CPU

I/O

CPU

Linux®

RTOS

Memory

Access

OK

I/O

I/O

CPU

CPU

Memory

Access

Denied

MMU is

controlled by

hypervisor

and restricts

all CPU

accesses to

physical

address

space

(19)

IOMMU – Device to Memory Access Control

CPU

I/O

CPU

Linux®

RTOS

Access

Denied

Memory

I/O

I/O

CPU

CPU

Memory

IOMMU

Access

OK

IOMMU

enforces

I/O-to- memory

accesses

A key

component in

a securely

partitioned

system

(20)

Direct I/O

Guest

Hypervisor

driver

physical hardware

PCI

Serial RapidIO®

DMA

USB

SD/MMC

(21)

Network interfaces

Security

Pattern matcher

Direct I/O thru Portal

Guest

Hypervisor

driver

physical hardware

portal

driver

portal

Guest A

Guest B

(22)

Virtual I/O – Hypervisor

Guest

Hypervisor

driver

physical hardware

Interrupt

controller

I

2

C

GPIO

Byte-channels

driver

hypercall or emulation

(23)

virtual CPU

(e500vcpu)

services

boot services

(ePAPR)

Emulation

(privileged instructions)

guest operating system

debug stub

Hypervisor

Debug console

hypercalls

device tree

system hardware

UART

device

tree

direct

I/O

mux

UART

Doorbells

GPIO

PIC

IOMMU

Channels

Byte

Partition

Mgmt

(24)

ePAPR (Embedded Power Architecture

®

Platform Requirements)

Defines

boot program

to

client program

interface

Boot program: firmware, hypervisor

Client program: bootloader, hypervisor, OS

Device tree

Data structure that represents a partition‟s hardware and virtual resources

CPUs, memory, I/O devices, hypervisor-provided resources

Address passed to boot CPU

Multi-CPU boot architecture

Single boot cpu

Mechanisms to start secondary CPUs

(25)

Hypervisor Device Trees– HW, HV, Guest

u-boot

hypervisor

HW

Dev tree

Guest #2

dynamically

created and

loaded into

guest memory

Loaded into

hypervisor

memory by

u-boot. Hardware

dev tree

/chosen

node points to

HV config tree

Guest #1

Guest Dev tree

Guest Dev tree

(26)

Guest Device Tree

root

cpus

cpu0

cpu1

memory

virtual interrupt controller

shared memory

hypervisor

byte-channel

doorbell

example

A data structure used for

representing a partition‟s

physical and virtual devices

See application note

AN3649 -

“Understanding

Device Tree Files in

Multicore Hypervisor/LWE

Implementations”

AN3649 contains specific

details on both hypervisor

and partition device trees

(27)

Core Enumeration

From the perspective of a partition, the CPUs are contiguous.

However, the actual physical CPUs being utilized may be

non-contiguous.

For example, the master Hypervisor tree may make physical CPUs

1, 3, and 5 available to a partition. However, from the perspective of

the partition the available CPUs are viewed as CPUs 0, 1, and 2.

Additionally, no CPUs other than those explicitly made available to a

(28)

virtual CPU

(e500vcpu)

services

boot services

(ePAPR)

Emulation

(privileged instructions)

guest operating system

debug stub

Hypervisor

Debug console

hypercalls

device tree

system hardware

UART

device

tree

direct

I/O

mux

UART

Doorbells

GPIO

PIC

IOMMU

Channels

Byte

Partition

Mgmt

(29)

Virtual CPU

The behavior of CPU facilities (instructions/registers/interrupts) as seen by an

OS-- an e500mc minus the hypervisor extensions

Full virtualization is used– i.e. OS is not hypervisor aware with respect to CPU

behavior

there are some exceptions to this general rule

User and kernel mode privileged instructions and registers behave normally

Hypervisor privileged instructions and register accesses trap to hypervisor and

are emulated by the hypervisor

lwz r3,(r4)

addi r3,r3,1

sc

User mode

mfspr r3,spr_dec

Hypervisor

Decrementer

emulation

Kernel mode

Hypervisor mode

system call

privilege

trap

(30)

virtual CPU

(e500vcpu)

services

boot services

(ePAPR)

Emulation

(privileged instructions)

guest operating system

debug stub

Hypervisor

Debug console

hypercalls

device tree

system hardware

UART

device

tree

direct

I/O

mux

UART

Doorbells

Power

MPIC

Error

Channels

Byte

Partition

Mgmt

(31)

Hypervisor Services – hcalls

Interrupt controller (MPIC)

Byte-channels – character I/O stream

Inter-partition signaling – doorbell

Partition management

Start/stop/image-loading

Partition management interrupts

Reset

Power management– change clock frequency, power states

Error Management

Future

CPU hot plug/unplug

GPIO– supports partitioning of GPIO pins

IOMMU– supports create/destroy mappings

(32)

VMPIC API

Hypercall

Description

FH_VMPIC_SET_INT_CONFIG

Configures the specified interrupt

FH_VMPIC_GET_INT_CONFIG

Returns the configuration of the specified interrupt

FH_VMPIC_SET_MASK

Sets the mask for the specified interrupt source

FH_VMPIC_GET_MASK

Returns the mask for the specified interrupt source

FH_VMPIC_GET_ACTIVITY

Returns a value indicating the activity status of an interrupt source,

regardless of whether an interrupt has been requested or is in

service.

FH_VMPIC_IACK

Acknowledges an interrupt and retrieves the interrupt number.

FH_VMPIC_EOI

Signals the end of processing for the highest-priority interrupt

(33)

Byte-Channels

Byte-channel– a

hypercall based

character I/O channel

Flexible endpoint

configuration

A physical UART on the

QorIQ

™ P4080

Another byte-channel

endpoint

A byte-channel to UART

multiplexer

A hypervisor debug stub

The hypervisor console

debug stub

Hypervisor

Debug console

Host

UART

Byte-channel

mux

UART

partition

partition

partition

partition

RS232

gdb (host)

telnet

telnet

Byte-channel

mux server

byte-channel

(34)

Debug Stubs

Hypervisor provides an internal API that allows debug stubs to be

created and built into the hypervisor.

Currently mutually exclusive from guest debug mode, where a guest

owns the CPU debug resources (debug interrupt and registers)

Two stubs supported today:

gdb

TRK (Code Warrior)

(35)

CPU

CPU

CPU

partition

Memory

partition

Memory

Memory

Hyper-visor

MPIC

OS

System

Hardware

Debug Stub Event Flow

OS

UART

MUX

stub

stub

stub

GDB remote serial protocol

MUX server

GDB

Host

(36)

Inter-partition Signaling

Mechanism by which

operating systems

can signal each other

A one-way signal

with no payload

which results in an

external interrupt in

the destination

partition

One-to-many,

many-to-one supported

Hypervisor

partition

partition

partition

partition

receive

endpoints

send

endpoint

receive

endpoint

receive

endpoint

send

endpoint

(37)

Partition Management

CPU

CPU

CPU

CPU

App

App

Memory

Memory

Memory

Memory

I/O

I/O

I/O

I/O

Hypervisor

Shared

Cache

I/O

Interrupt

Controller

Linux

®

App

App

RTOS

App

App

Legacy OS

Multicore

System

Hardware

Capabilities

Copy data to/from another

partition’s memory (e.g. loading OS

images)

Starting other partitions

Rebooting other guests

Notifications– guest watchdog fires,

guest requests reboot, error

conditions

running

partition

partition

partition

running

running

(38)

Privilege Levels – Guest State (GS) MSR bit

OS

App

App

partition

User

MSR[PR=1][GS=1]

Kernel/Supervisor

MSR[PR=0][GS=1]

Hypervisor

MSR[PR=0][GS=0]

Under Hypervisor

OS

App

App

partition

User

MSR[PR=1][GS=0]

Kernel/Supervisor

MSR[PR=0][GS=0]

Bare Metal

CPU

Memory

I/O

CPU

Memory

I/O

(39)

Standards

power.org ePAPR

1.0 complete in 8/2008

Resource discovery (device tree)

Multi-CPU boot

power.org Embedded Virtualization Committee

Virtual CPU standard– the behavior of instructions and registers

under a hypervisor

Working on RFC to the Power ISA– targeting 2.07

Paravirtualization & standard hcalls

Device tree related– hypervisor node

Shared page mechanisms

(40)
(41)

Boot Program / Client program

Boot Program

Firmware

Second stage

bootloader

Hypervisor

Client Program

Second stage

bootloader

Hypevisor

Operating system

Other bare metal

application

guest operating system

Hypervisor

device tree

Boot firmware

device tree

operating system

Boot firmware

device tree

(42)

ePAPR 1.0

Power.org released the

Power Architecture

Platform Requirements (PAPR)

specification in August 2006-- for desktop/server platforms

ePAPR 1.0 for embedded systems addresses „boot services‟– how a boot program

initializes hardware and boots a client program

Benefits of standard interfaces

Reduced OS porting effort and cost

Enables to development of standard boot programs (firmware and hypervisors)

Key areas

State of machine when control is transferred to client (e.g. registers, MMU, state of

interrupts)

Device Discovery – definition of device tree

Multi-cpu boot architecture

(43)

Initial Machine State

ePAPR defines the state of the hardware when control is transferred

to a client program

Registers

MMU

CPUs

Memory

(44)

Initial State of Registers

Register

Value

MSR

PR=0 supervisor state

EE=0 interrupts disabled

ME=0 machine check interrupt disabled

IP=0 interrupt prefix-- low memory

IR=0,DR=0 real mode (see note 1)

IS=0,DS=0 address space 0 (see note 1)

SF=0, CM=0, ICM=0 32-bit mode

R3

Effective address of the device tree image.

R4, R5, R8,

R9

0

R7

shall be the size of the boot or secondary IMA in bytes

TCR

WRC=0, no watchdog timer reset will occur

(45)

Initial Mapped Areas (IMA)

A client program‟s IMA is a region of memory that contains the entry

points for a client program.

Requirements:

An IMA shall be virtually and physically contiguous

An IMA shall start at effective address zero (0) which shall be mapped to a

naturally aligned physical address

The mapping shall not be invalidated except by a client program‟s explicit

action

The Translation ID (TID) field in the TLB entry shall be zero.

The memory and cache access attributes (WIMGE) have the following

requirements:

WIMG unspecified

E=0 (i.e., big-endian)

An IMA may be mapped by a TLB entry larger than the IMA size, provided the

MMU guarded attribute is set (G=1)

(46)

Device Tree Overview

A device tree is a tree data structure with nodes that describes the physical

devices in a system

Abstracts most hardware details out of the OS/client-- enables firmware to

provide an OS with a complete description of the physical hardware in a

system– devices, hardware address map, interrupt routing

Previously OSes were required to have hardcoded information about system

hardware.

Provides a basis for booting an operating system under a hypervisor in a

partitioned system

Each device node has property/value pairs that describe the device

All nodes have a “

binding”

which document required properties– the

(47)

Examples

soc

serial

compatible = “simple-bus";

#address-cells = <1>;

#size-cells = <1>;

ranges = <0 e0000000 00100000>;

reg = <e0000000 00000200>;

compatible = "ns16550"

reg = <4600 100>

clock-frequency = <0>

interrupts = <a 8>

interrupt-parent = < &ipic >

/

cpus

cpu@0

cpu@1

ethernet

serial

soc

(48)

Device Tree details

ePAPR defines requirements for:

Logical structure of the device tree – node names, paths, properties

Standard properties

Hierarchy & routing of interrupts (including cascaded interrupt controllers)

Representation of

CPUs

Memory

Caches

Device bindings for

PCI

Open PIC and ISA interrupt controllers

Serial devcies

Network devices

Device Control Registers (DCR)

Binary format of device tree DTB

DTS syntax

(49)

Standard Properties

compatible

model

phandle

status

#address-cells and #size-cells

reg

virtual-reg

ranges

dma-ranges

interrupts

interrupt-parent

(50)

UART Node Definition Example

serial@11c500 {

device_type = "serial";

compatible = "fsl,ns16550", "ns16550";

reg = <11c500 100>;

clock-frequency = <0>;

interrupts = <24 2>;

interrupt-parent = <&mpic>;

};

(51)

Device Tree Compiler

The typical process of using a device tree in an embedded Linux

system is:

An ASCII representation of the device tree is created in an a 'device tree

source' (DTS) file

DTS is compiled into a binary 'device tree blob' (DTB) file using a device

tree compiler (DTC) tool

DTB format is specified in the ePAPR

Firmware loads the DTB into RAM and passes a pointer to the DTB to

the Operating System kernel when it is started

(52)

Multicore boot architecture

ePAPR describes specifics on how secondary CPUs are booted for

a system with multiple CPUs

Default boot architecture

The boot program releases all CPUs from hardware reset

1 CPU is designated to be the client program‟s

boot

CPU

All other CPUs are

secondary

and are placed into loop where the

CPUs spin, waiting for a spin table field to change that directs them

where to go

Control is transferred to the client program on the boot CPU

When the client program is ready for secondary cores to start, it

releases them by writing the spin table field with the desired address

The architecture allows for other custom-defined secondary CPU

(53)

Device Tree and Virtualization/Partitioning

Each OS in a partitioned system is presented with a device tree

describing that partition‟s subset of physical resources

CPU cores

Memory

I/O devices

(54)

Porting an OS to the Freescale Embedded

Hypervisor

(55)

Porting Overview

Initial State and Boot

CPU

SOC Platform

(56)

Initial State and Boot

An OS must be minimally device

tree aware

Interrupt numbers passed in the

device tree are handles used in

VMPIC API hcalls

In order to boot multiple CPUs

the spin table release mechanism

must be used

(57)

Hypervisor Device Trees– HW, HV, Guest

u-boot

hypervisor

HW

Dev tree

Guest #2

dynamically

created and

loaded into

guest

memory

Loaded into

hypervisor

memory by

u-boot.

Hardware dev

tree

/chosen

node points to

HV config tree

Guest #1

Guest Dev

tree

Guest Dev

tree

(58)

CPU Considerations

The virtual CPU as seen by an OS behaves as a normal OS would

(59)

SOC Platform Devices

The following devices are hypervisor-owned and should not be

accessed directly:

Interrupt controller (see vmpic services)

The MPIC timers are not available for guest use

Global Utilities

Power management

Clock control

Reset control

Peripheral Access Management Unit (PAMU)

DDR memory controllers

CPC (Platform cache)

(60)

Hypervisor Services

Interrupt controller (MPIC)

Byte-channels – character I/O stream

Inter-partition signaling – doorbell

Partition management

Start/stop/image-loading

Partition management interrupts

Reset

Power management– change clock frequency, power states

Error Management

Future

CPU hot plug/unplug

GPIO– supports partitioning of GPIO pins

IOMMU– supports create/destroy mappings

(61)
(62)

Hypervisor Device Trees– HW, HV, Guest

u-boot

hypervisor

HW

Dev tree

Guest #2

dynamically

created and

loaded into

guest

memory

Loaded into

hypervisor

memory by

u-boot.

Hardware dev

tree

/chosen

node points to

HV config tree

Guest #1

Guest Dev

tree

Guest Dev

tree

HV config tree

(63)

U-boot

U-boot initialization

Configuration of physical memory map (LAWs)

Probing/initialization of DDR

Enabling all caches, including platform cache

Configuration clocks

Setting up LIODNs

Sets up ePAPR spin table for secondary CPUs, releases secondary

CPUs

Loads hypervisor image into memory

Updates hardware device tree, loads it into memory

(64)

Hypervisor

Hypervisor initialization

PAMU initialization

Coherence domain setup– set up LAWS, CSDIDs

Error Configuration– e.g. single bit ECC error thresholds

Driver initialization

DDR

CPC

CCM

UART

Release secondary CPUs

Partition instantiation/creation

The boot CPU for each partition takes care of partition creation

(65)

Hypervisor Relocation

Boot time view

Runtime view

0x0

0x0

HV

HV

OS

(66)

Summary

A common usage model for multicore systems will be to run multiple

operating systems on a single processor– this requires partitioning.

A hypervisor provides a good solution to enforce partition

boundaries and provide services to manage global resources (like

the interrupt controller).

The Freescale Embedded Hypervisor in conjunction with hardware

features of the

QorIQ

™ P4080 provides an efficient solution for

(67)

References

Related documents

NH-Polynomial with Reduction (PR) The main motivation that led to this con- struction was to reduce the size of the authentication tag. This is a concern for two reasons. The tag

An e-commerce support environment can produce the trust value by measuring the delivered service quality as well as service evaluations from customers and trust

In present study, surgical curettage was required in 28(75.6%) women which is a very high proportion compared to other studies where medical abortions were advised under

In the present study, we have compared the outcomes of 100 consecutive LD-SCLC patients treated with definitive hypofractionated RT with the outcomes of a historical control group

Factors such as maternal age, booking status, parity, multiple gestation, mode of delivery, sex of the babies and maternal disease in the current pregnancy and other

Further Work on the Topic.. Hypervisor Pain Points Hardware Tracing, Logging, Security functionality Kernel Layer KVM/QEMU incl. HW acceleration PIC Guest VM

This paper focuses on the performance comparison of guest operating system (Microsoft Window Server 2008 r2, 64-bit) under virtual environment by using two most useful

Clusters of editors and professors of philosophy, political theory, linguistics, and anthropology who proclaim their ‘European’ identity, New Right intellectuals are