• No results found

MxKernel. A System Software Architecture for Modern Hardware. Jan Mühlig. Michael Müller Jens Teubner. Olaf Spinczyk. mxkernel.org

N/A
N/A
Protected

Academic year: 2022

Share "MxKernel. A System Software Architecture for Modern Hardware. Jan Mühlig. Michael Müller Jens Teubner. Olaf Spinczyk. mxkernel.org"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

MxKernel

A System Software Architecture for Modern Hardware

mxkernel.org

Jan Mühlig Jens Teubner

Databases and Information Systems

Michael Müller Olaf Spinczyk

Embedded Software Systems

(2)

12/03/2021 Olaf Spinczyk: MxKernel 2

© Olaf Spinczyk

Traditional Operating Systems

… in the context of modern multicore and manycore systems

Single CPU

User/supervisor mode Uniform physical memory

MMU: Virtual memory Global I/O controllers

Hardware

Many CPU cores Heterogeneous cores Complex NUMA architecture

Non-volatile memory Non-uniform I/O architecture

Voltage/frequency islands Aging effects

Past

Future

CPU Multiplexing CPU Multiplexing Monolithic architecture Monolithic architecture Huge virtual address spaces Huge virtual address spaces Lock-based synchronization Lock-based synchronization

? ?

OS Support

Image sources (aspect ratios slightly modified):

https://en.wikipedia.org/wiki/Unix https://en.wikipedia.org/wiki/Supercomputer

(3)

12/03/2021 Olaf Spinczyk: MxKernel 3

© Olaf Spinczyk

Traditional Operating Systems

Example: Linux memory allocation benchmark [1]

(4)

12/03/2021 Olaf Spinczyk: MxKernel 4

© Olaf Spinczyk

Outline

Manycore Programming

Manycore OS Research

MxKernel Architecture

Preliminary Results

Conclusions

(5)

12/03/2021 Olaf Spinczyk: MxKernel 5

© Olaf Spinczyk

Outline

Manycore Programming

Manycore OS Research

MxKernel Architecture

Preliminary Results

Conclusions

(6)

12/03/2021 Olaf Spinczyk: MxKernel 6

© Olaf Spinczyk

Manycore Programming: Intel

®

TBB [2]

Instead of threads: “Task-based Programming”

Fine-grained units of work:

functions, functors, or C++ lambdas

Lightweight: No separate stack, register set, etc.

Task scheduler

Efficiently executes tasks from double ended queues

Automatic load balancing

Problems

Inefficient if tasks perform blocking operations

Tasks must be synchronized by classic mechanisms locks→ this

thread

other threads (cores)

“work stealing”

(7)

12/03/2021 Olaf Spinczyk: MxKernel 7

© Olaf Spinczyk

... Programming: HyPer Morsels [3]

Instead of threads: “Morsel-driven query execution”

Small DB operator pipelines, JIT compiled

Small chunks of input data

Input and output are NUMA-local

Scheduler (in user space)

Fixed number of pinned threads

Load balancing by work stealing

Excellent scalability:

30x performance on 32 core system

Problems

Special purpose solution; Does not re-use OS features

The HyPer DBMS

(8)

12/03/2021 Olaf Spinczyk: MxKernel 8

© Olaf Spinczyk

Outline

Manycore Programming

Manycore OS Research

MxKernel Architecture

Preliminary Results

Conclusions

(9)

12/03/2021 Olaf Spinczyk: MxKernel 9

© Olaf Spinczyk

Manycore OS: State-of-the-Art

Barrelfish [4]

Multikernel architecture

fos [1]

Microkernel

Server threads (or “fleets”)

Tesselation [5]

Cell concept

Gang scheduling

Still using threads. Optimizations done by app. programmer.

cores

monitoring

adaptation

app1 app2

OS

cell

(10)

12/03/2021 Olaf Spinczyk: MxKernel 10

© Olaf Spinczyk

... OS: Apple’s GCD Kernel Support

“Grand Central Dispatch”

Resembles TBB, but MacOS provides kernel-level support

Problems

Context switches for simple queue operations

Necessary to avoid priority inversion (task vs. thread priorities)

No clean layer structure in the kernel

Serial dispatch queue Dispatch source Queue hierarchy

queue

1 2 3

Implicit serialization

Worker thread creation on demand

worker thread

appl. threads

queue

1 2 3

worker thread

4

async. event

Seamless I/O integration

Automatic triggering of success/

failure handler

Restricted number of threads

Guaranteed partial order Q3

1 2 1 3

Q1 Q2

2 4

(11)

12/03/2021 Olaf Spinczyk: MxKernel 11

© Olaf Spinczyk

Outline

Manycore Programming

Manycore OS Research

MxKernel Architecture

Preliminary Results

Conclusions

(12)

12/03/2021 Olaf Spinczyk: MxKernel 12

© Olaf Spinczyk

The MxKernel: Design Goals

1) Fair and optimized partitioning of heterogeneous resources between multiple applications and OS components

2) Handle global concerns, such as power management, in a central component

3) Topology-aware placement of control flows and data to optimize performance

4) Global as well as application-specific (tailored) OS services

that can benefit from accelerators and many-core CPUs

(13)

12/03/2021 Olaf Spinczyk: MxKernel 13

The MxKernel: Key Features (1)

Goal 1:

Fair and optimized partitioning of heterogeneous resources between multiple applications and OS components

Solution: Elastic cells

Provide spatial isolation of applications and global OS services (based on priorities)

Optimized mapping (e.g. NUMA-aware)

Span over CPU cores, but also FPGA and GPU resources, etc.

CPU cores

Application or OS Service

Application

GPU ...

FPGA

cell (elastic)

cell (elastic)

+/- cores +/- memory Current

load or QoS per cell

+/- NICs +/- storage ...

(14)

12/03/2021 Olaf Spinczyk: MxKernel 14

The MxKernel: Key Features (2)

Goal 2:

Handle global concerns, such as power management, in a central component

Solution: Global resource management

Provisioning, monitoring and adaptation of cells (cores, memory, clock speed, etc.)

Enforcement of system-wide policies (low-power, anti aging, etc.)

monitoring adaptation

manycore resource management strategies

MxVisor

+/- cores +/- memory Current

load or QoS per cell

+/- NICs +/- storage ...

MxVisor

isolation of cells

priorities of applications

optimized app-to-core mapping (NUMA-aware)

power management

anti-aging

fault tolerance, e.g. app replication, handling damaged components) MxVisor

isolation of cells

priorities of applications

optimized app-to-core mapping (NUMA-aware)

power management

anti-aging

fault tolerance, e.g. app replication, handling damaged components)

(15)

12/03/2021 Olaf Spinczyk: MxKernel 15

The MxKernel: Key Features (3)

MxTasking

1 2 3 1 2 3

MxTasking

1 2 3 1 2 3

cell (elastic)

cell (elastic) brick (static)

+/- cores +/- memory Current

load or QoS per cell

+/- NICs +/- storage ...

Resource model

Resource model

Goal 3:

Topology-aware placement of control flows and data to optimize performance

Solution: Task-based programming model

Simplifies development of parallel programs

Unified programming model for heterogeneous compute units

Helps to avoid lock-based synchronization

Supports automatic load balancing, optimized task placement, and cell elasticity based on (physical) resource model MxTasking

task-based API

handles adaptations

topology-aware

optimizations (e.g. NUMA)

fine-grained application- specific mapping decisions

exploit heterogeneous computing resources

multiple specialized instances possible MxTasking

task-based API

handles adaptations

topology-aware

optimizations (e.g. NUMA)

fine-grained application- specific mapping decisions

exploit heterogeneous computing resources

multiple specialized instances possible

(16)

12/03/2021 Olaf Spinczyk: MxKernel 16

The MxKernel: Key Features (4)

cell (elastic) brick (static)

Solution: Global/local OS services built on top of MxTasking

Scalability provided automatically by MxTasking/MxVisor

Localized state

Thread model can be provided for legacy applications

Family-based design for code reuse

MxTasking MxOS Application

1 2 3 1 2 3

MxTasking MxOS

1 2 3 1 2 3

cell (elastic) Resource

model

Resource model

MxOS

device drivers

OS services, e.g. network protocols, filesystems, etc.

MxOS

device drivers

OS services, e.g. network protocols, filesystems, etc.

Goal 4:

Global as well as application-specific (tailored) OS services that can benefit from accelerators and many-core CPUs

(17)

12/03/2021 Olaf Spinczyk: MxKernel 17

The MxKernel: Architecture

CPU cores

monitoring adaptation

system software or application

(task-based) global OS

services

manycore resource management strategies

GPU ...

FPGA

MxTasking MxOS

1 2 3 1 2 3

legacy OS

& applications

MxTasking MxOS

1 2 3 1 2 3

MxVisor

cell (elastic)

cell (elastic) brick (static)

+/- cores +/- memory Current

load or QoS per cell

+/- NICs +/- storage ...

Resource model

Resource model

(18)

12/03/2021 Olaf Spinczyk: MxKernel 18

© Olaf Spinczyk

Outline

Manycore Programming

Manycore OS Research

MxKernel Architecture

Preliminary Results

Conclusions

(19)

12/03/2021 Olaf Spinczyk: MxKernel 19

© Olaf Spinczyk

Prototype Implementation

… comes in three flavors (all x86/64, all experimental):

Hardware Hardware Hardware

MxTasking + App MxVisor

(based on Xen Hypervisor)

MxTasking Cell MxTasking

Cell Xen Dom0

(Linux)

Linux

Intel TBB + App MxTasking

+ App

Single Application Multi Application Tasking Evaluation

library OS library OS

framework framework unikernel

unikernel

(20)

12/03/2021 Olaf Spinczyk: MxKernel 20

© Olaf Spinczyk

Favorite Benchmark: B-Tree Operations

Demonstrates advantages of tasks over threads

(link)

40 35

25 47 62

56

51 58

54 53

M

key: 53

<value>

... ... ... ... ...

... ...

...

...

r r r r

w

thread

node = root

while node is inner:

lock_r(node)

next = child(node,key) unlock_r(node)

node = next lock_w(node)

insert(node,key,value) unlock_w(node)

40 35

25 47 62

56

51 58

54 53

M

key: 53

<value>

... ... ... ... ...

... ...

...

...

r r r r

w

if node is inner:

lock_r(node)

next = child(node,key) unlock_r(node)

spawn

task(next,key,value) else:

lock_w(node)

insert(node,key,value) unlock_w(node)

sequence of tasks

“long” thread with unforeseeable access pattern

“long” thread with unforeseeable access pattern

“handy” tasks with known access and locking requirements

“handy” tasks with known access and locking requirements

(21)

12/03/2021 Olaf Spinczyk: MxKernel 21

© Olaf Spinczyk

Task Metadata Pays Off: Prefetching [6]

A glance into the future of memory accesses

Optimization is fully automatic

Performance impact:

Blink-tree insert and lookup operations on a 32 core Intel Xeon E5-2690

measured with MxTasking on top of Linux

Blink-tree insert and lookup operations on a 32 core Intel Xeon E5-2690

measured with MxTasking on top of Linux

(22)

12/03/2021 Olaf Spinczyk: MxKernel 22

© Olaf Spinczyk

... Pays Off: Task Synchronization [6]

Task-to-core-mapping

can implicitly avoid locking

Objects are assigned to cores

Tasks accessing an object are spawned on that core

Problem: Load balancing not trivial, but MxKernel can help

“Coarse-grained Work Stealing” Optimistic Locking with OLFIT [7]

reader/write locks root node bottleneck

(23)

12/03/2021 Olaf Spinczyk: MxKernel 23

© Olaf Spinczyk

Outline

Manycore Programming

Manycore OS Research

MxKernel Architecture

Preliminary Results

Conclusions

(24)

12/03/2021 Olaf Spinczyk: MxKernel 24

© Olaf Spinczyk

Conclusions

Fine-grained control flow abstractions: good for optimization

Challenge: Minimize overhead

Challenge: Exploit application knowledge

Challenge: Many mapping strategies possible, good theory missing

Heterogeneous computing resources can be integrated

Challenge: Lack of low-level hardware documentation

Hypervisor technology and OS might converge

It’s more than just fun: Compatibility layer possible!

(25)

12/03/2021 Olaf Spinczyk: MxKernel 25

© Olaf Spinczyk

References (1)

[1] A. Agarwal, J. Miller, D. Wentzlaff, H. Kasture, N. Beckmann, C.

Gruenwald III, and C. Johnson, FOS: A factored operating system for high assurance and scalability on multicores. Massachusetts Institue of Technology. Technical Report AFRL-RI-RS-TR-2012-205, August 2012.

[2] Intel® Threading Building Blocks – Tutorial, Document Number 319872-009US, http://www.intel.com

[3] V. Leis, P. Boncz, A. Kemper, and T. Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many- core age. In Proceedings of the 2014 ACM SIGMOD International

Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 743-754. DOI: https://doi.org/10.1145/2588555.2610507

(26)

12/03/2021 Olaf Spinczyk: MxKernel 26

© Olaf Spinczyk

References (2)

[4] A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T.

Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on OS Principles, Big Sky, MT, USA, October 2009.

[5] J. A. Colmenares, G. Eads, S. Hofmeyr, S. Bird, M. Moretó, D. Chou, B.

Gluzman, E. Roman, D. B. Bartolini, N. Mor, K. Asanović, and J. D.

Kubiatowicz. 2013. Tessellation: refactoring the OS around explicit

resource containers with continuous adaptation. In Proceedings of the 50th Annual Design Automation Conference (DAC '13). ACM, New York, NY, USA, Article 76, 10 pages. DOI:

https://doi.org/10.1145/2463209.2488827

[6] J. Mühlig, M. Müller, O. Spinczyk, and J.Teubner. A novel System

Software Stack for Data Processing on Modern Hardware. Datenbank- Spektrum, 20(3):223-230, 2020.

(27)

12/03/2021 Olaf Spinczyk: MxKernel 27

© Olaf Spinczyk

References (3)

[7] Cha SK, Hwang S, Kim K, Kwon K. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems.

In: Proceedings of the 27th International Conference on Very Large Databases (VLDB). Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 181–190, 2001.

References

Related documents

This chapter also presents the 30 hypotheses to address the research questions that enabled examination of the effects of the nine independent variables (facilitating

Brak zna- czących różnic w  obrazach SEM uzyskanych dla próbki kontrolnej oraz próbek wstępnie trawio- nej ceramiki na bazie dwukrzemianu litu po jej kontaminacji

[14] In the document for 2020, PTD recommends that in people with diabetes, who despite the implementation of lifestyle modifications (weight reduction, increased physical activity

A multi-stage Deplhi-study was undertaken to explore key research questions and priorities regarding competitive advantage in the bioeconomy for the forest-based sector from

data flows Enterprise Guide Statistical analysis BDSTAT Eneco Css Analizer Parpinelli TECNON Refinery industry e-Views Econometric models Scenarios - Forecast PowerSim

Figure 1 Pre-ART patient LTFU experiences segregated by baseline (A) Cotrimoxazole (CPT) provision (B) baseline CD4 category and (C) HIV disclosure status among adult pre-ART

In all the major breakthrough in the analysis and ideas of economic thought from mercantilism, physiocracy and early economic liberals came from Monroe (1945).Thus this

The purpose of the study is to determine the technical feasibility and cost of a lake probe mission both as an element of a future Titan flagship mission and as a standalone