• No results found

Why the Datacenter needs an Operating System. Dr. Bernd Mathiske Senior Software Architect Mesosphere

N/A
N/A
Protected

Academic year: 2021

Share "Why the Datacenter needs an Operating System. Dr. Bernd Mathiske Senior Software Architect Mesosphere"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Dr. Bernd Mathiske

Senior Software Architect


Mesosphere

Why the Datacenter needs an

Operating System

(2)

Bringing

Google-Scale

Computing

to Everybody

(3)

A Slice of Google Tech Transfer History

2005: MapReduce -> Hadoop (Yahoo)

2007: Linux cgroups for lightweight isolation (Google) 2009: BigTable -> MongoDB

2009: “The Datacenter as a Computer” - Barroso, Hölzle (Google)


2009: Mesos - a distributed operating system kernel (UC Berkeley) 2010: Large scale production Mesos deployment (Twitter)

(4)

Notable Operating System Developments

Single-something => multi-something: user, tasking, threading, core, …

More: bits, memory, storage, bandwidth…

OS virtualization => lightweight virtualization (cgroups, LXCs, jails, …)

Packaging => containers (docker, rkt, lmctfy, …)

Static libraries => dynamic libraries => static libraries

(5)

Cluster Operating Systems (Hardware Clustering)

Researched since the 1980s

Trying to provide (the illusion of) a single system image

Aiming at HA, load balancing, location transparency (e.g. for storage)

Many systems: Amoeba, ChorusOS, GLUnix, Hurricane, MOSIX, Plan9, RHCS, Spring, Sprite, Sumo, QNX, Solaris MC, UnixWare, VAXclusters, …

Relatively low scale (up to 100s of nodes)

Complicated to manage, less dynamic than software clustering

(6)

From HPC Grid to Enterprise Cloud

Condor, LSF, Maui, Moab, Quartz, SLURM, …

Typically for batch jobs

Also cover services => SOA => more job schedulers

=> grid computing => grid middleware … => cloud stacks

(7)

From Server Virtualization to App Aggregation

Cloud Era:


Big apps, small servers Client-Server Era:


Small apps, big servers

Server

Virtualization

App App App App

App

Aggregation

(8)

Cloud Computing

SaaS: Salesforce demonstrated success, then many followed

PaaS: Deis, Dotcloud, OpenShift, Heroku, Pivotal, Stackato, …

IaaS: AWS, Azure, DigitalOcean, GCE…


Private cloud stacks including IaaS: Eucalyptus, CloudStack, Joyent, OpenStack, SmartCloud, vSphere, …

(9)

Datacenter

✴ A facility used to house computer systems and associated

components (e.g. networking, storage, cooling, sensors) ✴ In this talk we focus on how to manage and use a single

production cluster of networked computers in a datacenter ✴ Such clusters range in size from 10s to 10000s of nodes

✴ Why should we and how can we end up with 


just one production cluster?

(10)

Datacenter Services

✴ LAMP (Linux, Apache, MSQL, PHP) or similar

✴ MEAN (MongoDB, Express.js, Angular.js, Node.js) or similar

✴ Cassandra, ElasticSearch, Exelixi, Hadoop, Hypertable, Jenkins,

Kafka, MPI, Spark, Storm, SSSP, Torque, … ✴ Private PaaS: Deis, …

✴ …

(11)
(12)

From Static Partitioning to Elastic Sharing

Static Partitioning

Elastic Sharing

WEB CACHE HADOOP

WASTED FREE FREE HADOOP WEB CACHE WASTED WASTED 100% — 100% —

(13)

Software Clustering

Layer between node OS and application frameworks


Scale

Multi-tenancy High availability

(14)

Available Open Source Components

✴ 2-level scheduler: Apache Mesos

✴ Meta-frameworks / schedulers: Aurora, Chronos, Marathon,

Kubernetes, Swarm, …

✴ Service discovery: Consul, HAProxy, Mesos DNS, … ✴ Highly available configuration: zk, etcd, …

✴ Storage: HDFS, Ceph, …

✴ Node OSs: lots of Linux variants

✴ Lots of app frameworks: Sparc, Storm, Cassandra, Kafka, …

(15)

2-Level Scheduling

Scale: from 1 node to at least 10000s of nodes

Optimizing resource management

End-to-end principle: “application-specific functions ought to reside in the end nodes of a network rather than intermediary nodes”

-> Requirement for general multi-tenancy

-> Requirement for having only one production cluster

(16)

App

How Mesos Works

16

Framework

Scheduler

Master

Slave

Master

Master

Master

Executor Executor

Task

Task

Task

Task

zk/etcd
(17)

Ways to Run an Application

1. Vanilla job

• Employ meta-framework for invocation: Chronos, Aurora, Kubernetes, …

2. Application of an adapted framework

• Hadoop, Sparc, Storm, ElasticSearch, Cassandra, Kafka, many more…

3. Non-adapted services

• Employ meta-framework for invocation: Marathon, Aurora, Kubernetes, …

• Provide (select) a service discovery solution

4. Program your own scheduler (and executor)

(18)

The Mesos Framework API

✴ Currently like internal Mesos communication:

• protobuf messages over HTTP


✴ Soon:

• JSON messages over HTTP (stream)

=> no need to link with binary Mesos library and/or less to reimplement

ca. a dozen programming languages => any language

(19)

How to implement a framework

✴ Scheduler interface: 1 half of 2-level scheduling

• The framework knows best when to do what with what kind of resources • About a dozen callbacks, main functionality in 2 of them:

- receive resource offers

- receive task status updates


✴ Executor interface: task life-cycle management and monitoring

• Command line executor included in Mesos

• Docker executor included in Mesos

• Custom executors often not needed

(20)

Scheduler SPI (implemented by Framework)

20

public interface Scheduler {

void registered(SchedulerDriver driver, FrameworkID frameworkId, 


MasterInfo masterInfo);

void reregistered(SchedulerDriver driver, MasterInfo masterInfo); void resourceOffers(SchedulerDriver driver, List<Offer> offers); void offerRescinded(SchedulerDriver driver, OfferID offerId); void statusUpdate(SchedulerDriver driver, TaskStatus status);

void frameworkMessage(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, byte[] data);

void disconnected(SchedulerDriver driver);

void slaveLost(SchedulerDriver driver, SlaveID slaveId);

void executorLost(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, int status);

void error(SchedulerDriver driver, String message); }

(21)

Minimal Scheduler Implementation

class MyFrameworkScheduler implements Scheduler { …

private TaskGenerator _taskGen;

public void resourceOffers(SchedulerDriver driver, List<Offer> offers) { if (_taskGen.doneCreatingTasks()) {

for (offer : offers) {

driver.declineOffer(offer.getId()); }

} else {

for (offer : offers) {

List<TaskInfo> taskInfos = _taskGen.generateTaskInfos(offer); driver.launchTasks(offer.getId(), taskInfos, _filters);

} } }

public void statusUpdate(SchedulerDriver driver, TaskStatus status) { _taskGen.observeTaskStatusUpdate(taskStatus); if (_taskGen.done()) { driver.stop(); } } …
 } 21

(22)

The Developer’s Perspective

✴ Focus on application logic, not datacenter structure ✴ Avoid networking-related code

✴ Reuse of built-in fault-tolerance and high availability

✴ Reuse distributed (infrastructure) frameworks (e.g., storage)

=> API, SDK for datacenter services

(23)

The Operations Engineer’s Perspective

✴ Ease of deployment/management

✴ Uniformity of deployment/management ✴ Hardware utilization rate

✴ Scaling up as business grows ✴ Scaling out sporadically

✴ Cost and time for moving to a different datacenter

✴ High availability and fault-tolerance of system services ✴ Monitoring

✴ Trouble shooting

(24)

Necessary Multi-Tenancy Features

Task containerization

Resource isolation

Resource and task attributes

Static and dynamic resource reservations

Reservation levels

Meta-frameworks

Dynamic scheduler update and reconfiguration

Security

(25)

Desirable Multi-Tenancy Features

Optimistic offers

Oversubscription

Task preemption, migration, resizing, reconfiguration

Rate limiting

Auto-scaling => hybrid cloud

Infrastructure frameworks

(26)

Using Docker Containers in Mesos

26

Mesos Master Server

init | + mesos-master | + marathon |

Mesos Slave Server

init | + docker | | | + lxc | |

| + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run … | | | Docker Registry

When a user requests a container

Mesos, LXC, and

Docker are tied together for launch

2 1 3 4 5 6 7 8

(27)

Other Schedulers as Meta-Frameworks in a 2-level Scheduler

YARN => https://github.com/mesos/myriad

Kubernetes => https://github.com/mesosphere/kubernetes-mesos

Swarm => Swarm on Mesos (new project)

=> run everything in one cluster

(28)

Myriad : Virtual YARN Clusters on Mesos

28

◦ POST /api/clusters: Registers a new YARN

◦ GET /api/clusters: Lists all registered clusters

◦ GET /api/clusters/{clusterId}: Lists the cluster with {clusterId}

◦ PUT /api/clusters/{clusterId}/flexup: Expands the size of cluster with {clusterId}

◦ PUT /api/clusters/{clusterId}/flexdown: Shrinks the size of cluster with {clusterId}

◦ DELETE /api/clusters/{clusterId}: Unregisters YARN cluster with {clusterId}. Also, kills all the nodes. Node Master Mesos Slave Mesos YARN Myriad Scheduler RM Myriad Executor 1. Launch NodeManager 1 1 1 2.5 CPU 2.5 GB 1 NM YARN flexU p 2.0 CPU 2.0 GB C1 C2

(29)

29

(30)

Portability

30

Mesos

Public Cloud

Managed Cloud

Your Own DC

Framework Apps

Meta-Frameworks

Vanilla Apps

(31)

The Application User’s Perspective

✴ Focus on apps, services, parameters, results

✴ Avoid dealing with datacenter operations/management ✴ Avoid adjusting system settings

✴ High availability ✴ Throughput

✴ Responsiveness ✴ Predictiveness

✴ Run everything I need

✴ Return on and safety of investment

(32)

The Datacenter is the new form factor

✴ 2-level scheduler => single production cluster

✴ scalability and portability => avoiding hardware/cloud lock-in ✴ built-in container support => running containers at scale

✴ automation => operator efficiency

✴ repositories => apps/services readily available

✴ API and SDK => productive/quick app/service development

(33)

33

Above the Clouds

https://github.com/mesosphere/kubernetes-mesos

References

Related documents

1 DEPRESS FOOT PEDAL 2 DISENGAGE ATTACHMENT 3 CHECK OIL INDICATOR LIGHTS RED BLACK RED RED BLACK BLACK BLACK BLACK WHITE WHITE RED RED YELLOW RED WHITE WHITE BLACK RED

◆ Auto-antifreeze: To prevent the pipes and pumps from being frozen, the unit will defrost automatically when it meets the condition as follows: the ambient temperature is

My methodology included sending out a survey to campus ministers across the United States that asked them a series of questions about (1) the characteristics of the college

Corrosion of Materials Other than Metal; Early Corrosion Studies; Fundamentals; Electrochemical Principles; Electromotive Force; Ionization; The Corrosion Cell; Oxidation and

In terms of cumulative ET c , the dual-K c adjusted at 2.5% reduction in E s for each 10% of soil surface covered with crop residue gave best results for the

Jacada WorkSpace is well suited for contact center environments where agents are either burdened with multiple desktop applications or where complex business rules (whether

[ 4] It was not unusual for students in the Tablet PC sections to comment that, even though the same material was covered and approximately the same number of homework

Solution: AVEVA’s Asset Life Cycle Information Management solution; AVEVA’s Control of Work solution; AVEVA Enterprise Asset Management™.. Asset Visualisation