Overview

(1)

TWISTER2

Overview

of a

(2)

Twister2 Highlights

● Pure batch engine

a. Not on top of a streaming engine

● Pure streaming engine

a. Not on top of a batch engine

● Generic System

a. Can support many models of computation i. Supports DataFlow model

ii. BSP model

iii. Timely (Iterative) DataFlow will be added

● Component based architecture -- it is a toolkit

a. Clearly deﬁnes the important layers of a distributed processing engine

(3)

Big Data APIs

Started with Map-Reduce

Task Graph with computations on data

Different Data APIs

● Data transformation APIs

○ Apache Crunch PCollections

○ Apache Spark RDD

○ Apache Flink DataSet

○ Apache Beam PCollections ○ Apache Storm Streamlets ● Apache Storm Task Graph

● SQL based APIs

(4)

Execution as a Graph for Data Analytics

● The graph created by the user API can be executed using an event model

● The events ﬂow through the edges of the graph as messages

● The compute units are executed upon arrival of events

○ Supports Function as a Service

● Execution state can be checkpointed automatically

○ Fault tolerance

Graph Execution Graph (Plan) Task

Schedule

T T

R

(5)

HPC APIs

● Dominated by Message Passing Interface (MPI)

○ Provides the most fundamental requirements in the most eﬃcient ways possible

■ Communication between parallel workers

■ Managing of parallel processes

● HPC has task systems and Data APIs

○ They are all built on top of parallel communication libraries

○ HPX, Legion

Simple MPI Program

(6)

Twister2

● Open Source - Apache Licence Version 2.0

● Github - https://github.com/DSC-SPIDAL/twister2

● Documentation - https://twister2.gitbook.io/twister2 ● User List - [email protected]

● Started in the 4th Quarter of 2017

● About 80000 Lines of Code

(7)

Component Area Current Implementation Future Implementation Connected

DataFlow

Workflow or External Dataflow between different resources

Dynamic dataflows connected by data Ongoing

High Level APIs Distributed Data Set, SQL, Python, Scala, Graph

TSets, Java Dataflow optimizations, SQL, Python, Scala, Graph

Task System Task migration Not started Streaming job task migrations Streaming Streaming dataflow graph and execution FaaS

Task Execution Process, Threads More executors Task Scheduling Dynamic Scheduling, Static Scheduling; Pluggable

Scheduling Algorithms

More algorithms

Task Graph Static Graph, Dynamic Graph Generation Cyclic graphs for iteration as in Timely DataFlow Communication Internal DataFlow Operations Twister:Net; MPI Based, TCP, Batch and Streaming Integrate to other big data systems, Integrate with RDMA

BSP Operations Conventional MPI, Harp Native MPI Integration Job Submission Job Submission (Dynamic/Static)

Resource Allocation

Plugins for Slurm,Mesos, Kubernetes, Aurora, Nomad Yarn, Marathon

Data Access Static (Batch) Data File Systems including HDFS NoSQL, SQL Streaming Data Kafka Connector RabbitMQ, ActiveMQ

(8)

Functions going across components

Function Mechanism Implementation Futures

Architecture Specification Coordination Points DataFlow coordination points and BSP

Use for Learning nodes and fault tolerance control

Execution Semantics Both process based and thread based

Ongoing improvements

Fault tolerance Checkpointing Lightweight barriers, Checkpointing

Available in June Release

Security Messaging, FaaS, Storage Crosses all components

(9)

Runtime Components

Mesos Kubernetes Standalone BSP

Operations

Internal (fine grain) DataFlow Operations

Task Graph System TSet

Runtime

Resource API

HDFS NoSQL Message Brokers Atomic Job

Submission

Connected or External DataFlow

Data Access APIs

Streaming, Batch and ML Applications

Orchestration API User APIs SQL API Python API Local Slurm Future Features

(10)

Twister2 APIs in Detail

Worker BSP Operations Java API Worker DataFlow Operations Java API

Operator Level APIs

Worker DataFlow Operations Task Graph Java API Worker DataFlow Operations Task Graph TSet

Java API Python API

Worker

DataFlow Operations

Task Graph SQL

APIs built on top of Task Graph

Low level APIs with the most flexibility. Harder to program

Higher Level APIs based on Task Graph

(11)

(12)

Runtime Process View

1. Driver

a. User submits and controls the Job

2. Cluster Resources

a. Resources managed by a resource scheduler such as Mesos or Kubernetes

3. Resource Unit

a. A resource allocated by the scheduler: Core, Kubernetes Pod, Mesos Task, Compute Node

4. Worker Process

a. A Twister2 process that executes the user tasks

5. Task

a. Execution unit programmed by user

(13)

Anatomy of a Twister2 Job

1. Jobs are self contained

a. Two jobs don’t have any sharing of Twister2 system b. Each job has its own set of workers, job master, driver

c. Dashboard is shared

2. Two types of Jobs

a. Multiple dataﬂows controlled by a driver (Connected or External DataFlow) b. Single dataﬂow jobs (Internal DataFlow)

3. Jobs are started and controlled at the driver

(14)

Batch Jobs

● Jobs terminate after processing ﬁnite amount of input data

Word Counting Job

Tasks Programmed by user

Operation between

(15)

Batch Jobs

(16)

Streaming Jobs

● Continuously running jobs processing an unbounded stream of data

Configuring and running a streaming job

Tasks Programmed by user

Operation between

(17)

A Streaming Job

(18)

Streaming Jobs

● Similar to Storm - Diﬀerent terminology

● Twister2 Operators

○ Operators represent distributed operations

○ I.e. Reduce is a distributed operation

○ Twister:Net used to connect nodes

● Storm Groupings

○ Arranging of communication links

○ Not a distributed operation

(19)

Storm Compatibility

● Twister2 has the support to run topologies created using Storm’s topology API.

● If you have existing topologies created using the Storm API, you can make

them twister2 compatible by simply replacing storm-core dependencies with twister2-storm dependency.

● Twister2 currently supports following storm grouping techniques

○ Direct Grouping

○ Shuﬄe Grouping

○ Field Grouping

(20)

Iterative Jobs

Many approaches for iterative jobs

1. Iterations are coded in the DataFlow Graph

a. Flink - Doesn’t support nested iterations

b. Naiad - Timely DataFlow model (Internal DataFlow with Cyclic Graphs). This will be supported by Twister2 in the June release

2. Iterations happen outside of the DataFlow Graph

a. Spark - The driver controls/initiates the iterations, Iteration happens in a central place which sends instructions (the dataﬂow graph) to workers

b. Twister2 - The Worker does the iterations for internal dataﬂow; external dataﬂow is done like Spark

3. Iterations in BSP programs

(21)

Twister2 Iterative Jobs

Control instructions for iterations. Worker runs the iterations. The dataﬂow graph is

executed inside the iteration. We need to extend to support iteration inside a

(22)

Spark Kmeans

● Supported through MLLib

● The diagram below shows how Spark Kmeans execution ﬂows

(23)

Update New Centroids

New Centroids

Twister2 K-Means Iterative Execution

Calculate Nearest Centroids

Centroids Partitioned points

Worker

For i = 0 to maxIterations

AllReduce

(24)

Iterative Jobs - Futures

1. Supporting Naiad Model

(25)

Connected or External or Hierarchical DataFlows

● Multiple DataFlow jobs on diﬀerent resources connected by an overarching

dataﬂow

● The driver program submits separate Twister2 atomic jobs with internal

dataﬂows on diﬀerent resources

● The dataﬂows run in the same overall Job holding resources

● Ongoing implementation (end of January)

Driver

Atomic DataFlow

(26)

Job Submission

● Simple command to submit a Job to diﬀerent cluster resource managers

○ Kubernetes, Mesos, Slurm, Standalone, Nomad

○ Clearly deﬁned APIs for adding a resource manager

● Clearly deﬁned semantics at the Job Submission layer to upper layers

○ Obtaining cluster resources

○ Spawning and managing processors started on these resources

○ Bootstrapping the environment and discovery of workers and resources

○ Providing auxiliary functions

■ Logging

■ Persistent and Volatile Storage

■ Control Messages between workers

bin/twister2 submit <cluster> jar <jar ﬁle> <class name>

(27)

Job Submission

● Job submission only allocates and manages a set of processes started on the

provision resources

● Job master

○ Coordinate the workers

○ Coordinate the driver and the workers

○ Fault tolerance

● Driver

○ Executes the user program

○ Start and control the job

(28)

Operator APIs

● Lowest API level for programming Twister2 applications

● Implement a set of parallel operators

● Bulk Synchronous Parallel (BSP) style and DataFlow style provided

● Streaming and batch operations supported

○ Streaming - unbounded operations

○ Batch - bounded operations

(29)

DataFlow Operators

● HPC has MPI and its implementations for parallel operations

● Big data systems didn’t have a communication library

○ Now it does with Twister:Net

● DataFlow can use

○ MPI

○ TCP Sockets

● Optimized DataFlow operations with low latencies and high throughput

○ Described in detail in paper, Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments

○ https://www.computer.org/csdl/proceedings/cloud/2018/7235/00/723501a383-abs.html

● Disk based operations for large data transfers

(30)

BSP Operators

● OpenMPI and Harp for BSP communication

● Currently doesn’t work with Twister2 task graphs

○ It should be possible to do this but not in plans for next 6 months

● Can be used to write BSP style programs or communicate between parallel

(31)

Operator Summary

● Status

○ Support various parallel operations

○ Optimized operations

○ Only library available for data operations

● Future

○ Direct RDMA support

○ C++ implementation

(32)

Task System

● Second level API for programming applications

● Deﬁnes a set of programmable units

○ Source

○ Compute

○ Sink

● User can program and connect these units using operations

○ These operations translate to communication operations

■ I.e. Reduce or Node to Node communication

○ Two compute units can be connected using the operation Reduce

○ Note we allow connection of nodes with diﬀerent levels of parallelism

● Once connected they form a Graph

● The graph is scheduled on to the processes obtained

(33)

Task System

● Three components

○ Task Graph

○ Task Scheduler

○ Task Executor

● Task Graph

○ Represents compute units and connections among them

● Task Scheduler

○ Assignment of user deﬁned compute units (tasks) to Workers

● Task Executor

(34)

How a user program is executed

1. User creates a Task Graph (dataflow) 2. This graph and resource information is

combined to create a task schedule

a. This schedule describes what tasks run on which resources

3. An execution graph is created

a. This graph has the connections created between tasks using operations

4. A suitable executor is used for executing this execution graph in an event driven manner

a. Streaming execution b. Batch execution

5. Controlled in workers for internal dataflow and in central driver for connected

(35)

Task System

● Current status

○ Able to write programs similar to Hadoop and Storm

● Future

○ Dynamically adapting task graph according to computing and data requirements

○ Direct support of Storm and Hadoop APIs

○ Cyclic Graphs with Timely DataFlow Model

■ This will be using the same Task system we have with DataFlow operations

■ Add new APIs to include timing information to graph

(36)

Twister Sets (TSet)

● Programming abstraction similar to RDD, PCollection or DataSets

(37)

TSet

● TSet based programs are translated to a Task Graph and executed

● TSet is a immutable distributed data set

○ We cannot modify a TSet

○ We can only apply an operation and produce another TSet

● Futures

○ Add support to existing APIs

■ Apache Beam

(38)

TSets, RDDs and Streamlets

Similar abstractions with small diﬀerences

Twister2 Spark Heron

Higher Level API Lowest Level API Heron has a Task Abstraction. So not the lowest.

Clear separation of data and operations

Data and operation separation not clear

Data and operations are not clearly separated

Streaming and Batch natively Batch and Mini-Batch streaming Only Streaming

More basic operations such as AllReduce and AllGather

More user focused operations

(39)

Dashboard

(40)

Major Ongoing Work & Planned Work

● Fault tolerance of Streaming and Batch applications

● Connected DataFlows

● Python API

● Improving APIs

○ We have several basic APIs

(41)

Future use cases

● DataFlows running from edge to cloud

○ We have streaming and batch components

○ Connecting batch and streaming dataﬂows running in cloud to a dataﬂow running in edge

○ A connected dataﬂow from edge to cloud?

● Function as a Service

○ Provide connectors to micro services

(42)

Upcoming Releases (Major Features)

1. End of February 2019

a. Connected DataFlow b. TSets

c. Dashboard

d. Heterogeneous Resources and Dynamic Scaling

2. End of June 2019

a. Fault tolerance b. Python API

c. Statistics gathering d. More applications e. Naiad Timely Dataﬂow

3. Not scheduled

(43)

Summary

We outlined Twister2 components working together to achieve data analytics tasks

● A ﬂexible system that can be adapted to user requirements

○ Diﬀerent APIs at diﬀerent levels of abstraction

● Batch, Streaming and Iterative computing

● A system designed to work in Clouds and HPC environments

(44)

Hands on

● We have prepared a cluster environment with Kubernetes

○ You can ssh to the Kubernetes cluster with the credentials provided separately

● If you like to try Twister2 on your laptop we have created a docker image

● We will add AWS instructions

● The instructions can be found at

○ https://twister2.gitbook.io/twister2/tutorial

● There are several examples you can try

○ https://twister2.gitbook.io/twister2/tutorial/developing

○ Streaming and Batch Jobs

○ K-Means algorithm

○ Advanced example with Join Twister2:Net operator

● These have descriptions on tutorial page; slides and links to code

○ Wordcount examples demonstrate batch and streaming.

○ Kmeans example demonstrate iteration and communication.

(45)

(46)

Thank You

You can reach us anytime

[email protected]

(47)

(48)

Data Analytics Architectures

API View

RDD, Streamlet, DataSet,

PCollection

Map Reduce, Spout Bolt

Analytics View

Lambda Architecture, Kappa Architecture

DataFlow

(49)