TWISTER2
Overview
of a
Twister2 Highlights
● Pure batch engine
a. Not on top of a streaming engine
● Pure streaming engine
a. Not on top of a batch engine
● Generic System
a. Can support many models of computation i. Supports DataFlow model
ii. BSP model
iii. Timely (Iterative) DataFlow will be added
● Component based architecture -- it is a toolkit
a. Clearly defines the important layers of a distributed processing engine
Big Data APIs
Started with Map-Reduce
Task Graph with computations on data
Different Data APIs
● Data transformation APIs
○ Apache Crunch PCollections
○ Apache Spark RDD
○ Apache Flink DataSet
○ Apache Beam PCollections ○ Apache Storm Streamlets ● Apache Storm Task Graph
● SQL based APIs
Execution as a Graph for Data Analytics
● The graph created by the user API can be executed using an event model
● The events flow through the edges of the graph as messages
● The compute units are executed upon arrival of events
○ Supports Function as a Service
● Execution state can be checkpointed automatically
○ Fault tolerance
Graph Execution Graph (Plan) Task
Schedule
T T
R
HPC APIs
● Dominated by Message Passing Interface (MPI)
○ Provides the most fundamental requirements in the most efficient ways possible
■ Communication between parallel workers
■ Managing of parallel processes
● HPC has task systems and Data APIs
○ They are all built on top of parallel communication libraries
○ HPX, Legion
Simple MPI Program
Twister2
● Open Source - Apache Licence Version 2.0
● Github - https://github.com/DSC-SPIDAL/twister2
● Documentation - https://twister2.gitbook.io/twister2 ● User List - [email protected]
● Started in the 4th Quarter of 2017
● About 80000 Lines of Code
Component Area Current Implementation Future Implementation Connected
DataFlow
Workflow or External Dataflow between different resources
Dynamic dataflows connected by data Ongoing
High Level APIs Distributed Data Set, SQL, Python, Scala, Graph
TSets, Java Dataflow optimizations, SQL, Python, Scala, Graph
Task System Task migration Not started Streaming job task migrations Streaming Streaming dataflow graph and execution FaaS
Task Execution Process, Threads More executors Task Scheduling Dynamic Scheduling, Static Scheduling; Pluggable
Scheduling Algorithms
More algorithms
Task Graph Static Graph, Dynamic Graph Generation Cyclic graphs for iteration as in Timely DataFlow Communication Internal DataFlow Operations Twister:Net; MPI Based, TCP, Batch and Streaming Integrate to other big data systems, Integrate with RDMA
BSP Operations Conventional MPI, Harp Native MPI Integration Job Submission Job Submission (Dynamic/Static)
Resource Allocation
Plugins for Slurm,Mesos, Kubernetes, Aurora, Nomad Yarn, Marathon
Data Access Static (Batch) Data File Systems including HDFS NoSQL, SQL Streaming Data Kafka Connector RabbitMQ, ActiveMQ
Functions going across components
Function Mechanism Implementation Futures
Architecture Specification Coordination Points DataFlow coordination points and BSP
Use for Learning nodes and fault tolerance control
Execution Semantics Both process based and thread based
Ongoing improvements
Fault tolerance Checkpointing Lightweight barriers, Checkpointing
Available in June Release
Security Messaging, FaaS, Storage Crosses all components
Runtime Components
Mesos Kubernetes Standalone BSP
Operations
Internal (fine grain) DataFlow Operations
Task Graph System TSet
Runtime
Resource API
HDFS NoSQL Message Brokers Atomic Job
Submission
Connected or External DataFlow
Data Access APIs
Streaming, Batch and ML Applications
Orchestration API User APIs SQL API Python API Local Slurm Future Features
Twister2 APIs in Detail
Worker BSP Operations Java API Worker DataFlow Operations Java APIOperator Level APIs
Worker DataFlow Operations Task Graph Java API Worker DataFlow Operations Task Graph TSet
Java API Python API
Worker
DataFlow Operations
Task Graph SQL
APIs built on top of Task Graph
Low level APIs with the most flexibility. Harder to program
Higher Level APIs based on Task Graph
Runtime Process View
1. Driver
a. User submits and controls the Job
2. Cluster Resources
a. Resources managed by a resource scheduler such as Mesos or Kubernetes
3. Resource Unit
a. A resource allocated by the scheduler: Core, Kubernetes Pod, Mesos Task, Compute Node
4. Worker Process
a. A Twister2 process that executes the user tasks
5. Task
a. Execution unit programmed by user
Anatomy of a Twister2 Job
1. Jobs are self contained
a. Two jobs don’t have any sharing of Twister2 system b. Each job has its own set of workers, job master, driver
c. Dashboard is shared
2. Two types of Jobs
a. Multiple dataflows controlled by a driver (Connected or External DataFlow) b. Single dataflow jobs (Internal DataFlow)
3. Jobs are started and controlled at the driver
Batch Jobs
● Jobs terminate after processing finite amount of input data
Word Counting Job
Tasks Programmed by user
Operation between
Batch Jobs
Streaming Jobs
● Continuously running jobs processing an unbounded stream of data
Configuring and running a streaming job
Tasks Programmed by user
Operation between
A Streaming Job
Streaming Jobs
● Similar to Storm - Different terminology
● Twister2 Operators
○ Operators represent distributed operations
○ I.e. Reduce is a distributed operation
○ Twister:Net used to connect nodes
● Storm Groupings
○ Arranging of communication links
○ Not a distributed operation
Storm Compatibility
● Twister2 has the support to run topologies created using Storm’s topology API.
● If you have existing topologies created using the Storm API, you can make
them twister2 compatible by simply replacing storm-core dependencies with twister2-storm dependency.
● Twister2 currently supports following storm grouping techniques
○ Direct Grouping
○ Shuffle Grouping
○ Field Grouping
Iterative Jobs
Many approaches for iterative jobs
1. Iterations are coded in the DataFlow Graph
a. Flink - Doesn’t support nested iterations
b. Naiad - Timely DataFlow model (Internal DataFlow with Cyclic Graphs). This will be supported by Twister2 in the June release
2. Iterations happen outside of the DataFlow Graph
a. Spark - The driver controls/initiates the iterations, Iteration happens in a central place which sends instructions (the dataflow graph) to workers
b. Twister2 - The Worker does the iterations for internal dataflow; external dataflow is done like Spark
3. Iterations in BSP programs
Twister2 Iterative Jobs
Control instructions for iterations. Worker runs the iterations. The dataflow graph is
executed inside the iteration. We need to extend to support iteration inside a
Spark Kmeans
● Supported through MLLib
● The diagram below shows how Spark Kmeans execution flows
Update New Centroids
New Centroids
Twister2 K-Means Iterative Execution
Calculate Nearest Centroids
Centroids Partitioned points
Worker
For i = 0 to maxIterations
AllReduce
Iterative Jobs - Futures
1. Supporting Naiad Model
Connected or External or Hierarchical DataFlows
● Multiple DataFlow jobs on different resources connected by an overarching
dataflow
● The driver program submits separate Twister2 atomic jobs with internal
dataflows on different resources
● The dataflows run in the same overall Job holding resources
● Ongoing implementation (end of January)
Driver
Atomic DataFlow
Atomic DataFlow
Job Submission
● Simple command to submit a Job to different cluster resource managers
○ Kubernetes, Mesos, Slurm, Standalone, Nomad
○ Clearly defined APIs for adding a resource manager
● Clearly defined semantics at the Job Submission layer to upper layers
○ Obtaining cluster resources
○ Spawning and managing processors started on these resources
○ Bootstrapping the environment and discovery of workers and resources
○ Providing auxiliary functions
■ Logging
■ Persistent and Volatile Storage
■ Control Messages between workers
bin/twister2 submit <cluster> jar <jar file> <class name>
Job Submission
● Job submission only allocates and manages a set of processes started on the
provision resources
● Job master
○ Coordinate the workers
○ Coordinate the driver and the workers
○ Fault tolerance
● Driver
○ Executes the user program
○ Start and control the job
Operator APIs
● Lowest API level for programming Twister2 applications
● Implement a set of parallel operators
● Bulk Synchronous Parallel (BSP) style and DataFlow style provided
● Streaming and batch operations supported
○ Streaming - unbounded operations
○ Batch - bounded operations
DataFlow Operators
● HPC has MPI and its implementations for parallel operations
● Big data systems didn’t have a communication library
○ Now it does with Twister:Net
● DataFlow can use
○ MPI
○ TCP Sockets
● Optimized DataFlow operations with low latencies and high throughput
○ Described in detail in paper, Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments
○ https://www.computer.org/csdl/proceedings/cloud/2018/7235/00/723501a383-abs.html
● Disk based operations for large data transfers
BSP Operators
● OpenMPI and Harp for BSP communication
● Currently doesn’t work with Twister2 task graphs
○ It should be possible to do this but not in plans for next 6 months
● Can be used to write BSP style programs or communicate between parallel
Operator Summary
● Status
○ Support various parallel operations
○ Optimized operations
○ Only library available for data operations
● Future
○ Direct RDMA support
○ C++ implementation
Task System
● Second level API for programming applications
● Defines a set of programmable units
○ Source
○ Compute
○ Sink
● User can program and connect these units using operations
○ These operations translate to communication operations
■ I.e. Reduce or Node to Node communication
○ Two compute units can be connected using the operation Reduce
○ Note we allow connection of nodes with different levels of parallelism
● Once connected they form a Graph
● The graph is scheduled on to the processes obtained
Task System
● Three components
○ Task Graph
○ Task Scheduler
○ Task Executor
● Task Graph
○ Represents compute units and connections among them
● Task Scheduler
○ Assignment of user defined compute units (tasks) to Workers
● Task Executor
How a user program is executed
1. User creates a Task Graph (dataflow) 2. This graph and resource information is
combined to create a task schedule
a. This schedule describes what tasks run on which resources
3. An execution graph is created
a. This graph has the connections created between tasks using operations
4. A suitable executor is used for executing this execution graph in an event driven manner
a. Streaming execution b. Batch execution
5. Controlled in workers for internal dataflow and in central driver for connected
Task System
● Current status
○ Able to write programs similar to Hadoop and Storm
● Future
○ Dynamically adapting task graph according to computing and data requirements
○ Direct support of Storm and Hadoop APIs
○ Cyclic Graphs with Timely DataFlow Model
■ This will be using the same Task system we have with DataFlow operations
■ Add new APIs to include timing information to graph
Twister Sets (TSet)
● Programming abstraction similar to RDD, PCollection or DataSets
TSet
● TSet based programs are translated to a Task Graph and executed
● TSet is a immutable distributed data set
○ We cannot modify a TSet
○ We can only apply an operation and produce another TSet
● Futures
○ Add support to existing APIs
■ Apache Beam
TSets, RDDs and Streamlets
Similar abstractions with small differences
Twister2 Spark Heron
Higher Level API Lowest Level API Heron has a Task Abstraction. So not the lowest.
Clear separation of data and operations
Data and operation separation not clear
Data and operations are not clearly separated
Streaming and Batch natively Batch and Mini-Batch streaming Only Streaming
More basic operations such as AllReduce and AllGather
More user focused operations
Dashboard
Major Ongoing Work & Planned Work
● Fault tolerance of Streaming and Batch applications
● Connected DataFlows
● Python API
● Improving APIs
○ We have several basic APIs
Future use cases
● DataFlows running from edge to cloud
○ We have streaming and batch components
○ Connecting batch and streaming dataflows running in cloud to a dataflow running in edge
○ A connected dataflow from edge to cloud?
● Function as a Service
○ Provide connectors to micro services
Upcoming Releases (Major Features)
1. End of February 2019
a. Connected DataFlow b. TSets
c. Dashboard
d. Heterogeneous Resources and Dynamic Scaling
2. End of June 2019
a. Fault tolerance b. Python API
c. Statistics gathering d. More applications e. Naiad Timely Dataflow
3. Not scheduled
Summary
We outlined Twister2 components working together to achieve data analytics tasks
● A flexible system that can be adapted to user requirements
○ Different APIs at different levels of abstraction
● Batch, Streaming and Iterative computing
● A system designed to work in Clouds and HPC environments
Hands on
● We have prepared a cluster environment with Kubernetes
○ You can ssh to the Kubernetes cluster with the credentials provided separately
● If you like to try Twister2 on your laptop we have created a docker image
● We will add AWS instructions
● The instructions can be found at
○ https://twister2.gitbook.io/twister2/tutorial
● There are several examples you can try
○ https://twister2.gitbook.io/twister2/tutorial/developing
○ Streaming and Batch Jobs
○ K-Means algorithm
○ Advanced example with Join Twister2:Net operator
● These have descriptions on tutorial page; slides and links to code
○ Wordcount examples demonstrate batch and streaming.
○ Kmeans example demonstrate iteration and communication.
Data Analytics Architectures
API View
RDD, Streamlet, DataSet,
PCollection
Map Reduce, Spout Bolt
Analytics View
Lambda Architecture, Kappa Architecture
DataFlow