Aurora: a new model and
architecture for data stream management
Daniel J. Abadi
1, Don Carney
2, Ugur Cetintemel
2, Mitch Cherniack
1, Christian Convey
2, Sangdon Lee
2, Michael Stonebraker
3, Nesime Tatbul
2, Stan Zdonik
21 Department of Computer Science, Brandeis University
2 Department of Computer Science, Brown University
3 Department of EECS and Laboratory of Computer Science, M.I.T.
Presentor: YongChul Kwon([email protected])
2/15
Table of contents
One-line statement
Scenario
Critique
3/15
One-line statement
They designed a new DBMS model and system specialized in data stream
management
4/15
Scenario – A.D. 201x
Good evening, XXX Headline news!
KAIST has announced that they developed
nationwide object monitoring system
Daihyun Mobis has announced that they will launch auto car diagnostic service in next month
as their first telematics service!
Today, the total number of daily stock trading establishes a new record!
5/15
Scenario
Hello, Daihyun Mobis research department,
YongChul speaking It’s me.
How can I help you?
Oh, sure.
Well, let me see…
Rrrrr…
Hi, this is goodday’s reporter.
May I interview who developed your new
telematics service.
Aha!
Would you tell me the story about developing your service? I’ve heard about it’s
a quite challenging task!
good day Mobis
6/15
Daihyun Motors car
Armed various sensors
Pressure, exchange
date, … Brightness,
…
Telematics agent can test the car and report malfunctioning part ids Telematics agent
collects and transmits data to
center
All parts are RFID tagged
RPM, temperature,
pressure, oil status, …
7/15
Diagnostic service
4G Wireless
Network Service center
Repair center Home
visit service
Notify GPS
Immediate
accident response
8/15
Implementation - trigger
Data
Stream ???DBMS Output
Data Submitter
Messaging Systems Query register
CHALLENGE
CHALLENGE
CHALLENGE
CHALLENGE
CHALLENGE
Trigger : they are not
scalable Data stream
: sometimes lost or delivered lately
Update query : millions update in
short time burst
Query management : often update new triggers or queries requested by 3rd party History of values
: no scalable way to support latest location
of the car
CHALLENGE
Optimization : Is it helpful doing massive optimization
during high load?
CHALLENGE
QoS
: can not ensure service for premium
customers
9/15
Implementation - middleware
Data
Stream ???
DBMS
Data Submitter
Messaging Systems Query register query
Query Processor
CHALLENGE
CHALLENGE CHALLENGE
CHALLENGE
QoS
: can not ensure service for premium
customers
Query management : has to use new query
language Data stream
: sometimes lost or delivered lately
History of values : no scalable way to find latest location of
the car
Optimization : Can not benefit
from query optimization Update query
: millions update in short time burst
CHALLENGE
CHALLENGE
Resource usage : are we efficiently
using the system?
CHALLENGE
Output
10/15
Implementation - Aurora
Data
Stream Output
DBMS
Data Submitter
Messaging Systems Query register
CHALLENGE
query
Query Processor
CHALLENGE
CHALLENGE CHALLENGE
CHALLENGE
QoS
: can not ensure service for premium
customers
Query management : has to use new query
language Data stream
: sometimes lost or delivered lately
History of values : no scalable way to find latest location of
the car
Optimization : Can not benefit
from query optimization Update query
: millions update in short time burst
CHALLENGE
Data stream : new stream
processing architecture
Update queries : new stream
processing architecture
History of the values : new stream
processing architecture
Optimization : run-time optimization
Query management : intuitive stream algebra and GUI
QoS
: specified by application administrator &
load shedding
CHALLENGE
Resource usage : are we efficiently
using the system?
Resource usage : train scheduling &
feed back from/to QoS
11/15
Implementation - Aurora
Output
Buffer manager
Storage Manager
Persistent Store Q1
Q2
Qm
Q1 Q2 Qn
Scheduler
Load Shedder
QoS Monitor Catalog
Box Processors
σ μ
Router
inputs outputs
Data
Stream
12/15
Strong points
Solution approach itself
{
Rethink about everything for the requirements
Query model
{
Data flow style query specification
Optimization
{
Dynamic runtime optimization
{
Train scheduling
{
QoS specification based resource management
13/15
Weak points
Runs on a single computer
{
Aurora* project
No experiment results
{
Train scheduling
{
Various optimization technique
14/15
New ideas
Q. Design looks fancy but how to embody more scalability?
A. distributed aurora runtime
Flux style Aurora run-time coordination
{
Transfer aurora sub network or query to another runtime instance
{
External QoS scheduler will help
15/15
Buffer manager Storage
Manager
Persistent Store Q1
Q2
Qm
Q1 Q2 Qn
Scheduler
Load Shedder
QoS Monitor Catalog
Box Processors σ
μ Router
inputs outputs
Buffer manager Storage
Manager
Persistent Store Q1
Q2
Qm
Q1 Q2 Qn
Scheduler
Load Shedder
QoS Monitor Catalog
Box Processors σ
μ Router
inputs outputs
Buffer manager Storage
Manager
Persistent Store Q1
Q2
Qm
Q1 Q2 Qn
Scheduler
Load Shedder
QoS Monitor Catalog
Box Processors σ
μ Router
inputs outputs
Distributed Aurora run- time
External QoS Monitor
Supplementary Slides
17/15
Monitoring application VS.
Traditional DBMS
Data Passive Human Active Data Active
Human Passive Typical model
Not supported required
Real-time requirement
Not supported required
Approximate query result
Very hard or inefficient required
Managing History of values
Traditional DBMS Monitoring
Application
18/15
Solution approach
Rethink about DBMS
{
System & query model
{
Architecture
System model
Runtime operation
Optimization
{
Algebra
19/15
Runtime system
Buffer manager
Storage Manager
Persistent Store Q1
Q2
Qm
Q1 Q2
Qn
Scheduler
Load Shedder
QoS Monitor
Box Processors σ
μ Router
inputs outputs
Catalog
20/15
System model
User application
Continuous
& ad hoc queries
Historical Storage Aurora
System
QoS spec
Query spec
Application administrator
External data source
Operator
boxes data flow
21/15
Query model
Traditional
{
Structured Query Language
{
Declarative query on static data
Aurora
{
Data flow model for data stream
Application manager will construct queries using GUI
{
Stream Query Algebra
Queries are processed by SQuAl operators on the data
stream
22/15
Query model
b1
QoS spec
QoS spec
continuous query
Connection point
b2 b3
b4
b5 b6
b7 b8 b9 app
data input app
view
ad-hoc query
QoS spec
23/15
Optimization
How can we fix some parts of water supply system?
X X
X
24/15
Optimization
Filter BSort
Union Aggregate
Join
Aggregate
Map
Continuous query Filter
Map
Join Static storage
pull data
Hold
Filter
Hold
Ad hoc query
25/15
Optimization
Dynamic continuous query optimization
{
Inserting projections
{
Combining boxes
{
Reordering boxes
Ad hoc query optimization
{
1
ststage : replace implementation (Filter/Join)
{
2
ndstage : same as continuous query
26/15
SQuAl
Order-insensitive
{
Filter
{
Map
{
Union
Order-sensitive
{
BSort
{
Aggregate
{
Join
{