Efficient
Processing
for
Big
Data
Streams
and
their
Context
in
Distributed
Cyber
‐
Physical
Systems
Marina
Papatriantafilou
Department
of
Computer
Science
and
Engineering
Chalmers
University
of
Technology
&
Gothenburg
University
Prelude …
PhD
(1996)
University
of
Patras,
Greece
Computer Science and Engineering
Distributed Computing
‐
Center
for
Mathematics &
Computer
Science,
Netherlands
‐
Max
Planck
Institute for
Computer
Science,
Germany
‐
Chalmers:
forskarassistent
Assoc prof.,
Chalmers
Un.
of
Technology &
Gothenburg
University,
Sweden
Roadmap
Cyberphysical systems,
big
data,
streams
and
distributed
systems:
how
they
belong
together
At
our
research
team
Concluding
discussion
Examples
Cyber
‐
Physical System
(CPS)
www.energy‐daily.com/images/ Adaptive Electricity Grids http://www.kapsch.net/se/ Marina Papatriantafilou
Cyberphysical systems
as
layered
systems
Cyber system
communication link
Physical
system
Sensing+computing+
communicating device
aka Internet
‐
of
‐
Things (IoT)
CPS/IoT =>
big
numbers of
devices and/or
big data
rates
=>
big volumes of events/data!
Why this complexity?
(smart)
adaptive
use of
resources ….
…
possibilities
of
improvements:
e.g.
energy
consumption,
traffic
bandwidth,
early
‐
warnings,
…
improving
systems
quality
”
[the
4
thindustrial
(r)evolution,
presentation
S.
Jeschke,
2013]
Info
needed in
near
‐
real
‐
time
Is
store&process (DB)
a
feasible
option?
–
high
‐
rate
sensors,
high
‐
speed
networks,
soc. media,
financial
records:
up
to
Mmsg/sec;
decisions
must
be
taken
really fast
e.g.,
fractions
of
msec,
even
μ
secs.
Data
Streaming:
–
In
memory,
in
‐
network,
distributed
–
Locality,
use
of
available
resources
–
Efficient
one
‐
pass
analysis
&
filter
“as of today, of the available data from sensors only 0.1% is
analyzed, mainly offline (i.e., afterwards, not in or close to
real‐time)”
[Jonathan Ballon, Chief Strategy Officer, General Electric
]
“as of today, of the available data from sensors only 0.1% is
analyzed, mainly offline (i.e., afterwards, not in or close to
real‐time)”
[Jonathan Ballon, Chief Strategy Officer, General Electric
]
fig: V. Gulisano
Data
streaming
‐
components
Distributed input
sources generating
streams of data
(unbounded sequences
of tuples, time‐series)
Continuous Query (‐ies)
(graph of data streaming
operators/tasks).
Can be used to:
•
filter / modify tuples
•
aggregate tuples, join streams
…
stateful operations
computed over
windows
fig: V. GulisanoInput/output
&
processing
can
involve
multiple
parallel
threads
Challenges:
Throughput,
Latency,
Determinism,
Load
balancing,
Fault
Tolerance
Marina Papatriantafilou
[
State
‐
of
‐
the
‐
art
literature
]
parallelization
in
operators
implementations
:
but single
‐
point
bottlenecks
can
still
persist
Roadmap
Cyberphysical systems,
big
data,
streams
and
distributed
systems:
how
they
belong
together
At
our
research
team
Concluding
discussion
Parallel
Data
Streaming
…
…
Fine
‐
grain
parallelism
At CTH: enhanced parallelism by means of dedicated / semantic‐
aware
concurrent
data
objects
and their efficient
algorithmic fine
‐
grain
synchronization
implementations
fig: V. Gulisano, R. Rodriguez
Examples of results with ScaleGate
•
shifting the
saturation point of the
pipeline
•
possible to
process
”heavier”
streams
with same
computing capacity,
many
times faster,
Mtuples/sec
Baseline (Borealis,Streamcloud) FIFO queue Baseline Lock‐free FIFO
ScaleGate‐based
[CGNPT ACM SPAA2014,
GNPT IEEE BigData2015]
Latency,
throughput scaling
(while keeping
fault
‐
tolerant
and
deterministic
DETERMINISTIC
REAL
‐
TIME
ANALYTICS
OF
GEOSPATIAL
DATA
STREAMS
THROUGH
SCALEGATE
OBJECTS
http://www.chalmers.se/en/departments/cse/news/Pages/debs2015.aspx
BEST
SOLUTION
GRAND
CHALLENGE
AWARD:
9th ACM SIGMOD‐SIGSOFT International
Conference on Distributed Event‐Based Systems 2015
•
Top k frequent routes, profitable cells (near‐real time window‐based streaming)
•
> 110,000 tuples/sec throughput, < 46 msec latency
Examples of use
‐
cases:
Geospatial
monitoring
[GNWPT
ACM
DEBS
2015]
Examples of use
‐
cases:
Advanced Metering
Infrastructure
Efficient
temporal
‐
spacial clustering for
on
‐
line
identification of
critical
events
(even
when
the
communication
is
unreliable)
time
Sliding window
Grid‐based Single‐Linkage Clustering (G‐SLC)
[FALP IEEE BigData2014]
Examples of use
‐
cases:
Advanced Metering
Infrastructure
Efficient
Data
Validation
on
‐
the
‐
fly:
Noisy
and
lossy data:
bad
‐
calibrated
/
faulty
devices,
lossy communication,
…
Eg scaling to 25 Million meters/hourly
readings on mainstream 6‐core‐platform
[GAP IEEE ISGT 2014]
+
differentially
private
aggregation
[ongoing work]
Roadmap
Cyberphysical systems,
big
data,
streams
and
distributed
systems:
how
they
belong
together
At
our
research
team
Concluding
discussion
Summarizing &
Concluding
Advancing SoA
BigDataStreamAnalysis
(context IoT/CPS;
relate
with Cloud/
”Fog”
computing)
DS^2:
DataStreaming*DataStructures
ie efficient multicore stream
processing
Efficient algorithmic (in
‐
memory)
stream analysis
“…important
…
to
design
algorithms
that
communicate
as
little
as
possible
”
…
“…
efficient
processing
and
data
analysis
need
to
be
unified
…”
[J. Dongarra, D. Reed, CACM 2015]
Marina Papatriantafilou
In
our
ongoing/near
‐
future
research:
‐
Elastic
parallel&distributed,
in
‐
network
streaming
(allowing
eg.
embedded
devices)
‐
More
concurrent
data
structures
&
multicore
‐
algos for
efficient
in
‐
memory
stream
processing
‐
Processing
high
‐
rate
sensory
data
(eg LIDAR)
&
Thank you
EXCESS
Marina Papatriantafilou
Contact; [email protected]
Co
‐
authors in
work mentioned here
(from
left to
right):
Data
– Internet
of
Things
‐
Data
processing:
validation,
monitoring,
prediction
‐
Security,
privacy
Demand
‐
response
in
energy
‐
Resource
management,
load
shaping
‐
Microgrids demo/
testbeds
Energy/efficient
computation
‐
streaming
,
parallel,
multicore
‐
energy
efficiency
:
estimated
savings
30
‐
70%
Cooperative
vehicular systems
‐
Communication
&coordination,
‐
data
‐
driven
situation
‐
awareness (new
postdoc SAFER)
‐
Virtual traffic
‐
lights/safer crossings
‐
Gulliver
demo/testbed
At
our research
team
(approx 30
pers):
Cyberphysical systems
research
Marina Papatriantafilou