• No results found

Efficient Processing for Big Data Streams and their Context in Distributed Cyber Physical Systems

N/A
N/A
Protected

Academic year: 2021

Share "Efficient Processing for Big Data Streams and their Context in Distributed Cyber Physical Systems"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Efficient

 

Processing

 

for

 

Big

 

Data

 

Streams

 

and

 

their

 

Context

 

in

 

Distributed

 

Cyber

Physical

 

Systems

Marina

 

Papatriantafilou

Department

 

of

 

Computer

 

Science

 

and

 

Engineering

Chalmers

 

University

 

of

 

Technology

 

&

 

Gothenburg

 

University

 

(2)

Prelude …

PhD

 

(1996)

University

 

of

 

Patras,

 

Greece

Computer Science and Engineering

Distributed Computing

Center

 

for

 

Mathematics &

Computer

 

Science,

 

Netherlands

Max

 

Planck

 

Institute for

 

Computer

 

Science,

 

Germany

Chalmers:

 

forskarassistent

Assoc prof.,

 

Chalmers

 

Un.

 

of

 

Technology &

 

Gothenburg

 

University,

Sweden

(3)

Roadmap

Cyberphysical systems,

 

big

 

data,

 

streams

 

and

 

distributed

 

systems:

 

how

 

they

 

belong

 

together

 

At

 

our

 

research

 

team

Concluding

 

discussion

(4)

Examples

 

Cyber

Physical System

 

(CPS)

 

www.energy‐daily.com/images/ Adaptive  Electricity Grids http://www.kapsch.net/se/ Marina Papatriantafilou

(5)

Cyberphysical systems

 

as

 

layered

 

systems

Cyber system 

communication link

Physical

 

system

 

Sensing+computing+

communicating device

aka Internet

of

Things (IoT)

(6)

CPS/IoT =>

 

big

numbers of

 

devices and/or

 

big data

 

rates

 

=>

 

big volumes of events/data!

Why this complexity?

 

(smart)

 

adaptive

 

use of

 

resources ….

 

possibilities

 

of

 

improvements:

 

e.g.

 

energy

 

consumption,

 

traffic

 

bandwidth,

 

early

warnings,

 

 

improving

 

systems

 

quality

 

[the

 

4

th

industrial

 

(r)evolution,

 

presentation

 

S.

 

Jeschke,

 

2013]

(7)

Info

 

needed in

 

near

real

time

Is

 

store&process (DB)

 

a

 

feasible

 

option?

high

rate

 

sensors,

 

high

speed

 

networks,

 

soc. media,

 

financial

 

records:

 

up

 

to

 

Mmsg/sec;

 

decisions

 

must

 

be

 

taken

 

really fast 

e.g.,

 

fractions

 

of

 

msec,

 

even

 μ

secs.

Data

 

Streaming:

In

 

memory,

 

in

network,

 

distributed

 

Locality,

 

use

 

of

 

available

 

resources

Efficient

 

one

pass

 

analysis

 

&

 

filter

“as of today, of the available data from sensors only 0.1% is 

analyzed, mainly offline (i.e., afterwards, not in or close to 

real‐time)” 

[Jonathan Ballon, Chief Strategy Officer, General Electric

]

“as of today, of the available data from sensors only 0.1% is 

analyzed, mainly offline (i.e., afterwards, not in or close to 

real‐time)” 

[Jonathan Ballon, Chief Strategy Officer, General Electric

]

fig: V. Gulisano

(8)

Data

 

streaming

 ‐

components

Distributed input 

sources generating 

streams of data 

(unbounded sequences 

of tuples, time‐series)

Continuous Query (‐ies)

(graph of data streaming 

operators/tasks).

Can be used to:

filter / modify tuples

aggregate tuples, join streams

stateful operations 

computed over 

windows

fig: V. Gulisano

Input/output

 

&

 

processing

 

can

 

involve

 

multiple

 

parallel

 

threads

Challenges:

 

Throughput,

 

Latency,

 

Determinism,

 

Load

 

balancing,

 

Fault

 

Tolerance

Marina Papatriantafilou

[

State

of

the

art

 

literature

]

 

parallelization

 

in

 

operators

 

implementations

:

 

but single

point

bottlenecks

 

can

 

still

 

persist

 

(9)

Roadmap

Cyberphysical systems,

 

big

 

data,

 

streams

 

and

 

distributed

 

systems:

 

how

 

they

 

belong

 

together

 

At

 

our

 

research

 

team

Concluding

 

discussion

(10)

Parallel

 

Data

 

Streaming

Fine

grain

 

parallelism

At CTH: enhanced parallelism by means of dedicated / semantic‐

aware 

concurrent

  

data

 

objects

 

and their efficient 

algorithmic fine

grain

 

synchronization

 

implementations

fig: V. Gulisano, R. Rodriguez

(11)

Examples of results with ScaleGate

shifting the

 

saturation point of the

 

pipeline

 

possible to

 

process

 

”heavier”

 

streams

with same

 

computing capacity,

 

many

times faster,

 

Mtuples/sec

Baseline (Borealis,Streamcloud) FIFO queue Baseline Lock‐free FIFO

ScaleGate‐based

[CGNPT ACM SPAA2014, 

GNPT IEEE BigData2015]

Latency,

 

throughput scaling

(while keeping

fault

tolerant

 

and

 

deterministic

(12)

DETERMINISTIC

 

REAL

TIME

 

ANALYTICS

 

OF

 

GEOSPATIAL

 

DATA

 

STREAMS

 

THROUGH

 

SCALEGATE

 

OBJECTS

 

http://www.chalmers.se/en/departments/cse/news/Pages/debs2015.aspx

BEST

 

SOLUTION

 

GRAND

 

CHALLENGE

 

AWARD:

 

9th ACM SIGMOD‐SIGSOFT International 

Conference on Distributed Event‐Based Systems 2015

Top k frequent routes, profitable cells (near‐real time window‐based streaming)

> 110,000 tuples/sec throughput, < 46 msec latency

Examples of use

cases:

 

Geospatial

 

monitoring

[GNWPT

 

ACM

 

DEBS

 

2015]

 

(13)

Examples of use

cases:

 

Advanced Metering

 

Infrastructure

Efficient

 

temporal

spacial clustering for

 

on

line

 

identification of

 

critical

 

events

 

(even

 

when

 

the

 

communication

 

is

 

unreliable)

 

time

Sliding window

Grid‐based Single‐Linkage Clustering (G‐SLC)

[FALP IEEE BigData2014]

(14)

Examples of use

cases:

 

Advanced Metering

 

Infrastructure

Efficient

 

Data

 

Validation

 

on

the

fly:

 

Noisy

 

and

 

lossy data:

 

bad

calibrated

 

/

 

faulty

 

devices,

 

lossy communication,

 

Eg scaling to 25 Million meters/hourly 

readings on mainstream 6‐core‐platform 

[GAP IEEE ISGT 2014]

differentially

 

private

 

aggregation  

[ongoing work]

(15)

Roadmap

Cyberphysical systems,

 

big

 

data,

 

streams

 

and

 

distributed

 

systems:

 

how

 

they

 

belong

 

together

 

At

 

our

 

research

 

team

Concluding

 

discussion

(16)

Summarizing &

 

Concluding

Advancing SoA

 

BigDataStreamAnalysis

(context IoT/CPS;

 

relate

with Cloud/

 

”Fog”

 

computing)

DS^2:

 

DataStreaming*DataStructures

ie efficient multicore stream

processing

Efficient algorithmic (in

memory)

 

stream analysis

“…important

 

 

to

 

design

 

algorithms

 

that

 

communicate

 

as

 

little

 

as

 

possible

 

 

“…

efficient

 

processing

 

and

 

data

 

analysis

 

need

 

to

 

be

 

unified

…”

[J. Dongarra, D. Reed, CACM 2015]

Marina Papatriantafilou

In

 

our

 

ongoing/near

future

 

research:

Elastic

 

parallel&distributed,

 

in

network

 

streaming

 

(allowing

 

eg.

 

embedded

 

devices)

More

 

concurrent

 

data

 

structures

 

&

 

multicore

algos for

 

efficient

 

in

memory

 

stream

 

processing

Processing

 

high

rate

 

sensory

 

data

 

(eg LIDAR)

 

&

 

(17)

Thank you

EXCESS 

Marina Papatriantafilou

Contact; [email protected]

Co

authors in

 

work mentioned here

(from

 

left to

 

right):

(18)

Data

 

– Internet

 

of

 

Things

Data

 

processing:

 

validation,

  

monitoring,

 

prediction

Security,

 

privacy

Demand

response

in

 

energy

Resource

management,

 

load

shaping

Microgrids demo/

 

testbeds

Energy/efficient

computation

streaming

 

,

 

parallel,

 

multicore

energy

 

efficiency

 

:

estimated

 

savings

30

70%

Cooperative

vehicular systems

Communication

  

&coordination,

data

driven

 

situation

awareness (new

 

postdoc SAFER)

Virtual traffic

lights/safer crossings

Gulliver

 

demo/testbed

At

 

our research

 

team

 

(approx 30

 

pers):

Cyberphysical systems

 

research

Marina Papatriantafilou

Systems

 

Security

Distribut

ed

systems,

 

IoT

&stream

Parallel

computing

References

Related documents

spécialisées: Le Président/Secrétaire Exécutif lorsqu‟il lui est permis, Ombudsman et ses Vices, ainsi que les fonctionnaires éligibles pour bénéficier la

The financial aid offices and other categorical programs such as the CalWORKs program and Student Success and Support Programs (SSSP) have increased work-study for students who

institutions to embrace new parameters and protocols. Finally, I will conclude with how COL has responded to the challenges of QA and provided support to Member States. But first

The payment platform may decide either to lower or to increase the interchange fee to provide merchants with incentives to increase their investment in fraud detection

Accepted 2017 October 9. The broad-band XMM–Newton + NuSTAR spec- trum of P13 is qualitatively similar to the rest of the ULX sample with broad-band coverage, suggesting that

In 1950, after what was apparently a large influx of nongraduates into the engineering profession during and after the war, the proportion of engineers with four or more years

Prior to the first complete CDIO course issue in 2014, a pilot development was carried out by teachers and selected assistant students in order to evaluate the student necessary

In addressing the factors that contribute to re-offending, correctional services often have to remedy a lifetime of combined service failure, often unaided. And when prisoners