From Distributed Computing to Distributed Artificial Intelligence

(1)

From Distributed Computing to Distributed

Artificial Intelligence

Dr. Christos Filippidis, NCSR Demokritos

(2)

Big Data and the Fourth Paradigm

The two dominant paradigms for scientific discovery:

● Theory

● Experiments

large-scale computer simulations emerging as the third paradigm in the 20th century

The fourth paradigm, which seeks to exploit information buried in massive datasets, has emerged as an essential complement to the three existing paradigms

The complexity and challenge of the fourth paradigm arises from the increasing rate, heterogeneity, and volume of data generation.

● Large Hadron Collider (LHC) currently generate tens of petabytes of reduced data

per year

● observational and simulation data in the climate domain are expected to reach

exabytes by 2021

(3)

LHC Data Challenge

Starting from this event (particle collision) …

You are looking for this “signature”…



_{Data Collection}



_{Data Storage}



_Data

Processing



_{Data Collection}



_{Data Storage}



_Data

Processing

•Selectivity: 1 in 1013

 Like looking for 1 person in a thousand world populations!

 Or for a needle in 20 million haystacks!

(4)

CMS

ATLAS

LHCb

~15 PetaBytes / year

~10

10

events / year

~10

3

batch and

interactive

users

~ 20.000.000 CD / year

Concorde(15 Km) Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Mt. Blanc (4.8 Km)

(5)

(6)

Definition of Grid systems

●

Collection of geographically distributed

heterogeneous resources

“Most generalized, globalized form of distributed computing”

●

“An infrastructure that enables flexible,

secure, coordinated resource sharing among

dynamic collections of individuals,

institutions and resources”

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(16)

(17)

(18)

(19)

Exascale Challenges

● Current Petascale systems is unlike to scale to eXascale environments, due to the

disparity among computational power, machine memory and I/O bandwidth

● The exascale simulations will not be able to write enough data out to permanent

storage to ensure a reliable analysis

● Current Grid infrastructures are not user friendly and are far from efficient, for

small groups and individuals

● Grid infrastructures, when implemented by HEP VOs, tends to be centralized,

from the data point of view.

(20)

IKAROS Platform

20

android .apk

Data/Metadata-Collector Ikaros-EG plugin

“job” creation Content provider

+ mobile devices

+ WI-FI, 3G

mobile-Grid

android .apk

android .apk _{android .apk}

(21)

Elastic Transfer (eT)

●Create your Personal Storage Cloud

●Directly, transfer your files from your workstation to another PC ●Third-party Data transfer

●Flexible data & storage sharing

●You are on the road, behind fifteen firewalls, and want to share some web

application you're developing locally, or just share a set of files with someone real quick (Reverse HTTP)

(22)

Nice! So, now can I...

● Discover whether corruption in

politics is a location-based issue?

● Check what is the best route to a

house by the sea, with low rent?

● Find the ideal husband/wife?

● Determine how to improve my

(23)

Well, you kind of can...

If you

●

can read through petabytes of information

●

can determine what is useful and what is not

●

contact 30 different organizations hosting the data

●

have experts combining the data

●

visualize them in a meaningful way

(24)

(25)

Bits and pieces

●If you had individual people producing simple statements

● People need food ● Souvlaki is food

● Souvlaki contains meat

●Decipherable by machines

● <people, need, food> ● <souvlaki, is, food>

● <souvlaki, contains, meat>

●Could computers combine knowledge to be “intelligent”?

● <?,need,meat>: Who needs meat?

(26)

Distributed Artificial Intelligence to the rescue!

(27)

(28)

How does it work?

● You use MACHINES (agents will do fine...)!

● You query LOTS of resources...

● With BILLIONS of small, statements

● You REASON upon them

● You provide answers in realistic time

(29)

Challenges

●

Data providers speak different languages

●

Data providers can go offline

●

Even knowing who to ask is a problem

●

Responding in time can be challenging

(30)

SemaGrow: Distributed, Heterogeneous,

Semantic Query Processing

●Distributed queries over SPARQL endpoints

●On-the-fly mapping across data provider languages

●Adaptive to problematic data providers

●Allows complex queries

(31)

Summary

● Distributed computing allows

● Generating amazing amounts of data

● Handling amazing amounts of data

● Computational availability and fail-over

● On-demand computation power

● Security

●Distributed artificial intelligence allows

● Asking complex questions over data

● Combining data

● Generating knowledge

(32)

From Distributed Computing to Distributed

Artificial Intelligence

Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos