From Distributed Computing to Distributed
Artificial Intelligence
Dr. Christos Filippidis, NCSR Demokritos
Big Data and the Fourth Paradigm
The two dominant paradigms for scientific discovery:
● Theory
● Experiments
large-scale computer simulations emerging as the third paradigm in the 20th century
The fourth paradigm, which seeks to exploit information buried in massive datasets, has emerged as an essential complement to the three existing paradigms
The complexity and challenge of the fourth paradigm arises from the increasing rate, heterogeneity, and volume of data generation.
● Large Hadron Collider (LHC) currently generate tens of petabytes of reduced data
per year
● observational and simulation data in the climate domain are expected to reach
exabytes by 2021
LHC Data Challenge
Starting from this event (particle collision) …
You are looking for this “signature”…
Data Collection
Data Storage
Data
Processing
Data Collection
Data Storage
Data
Processing
•Selectivity: 1 in 1013 Like looking for 1 person in a thousand world populations!
Or for a needle in 20 million haystacks!
CMS
ATLAS
LHCb
~15 PetaBytes / year
~10
10events / year
~10
3batch and
interactive
users
~ 20.000.000 CD / year
Concorde(15 Km) Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Mt. Blanc (4.8 Km)Definition of Grid systems
●
Collection of geographically distributed
heterogeneous resources
“Most generalized, globalized form of distributed computing”
●
“An infrastructure that enables flexible,
secure, coordinated resource sharing among
dynamic collections of individuals,
institutions and resources”
Exascale Challenges
● Current Petascale systems is unlike to scale to eXascale environments, due to the
disparity among computational power, machine memory and I/O bandwidth
● The exascale simulations will not be able to write enough data out to permanent
storage to ensure a reliable analysis
● Current Grid infrastructures are not user friendly and are far from efficient, for
small groups and individuals
● Grid infrastructures, when implemented by HEP VOs, tends to be centralized,
from the data point of view.
IKAROS Platform
20
android .apk
android .apk
android .apk
Data/Metadata-Collector Ikaros-EG plugin
“job” creation Content provider
+ mobile devices
+ WI-FI, 3G
mobile-Grid
android .apkandroid .apk android .apk
Elastic Transfer (eT)
●Create your Personal Storage Cloud
●Directly, transfer your files from your workstation to another PC ●Third-party Data transfer
●Flexible data & storage sharing
●You are on the road, behind fifteen firewalls, and want to share some web
application you're developing locally, or just share a set of files with someone real quick (Reverse HTTP)
Nice! So, now can I...
● Discover whether corruption in
politics is a location-based issue?
● Check what is the best route to a
house by the sea, with low rent?
● Find the ideal husband/wife?
● Determine how to improve my
Well, you kind of can...
If you
●
can read through petabytes of information
●can determine what is useful and what is not
●
contact 30 different organizations hosting the data
●have experts combining the data
●
visualize them in a meaningful way
Bits and pieces
●If you had individual people producing simple statements
● People need food ● Souvlaki is food
● Souvlaki contains meat
●Decipherable by machines
● <people, need, food> ● <souvlaki, is, food>
● <souvlaki, contains, meat>
●Could computers combine knowledge to be “intelligent”?
● <?,need,meat>: Who needs meat?
Distributed Artificial Intelligence to the rescue!
How does it work?
● You use MACHINES (agents will do fine...)!
● You query LOTS of resources...
● With BILLIONS of small, statements
● You REASON upon them
● You provide answers in realistic time
Challenges
●
Data providers speak different languages
●
Data providers can go offline
●
Even knowing who to ask is a problem
●
Responding in time can be challenging
SemaGrow: Distributed, Heterogeneous,
Semantic Query Processing
●Distributed queries over SPARQL endpoints
●On-the-fly mapping across data provider languages
●Adaptive to problematic data providers
●Allows complex queries
Summary
● Distributed computing allows
● Generating amazing amounts of data
● Handling amazing amounts of data
● Computational availability and fail-over
● On-demand computation power
● Security
●Distributed artificial intelligence allows
● Asking complex questions over data
● Combining data
● Generating knowledge
From Distributed Computing to Distributed
Artificial Intelligence
Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos