1/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.
Pierre Delort
President of the French National Association of CIOs
Association Nationale des DSI.
OECD
ICCP Technology Foresight Forum
3/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.
What you have typed
Oscar nomination
The 45 queries and their topics
5/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.
US, Region atlantic center, season 2007-08
Google Flu Trends (black)
CDC (red)
Results
Etats-Unis – Propagation du Virus
Estimation de la grippe
Google Flu Trends
CDC’s Sentinel
Induction—“the glory of
science and the scandal of philosophy
CD Broad
NGS & Moore’s law
Ion proton sequencer
Bottleneck = Data Analysis
Cost per Mégabase
10 years ; cost divided by 10 000
7/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.
Important (X 3 to X 10) increase of the need of :
Sboner A, et al.: The real cost of sequencing:
higher than you think! Genome Biology 2011, 12:125.
• (bio) computer scientists ;
• (bio) statisticians.
NGS ; impact on skills
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2000 (before NGS)
2010 (NGS)
2020 (Est.)
%
o
f
seq
u
en
ci
n
g
co
st
Genome's sequencing cost
Sampling & Experiment design
Sequencing
Data Management
Data reduction & synthesis
Downstream analysis
If you are looking for a career where your
services will be in high demand, you should
find something where you provide a scarce,
complementary service to something that is
getting ubiquitous and cheap. So what’s
getting ubiquitous and cheap? Data. And
what is complementary to data? Analysis. –
Prof. Hal Varian, UC Berkeley, Chief
Economist at Google, 2008.
The last ten years
This leads to three technologies I belive will drive the future of Big Data computing :
•
In-Memory ;
•
SSD ;
•
MPP.
Sharp decrease of Solid State Device’s cost
Innovation in
DB & software
Increase of
computing power
64 bits adressable memory
(DRAM)
X 4,2 10
9
DRAM/10
2
Flash
NAND
/10
3
/50 à /10
x
X 100
+
+
9/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.
Three technologies
• In-Memory ; very fast (I/O), saves on Opex (DB tuning, energy…) but
non-persistent, requires reformatting of the software, expensive and (today) with
limited scalability ;
• SSD (Flash/NAND) ; fast (I/O) no reformating required, persistent, scalable,
extensible, cheap ; saves both on Capex and Opex (energy) ;
• MPP (MapReduce) ; very scalable (n 10
5
), fit for « low density data »
(key/value), but requires new programming skills and Opex (energy).
Source : Objective Analysis (Echelle Log.)
DRAM
Flash
Disk
Tape
CPU Cache
Bandwidth (MB/S)
Cost
11/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.
Inspiration
EXPÉRIMENTATION (2
ndParadigm)
Publication
Protection
Valorisation
Conversion
Selection
Diffusion
(BIG) DATA MINING (4
thParadigm)
Research
Firm / Innovation
Data
Hypothesis
Limiting factor
Validation
Publication
Protection
Valorisation
Exploration
Explanation
Idea generation
Impacts on Research & Innovation
Induction—“the glory of
science and the scandal
of philosophy
Public Data; a right for citizens as well as scientists ?
13/14
Big Data
© P. DELORT
Mines ParisTech, Centre de Recherche en Informatique.