CISE Overview and Big Data
Suzi Iacono
CISE Directorate
National Science Foundation
SI^2 Workshop
January 17, 2013
Economic Impact of IT
•
Growth of IT industry coupled with
productivity gains across the entire economy
have had enormous impact.
•
IT industries accounted for 25% of US
economic growth since 1995.
–
In 2010, IT industries grew 16% and
contributed 5% to overall US GDP
•
Use and production of IT accounted for ~2/3
of the post-1995 growth in labor productivity.
•
IT sector generates jobs: IT jobs have grown
125x faster than employment as a whole
between 2001 and 2011, and in 2011, IT
workers earned 74% more than the average
worker.
•
IT diversifies regional economies to include
idea-driven “creative” industries.
Sources: NRC (2009).
Assessing
the Impacts of Changes in the IT R&D Ecosystem
.; NRC (2012). Continuing
Innovation in Information Technology.; ITIF (2012).
Looking for Jobs? Look to IT in 2010 and Beyond.
30%&
CISE&Directorate
&&
Computing and
Communication
Foundations (CCF)
Susanne Hambrusch
Algorithmic+
Founda1ons+
Communica1on+
and+Informa1on+
Founda1ons+
So7ware+and+
Hardware+
Founda1ons+
Computer and Network
Systems (CNS)
Keith Marzullo
Computer+
Systems+
Research+
Networking+
Technology+and+
Systems+
Information and
Intelligent Systems
(IIS)
Howard Wactlar
HumanA
Centered+
Compu1ng+
Informa1on+
Integra1on+and+
Informa1cs+
Robust+
Intelligence+
70%&
CISE&Cross<Cu=ng&Programs&
CI
SE
&Co
re&Pr
ogr
ams&
Cross<FoundaAon&Programs&
30%&
CISE&Directorate
&&
Computing and
Communication
Foundations (CCF)
Susanne Hambrusch
Algorithmic+
Founda1ons+
Communica1on+
and+Informa1on+
Founda1ons+
So7ware+and+
Hardware+
Founda1ons+
Computer and Network
Systems (CNS)
Keith Marzullo
Computer+
Systems+
Research+
Networking+
Technology+and+
Systems+
Information and
Intelligent Systems
(IIS)
Howard Wactlar
HumanA
Centered+
Compu1ng+
Informa1on+
Integra1on+and+
Informa1cs+
Robust+
Intelligence+
Office of
Cyberinfrastructure
(OCI)
Alan Blatecky
70%&
CISE&Cross<Cu=ng&Programs&
CI
SE
&Co
re&Pr
ogr
ams&
Cross<FoundaAon&Programs&
Computer+
Science+&+
Informa1on+
Science+&+
Computer+
Engineering+
(CISE),+65%+
Engineering+
(excluding+
Computer+
Engineering),+
11%+
Interdisciplinary+
Centers,+3%+
Sciences+&+
Humani1es,+21%+
PI&and&Co<PI&Departments&for&FY&2011&Awards&Funded&by&CISE&
Who is the CISE community?
Research Frontiers
Data&Explosion&
Sensing,&Analysis&and&
Smart&Systems:&
Decision&
Expanding&the&Limits&
of&ComputaAon&
Image+Credit:+CCC+and+SIGACT+CATCS+
Advances in information
technologies are transforming the
fabric of our society and data
represents a transformative new
currency for science, engineering,
Where+do+the+data+come+from?+
++
The Big Data Landscape I: Big Science
•
Science gathers data at an ever-increasing
rate
across all
scales
and
complexities
of natural phenomena
•
Sloan Digital Sky Survey in 2000 collected more data in its 1
st
few weeks than had been amassed in the entire history of
astronomy
–
Within a decade, over 140 terabytes of information
collected
•
Large Hadron Collider generates scores of petabytes a year
•
The proposed Large Synoptic Survey Telescope (3.3 gigapixel
digital camera) will generate 40 terabytes of data nightly
•
By 2015, the world will generate the equivalent of
Source: Sajal Das, Keith Marzullo
Personal+
Sensing+
Public+
Sensing+
Social+
Sensing+
People<Centric&Sensing&
Actions
(
controllers
)
Percepts
(
sensors
)
Agent
(
Reasoning
)
Smart&Health&Care&
Situation
Awareness:
Humans as
sensors
feed
multi-modal data
streams
Pervasive&&&&&CompuAng&&
Social&&&&&&&&&&&InformaAcs&&
Sense+ Iden1fy+ Assess+ Intervene+ Evaluate+Emergency&Response&
Environment&Sensing&
The Big Data Landscape II: Smart Sensing, Reasoning and
Decision-making
The Big Data Landscape III:
New Paradigms for Communications
1988$
Remarkable
Pace of Innovation
VOIP
BLOGS
Today$
MOBILE
SOCIAL NETWORKS
VIDEO
The Big Data Landscape IV: The Long
Tail of Science
•
Hundreds of thousands of scientists and engineers work individually or
in small, distributed, disconnected groups – all generating data that
collectively represent an enormous, largely untapped scientific
resource
–
From running simulations, experiments, etc.
•
Making heterogeneous data across many areas of science more
homogeneous could give way to breakthroughs across all areas of
science and engineering
•
Estimated 40 exabytes of unique new information generated worldwide
in 2010
•
Only 5% of the information created is “structured,” however, in a
standard format of words or numbers; the rest are unstructured text,
voice, images, etc.
How Big is
Big
?
•
“Big Data”: “Datasets whose size
are beyond the ability of typical
database software tools to
capture, store, manage, and
analyze”
-McKinsey Global Institute, Big data: the
next frontier for innovation, competition, and
productivity, May 2011.
…Not Just Volumes of Data
•
The science of big data is not just about
volumes and velocity of data, but also
–
Heterogeneity and diversity
•
Levels of granularity
•
Media formats
•
Scientific disciplines
–
Complexity
•
Uncertainty
•
Incompleteness
•
Representation types
Why is Big Data Important?
•
Critical to transforming how science is done and to
accelerating the pace of discovery in almost every
science and engineering discipline
•
Transformative implications for commerce and economy
•
Potential for addressing some of society’s most pressing
challenges
Paradigm Shift: from Hypothesis-driven
to Data-driven Discovery
hVp://www.sciencemag.org/site/special/data/
+
hVp://www.economist.com/node/15579717
++
hVp://research.microso7.com/enAus/
collabora1on/fourthparadigm/
++
"
"
"
The"Fourth"Paradigm:"
Data!Intensive"Scien9fic"
Discovery"(2009,"
Microso7+Corpora1on).+
+
+
+
+
+
++
"
"
"
The"Economist,"The+data+
deluge+and+how+to+
handle+it:+A+14Apage+
special+report+(Feb+25,+
2010).++
+
+
+
+
++
The Age of Data:
From Data to Knowledge to Action
•
Data-driven discovery is
revolutionizing scientific exploration
and engineering innovations
•
Automatic extraction of new
knowledge
about the physical,
biological and cyber world continues
to accelerate
•
Multi-cores, concurrent and parallel
algorithms, virtualization and
advanced server architectures will
enable
data mining and machine
learning
, and
discovery and
Potential for Transformational Science &
Engineering:
From Data to Knowledge to Action
•
Integration of discipline (or media
format…) specific data, examine for
relationships
–
Disaster informatics
•
3D toxic fume images
•
Simulations of gas spread
•
Maps of census
concentrations
•
First responder on-the-
ground findings
Examples of Research Challenges
•
More data are being collected than we can store
•
Analyze the data as it becomes available
•
Decide what to archive and what to discard
•
Many data sets are too large to download
•
Analyze the data wherever it resides
•
Many data sets are too poorly organized to be usable
•
Better organize and retrieve data
•
Many data sets are heterogeneous in type, structure, semantics,
organization, granularity, accessibility …
•
Integrate and customize access to federate data
•
Utility of data is limited by our ability to interpret and use it
•
Extract and visualize actionable knowledge
•
Evaluate results
•
Large and linked datasets may be exploited to identify individuals
•
Design management and analysis with built-in privacy
A National Imperative
Source: PCAST (December 2010), “Report to the President and Congress: Designing a Digital Future…”– a
periodic congressionally-mandated review of the Federal Networking and Information Technology Research and
Development (NITRD) Program.
•
PCAST calls on the Federal government
to increase R&D investments for
collecting, storing, preserving,
managing, analyzing, and sharing the
increasing quantities of data.
•
Furthermore, PCAST observed that the
potential to gain new insights … to move
from data to knowledge to action has
tremendous potential to transform all
areas of national priority.
Administration’s Big Data Research and
Development Initiative
•
Big Data Senior Steering Group – chartered in spring 2011
under the Networking and Information Technology R&D
(NITRD) Program
–
Members from DARPA,
DOD OSD, DHS, DOE-Science, HHS, NARA,
NASA, NIST, NOAA, NSA, OFR,
USGS, etc.
–
Co-chaired by NSF (and NIH)
–
Initial charge was to come up
with a plan, a strategy
23+
Image+Credit:+
Fuqing"Zhang"and"Yonghui"Weng,"Pennsylvania"State"University;"
Frank"Marks,"NOAA;"Gregory"P."Johnson,"Romy"Schneider,"John"Cazes,"Karl"
Schulz,"Bill"Barth,"The"University"of"Texas"at"Aus9n
+
Big Data Membership
Biven&& Laura&& DOE&Science&& [email protected]&& Blatecki&& Alan&& NSFNational&Science&Foundation&& [email protected]&&
Collica&& Leslie&& NISTNational&Institute&of&Standards&and&Technology&& [email protected]&& Deift&& Abby&& NSFNational&Science&Foundation&& [email protected]&&
Downing&& Gregory&& HHSDepartment&of&Health&and&Human&Services&& [email protected]&& Espina&& Pedro&& OSTPWhite&House&Office&of&Science&and&Technology&Policy&& [email protected]&& Gerr&& Neil&& DARPADefense&Advanced&Research&Projects&Agency&& [email protected]&&
Gundersen&& Linda&& USGSU.S.&Geological&Survey&& [email protected]&& Hall&& Alan&& NOAANational&Oceanic&and&Atmospheric&Administration&& [email protected]&& Iacono&& Suzanne&& NSFNational&Science&Foundation&& [email protected]&&
Jakubek&& David&& OSDOffice&of&the&Secretary&of&Defense&PATL&& [email protected]&& Kaufman&& Daniel&& DARPADefense&Advanced&Research&Projects&Agency&& [email protected]&& Larson&& Phillip&P.&&OSTPWhite&House&Office&of&Science&and&Technology&Policy&& [email protected]&& Lee&& Tsengdar&&NASANational&Aeronautics&and&Space&Administration&& [email protected]&&
Lipman&& David&& NIHNational&Institutes&of&Health&/NLMNIH’s&National&Library&of&Medicine&/NCBI&& [email protected]&& Little&& Michael&& NASANational&Aeronautics&and&Space&Administration&& [email protected]&& Luker&& Mark&& NCONational&Coordination&Office&for&NITRD&/NITRDNetworking&and&Information&Technology&
Research&and&Development&& [email protected]&& Marth&& Lisa&& NISTNational&Institute&of&Standards&and&Technology&& [email protected]&& Muoio&& Patricia&A.&& DNI&& [email protected]&& Pantula&& Sastry&& NSFNational&Science&Foundation&& [email protected]&&
Preuss&& Don&& NIHNational&Institutes&of&Health&/NLMNIH’s&National&Library&of&Medicine&/NCBI&& [email protected]&& Quade&& Brittany&& NSFNational&Science&Foundation&& [email protected]&&
Romine&& Charles&& NISTNational&Institute&of&Standards&and&Technology&& [email protected]&& Smith&& Darren&& NOAANational&Oceanic&and&Atmospheric&Administration&& [email protected]&& Spengler&& Sylvia&& NSFNational&Science&Foundation&& [email protected]&&
Statler&& Tom&& NSFNational&Science&Foundation&& [email protected]&& Strawn&& George&& NCONational&Coordination&Office&for&NITRD&/NITRDNetworking&and&Information&Technology&
Research&and&Development&& [email protected]&& Suskin&& Mark&& NSFNational&Science&Foundation&& [email protected]&& Villani&& Jennifer&& NIHNational&Institutes&of&Health&/NIGMS&& [email protected]&& Wigen&& Wendy&&
NCONational&Coordination&Office&for&NITRD& /NITRDNetworking&and&Information&Technology&
Research&and&Development&& [email protected]&& Zhao&& Fen&& NSFNational&Science&Foundation&& [email protected]&&