Understanding Global Change
from Data
Karsten Steinhaeuser
University of Minnesota
[email protected]
www.umn.edu/~ksteinha
Global Change:
A Defining Issue of our Era
Global Change:
A Defining Issue of our Era
Global Change:
A Defining Issue of our Era
Industrialization & Modernization
Population Growth & Demographic Shifts
Global Change:
A Defining Issue of our Era
Land Use Change
Urbanization Deforestation
Industrialization & Modernization Population Growth &
Global Change:
A Defining Issue of our Era
Population Growth & Demographic Shifts Industrialization & Modernization Climate Change Land Use Change
Global Change:
A Defining Issue of our Era
Biodiversity Loss & Natural Disasters
Monoculture Ocean Acidification
Industrialization & Modernization Population Growth &
Demographic Shifts Land Use Change Climate Change Fires Cyclones
Global Change:
A Defining Issue of our Era
Natural Disasters
THIS IS GLOBAL CHANGE
BiodiversityLoss
Industrialization & Modernization Population Growth &
Demographic Shifts
Land Use Change
Climate Change
Responding to Societal Needs
•
Where is population growth putting pressure on urban
infrastructure and natural resources?
•
What is the interplay between the global climate system,
local ecosystems and natural disasters?
•
How does increased biofuel production impact crop
patterns and food availability?
•
How do changing oceans affect the atmosphere and
land climate?
•
What are the major feedback mechanisms among
eco-climatic processes?
• Climate Models • Reanalysis Data • River Discharge • Agricultural Statistics • Population Data • Air Quality • …
Transformation: Data-Poor to Data-Rich
• Satellite Data – Spectral Reflectance – Elevation Models – Nighttime Lights – Aerosols • Oceanographic Data – Temperature – Salinity – Circulation“The future of science depends […] on cleverness being
applied to data for their own sake, complementing
scientific hypotheses as a basis for exploring today’s information cornucopia.”
Global Change is a Big Data Problem
Scale and nature of the data offer numerous
challenges and opportunities for research in the computational analysis of large datasets.
Data-driven discovery methods hold great
promise for advancing our understanding of the climate and ecosystem processes contributing to global change.
Advances are of scientific importance and
societal relevance.
"data-intensive science [is] so different that
it is worth distinguishing [it] … as a new,
Computational & Data Challenges
• Spatio-temporal, non-i.i.d., non-stationary, heteroskedastic
• High-dimensionality, massive dataset size, increasing complexity • Noise, measurement error, missing values
• Multi-scale, multi-resolution processes • Long-range spatial dependencies
• Long memory temporal dependencies • Low-frequency variability
• Heterogeneity, multiple data sources • . . .
Active Research Projects
•
GOPHER: Global Observatory for Planetary Health
and Resources
Project Aim: Monitoring of global ecosystem for
changes in land cover, land use, etc.
•
NSF Expeditions: Understanding Climate Change –
A Data Driven Approach
Project Aim: Develop novel data analysis
methods to help improve understanding
and prediction of climate change
NSF ExpeditionsGOPHER: Ecosystem Monitoring
What is the current state of the global
forest ecosystems and how are they
changing as a result of logging and natural disasters?
How are the demands of a growing population affecting agriculture, e.g., creation of new farmland, changings in cropping patterns, conversion to
biofuels, etc.?
How is urbanization affecting the surrounding ecosystem resources and water supply?
Traditional Approach for Change Detection
•
Requires high-quality imagery
– Available infrequently•
Requires high resolution
– No global coverage•
Requires training data
– Must be created manually
– Labor-intensive, time-consuming, expensive
Image-to-Image Comparison
Studies are limited to small regions and unable to identify change point or rate of change
Alternate Approach: Spatio-Temporal
Multi-Spectral Data
• Provides global coverage daily • (Relatively) coarse resolution • Sometimes poor quality
– Noisy
– Missing Data
Trade-Off
lower spatial vs. higher frequency, resolution increased coverage
opportunities and challenges for
spatio-temporal data mining
Time Series Change Detection
What happened
here?
Sudden vegetation
loss could indicate
forest fire, logging, flooding, …? Recovery period Gradual recovery suggest a fire is likely possibility.
Time Series Change Detection
Did a change happen here?
Data can be noisy (measurement error, interference, etc.) or
missing (e.g., due to
cloud cover or snow). How much information is required to identify a change?
Time Series Change Detection
Is there a decrease in vegetation?
Changes can also be
gradual, occurring
over multiple years… How to distinguish this
subtle change from
the dominant signal of seasonal phenology?
Time Series Change Detection
And what if there is
high variability
instead of a clear seasonal cycle? How does data
quality affect
change detection? Change or
natural variability?
Time Series Change Detection
Time Series Change Detection
Time Series Change Detection
Novel Change Detection Techniques
Current methods are
not adequate
to address these challenges. We focus
on developing algorithms that are:
•
Robust to missing data, noise and
outliers
•
Able to automatically characterize
different types of changes
•
Capable of incremental update
and (near) real-time detection
•
Aware of spatial context
Segmentation Approaches:
Divide time series into pieces and determine if a change occurred
Before After
Model + Predict
Prediction-Based Methods:
Build model of the “normal” behavior and predict, measure deviation
ALERTS: Automated Land change Evaluation,
Reporting and Tracking System
• Planetary Information System
for interactive investigation of ecosystem disturbances discovered by GOPHER – Forest Fires – Deforestation – Droughts – Urbanization – …
• Helps quantify carbon impact of changes, understand the relationship between climate variability and human activity • Provides ubiquitous web-
based access to changes
occurring across the globe, creating public awareness
Illustrative Examples
Large forest fires in Canada have converted the forests from a sink into source of carbon in the atmosphere.
Logging is legal
in some parts of Canada, further reducing carbon sequestration
Brazil Accounts for almost 50% of all
humid tropical forest clearing, nearly 4 times that of the next highest country.
Illustrative Examples
Examples of afforestation can be seen in several areas around the world, including this region near Beijing (China) where new trees have been planted to prevent dust storms and erosion.
One winter the Ob River caused massive flooding due to freezing of the Bay of Ob / Kara Sea.
Hurricane Katrina caused significant damage
and vegetation loss along the US Gulf Coast.
Political conflict and the ensuing “land reform” resulted in wide-spread farm abandonment and loss of
Impact on REDD+
“The [Peru] government needs to spend more than $100m a year on high-resolution satellite pictures of its billions of trees.
But … a computing facility
developed by the Planetary Skin Institute (PSI) … might help cut that budget.”
“ALERTS, which was launched at Cancún, uses … data-mining algorithms developed at the
University of Minnesota and a
lot of computing power … to spot places where land use changed.”
Understanding Climate Change:
A Data Driven Approach
•
5-year / $10M NSF Expeditions in Computing
•
Team led by UMN, consists of 15 senior personnel
and ~50 students and post-docs
•
Developing state of the art computational methods
Climate Change is Real…
•
The planet is warming•
Multiple lines of evidence•
Credible link to human GHG (green house gas) emissions•
Consequences can be dire•
The potential societal cost of both action and inaction is large•
There is an urgency to act•
Adaptation: “Manage the unavoidable”…but Understanding is Limited
CellClouds Land
Ocean
Much of what we know is derived from computer simulations of general circulation models
(mathematical equations describing the physical processes involved in climate)
“The sad truth of climate science is that the most crucial information is the
least reliable” (Nature, 2010)
Physics-based models are essential but not adequate
• Relatively reliable for projections at global scale for
smooth fields such as temperature, pressure
• Less reliable for variables that are crucial for impact
Expeditions Project Highlights
High-Performance Data Analytics Hurricane Intensity Prediction
and Land-Fall Modeling Climate Extremes and Uncertainty
Teleconnections & Sparse Predictive Modeling
-0.4 -0.2 0 0.2 0.4 0.6 0.8 lat itud e
Correlation Between ANOM 1+2 and Land Temp (>0.2)
90 60 30 0 -30 El Nino Events Nino 1+2 Index
Dipoles & Teleconnections
Crucial for understanding the climate system…
Correlation of Land temperature
anomalies with NAO Correlation of Land temperature anomalies with SOI
SOI dominates tropical climate variability,
affecting rainfall patterns over East Asia, Australia & South America, as well as droughts over North America and Africa, among others.
NAO influences sea level pressure (SLP) over
most of the Northern Hemisphere. Strong positive phases (Islandic Low/Azores High) are associated with above-avg. temperatures in the Eastern US.
Automatic Dipole Discovery
Nodes in the graph correspond to grid points, edges weighted
with correlation between anomaly time series.
Graph-based representation of the global climate system
Shared Reciprocal Nearest Neighbors (SRNN) Density
Overcomes limitations of traditional approaches such as EOF analysis
• Capable of identifying dynamic dipoles • Discovers dipoles of different strength
(not dominated by leading signals) • Comprehensive global data analysis,
Focus on Climate Extremes: Temp.
The World Climate Research Program (WCRP) ranked
“Prediction of Extremes” as the top Grand Science Challenge.
Shifted Mean
Increased Variability
Changed Shape
• Expect statistically
significantly higher trends in warming with trend of increasing heat waves
• Larger uncertainty and
regional variability than previously thought
• Even with a general warming
trend, intensity and duration of cold extremes will persist
• Certain regions exhibit
consistent cold extremes despite uncertainties
The uncertainty in projections grows with increasing precision and particularly over the tropics.
Precipitation Extremes
No increasing trendsin rainfall extremes in India during last half-century
Trend of 30-year return levels Trend of 100-year return levels
Credit: Ganguly, Ghosh, Kao, Kodra, et al.
Significant growth in intensity and frequency of rainfall at aggregate scales in the extratropics
Drought Detection
•
Major Droughts•
Persistent over space and time•
Catastrophic consequences•
Examples•
1930s North American Dust Bowl•
Late 1960s Sahel drought•
Dataset: Climate Research Unit•
http://data.giss.nasa.gov/precip_cru/•
Monthly precipitation over land•
Duration: 1901 to 2006•
Resolution: 0.5◦ latitude x 0.5◦ longitude(Sparse) Predictive Modeling
Sparse High-Dimensional Regression
•
Understanding extremes: atmospheric covariates to rainfall extremes•
Understanding teleconnections:long-range dependencies in space/time
•
Understanding hurricanes: ocean properties, trends•
Multi-model ensembles: skill selection, regional processes vs global average•
Statistical downscaling: coarse to fine scale, capture dependenciesOLS: High RMSE, High Complexity
Hurricane Track Forecasting
Physics-based Models
What if the error gets
interpolated to 10-15 day in advance forecast?
~500 km
Improving but have mean error >185km beyond 48 h
(A)
HURDAT Historic Data
Extreme Event Forecasting via
Contrast-based Network Motive Discovery
SLP SST VWS
(B)
(C) (D)Phase: Land Phase: Curve
(E)
(F)
Intuition: If an extreme event (e.g. hurricane track) is in one of its key phases (e.g. land-hitting), then there exist network motifs (recurrent patterns in climate networks) that are
specific to that phase.
Scaling I/O and Analytics
• Global Cloud Resolving Model (GCRM)
– Simulate circulation associated with large convective clouds
– Developed by David Randall (Colorado State U) and Karen Schuchardt (PNNL)
• Geodesic grid model
• 1.4 PB data per simulation
– 4 km resolution, 3 hourly, 1 simulated year – 1.5 TB per checkpoint
• Parallel NetCDF I/O library
I/O was previously a major bottleneck: The only reason execution at this scale became possible was due to I/O scaling.
Illustrative Result: Parallel NetCDF
• Improved I/O throughput
– Using PnetCDF optimizations, massive scalability
– For 3.5 km grid resolution, grid size is 41.9M cells with 256 vertical layers – Data analysis read and simulation checkpoint
Credit: Choudhary, et al.
Thank You! Questions?
UMN Team Members
Vipin Kumar, Arindam Banerjee, Shyam Boriah, Snigdhansu Chatterjee, Jonathan Foley, Joseph Knight, Stefan Liess, Shashi Shekhar, Peter Snyder, Michael Steinbach, Karsten Steinhaeuser UMN Students
Graduate: Ivan Brugere, Yashu Chamber, Xi Chen, James Faghmous, Jaya Kawale, Arjun Kumar, Michael Lau, Varun Mithal; Undergraduate: Kelly Cutler, Ryan Haasken, Zachary O’Connor, Dominick Ormsby
PSI Collaborators NSF Expeditions Collaborators
NASA Ames: Christopher Potter NCSU: Nagiza Samatova, Fredrick Semazzi Cal State Monterey Bay: Stephen Klooster Northeastern: Auroop Ganguly
Michigan State: Pang-Ning Tan Northwestern: Alok Choudhary, Wei-keng Liao PSI: Juan Carlos Castilla-Rubio North Carolina A&T: Abdollah Homaifar
website: gopher.cs.umn.edu website: climatechange.cs.umn.edu