Spatial Big Data
Shashi Shekhar
McKnight Distinguished University Professor
Department of Computer Science and Engineering, University of Minnesota
www.cs.umn.edu/~shekhar
AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012) More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM
Research Theme 1: Spatial Databases
only in old plan Only in new plan In both plans
Evacutation Route Planning
Parallelize Range Queries
Storing graphs in disk blocks Shortest Paths
Theme 2 : Spatial Data Mining
Nest locations Distance to open water
Vegetation durability Water depth
Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35
4
Outline
•
Motivation
•
What is Spatial Big Data (SBD)?
•
SBD and Science
•
SBD Analytics
Big Data
Mining and analyzing these big new data sets can open the door to a new wave of innovation, accelerating productivity and economic growth. Some economists, academics and business executives see an opportunity to move beyond the payoff of the first stage of the Internet, which combined computing and low-cost
communications to automate all kinds of commercial transactions.
Estimated Value >Usd 1 Trillion per year by 2020 Location-based service: usd 600 B
Health Informatics: usd 300 B
Manufacturing: …
6
Spatial Big Data Definitions
•
Spatial datasets exceeding capacity of current computing systems
• To manage, process, or analyze the data with reasonable effort
• Due to Volume, Velocity, Variety, …
•
SBD Components
• Data-intensive Computing: Cloud Computing
• Middleware, e.g., Map-Reduce, Pregel, Big-Table, …
• Big-Data analytics, e.g., data mining, machine learning, computational statistics, …
• Big Data science and societal applications
• Ex. Social media datasets, e.g., Google Flu Trend
• Which patterns may be detected in these datasets?
• Flu outbreaks ?
7
Traditional Spatial Data
Spatial attribute:
Neighborhood and extent
Geo-Reference: longitude, latitude, elevation
Spatial data genre
Raster
: geo-images e.g., Google Earth
Vector:
point, line, polygons
Graph
, e.g., roadmap: node, edge, path
Raster Data for UMN Campus Courtesy: UMNVector Data for UMN Campus Courtesy: MapQuest Graph Data for UMN Campus
8
Raster SBD
LiDAR & Urban Terrain
Change Detection Feature Extraction
Average Monthly Temperature (Courtsey: Prof. V. Kumar)
Data Sets >> Google Earth
Geo-videos from UAVs, security cameras Satellite Imagery (periodic scan), LiDAR, … Geo-sensor networks
Climate simulation, EPA Air Quality
Example use cases
Patterns of Life
9
Weekday GPS track for 3 months
Patterns of life
Activity Space: Usual places and visits
Rare places,
Rare visits
Work Home Club Farm Morning 7am – 12am Afternoon 12noon – 5pm Evening 5pm – 12pm Midnight 12midnight – 7pm Total Home 10 2 15 29 54 Work 19 20 10 1 50 Club 4 5 4 15 Farm 1 1 Total 30 30 30 30 120
10
Vector SBD from Geo-Social Media
Vector data sub-genre
Point: location of a tweet, Ushahidi report, checkin,
…
Line-strings, Polygons: roads in openStreetMap
Use cases: Persistent Surveillance
Outbreaks of disease, Disaster, Unrest, Crime, …
Hot-spots, emerging hot-spots
11
Persistent Surveillance at American Red Cross
•
Even
before cable news
outlets began reporting the
tornadoes
that ripped through
Texas
on
Tuesday, a
map
of the state began blinking red on a screen in the
Red Cross' new social
media monitoring center
,
alerting
weather watchers that something was happening in the
hard-hit area
. (AP, April 16
th, 2012)
12
Graphs SBDs: Temporally Detailed
Spatial Graphs, e.g., Roadmaps, Electric grid, Supply Chains, …
Temporally detailed roadmaps
[Navteq]
Use cases:
Accessibility by time of week
,
Best start time
, Best route at
different start-times
13
Outline
•
Motivation
•
What is Spatial Big Data (SBD)?
•
SBD and Science
•
SBD Analytics
Big Data and Science
Nature, 7209(4), September 4, 2008
"Above all, data on today's scale require scientific and
computational intelligence. Google may now have its critics, but no one can deny its impact, which ultimately stems from the cleverness of its informatics. The future of science depends in part on such cleverness again being applied to data for their own sake,
complementing scientific hypotheses as a basis for exploring today's information cornucopia."
Science in the Petabyte Era –
• Increasing Volume
• Heightened Complexity
Preparing Science for Big-Data
Nature, 7209(4), September 4, 2008
Big Data Translates into Big Opportunities...
and Big Responsibilities
Sudden influxes of data have transformed researchers' understanding of nature before — even back in the days when 'computer' was still a job description.
Unfortunately, the institutions and culture of science remain rooted
in that pre-electronic era.
Taking full advantage of electronic data
will require a great deal of additional infrastructure, both technical
and cultural
Models in Science
Science: understand natural world
Subjective Objective, (transparent, reproducible) Methods: Forward models, Backward models
Engineering: Solve problems optimizing cost, efficiency, etc. Models Manual (Paper,
Pencil, Slide-rules, log-tables, …)
Assisted by computers (HPCC, cyber-infrastructure, data-intensive, big-data)
Forward Differential
Equations (D.E.), Algebraic equations, …
Computational Simulations using D.E.s, Agent-based models, etc.
Backward Parametric models, e.g. Regression, Correlations, sampling, Experiment design, Hypothesis testing, …
Bayesian: resampling, local regression, MCMC, kernel density estimation, neural networks, generalized additive models, …
Frequentist: frequent patterns, Model ensembles, hypothesis generation, … Exploratory Data Analysis: data visualization, visual analytics,
geographic information science, spatial data mining, …
17
Outline
•
Motivation
•
What is Spatial Big Data (SBD)?
•
SBD and Science
•
SBD Analytics
•
SBD Infrastructure
Pre-Electronic Era Models: Example 1
1854 Cholera in London
Broad St. water pump except a brewery
Recent Decades
From Hotspots To Mean Streets
19• Complication Dimensions
• Spatial Networks
•
Time
• Challenges:
Trade-off
b/w
•
Semantic richness and
KMR Routes (10) – thick lines, Crimestat K-Means (10) – ellipses, Roads – gray lines, Burglaries - points
Innovative
Technique:
K Main Routes (KMR)
Pre-Electronic Models: Example 2
Location Prediction
Models to predict location, time, path, …
Nest sites, minerals, earthquakes, tornadoes, …
Pre-electronic models, e.g. Regression
Assumed i.i.d
To simplify parameter estimation
Least squares – easy to hand-compute
Alternatives
Spatial Autoregression,
Geographic Weighted (Local) Regression
Parameter estimation is compute-intensive!
Next
Non-i.i.d errors: Distance based
Spatio-temporal vector fields (e.g. flows, motion)
ε
xβ
Wy
y
ρ
SSE n n L 2 ) ln( 2 ) 2 ln( ln ) ln( 2
W IRiver Station
Example 3: Global vs. Local Regression
Example: Lilac Phenology data
Yearly date of first leaf and first bloom 1126 locations in US & Canada
―Global‖ regression model shows a mystery
Postive Slope => blooms delayed in recent years!
Spatial decomposition solves the mystery
East of Mississippi, West of Mississippi
Each half has Negative Slope => blooms earlier in recent years! However slopes are different across east & west
23
Outline
•
Motivation
•
What is Spatial Big Data (SBD)?
•
SBD and Science
•
SBD Analytics
24
Spatial Big Data (SBD) Summary
SBD are becoming available
Geo-social Media, Geo-Sensor Networks, Geo-Simulations, VGI, …
Big Opportunities
Data:
Quicker detection of disease outbreaks, e.g., Google Flu Trends
Multi-decade large-area studies, e.g., Gulf Study, Exposomics, …Intervention
:
How can geo-social network induce desired behavior?
Health effects of friends, e.g., smoking, drinking, exercise, nutrition, optimism, …
Large scale Collaboration on Complex Questions
Studies with thousands of doctors and hundred million humans
... and Big Responsibilities
Institutions and culture of science remain rooted in that pre-electronic era.
Ex. Hotspots to Mean Streets25
25