• No results found

Spatial Big Data. Shashi Shekhar

N/A
N/A
Protected

Academic year: 2021

Share "Spatial Big Data. Shashi Shekhar"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Spatial Big Data

Shashi Shekhar

McKnight Distinguished University Professor

Department of Computer Science and Engineering, University of Minnesota

www.cs.umn.edu/~shekhar

AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012) More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM

(2)

Research Theme 1: Spatial Databases

only in old plan Only in new plan In both plans

Evacutation Route Planning

Parallelize Range Queries

Storing graphs in disk blocks Shortest Paths

(3)

Theme 2 : Spatial Data Mining

Nest locations Distance to open water

Vegetation durability Water depth

Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35

(4)

4

Outline

Motivation

What is Spatial Big Data (SBD)?

SBD and Science

SBD Analytics

(5)

Big Data

Mining and analyzing these big new data sets can open the door to a new wave of innovation, accelerating productivity and economic growth. Some economists, academics and business executives see an opportunity to move beyond the payoff of the first stage of the Internet, which combined computing and low-cost

communications to automate all kinds of commercial transactions.

Estimated Value >Usd 1 Trillion per year by 2020 Location-based service: usd 600 B

Health Informatics: usd 300 B

Manufacturing: …

(6)

6

Spatial Big Data Definitions

Spatial datasets exceeding capacity of current computing systems

• To manage, process, or analyze the data with reasonable effort

• Due to Volume, Velocity, Variety, …

SBD Components

• Data-intensive Computing: Cloud Computing

• Middleware, e.g., Map-Reduce, Pregel, Big-Table, …

• Big-Data analytics, e.g., data mining, machine learning, computational statistics, …

• Big Data science and societal applications

• Ex. Social media datasets, e.g., Google Flu Trend

• Which patterns may be detected in these datasets?

• Flu outbreaks ?

(7)

7

Traditional Spatial Data

Spatial attribute:

Neighborhood and extent

Geo-Reference: longitude, latitude, elevation

Spatial data genre

Raster

: geo-images e.g., Google Earth

Vector:

point, line, polygons

Graph

, e.g., roadmap: node, edge, path

Raster Data for UMN Campus Courtesy: UMN

Vector Data for UMN Campus Courtesy: MapQuest Graph Data for UMN Campus

(8)

8

Raster SBD

LiDAR & Urban Terrain

Change Detection Feature Extraction

Average Monthly Temperature (Courtsey: Prof. V. Kumar)

Data Sets >> Google Earth

Geo-videos from UAVs, security cameras Satellite Imagery (periodic scan), LiDAR, … Geo-sensor networks

Climate simulation, EPA Air Quality

Example use cases

Patterns of Life

(9)

9

Weekday GPS track for 3 months

Patterns of life

Activity Space: Usual places and visits

Rare places,

Rare visits

Work Home Club Farm Morning 7am – 12am Afternoon 12noon – 5pm Evening 5pm – 12pm Midnight 12midnight – 7pm Total Home 10 2 15 29 54 Work 19 20 10 1 50 Club 4 5 4 15 Farm 1 1 Total 30 30 30 30 120

(10)

10

Vector SBD from Geo-Social Media

Vector data sub-genre

Point: location of a tweet, Ushahidi report, checkin,

Line-strings, Polygons: roads in openStreetMap

Use cases: Persistent Surveillance

Outbreaks of disease, Disaster, Unrest, Crime, …

Hot-spots, emerging hot-spots

(11)

11

Persistent Surveillance at American Red Cross

Even

before cable news

outlets began reporting the

tornadoes

that ripped through

Texas

on

Tuesday, a

map

of the state began blinking red on a screen in the

Red Cross' new social

media monitoring center

,

alerting

weather watchers that something was happening in the

hard-hit area

. (AP, April 16

th

, 2012)

(12)

12

Graphs SBDs: Temporally Detailed

Spatial Graphs, e.g., Roadmaps, Electric grid, Supply Chains, …

Temporally detailed roadmaps

[Navteq]

Use cases:

Accessibility by time of week

,

Best start time

, Best route at

different start-times

(13)

13

Outline

Motivation

What is Spatial Big Data (SBD)?

SBD and Science

SBD Analytics

(14)

Big Data and Science

Nature, 7209(4), September 4, 2008

"Above all, data on today's scale require scientific and

computational intelligence. Google may now have its critics, but no one can deny its impact, which ultimately stems from the cleverness of its informatics. The future of science depends in part on such cleverness again being applied to data for their own sake,

complementing scientific hypotheses as a basis for exploring today's information cornucopia."

Science in the Petabyte Era –

• Increasing Volume

• Heightened Complexity

(15)

Preparing Science for Big-Data

Nature, 7209(4), September 4, 2008

Big Data Translates into Big Opportunities...

and Big Responsibilities

Sudden influxes of data have transformed researchers' understanding of nature before — even back in the days when 'computer' was still a job description.

Unfortunately, the institutions and culture of science remain rooted

in that pre-electronic era.

Taking full advantage of electronic data

will require a great deal of additional infrastructure, both technical

and cultural

(16)

Models in Science

Science: understand natural world

Subjective  Objective, (transparent, reproducible) Methods: Forward models, Backward models

Engineering: Solve problems optimizing cost, efficiency, etc. Models Manual (Paper,

Pencil, Slide-rules, log-tables, …)

Assisted by computers (HPCC, cyber-infrastructure, data-intensive, big-data)

Forward Differential

Equations (D.E.), Algebraic equations, …

Computational Simulations using D.E.s, Agent-based models, etc.

Backward Parametric models, e.g. Regression, Correlations, sampling, Experiment design, Hypothesis testing, …

Bayesian: resampling, local regression, MCMC, kernel density estimation, neural networks, generalized additive models, …

Frequentist: frequent patterns, Model ensembles, hypothesis generation, … Exploratory Data Analysis: data visualization, visual analytics,

geographic information science, spatial data mining, …

(17)

17

Outline

Motivation

What is Spatial Big Data (SBD)?

SBD and Science

SBD Analytics

SBD Infrastructure

(18)

Pre-Electronic Era Models: Example 1

1854 Cholera in London

Broad St. water pump except a brewery

Recent Decades

(19)

From Hotspots To Mean Streets

19

• Complication Dimensions

• Spatial Networks

Time

• Challenges:

Trade-off

b/w

Semantic richness and

(20)

KMR Routes (10) – thick lines, Crimestat K-Means (10) – ellipses, Roads – gray lines, Burglaries - points

Innovative

Technique:

K Main Routes (KMR)

(21)

Pre-Electronic Models: Example 2

Location Prediction

Models to predict location, time, path, …

Nest sites, minerals, earthquakes, tornadoes, …

Pre-electronic models, e.g. Regression

Assumed i.i.d

To simplify parameter estimation

Least squares – easy to hand-compute

Alternatives

Spatial Autoregression,

Geographic Weighted (Local) Regression

Parameter estimation is compute-intensive!

Next

 Non-i.i.d errors: Distance based

Spatio-temporal vector fields (e.g. flows, motion)

ε

Wy

y

ρ

SSE n n L      2 ) ln( 2 ) 2 ln( ln ) ln( 2

W I

(22)

River Station

Example 3: Global vs. Local Regression

Example: Lilac Phenology data

Yearly date of first leaf and first bloom 1126 locations in US & Canada

―Global‖ regression model shows a mystery

Postive Slope => blooms delayed in recent years!

Spatial decomposition solves the mystery

East of Mississippi, West of Mississippi

Each half has Negative Slope => blooms earlier in recent years! However slopes are different across east & west

(23)

23

Outline

Motivation

What is Spatial Big Data (SBD)?

SBD and Science

SBD Analytics

(24)

24

Spatial Big Data (SBD) Summary

SBD are becoming available

Geo-social Media, Geo-Sensor Networks, Geo-Simulations, VGI, …

Big Opportunities

Data:

Quicker detection of disease outbreaks, e.g., Google Flu Trends

 Multi-decade large-area studies, e.g., Gulf Study, Exposomics, …

Intervention

:

 How can geo-social network induce desired behavior?

 Health effects of friends, e.g., smoking, drinking, exercise, nutrition, optimism, …

Large scale Collaboration on Complex Questions

Studies with thousands of doctors and hundred million humans

... and Big Responsibilities

Institutions and culture of science remain rooted in that pre-electronic era.

 Ex. Hotspots to Mean Streets

(25)

25

25

CCC Workshop: Spatial Computing Visioning (9/10-11/2012)

References

Related documents

The main wall of the living room has been designated as a "Model Wall" of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

It’s the duty of the Federal Ministry of Labour and Productivity (Inspectorate Division) to enforce the Factories Act of 1990, while the Labour, Safety, Health and Welfare Bill

By studying how the deposits of two landslides in northern Ice- land evolved through time, we have shown for the first time that molards in permafrost terrains are cones of loose

impact Two or more of the following people are likely to read your Results section, and all of them need to understand whether you are talking about your results or those of

Today the cloud based platforms provide no cost services (Jones and Sclater, 2009) to educational institutions like mail, messaging and collaboration tools (e-mail, contacts,

DICAL HOUSE gifts and wine hampers are always well received, and there is a hamper for every taste so step inside the flagship Store located on the outskirts of Mosta, or if more

Ninety cases with simple cysts having Ca-125 level <35 U/ml underwent laparoscopy guided cyst aspiration followed by excision (Group I) while thirty-five cases with simple cysts

We will find a general pattern for all utility functions with con- vex absolute risk aversion: when assets are observable (second best), the allocation has a more concave