Adding Big Earth Data Analytics
to GEOSS
GEO IX Plenary
Foz do Iguacu, 2012-nov-20
Peter Baumann, Stefano Nativi
Jacobs University Germany, CNR Italy
[gamingfeeds.com]
Research funded through EU FP7 283610 EarthServer – European Scalable Earth Science Service Environment
Features & Coverages
The basis of all: geographic
feature
•
= abstraction of a real world phenomenon [OGC, ISO]•
associated with a location relative to Earth
Special kind of feature:
coverage
•
Typical representative: raster image•
...but there is more!„Big Data“: The 4 Vs
Volume
Velocity
Variety
Veracity
Raster Data Volume
Social Networks
•
Incidence matrix of size 10^8 x 10^8 ...now do linear algebra!
Satellite Imagery
•
ngEO plannings: 10^12 images under ESA custody
HPC
•
„Even with multi-terabyte local disk sub-systems and multi-petabyte archives, I/O can become a bottleneck in HPC.“• -- Jeanette Jenness, LLNL, ASCI-Project, 1998
•
„Users download 10x more data than needed“ • -- Kerstin Kleese van Dam, 2002Raster Data Velocity
NASA MODIS instrument on board of
AQUA & TERRA
•
~ 1 TB per day
LOFAR: distributed sensor array farms
for radio astronomy
•
3 GB per second per station sustained, consolidated into 2 – 3 PB per year
M. Stonebraker:
Sensor, image, model, & statistics data
•
Life Science: Pharma/chem, healthcare / bio research, bio statistics, genetics, ...•
Geo: Geodesy, geology, hydrology, oceanography, meteorology, earth system, ...•
Engineering & research: Simulation & experimental data in automotive/shipbuilding/ aerospace industry, turbines, process industry, astronomy, high energy physics, ...•
Management/Controlling: Decision Support, OLAP, Data Warehousing, census, statistics in industry and public administration, ...•
Multimedia: e-learning, distance learning, prepress, ...
„80% of all data have some spatial connotation“ [C&P Hane, 1992]
Raster Data Variety
MultiSolid Coverage
n-D "
space/time-varying phenomenon
"
•
[ISO 19123, OGC 09-146r2]Raster Data Variety: Coverages
«FeatureType» Abstract Coverage MultiPoint Coverage MultiCurve Coverage MultiSurface Coverage Grid Coverage Referenceable GridCoverage Rectified GridCoverage
Raster Data Veracity
Both measured and computed data need to
carry
quality information
as part of provenance
Sometimes established (costly!) procedures
for error estimation, sometimes not
Ex: Satellite image processing, from L0 to L2
•
Many quality criteria determined, but hardwired•
error propagation by far not always customary
What to do with this information?
•
Complicates life of data consumer dramatically!Let’s Take a Closer Look...
Remember? „Users download 10x more data than needed“
[Kerstin Kleese van Dam, 2002]
t
Divergent access patterns
for ingest and retrieval
Server must
mediate
between access patterns
[Diedrich et al 2001]
Use Case:
Raster DBMS
for massive
n-D
raster data
rasql = SQL with integrated raster processing
•
Tile-based architecture
•
n-D array set of n-D tiles
Extensive
optimization, hw/sw parallelization
In operational use
•
dozen-Terabyte objects•
Analytics queries in 50 ms on laptopThe rasdaman Raster Analytics Server
select img.green[x0:x1,y0:y1] > 130
from LandsatArchive as img
Heterogeneous federation / cloud
•
Can optimize for data location, transport volume, node load, ...
Work in progress
select encode(
( (A.nir - A.red) / (A.nir + A.red)
- (B.nir - B.red) / (B.nir + B.red)
), “HDF5“ ) from A, B
array A
select
encode(
(B.nir - B.red) / (B.nir + B.red), “array-compressed“ )
from B select
encode(
(A.nir - A.red) / (A.nir +A.red), “array-compressed“ )
from A
Query Processing in a Federation
Raster Query Language: ad-hoc
navigation, extraction, aggregation, analytics
Time series
Image processing
Summary data
Sensor fusion
& pattern mining
3D Clients: Experiments
Problem: coupling DB / visualization
Approach:
•
deliver RGBA image to X3D client, transparency as height•
Feed directly into client GPUselect
encode(
{ red: (char) s.b7[x0:x1,x0:x1],
green: (char) s.b5[x0:x1,x0:x1],
blue: (char) s.b0[x0:x1,x0:x1],
alpha: (char) scale( d, 20 )
},
"png" )
EarthServer: Big Earth Data Analytics
Scalable On-Demand Analytics & Fusion for all Earth Sciences
•
11 partners (lead: JacobsU), 7 mUS$ budget, 2011-sep-01 – 2014-aug-31
6 * 100+ TB databases for all Earth sciences + planetary science
www.earthserver.eu