• No results found

Adding Big Earth Data Analytics to GEOSS

N/A
N/A
Protected

Academic year: 2021

Share "Adding Big Earth Data Analytics to GEOSS"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Adding Big Earth Data Analytics

to GEOSS

GEO IX Plenary

Foz do Iguacu, 2012-nov-20

Peter Baumann, Stefano Nativi

Jacobs University Germany, CNR Italy

[gamingfeeds.com]

Research funded through EU FP7 283610 EarthServer – European Scalable Earth Science Service Environment

(2)

Features & Coverages

The basis of all: geographic

feature

= abstraction of a real world phenomenon [OGC, ISO]

associated with a location relative to Earth

Special kind of feature:

coverage

Typical representative: raster image

...but there is more!

(3)

„Big Data“: The 4 Vs

Volume

Velocity

Variety

Veracity

(4)

Raster Data Volume

Social Networks

Incidence matrix of size 10^8 x 10^8 ...now do linear algebra!

Satellite Imagery

ngEO plannings: 10^12 images under ESA custody

HPC

Even with multi-terabyte local disk sub-systems and multi-petabyte archives, I/O can become a bottleneck in HPC.“

-- Jeanette Jenness, LLNL, ASCI-Project, 1998

„Users download 10x more data than needed“ • -- Kerstin Kleese van Dam, 2002

(5)

Raster Data Velocity

NASA MODIS instrument on board of

AQUA & TERRA

~ 1 TB per day

LOFAR: distributed sensor array farms

for radio astronomy

3 GB per second per station sustained, consolidated into 2 – 3 PB per year

M. Stonebraker:

(6)

Sensor, image, model, & statistics data

Life Science: Pharma/chem, healthcare / bio research, bio statistics, genetics, ...

Geo: Geodesy, geology, hydrology, oceanography, meteorology, earth system, ...

Engineering & research: Simulation & experimental data in automotive/shipbuilding/ aerospace industry, turbines, process industry, astronomy, high energy physics, ...

Management/Controlling: Decision Support, OLAP, Data Warehousing, census, statistics in industry and public administration, ...

Multimedia: e-learning, distance learning, prepress, ...

„80% of all data have some spatial connotation“ [C&P Hane, 1992]

Raster Data Variety

(7)

MultiSolid Coverage

n-D "

space/time-varying phenomenon

"

[ISO 19123, OGC 09-146r2]

Raster Data Variety: Coverages

«FeatureType» Abstract Coverage MultiPoint Coverage MultiCurve Coverage MultiSurface Coverage Grid Coverage Referenceable GridCoverage Rectified GridCoverage

(8)

Raster Data Veracity

Both measured and computed data need to

carry

quality information

as part of provenance

Sometimes established (costly!) procedures

for error estimation, sometimes not

Ex: Satellite image processing, from L0 to L2

Many quality criteria determined, but hardwired

error propagation by far not always customary

What to do with this information?

Complicates life of data consumer dramatically!

(9)

Let’s Take a Closer Look...

Remember? „Users download 10x more data than needed“

[Kerstin Kleese van Dam, 2002]

t

Divergent access patterns

for ingest and retrieval

Server must

mediate

between access patterns

(10)

[Diedrich et al 2001]

Use Case:

(11)

Raster DBMS

for massive

n-D

raster data

rasql = SQL with integrated raster processing

Tile-based architecture

n-D array set of n-D tiles

Extensive

optimization, hw/sw parallelization

In operational use

dozen-Terabyte objects

Analytics queries in 50 ms on laptop

The rasdaman Raster Analytics Server

select img.green[x0:x1,y0:y1] > 130

from LandsatArchive as img

(12)

Heterogeneous federation / cloud

Can optimize for data location, transport volume, node load, ...

Work in progress

select encode(

( (A.nir - A.red) / (A.nir + A.red)

- (B.nir - B.red) / (B.nir + B.red)

), “HDF5“ ) from A, B

array A

select

encode(

(B.nir - B.red) / (B.nir + B.red), “array-compressed“ )

from B select

encode(

(A.nir - A.red) / (A.nir +A.red), “array-compressed“ )

from A

Query Processing in a Federation

(13)

Raster Query Language: ad-hoc

navigation, extraction, aggregation, analytics

Time series

Image processing

Summary data

Sensor fusion

& pattern mining

(14)
(15)

3D Clients: Experiments

Problem: coupling DB / visualization

Approach:

deliver RGBA image to X3D client, transparency as height

Feed directly into client GPU

select

encode(

{ red: (char) s.b7[x0:x1,x0:x1],

green: (char) s.b5[x0:x1,x0:x1],

blue: (char) s.b0[x0:x1,x0:x1],

alpha: (char) scale( d, 20 )

},

"png" )

(16)

EarthServer: Big Earth Data Analytics

Scalable On-Demand Analytics & Fusion for all Earth Sciences

11 partners (lead: JacobsU), 7 mUS$ budget, 2011-sep-01 – 2014-aug-31

6 * 100+ TB databases for all Earth sciences + planetary science

www.earthserver.eu

Advisory board:

(17)

Web Coverage Service (WCS)

Core

: Simple access to multi-dimensional coverages

subset =

trim

|

slice

WCS

Extensions

for additional functionality facets

encodings, band extraction, scaling, reprojection, interpolation, query language, data upload, ...

(18)

SWE O&M and SOS (+ friends):

specialized for sensor acquisition, some complexity

upstream acquisition

GMLCOV and WCS (+WCPS):

simple, uniform schema for all coverages; scalable; versatile processing

downstream services

coverage

server

O&M + SensorML GMLCOV + WCS Semantic Web

(19)

Propose

EarthServer platform, rasdaman, as contribution to CGI

Flexible ad-hoc

processing & filtering

Working „in situ“on existing archives; no copying!

Integrated

n-D coverage data / metadata search

Smooth integration with GEOSS Broker

Scalable

n-D interfaces using OGC

standards

WMS, WCS suite including WCPS, WPS

nD visual coverage client toolkit

1D diagrams, 2D maps, 3D data cubes, 3D timeseries sets, ...

Dynamically composed from query results

References

Related documents

The relation between the education sector and physical activity has three different aspects: physical education at school, physical activity in local communities (e.g. sport

In the words of an Ontario educator, “We are learning that students have ideas that they want to pursue, and that they have capacity to work together to complete tasks and

The significant differences we found in family satisfaction for both contrasts (adoptees versus non-adoptees, and domestic versus intercountry adoptees) were not present in other

To  calibrate  the  speed  flow  questions  parameters  ( α , β ),  we  defined  the  study  section  (MAC  reader  data  collection  location  pair)  for 

A number of “bus lines” operates on the road network. A bus line is identified by a sequence of road network arcs, in which a group of buses runs through them,.. Auto lane

Proposition 3 (general case) Non positiveness of one of the identified covariance ele- ments across the reference states is a sufficient condition for the existence of a non

Integration Integration Data and Data Services Repository/ Artifact Governance Operations Mangement Administration, management and monitoring Manage Execute