• No results found

Big Data Infrastructures for Processing Sentinel Data

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Infrastructures for Processing Sentinel Data"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data Infrastructures for

Processing Sentinel Data

Wolfgang Wagner

Department for Geodesy and Geoinformation

Technische Universität Wien

Earth Observation Data Centre for Water

Resources Monitoring

What is Big Data?

Sven Schade (2015) describes the Big Data era as

• “a situation in where the volume, variety, velocity and veracity (3+1 Vs) in which data sets and streams become available challenges current

management and processing capabilities” Schade, S. (2015) Big Data breaking barriers - first steps on a long trail, ISPRS Archives, XL-7/W3, 691-697.

Big Data, Big Hype?

• Steve Dodson (2014) in

An intrusion of privacy

A successful business model of few

big primarily American enterprises

(2)

Infrastructures for Processing Big Data

Google Council Bluffs data center (http://www.google.com/about/datacenters/gallery/#/all/2)

Sentinel Programme

A fleet of European earth observation

satellites for environmental monitoring

(3)

Sentinel-1 – A Game Changer

C-band SAR satellite in

continuation of ERS-1/2 and

ENVISAT

High spatio-temporal coverage

• Spatial resolution 20-80 m • Temporal resolution < 3 days

over Europe and Canada – with 2 satellites

Excellent data quality

Highly dynamic land surface

processes can be captured

• Impact on water management, health and other applications could be high if the challenges in the ground segment can be overcome

Solar panel and SAR antenna of Sentinel-1 launched 3 April 2014. Image was acquired by

the satellite's onboard camera. © ESA

Sentinel-1 Image of Upper

Austria taken on 13/04/2015

(4)

Sentinel-1 Data Volume

From Byte to PetaByte

1 KiloByte

1 MegaByte

1 Byte

(5)

Speed of Data Transmission

Download of 500 Gigabyte

(daily Sentinel-1 data volume over land)

• Wireless with 7 Mbit/s

• Landline with 1 Gbit/s

Download of 1 Petabyte

(7 years of Sentinel-1 data over land)

• Landline with 1 Gbit/s

Speed of Data Processing

Assumed processing speed of Sentinel-1 data with one

computer/node ~

4 Mbit/s

Processing of 500 Gigabyte

(daily Sentinel-1 data over land)

• 1 computer

Processing of 1 Petabyte

(7 years of Sentinel-1 data over land)

• 1 computer • 100 nodes • 1000 nodes

One needs

supercomputers for

processing Sentinel

data!

(6)

Approaching Technological Frontiers?

Information and communications technology

(ICT) has improved

dramatically over the past decades

• Moore’s law, which states that the number of transistors in a dense integrated circuit doubles approximately every two years, still holds

But there are physical limits to every technology!

• e.g. for any thermodynamic cycle operating between temperatures and none can exceed the efficiency of a Carnot cycle: = 1 − ⁄

Increasingly we face challenges

related to

• Data volume • Bandwidth and I/O • Algorithmic complexity

Earth Observation Ground Segment

(7)

Earth Observation Ground Segment

Present

Earth Observation Ground Segment

(8)

A New Paradigm for Earth Observation

Reasons

• Fast growing volume and increasing variety of EO data • Increasing complexity of algorithms with increasing resolution • Higher scientific standards

– Algorithms must be validated with big data sets and competing algorithms – Algorithms ensembles needed

Solution

Consequence

Bring users and their software to the data

Need for cooperation & specialisation

An Opportunity for New Business Models

Business Model of Munich-based company CloudEO

(9)

Big Data Infrastructures for the Sentinels

Private Sector

• Google Earth Engine • Amazon Web Services

– Offers Landsat data (complete from 2015 onwards) for its cloud user • Helix Nebula Science Cloud

– Consortium of European ICT providers teaming up with ESA, CERN, etc. • etc.

Public Sector

• Initiatives trigged mainly by national space programmes – THEIA Land Data Centre (France)

– Climate, Environment and Monitoring from Space (CEMS) (UK) – OPUS/Copernicus Centre (Germany)

• European Space Agency

– Thematic Exploitation Platforms – Mission Exploitation Platforms • etc.

Google Earth Engine

Premier platform for the scientific analysis of high-resolution imagery

• Combines the strength of an ICT giant with expertise in earth observation (team of > 100 programmers)

• Rolled out on three Google data centres (US, Europe, Asia) • Access through Java Script or Python API

• Programming in “Googlish”, i.e. code can only run on Google Earth Engine • Image-oriented data structure, including image pyramids for interactive

analysis

• Commercial applications are not free

• Data download possible (original and processed data) – Landsat: complete archive

– MODIS: many geophysical variables – Sentinel-1: already about 10.000 scenes – Sentinel-2: will likely follow soon

(10)

Snapshot of Google Earth Engine Interface showing Sentinel-1 data holding as of 4/9/2016 (https://ee-api.appspot.com)

Earth Observation Data Centre (EODC)

Founded in May 2014 as a Public-Private Partnership

Mission

• EODC works together with its partners from science, the public- and the

privatesectors in order to foster the use of EO data for monitoring of water and land

EODC acts as a community facilitator

Joint developments

• Cloud infrastructure • Operational data services • Software

– Open Source

EODC works towards a federation of data

(11)

Work is done within

the “Communities”

• Infrastructure • Sentinel-1 • Sentinel-2 

Already 13

Cooperation

Partners from 6

countries

• Austria, Australia, Czech Republic, Italy, France, The Netherlands

EODC Cooperation Network

EODC Infrastructure in Vienna

24/7 Operations & Rolling Archive

Petabyte-Scale Disk Storage Supercomputer Tape Storage Virtual Machines (VMs)

VSC-3 Rank 85 of the World‘s most powerful computers (11/2014)

(12)

EODC Status

Operations started in June 2015 after a one year development phase

• Operational data reception and processing by ZAMG • Computer cluster to operated by EODC

– Virtual Machines via OpenStack – Cloud Services

• Supercomputer VSC-3 operated by TU Wien

PaaS

VSC-3 Login

Node NORA Router

Continuous Integration User VMs Job Scheduler Repositories High Availabilty

Data and Platform Services

Various Inspection

Tools

Community File Repository

Web Conferencing

Development Collaboration Community Building

Sentinel-1 Data Availability @ EODC

Sentinel-1 data are currently available ~2,5 hours after its processing time

and 6,25 hours after acquisition time

(median value for August 2015)

54888 acquisitions with 39.65 TB

(>1,5 times our 10-year ENVISAT ASAR archive)

Ramp-up of Sentinel-1 acquisition scenario to full

(13)

Supercomputing Experiment

Vienna Scientific Cluster 3

• High-performance computing (HPC) system with 2020 nodes

• Each node has 2 processors Intel Xeon E5-2650v2, 2.6 GHz, and 64 Gbytes of RAM

• Simple Linux Utility for Resource Management (SLURM)

Experiment

• Geocoding of 624 Sentinel-1 images from Austria, Sudan and Zambia with Sentinel-1 toolbox

• Each image is about 1 Gbyte in size

• Serial processing with one processor would take about two weeks

Approach

• Parallel processing on 312 nodes whereas 2 images were simultaneously launched on a single computing node

Results

(14)

Conclusions

Earth Observation is entering the Big Data era

Big Data infrastructures for processing of Sentinel data are being

developed along two main lines

• Deploy EO specific services on general-purpose cloud computing environments

• Building of new, or expansion of existing dedicated EO data centres

Acknowledgements

My colleagues at TU Wien and EODC: Christian Briese, Vahid Naeimi, Bernhard Bauer-Marschallinger, Christoph Paulik, Alena Dostalova, Stefano Elefante, Thomas Mistelbauer, Hans Thüminger, and Andreas Roncat

Austrian Space Application Programme: Projects 844350 “Prepare4EODC” and 88001 “WetMon” European Space Agency: Contract No. 4000107319/12/I-BG “EODC Water Study”

Big Data is a broad term for data sets so large or complex that traditional data processing applications are inadequate.

References

Related documents