Big Data Infrastructures for
Processing Sentinel Data
Wolfgang Wagner
Department for Geodesy and Geoinformation
Technische Universität Wien
Earth Observation Data Centre for Water
Resources Monitoring
What is Big Data?
Sven Schade (2015) describes the Big Data era as
• “a situation in where the volume, variety, velocity and veracity (3+1 Vs) in which data sets and streams become available challenges current
management and processing capabilities” Schade, S. (2015) Big Data breaking barriers - first steps on a long trail, ISPRS Archives, XL-7/W3, 691-697.
Big Data, Big Hype?
• Steve Dodson (2014) in
An intrusion of privacy
A successful business model of few
big primarily American enterprises
Infrastructures for Processing Big Data
Google Council Bluffs data center (http://www.google.com/about/datacenters/gallery/#/all/2)
Sentinel Programme
A fleet of European earth observation
satellites for environmental monitoring
Sentinel-1 – A Game Changer
C-band SAR satellite in
continuation of ERS-1/2 and
ENVISAT
High spatio-temporal coverage
• Spatial resolution 20-80 m • Temporal resolution < 3 days
over Europe and Canada – with 2 satellites
Excellent data quality
Highly dynamic land surface
processes can be captured
• Impact on water management, health and other applications could be high if the challenges in the ground segment can be overcome
Solar panel and SAR antenna of Sentinel-1 launched 3 April 2014. Image was acquired by
the satellite's onboard camera. © ESA
Sentinel-1 Image of Upper
Austria taken on 13/04/2015
Sentinel-1 Data Volume
From Byte to PetaByte
1 KiloByte
1 MegaByte
1 Byte
Speed of Data Transmission
Download of 500 Gigabyte
(daily Sentinel-1 data volume over land)• Wireless with 7 Mbit/s
• Landline with 1 Gbit/s
Download of 1 Petabyte
(7 years of Sentinel-1 data over land)• Landline with 1 Gbit/s
Speed of Data Processing
Assumed processing speed of Sentinel-1 data with one
computer/node ~
4 Mbit/s
Processing of 500 Gigabyte
(daily Sentinel-1 data over land)• 1 computer
Processing of 1 Petabyte
(7 years of Sentinel-1 data over land)• 1 computer • 100 nodes • 1000 nodes
One needs
supercomputers for
processing Sentinel
data!
Approaching Technological Frontiers?
Information and communications technology
(ICT) has improved
dramatically over the past decades
• Moore’s law, which states that the number of transistors in a dense integrated circuit doubles approximately every two years, still holds
But there are physical limits to every technology!
• e.g. for any thermodynamic cycle operating between temperatures and none can exceed the efficiency of a Carnot cycle: = 1 − ⁄
Increasingly we face challenges
related to
• Data volume • Bandwidth and I/O • Algorithmic complexity
Earth Observation Ground Segment
Earth Observation Ground Segment
Present
Earth Observation Ground Segment
A New Paradigm for Earth Observation
Reasons
• Fast growing volume and increasing variety of EO data • Increasing complexity of algorithms with increasing resolution • Higher scientific standards
– Algorithms must be validated with big data sets and competing algorithms – Algorithms ensembles needed
Solution
Consequence
Bring users and their software to the data
Need for cooperation & specialisation
An Opportunity for New Business Models
Business Model of Munich-based company CloudEO
Big Data Infrastructures for the Sentinels
Private Sector
• Google Earth Engine • Amazon Web Services
– Offers Landsat data (complete from 2015 onwards) for its cloud user • Helix Nebula Science Cloud
– Consortium of European ICT providers teaming up with ESA, CERN, etc. • etc.
Public Sector
• Initiatives trigged mainly by national space programmes – THEIA Land Data Centre (France)
– Climate, Environment and Monitoring from Space (CEMS) (UK) – OPUS/Copernicus Centre (Germany)
• European Space Agency
– Thematic Exploitation Platforms – Mission Exploitation Platforms • etc.
Google Earth Engine
Premier platform for the scientific analysis of high-resolution imagery
• Combines the strength of an ICT giant with expertise in earth observation (team of > 100 programmers)
• Rolled out on three Google data centres (US, Europe, Asia) • Access through Java Script or Python API
• Programming in “Googlish”, i.e. code can only run on Google Earth Engine • Image-oriented data structure, including image pyramids for interactive
analysis
• Commercial applications are not free
• Data download possible (original and processed data) – Landsat: complete archive
– MODIS: many geophysical variables – Sentinel-1: already about 10.000 scenes – Sentinel-2: will likely follow soon
Snapshot of Google Earth Engine Interface showing Sentinel-1 data holding as of 4/9/2016 (https://ee-api.appspot.com)
Earth Observation Data Centre (EODC)
Founded in May 2014 as a Public-Private Partnership
Mission
• EODC works together with its partners from science, the public- and the
privatesectors in order to foster the use of EO data for monitoring of water and land
EODC acts as a community facilitator
Joint developments
• Cloud infrastructure • Operational data services • Software
– Open Source
EODC works towards a federation of data
Work is done within
the “Communities”
• Infrastructure • Sentinel-1 • Sentinel-2 Already 13
Cooperation
Partners from 6
countries
• Austria, Australia, Czech Republic, Italy, France, The NetherlandsEODC Cooperation Network
EODC Infrastructure in Vienna
24/7 Operations & Rolling Archive
Petabyte-Scale Disk Storage Supercomputer Tape Storage Virtual Machines (VMs)
VSC-3 Rank 85 of the World‘s most powerful computers (11/2014)
EODC Status
Operations started in June 2015 after a one year development phase
• Operational data reception and processing by ZAMG • Computer cluster to operated by EODC
– Virtual Machines via OpenStack – Cloud Services
• Supercomputer VSC-3 operated by TU Wien
PaaS
VSC-3 Login
Node NORA Router
Continuous Integration User VMs Job Scheduler Repositories High Availabilty
Data and Platform Services
Various Inspection
Tools
Community File Repository
Web Conferencing
Development Collaboration Community Building
Sentinel-1 Data Availability @ EODC
Sentinel-1 data are currently available ~2,5 hours after its processing time
and 6,25 hours after acquisition time
(median value for August 2015)
54888 acquisitions with 39.65 TB
(>1,5 times our 10-year ENVISAT ASAR archive)Ramp-up of Sentinel-1 acquisition scenario to full
Supercomputing Experiment
Vienna Scientific Cluster 3
• High-performance computing (HPC) system with 2020 nodes
• Each node has 2 processors Intel Xeon E5-2650v2, 2.6 GHz, and 64 Gbytes of RAM
• Simple Linux Utility for Resource Management (SLURM)
Experiment
• Geocoding of 624 Sentinel-1 images from Austria, Sudan and Zambia with Sentinel-1 toolbox
• Each image is about 1 Gbyte in size
• Serial processing with one processor would take about two weeks
Approach
• Parallel processing on 312 nodes whereas 2 images were simultaneously launched on a single computing node
Results
Conclusions
Earth Observation is entering the Big Data era
Big Data infrastructures for processing of Sentinel data are being
developed along two main lines
• Deploy EO specific services on general-purpose cloud computing environments
• Building of new, or expansion of existing dedicated EO data centres
Acknowledgements
My colleagues at TU Wien and EODC: Christian Briese, Vahid Naeimi, Bernhard Bauer-Marschallinger, Christoph Paulik, Alena Dostalova, Stefano Elefante, Thomas Mistelbauer, Hans Thüminger, and Andreas Roncat
Austrian Space Application Programme: Projects 844350 “Prepare4EODC” and 88001 “WetMon” European Space Agency: Contract No. 4000107319/12/I-BG “EODC Water Study”
Big Data is a broad term for data sets so large or complex that traditional data processing applications are inadequate.