Copernicus and Big Data:
Challenges and Opportunities
Alessandro Annoni
European Commission
Joint Research Centre
www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting legislation
Big Data definition
Factors to be considered are:
Volume, available data volumes are now larger; such volumes outstrip
traditional storage and analysis techniques.
Velocity, due to the high rate at which data is being collected and
continuously made available.
Variety, big data comes from a great variety of sources that are generally
of three types: structured, semi structured and unstructured.
Veracity, data sources (even in the same domain) are of widely differing
qualities, with significant differences in the coverage, accuracy and timeliness of the data provided.
Big Data are defined as massive data sets having large, varied and complex structures that make their further storage, analysis, visualization, and processing difficult.
What Big Data means for Copernicus
Can Copernicus being classified as Big Data ?
• Volume = about 3,000 terabytes of EO data yearly • Space Infrastructure (including ground segment) • Velocity = several terabytes of new EO data per day
• Space and Services Infrastructures
• Veracity = Sentinel data and contributing missions (virtual
constellations, virtual global coverages, time series, …)
• Space Infrastructures
• Variety = depends on future implementation of Copernicus and
downstream Services (e.g. role and relevance of in situ-networks, citizen observatories, …)
EO landscape evolution: an example
Global Land-Cover Observation capacity (*)
Since the 1970s the number of missions failing within 3 years of
launch has dropped from around 60% to less than 20%, the average
operational life of a mission has almost tripled, increasing from
3.3 years in the 1970s to 8.6 years (and still lengthening),
(*) Who launched what, when and why; trends in Global Land-Cover Observation capacity from civilian Earth Observation satellites. Alan S. Belward and Jon O. Skøien
the average number of satellites
launched per-year/per-decade has increased from 2 to 12 and spatial
resolution increased from around 80 meters to less than one meter multispectral and less than half a meter for panchromatic …..
Challenges
User community of EO increased significantly:
the example of GOOGLE
Which will be the future users of Copernicus?
• EO Professionals
• Other Private Services (e.g
Transport, Tourism, …)
• Public Authorities
• Policy and Decision Makers
• Research
• Education (including schools)
• …
• Public (all citizens) ??
Open to new market opportunities – from few to many users
BUT
Each different category could have different requirements for data management
An example of professional User: JRC support to the CAP
Community Image Data portal (CID)
online archive of satellite imagery
Contains satellite imagery
• Central catalogue to search data • Web services to access data • File-based access for internal use Currently: approx. 80 TB of data
• 150000 LR/MR images (km, 100’s m) • 20000 HR images (5 - 50 m) • 7.6 M km² VHR (< 5 m)
Licensing:
• Use of INSPIRE and GEOSS definitions • ‘Click to accept’ EULA (End User Licence
Agreement) legally and technically implemented
Manage image acquisition for (CAP) area-based controls
• nearly 7 M euro annual budget
• planning, programming of satellites, liaison with stakeholders • data acquisition, validation, QC, delivery, and storing/sharing.
Copernicus and INSPIRE
INSPIRE Directive foresees the creation of a European Spatial Data
Infrastructure. The INSPIRE Geoportal is the Central access point to the
infrastructure and resources (>300.000)
“The face” of INSPIRE
• How to better connect Copernicus and MS data and services?
• How to use Copernicus data to create/update the information required by INSPIRE?
Extend INSPIRE to other areas and integrate with
Copernicus Services
Open Research Data
• Open Science requires Open Access to data
• Data should be easily accessible and usable within research
Copernicus Data Tsunami
Are we looking to protect ourselves ?
Or do we want to profit of this opportunity to produce new energy?
Opportunities
Publiclaboratory.com
Waspmotes
• Mobile phones sensing better in some fields than others (e.g. noise)
• Drones some limitations
relating to regulatory framework
• Waspmotes need
programming and issues of calibration and
response time but opportunities high..
Massive diffusion of cheap sensors provides new
opportunities and challenges
More than 20million Tweets and 1 million Flickr images retrieved and analysed for fires South of France Spatio-temporal clustering and analysis shows 80% of fires correctly detected
Large scale experiment at JRC to assess quality of
social network data: forest fires
From Data to Processes
If you have BIG Data from multiple sources, you cannot move the data for processing, need to move the analysis and processing to the data.
To support multi-disciplinary research we need also to develop a shared understanding of what do you do with the data? How do you frame a problem and possible solution according to
different disciplinary approaches.
This quest requires to describe not just the data, but also processes or workflows, leading to new executable web services that are understood across disciplines.
Conclusions
• Copernicus is a revolution for Europe both in terms of amount of
data and for open policies
• We need to consider a broader user community.
• In order to be able to serve all users we need to identify new ways for
data dissemination (e.g. from small to large, from pixel extraction to area based) and processing (e.g. spatio-temporal time series,
distributed geo-processing,..)
• Copernicus will have a positive impact on INSPIRE, Research and
other relevant initiatives if it will be able to interoperate with them
• Also EO Professionals could better benefit if Copernicus will not
stop at the level of pure data dissemination
• This ws will be an excellent opportunity to identify solutions and