Processing Biological Data in i-Marine
Gianpaolo Coro
ISTI-CNR
Facilities and Experience - Summary
We will show results about:
•
Ecological Niche Modeling
•
Environmental Features Analysis
•
Time Series Analysis
•
Climate changes and impact on species
•
Vessels Monitoring Systems
•
Occurrence Points Reconciliation
•
Taxa Reconciliation
The Context of Data Processing in iMarine
External
Geospatial Repositories
MyOcean
Data Discovery
Data Access
Data Processing
OGC
CSW
Geonetwork
OGC - WCSOGC
WPS
WPS 52N
P1
P2
P..
WPS Hadoop Hadoop ClusterWorld Ocean
Atlas
Distributed Storage SystemData Visualization
OGC WMS, WFS GeoServergCu
be
Da
ta
st
ag
ing
D4S Statistical Manager D4Science Cluster Windows AzureWe will show experiments that have been performed by means of the
collaborative
iMarine Data Processing Facilities
OGC - WFS
OpenDAP
GBIF
Obis
Latimeria chalumnae (Smith, 1939)
Presence Points (FishBase) Aquamaps Native Distribution Absence Points Ar tifici al N eura l N et wo rk Artificial Neural NetworkWe used…
•
Presence information
from FishBase
•
Absence information
simulated through
Aquamaps
•
Environmental information
from Aquamaps
Depth; Bottom and Mean Annual Salinity; Bottom and Surface Temperature; Mean Annual Primary Production; Distance from Land; Sea Ice Concentration.
To
train
an
Artificial Neural Network
and
project
a
native and suitable environment for the Coelacanth
Projection
Habitat Representativeness Score
1. How representative is an environmental feature set with respect to the projection area?
2. Are the features independent of each other?
3. Do the features share hidden common characteristics?
HRS: measures the
representativeness
of a set of features with respect to a
certain area
A HRS which is too high means the automatic maps could mean that the automatic maps are unreliable
HRS = 10.58
HRS = 10.61
Features Clustering
Presence Points (FishBase + Obis)Density Based Clustering
DBSCAN
(with outliers)
Other methods are also
available …
K-Means
Climate changes
and
Species Occupancy in Time
Impact of climate change over 20 years on
11549 species.
Goldback Anthias
The occupancy decreases
in Area 71 and increases in
Area 77
Analysis on the Aquamaps
Data Enrichment
by means of the
e-Infrastructure
Vessels information
processing workflow
to calculate Fishing
Monthly Effort
Alternative ways for vessels activity
classification
Environmental Signal Analysis
17.59; 41.37
We traced the Spectrogram
We automatically detected a
periodicity in the trend
Fequencies in 10-8 Hz
Periodicity of 12 months
We took data from the
MyOcean reporitory (NetCDF
Format)
Occurrence Data from GBIF
Occurrence Data from OBIS
Occurrence Data from WoRMs
∩
Intersection-Difference
ᴜ
UnionA
x,y
Event Date
Modif Date
Author
Species Scientific
Name
d(x,y) < Distance Thr LexicalDistance(A.Author,B.Author) LexicalDistance (A.SciName,B.SciName) > Lexical Thr<Take the most recent>
Evaluate
B
x,y
Event Date
Modif Date
Author
Species Scientific
Name
FIN Taxa Match
Steps:
•
Nomalization
•
Stemming
•
Phonetic Transformation
•
Lexical Distance
Integration with the Infra : 1h
Interface Generation Time: 0s
52,6% Match
The muzzle is short and moderately pointed. The nose does not extend much past the mouth, is not bulbous, and the nostrils point ahead. [..]
New Zealand fur seals have rather generic southern fur seal features. The muzzle is moderately long, flat, and pointed, with a fleshy, somewhat bulbous nose [..]
52,6%
17% 14%
37%
31%
Antarctic Fur Seal
New Zealand Fur Seal
Killer Whale
Southern Elephant Seal 14%
Descriptions and Habitat
Semantic Distances
Future Work
Native Today
Native 2050
Numerical comparison
between remote
distribution maps
Time Series Forecast
and
Anomalies Detection
Conclusions
The experiments show some of the methods the i-Marine Community can use.
We stress on:
Collaboration
: the results can be shared by one user to other users in the same VRE
Reproducibility
: all the experiments can be easily reproduced by another user
Data Accessibility
: all the data hosted\accessed by the e-Infrastructure are
automatically available to be processed
Data Import
: it is easy to make user’s own data available for processing
Transparent Computational Effort
: the processing effort and the cloud computations
are autonomously managed and are transparent to the user
Features Accessibility
: the processing facilities are accessible from outside by means
1. G. Coro, A. Gioia, P. Pagano, L. Candela. A Service for Statistical Analysis of Marine Data in a Distributed e-Infrastructure. (Sub. to) International Conference on Marine Data and Information Systems (IMDIS 2013).
2. D. Castelli, P. Pagano, L. Candela, G. Coro. The iMarine Data Bonanza: Improving Data Discovery and Management through an Hybrid Data Infrastructure. (Sub. to) International Conference on Marine Data and Information Systems (IMDIS 2013).
3. G. Coro, P. Pagano, A. Ellenbroek. Automatic Procedures to Assist in Manual Review of Marine Species Distribution Maps. M. Tomassini et al. (Eds.): International Conference on Adaptive and Natural Computing Algorithms (ICANNGA’13), Springer, Heidelberg (2013).
4. G. Coro, P. Pagano, A. Ellenbroek. Combining Simulated Expert Knowledge with Neural Networks to Produce Niche Models for Latimeria Chalumnae. (accepted with rev.) Ecological Modeling Journal, Ed. Elsevier.
5. G. Coro, L. Fortunati, P. Pagano. Deriving Fishing Monthly Effort and Caught Species from Vessel Trajectories. To be published in Oceans 2013, Proceedings of MTS/IEEE.
6. L. Candela, D. Castelli, G. Coro,P. Pagano, F. Sinibaldi. Species Distribution Modeling in the Cloud. Concurrency and Computation: Practice and Experience, Ed. Wiley.
7. P. Pagano, G. Coro, D. Castelli, L. Candela, F. Sinibaldi, A. Manzi. Cloud Computing for Ecological Modeling in the D4Science Infrastructure. In Proceedings of EGI Community Forum 2013.
8. L. Candela, G. Coro, P. Pagano. Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques. In M. Agosti et al. (Eds.): IRCDL 2012, CCIS 354, pp. 21--32. Springer, Heidelberg (2012).
9. Castelli, P. Pagano, G. Coro. Variazioni Climatiche ed Effetto sulle Specie Marine (Climate Changes and Effect on Marine Species). In the book: “Le Tecnologie del CNR per il Mare” (CNR Technologies for the Sea) p. 139, Ed. CNR 2013 (Roma).
10. D. Castelli, P. Pagano, G. Coro, F. Sinibaldi. Modellazione della Nicchia Ecologica di Specie Marine (Marine Species Ecological Niche Modelling). In the book: “Le Tecnologie del CNR per il Mare” (CNR Technologies for the Sea) p. 140, Ed. CNR 2013 (Roma).
11. D. Castelli, P. Pagano, G. Coro. Elaborazione di Dati Trasmessi da Pescherecci (Processing of Vessel Transmitted Information). In the book: “Le Tecnologie del CNR per il Mare” (CNR Technologies for the Sea) p. 133, Ed. CNR 2013 (Roma).
12. C. MacLeod. Habitat representativeness score (hrs): a novel concept for objectively assessing the suitability of survey coverage for
modelling the distribution of marine species. Journal of the Marine Biological Association of the United Kingdom 90 (07) (2010) 1269-1277.