A. Spinuso, L. Trani, A. Strollo and D. Bailo
Interaction with other IT
projects: EUDAT2020,
VLDATA, ENVRI
PLUS, …
OUTLINE
§
WG1 and the “EIDA use case”
§
A modular and integrated development plan
§
Intra-‐ and inter-‐community acCviCes
§
EUDAT2020
§
VLDATA
§
ENVRI
PLUSWG1 Seismology and EIDA
EIDA is a distributed data centre established to securely archive seismic waveform + metadata, gathered by European research infrastructures, and provide transparent access to the archives by the geosciences research communiCes.
WG1 Seismology and EIDA
EIDA is a distributed data centre established to securely archive seismic waveform + metadata, gathered by European research infrastructures, and provide transparent access to the archives by the geosciences research communiCes.
WG1 Seismology and EIDA
EIDA started to operate formally under ORFEUS in 2013 with two bodies: the management board and the technical commission. EMB and ETC conCnually addresses challenges such as efficient data management, content metadata, quality maintenance, provenance and access services in a distributed network of archives.
The EIDA NG development plan
Intra-‐community acCviCes
Discussing, proposing and implemenCng new standards within the Seismological community:
§ The recent joint effort (EIDA-‐IRIS DMC) to define the usage
of DOIs for seismic networks. “The FDSN through its WG III on Products, Tools, and Services has agreed upon a method to provide a>ribu@on to permanent and temporary seismic n e t w o r k s u s i n g F D S N N e t w o r k C o d e s ”
hQp://www.fdsn.org/wgIII/V1.0-‐21Jul2014-‐DOIFDSN.pdf
§ The discussion conCnues within ORFEUS and EIDA about
possible extension of fdsnws and proposals for new ws as standards for FDSN.
Inter-‐community acCviCes
AAI, federated idenCty management:
§ Recently ORFEUS and EIDA started the discussion with
GEANT/eduGAIN about possible synergies. The EIDA developers are currently tesCng possible soluCons supported by the eduGAIN experts.
§ A specific workshop is being organized (GEANT-‐ORFEUS-‐
EIDA) to discuss the E-‐Infrastructure requirements of users, projects, collaboraCons in the broad field of Earth Sciences and discuss how to co-‐ordinate the support given to users by E-‐Infrastructures. A specific discipline being targeted is seismology and the workshop is open to other disciplines. hQp://www.geant.net/service/eduGAIN/Pages/home.aspx
Inter-‐community acCviCes
E-‐Infrastructure providers (GEANT, EGI, PRACE, EUDAT), Research Infrastructures (EIDA and EPOS) along with a small but representaCve number of users and projects will meet to discuss the collaboraCon models of the projects in Earth Sciences and define how best the e-‐Infrastructures can support them.
From the perspecCve of the e-‐Infrastructures, topics which may be discussed in detail at the workshop include:
• ConnecCvity Services and performance monitoring • Federated IdenCty Management (AAI/ eduGAIN)
• compuCng needs and types of compuCng (HTC, HPC, GPPGU, etc.)
• cloud provisioning of services: cloud compuCng and cloud storage, on-‐demand provisioning virtual research environments
• user support model
Contact Luca and Alessandro if you would like to parCcipate. To make the workshop effecCve the organizers would like to keep the number restricted (1-‐2 representaCves max for each EPOS TCS)
Workshop on E-‐Infrastructure support for Earth Sciences
Amsterdam, 22-‐23 January 2015
The EIDA and EUDAT interacCons
Pilot projects
with communi2es Call for collabora2on EUDAT2020 proposal
KNMI/Orfeus SURFsara
EIDA proposal submiGed (led by INGV, KNMI, GFZ) EIDA proposal accepted EUDAT 2020 Proposal submiGed Oct 2011 Project starts INGV among the partners
Jan 2013 Jan 2014 Sep 2014 Oct 2014 EUDAT
Project ends
Moving forward with EUDAT2020
- Consolidate and improve the CDI together with
the scientific communities
- Provide full research data lifecycle support
- New services for data curation, provenance and
workspaces
- Workflows and semantic web services
- Strengthen interoperability and collaboration
with other e-infrastructure providers (eg.:PRACE,
EGI,GEANT)
- Set up a robust, scalable, reliable and secure
infrastructure to support the federation of datacenters
- Enrich data streams with PIDs and detailed metadata
- Enable reproducibility and provenance (continue work done with DOI)
- Automatic safe replication of datasets on external data resources
- Effective exploitation of replicas: reliability, failover, disaster recovery, computation, optimisation
- Federated Discovery and Access
- Federated Identity Management (in discussion with eduGAIN/GEANT)
3rd EUDAT Conference, Amsterdam,The Netherlands | 24-25 Sep 2014
IPGP
ETH/SED GFZ
ODC INGV
3rd EUDAT Conference, Amsterdam,The Netherlands | 24-25 Sep 2014
IPGP
ETH/SED GFZ
ODC INGV
3rd EUDAT Conference, Amsterdam,The Netherlands | 24-25 Sep 2014
IPGP
ETH/SED GFZ
ODC INGV
3rd EUDAT Conference, Amsterdam,The Netherlands | 24-25 Sep 2014
IPGP
ETH/SED GFZ
ODC INGV
3rd EUDAT Conference, Amsterdam,The Netherlands | 24-25 Sep 2014 IPGP ETH/SED GFZ ODC INGV B2?! … … Data Discovery Data Access Data ComputaCon AAI Citation Workflows … B2?! B2?! New node … B2?!
What Data VERCE handles:
SyntheCc seismograms, plots, 3D Geometry, Videos, KMZ packages, meshes and models.
(100 staCons = 900 products and metadata )
6-‐10 GB for a single user run, on 1000 cores in ~ 1 hour.
Data volume will increase with the acquisiCon of observaConal data from EIDA,
for the Misfit CalculaCon workflow.
VLDATA ObjecCves
Create a secure, open and generic planorm
-‐ Efficient and cost-‐effecCve solu2ons for handling large-‐scale distributed data and
heterogeneous infrastructures (EUDAT, EGI, PRACE )
Integrate and advance state-‐of-‐the-‐art e-‐Science technologies
-‐ Integrate DIRAC and SCI-‐BUS and iden2fy cri2cal APIs helping users to unlock
high scien2fic value from large data resources
Support a new generaCon of data scienCsts
-‐ Co-‐design approach driven and validated by the user community.
Ensure the sustainability
-‐ VLDATA will evolve around a reference model, demonstra2ng its effecCveness in
VLDATA and EPOS
Thirteen world-‐class large-‐scale scienCfic research infrastructures parCcipate in VLDATA:
Astrophysics (GAIA, STARNET, LasMOG), Belle II, BESIII, CTA, DRIHM, EISCAT-‐3D,
IceCube, LHCb, MolSim (MosGrid and CMMST), NA62, PAO and EPOS.
EPOS gain from VLDATA :
-‐ Orchestrate and manage services provided by other ini2a2ves
( EUDAT, EGI, PRACE ) .
-‐ Workflow management and enactment.
-‐ VisualizaCon services.
-‐ ValidaCon procedures via provenance exploraCon.
-‐ Discovery services across catalogues (such as the EPOS catalogue) of
observa2onal and experimental data.
ENVRI+
Goal
Provide
common soluCons
to shared
challenges for European Environmental
a n d E a r t h S y s t e m R e s e a r c h
Infrastructures (RIs) in their efforts to
deliver new services
for science and
Atm os ph er e O ce an s Bi ol og ic al e co sy ste m s So lid E ar th
Integra2on and coordina2on Users/ data services
Users / services Users / modelling
Outreach (internal + external) / dissemina2on
Core work packages: disciplines
In te gr a2 on w or k pac kag es : u se r or ie nte d
ENVRI+
Themes (WPs)
1.
Sensors (new sensors, harmonize technologies)
2.
Common solu2ons for data (discovery, use, access,
workflows…)
3.
Common Policies for access
4.
Interac2ons between RIs and societ
5.
Knowledge transfer and training best prac2ces
ENVRI+
Themes (WPs)
1.
Sensors (new sensors, harmonize technologies)
2.
Common soluCons for data (discovery, use, access,
workflows…)
3.
Common Policies for access
4.
Interac2ons between RIs and societ
5.
Knowledge transfer and training best prac2ces
ENVRI+ & EPOS
•
2.1 Reference model guided RI design
–
Reference model, ontological framework,
design and implementa2on plan
•
2.3 Data processing and analysis
–
efficiency of data processing ,
performance
of the research infrastructures
•
2.4 Reference model guided RI design
–
data cura2on,
catalogue interoperability, data
provenance and tracing
ENVRI+ & EPOS
EGI Competence Centers
•
BEFORE
: EPOS archives provide high quality
Earth Science data across Europe, but there is
no federated compu2ng infrastructure
suppor2ng such analysis.
•
AFTER
: adop2on of
AAI
and deployment of
services to support the EPOS community at IaaS,
PaaS and SaaS level, demonstra2ng to fulfil the
requirements of the researchers, enabling an
efficient
data analysis
transparently using
EGI Competence Centers
•
Contribu2on in EPOS ICS-‐d processing
“modules”
•
Results and requirements to be discussed
between ICS-‐C and TCS
PotenCal contribuCon to EPOS (TCS, ICS)
EUDAT2020: focus on data cura2on services, via Common Data Services to various research communi2es, towards a Collabora2ve DataInfrastructure.
e.g. provide data mangement and PID to TCS nodes
VLDATA: gathers world-‐class exper2se to inves2gate interoperable access to heterogeneous resources, flexible workflows and scien2fic gateways.
e.g. common interface to some ICS-‐d
ENVRI+: catalogue interoperability, data provenance and tracing, other data solu2ons
e.g. help EPOS ICS-‐C in producing interoperable metadata catalogue
EGI CC: enabling an efficient data analysis transparently using distributed and federated resources, AAAI solu2ons