www.bsc.es
Future computing platforms for biodiversity science
Daniele Lezzi
Motivation
Lack of service integration and interoperability of research
e-Infrastructure
– e-IRG 2013 White Paper: e-Infrastructure Commons
– Provide e-Infrastructure services following the Cloud model
– Lack of standardization and concerns about security and vendor lock-in
Importance of open e-Infrastructures, sharing of raw data and results,
and collaboration to enable more open science in Horizon 2020
– Neelie Kroes, Vice President, European Commission
– Involvement from the user communities, ensuring alignment between user needs and the development of the e-Infrastructures
– Re-use the results of successful projects
Lack of a common understanding on how best to deploy
e-infrastructures for biodiversity and ecosystem research
– “A decadal view of biodiversity informatics: challenges and priorities” White Paper
Clouds
Grids
Clusters
BSC Distributed Computing Activities
BSC expertise in Distributed Infrastructures and Programming Models Helping user communities in the porting and optimized execution of scientific applications on several computing platforms
BSC Distributed Computing Activities
COMPSs: Platform unaware programming framework that simplifies the development and execution of applications in distributed infrastructures Same application runs on:
– Clusters, Grids, Clouds
Interoperability through standards Cloud interoperability
– Commercial solutions: Azure, Amazon
VENUS-C: e-Science as a service
July 05, 2011 Cluster IaaS PaaS Mgmt IaaS PaaS Mgmt Hosting Glue Hosting Glue Mgmt Hosting Glue App VENUS-C Execution EnvironmentsApp App App
E-Science: What the researcher cares about
Overhead: What the researcher has to do, but does not want to
Infrastructure: What the researcher expects from
someone else Systems Provisioning Systems Administration Platform Adaptation E-Science applications C u sto m so lut io n on an IaaS C lo u d C u sto m so lut io n on a P aaS C lo u d Trad ition al on -p rem ises VE N US -C ( on an IaaS C lo u d ) VE N US -C ( in a P aaS C lo u d )
VENUS-C: e-Science as a service
Windows Azure OpenNebula MSFT EMOTIVE ENG KTH BSC Windows Linux Operating System Cloud Technology Cloud Provider Execution Environments Scenarios / Algorithm IaaS PaaS Cloud Paradigm … Windows B SC Su p er Comp u ter (n o t in t h e clou d ) On Pr em ises (n o t in t h e clou d ) Customer Parameter sweep Type ofworkload Batch HTC Data flow Workflow
Map /
Reduce CEP
EMIC Generic Worker BSC COMPSs
VENUS-C
Infra-structure
EU-Brazil OpenBio
Further EU-Brazil
collaboration in support
of the biodiversity area &
infrastructures
Computing resources & SW platforms
EU & Brazilian biodiversity scientific communities
Data and resource managers & Open Access community
European & Brazilian policy and funding bodies
Who benefit from EUBrazilOpenBio? Combining Biodiversity Science and the Open Access Movement to
deploy a joint European and Brazilian e-Infrastructure of open access resources supporting the needs of the
biodiversity scientific community.
Two biodiversity use cases
EU-Brazil Open Data and Cloud Computing e-Infrastructure for Biodiversity
EU-Brazil OpenBio
Catalogue of Life
Integrating different technologies to make a large variety of services available for managing, manipulating and processing data and metadata within an autonomously-managed
infrastructure: gCube system, openModeller, COMPSs,
EasyGrid AMS, VENUS-C, HTCondor, u.store
Leveraging on existing European, Brazilian and global data sources ranging from species data - species names, synonyms, taxonomical classifications - to literature, occurrence maps and images: Catalogue of Life, List of Species of the Brazilian Flora,
speciesLink, Biodiversity Heritage Library, Bioline
International, Global Biodiversity Information Facility (GBIF).
Two use cases: Taxonomy Management and Ecological Niche
Modelling
Use Case 2: Ecological Niche
Modelling
31 718 angiosperms (flowering plants)
Assuming that 30% will have enough points to generate models (~9 000 species):
– 495k models, 540k tests, 90k projections 10 months to generate all models!
But what if we want to generate models for
– All ~43 thousand plant species from Brazil? – Using more than one spatial resolution?
– Projecting into different environmental climatic scenarios? – With global coverage?
Note: models may be regenerated every time new data is available for each species...
43203 species (18 Sept. 2012)
EU-Brazil OpenBio
Multi-staging and multiparametric experiments implemented through COMPSs and the openModeller software and managed through a Virtual Research Environment (VRE) portal
COMPSs and the VENUS-C middleware middleware used to instantiate the workflows on cloud resources from different providers dynamically deployed by COMPSs More details of the application will be provided in the SESSION 8:
TRAINING COURSES,
HACKATONS AND WORKSHOPS
of this Friday
Use Case 2: Ecological Niche
Modelling
EU-Brazil OpenBio
ENM Service (OMWS2) VENUS-C Cloud Middleware
COMPSs Workflow Orchestrator
OCCI CDMI
EGI Federated Cloud
Support to biodiversity
communities
Integration of the EUBrazilOpenBio
solution in the EGI Federated Cloud
Shared requirements between
EUBrazilOpenBio and BioVeL
The EUBrazilOpenBio ENM service is exposed through an extended openModeller Web Service interface (OMWS2 in the
picture).
The OMWS extensions are backwards compatible with the original specification, allowing existing clients, as the Taverna Workflow Management System in BioVeL, to be fully supported in the new
implementation
The EUBrazilOpenBio ENM Service is published in the BiodiversityCatalogue
EUBrazil Cloud Connect
The main objective is the creation of a federated e-infrastructure for research using a user-centric approach.
To achieve this, we need to pursue three objectives:
– Adaptation of existing applications to tackle new scenarios emerging from cooperation between Europe and Brazil relevant for both continents and with high social impact and
innovation.
– Integration of frameworks and programming models for scientific gateways and complex workflows addressing not only the requirements of the selected use cases, but a potentially much larger user community.
– Federation of resources, to build up a general-purpose infrastructure comprising existing and heterogeneous resources
Additionally, EUBrazilCC will perform an active and intense dissemination campaign, it will analyse innovation, and it will foster the involvement of Brazilian institutions in cloud standards definition. It will also be the first example of the internationalisation of the EU Cloudscape series.
EUBrazil Cloud Connect
Use Case 1: Leishmania Virtual
Laboratory
Objective: Improve knowledge on the distribution and susceptibility of
epidemiology outburst in Leishmaniasis Disease
Technical Challenge: Easy access to computing and data federation
for applications defined as workflows.
Means: Integrate access to species/specimen distributed databases
from parasites and vectors from Fiocruz (CLIOC and COLFLEB) and ISCIII and Species Occurrences (speciesLink) databases in several processing pipelines (Bioinformatics and ENM) running on the cloud.
Existing resources: Databases, openModeller, EUBrazilOpenBio
ENM service, NGS pipelines.
Required developments: Integrate pipelines in the different workflow
EUBrazil Cloud Connect
Use Case 3: Biodiversity and
Climate Change
Objective: Understand the impact of climate change on
terrestrial biodiversity through two workflows based on Earth observation and ground level data.
Technical Challenge: Integrate parallel data analysis with other
processing workflows in a geographically distributed environment.
Means: Integration of models of plant species distribution with
multi-level image data and processing in a scientific gateway.
Existing resources: Parallel Data analysis service, satellite data,
vegetation images, OpenModeller, EUBrazilOpenBio ENM
service.
Required developments: Adapting both simulators to cooperate by
Di sse m ina tio n In te rop e ra b ility B u sine ss M o d e l & E x p loita tio n S a a S P a a S Ia a S UC1 Leishmania Virtual Lab UC2 Heart Simulation UC3 Climate Change Validation & Requirements
Computing & Data Resources
JiT OurGrid CSGrid ICFF mc2 CloudScape Parallel Data Analysis COMPSs e-Science Central
EUBrazil Cloud Connect
Conclusions: Future computing platforms
• Open Data Services
• Sharing and exploitation of results Virtual Research Environments for biodiversity • Programming Environments • Execution Services Platform Services • Hybrid Clouds • Federated Services • Interoperability Innovative infrastructures Services
• User friendly approach for the uptake by the wider research community • Built on previous experience
• Integration of services • Interoperability • Multidisciplinary science • Technology transfer Future Infrastructures
www.bsc.es
Thank you!
http://compss.bsc.es
For further information please contact