• No results found

Future computing platforms for biodiversity science

N/A
N/A
Protected

Academic year: 2021

Share "Future computing platforms for biodiversity science"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

www.bsc.es

Future computing platforms for biodiversity science

Daniele Lezzi

(2)

Motivation

Lack of service integration and interoperability of research

e-Infrastructure

– e-IRG 2013 White Paper: e-Infrastructure Commons

– Provide e-Infrastructure services following the Cloud model

– Lack of standardization and concerns about security and vendor lock-in

Importance of open e-Infrastructures, sharing of raw data and results,

and collaboration to enable more open science in Horizon 2020

– Neelie Kroes, Vice President, European Commission

– Involvement from the user communities, ensuring alignment between user needs and the development of the e-Infrastructures

– Re-use the results of successful projects

Lack of a common understanding on how best to deploy

e-infrastructures for biodiversity and ecosystem research

– “A decadal view of biodiversity informatics: challenges and priorities” White Paper

(3)

Clouds

Grids

Clusters

BSC Distributed Computing Activities

BSC expertise in Distributed Infrastructures and Programming Models Helping user communities in the porting and optimized execution of scientific applications on several computing platforms

(4)

BSC Distributed Computing Activities

COMPSs: Platform unaware programming framework that simplifies the development and execution of applications in distributed infrastructures Same application runs on:

– Clusters, Grids, Clouds

Interoperability through standards Cloud interoperability

– Commercial solutions: Azure, Amazon

(5)

VENUS-C: e-Science as a service

July 05, 2011 Cluster IaaS PaaS Mgmt IaaS PaaS Mgmt Hosting Glue Hosting Glue Mgmt Hosting Glue App VENUS-C Execution Environments

App App App

E-Science: What the researcher cares about

Overhead: What the researcher has to do, but does not want to

Infrastructure: What the researcher expects from

someone else Systems Provisioning Systems Administration Platform Adaptation E-Science applications C u sto m so lut io n on an IaaS C lo u d C u sto m so lut io n on a P aaS C lo u d Trad ition al on -p rem ises VE N US -C ( on an IaaS C lo u d ) VE N US -C ( in a P aaS C lo u d )

(6)

VENUS-C: e-Science as a service

Windows Azure OpenNebula MSFT EMOTIVE ENG KTH BSC Windows Linux Operating System Cloud Technology Cloud Provider Execution Environments Scenarios / Algorithm IaaS PaaS Cloud Paradigm … Windows B SC Su p er Comp u ter (n o t in t h e clou d ) On Pr em ises (n o t in t h e clou d ) Customer Parameter sweep Type of

workload Batch HTC Data flow Workflow

Map /

Reduce CEP

EMIC Generic Worker BSC COMPSs

VENUS-C

Infra-structure

(7)

EU-Brazil OpenBio

Further EU-Brazil

collaboration in support

of the biodiversity area &

infrastructures

Computing resources & SW platforms

EU & Brazilian biodiversity scientific communities

Data and resource managers & Open Access community

European & Brazilian policy and funding bodies

Who benefit from EUBrazilOpenBio? Combining Biodiversity Science and the Open Access Movement to

deploy a joint European and Brazilian e-Infrastructure of open access resources supporting the needs of the

biodiversity scientific community.

Two biodiversity use cases

EU-Brazil Open Data and Cloud Computing e-Infrastructure for Biodiversity

(8)

EU-Brazil OpenBio

Catalogue of Life

Integrating different technologies to make a large variety of services available for managing, manipulating and processing data and metadata within an autonomously-managed

infrastructure: gCube system, openModeller, COMPSs,

EasyGrid AMS, VENUS-C, HTCondor, u.store

Leveraging on existing European, Brazilian and global data sources ranging from species data - species names, synonyms, taxonomical classifications - to literature, occurrence maps and images: Catalogue of Life, List of Species of the Brazilian Flora,

speciesLink, Biodiversity Heritage Library, Bioline

International, Global Biodiversity Information Facility (GBIF).

Two use cases: Taxonomy Management and Ecological Niche

Modelling

(9)

Use Case 2: Ecological Niche

Modelling

31 718 angiosperms (flowering plants)

Assuming that 30% will have enough points to generate models (~9 000 species):

– 495k models, 540k tests, 90k projections 10 months to generate all models!

But what if we want to generate models for

– All ~43 thousand plant species from Brazil? – Using more than one spatial resolution?

– Projecting into different environmental climatic scenarios? – With global coverage?

Note: models may be regenerated every time new data is available for each species...

43203 species (18 Sept. 2012)

(10)

EU-Brazil OpenBio

Multi-staging and multiparametric experiments implemented through COMPSs and the openModeller software and managed through a Virtual Research Environment (VRE) portal

COMPSs and the VENUS-C middleware middleware used to instantiate the workflows on cloud resources from different providers dynamically deployed by COMPSs More details of the application will be provided in the SESSION 8:

TRAINING COURSES,

HACKATONS AND WORKSHOPS

of this Friday

Use Case 2: Ecological Niche

Modelling

(11)

EU-Brazil OpenBio

ENM Service (OMWS2) VENUS-C Cloud Middleware

COMPSs Workflow Orchestrator

OCCI CDMI

EGI Federated Cloud

Support to biodiversity

communities

Integration of the EUBrazilOpenBio

solution in the EGI Federated Cloud

Shared requirements between

EUBrazilOpenBio and BioVeL

The EUBrazilOpenBio ENM service is exposed through an extended openModeller Web Service interface (OMWS2 in the

picture).

The OMWS extensions are backwards compatible with the original specification, allowing existing clients, as the Taverna Workflow Management System in BioVeL, to be fully supported in the new

implementation

The EUBrazilOpenBio ENM Service is published in the BiodiversityCatalogue

(12)

EUBrazil Cloud Connect

The main objective is the creation of a federated e-infrastructure for research using a user-centric approach.

To achieve this, we need to pursue three objectives:

– Adaptation of existing applications to tackle new scenarios emerging from cooperation between Europe and Brazil relevant for both continents and with high social impact and

innovation.

– Integration of frameworks and programming models for scientific gateways and complex workflows addressing not only the requirements of the selected use cases, but a potentially much larger user community.

– Federation of resources, to build up a general-purpose infrastructure comprising existing and heterogeneous resources

Additionally, EUBrazilCC will perform an active and intense dissemination campaign, it will analyse innovation, and it will foster the involvement of Brazilian institutions in cloud standards definition. It will also be the first example of the internationalisation of the EU Cloudscape series.

(13)

EUBrazil Cloud Connect

Use Case 1: Leishmania Virtual

Laboratory

Objective: Improve knowledge on the distribution and susceptibility of

epidemiology outburst in Leishmaniasis Disease

Technical Challenge: Easy access to computing and data federation

for applications defined as workflows.

Means: Integrate access to species/specimen distributed databases

from parasites and vectors from Fiocruz (CLIOC and COLFLEB) and ISCIII and Species Occurrences (speciesLink) databases in several processing pipelines (Bioinformatics and ENM) running on the cloud.

Existing resources: Databases, openModeller, EUBrazilOpenBio

ENM service, NGS pipelines.

Required developments: Integrate pipelines in the different workflow

(14)

EUBrazil Cloud Connect

Use Case 3: Biodiversity and

Climate Change

Objective: Understand the impact of climate change on

terrestrial biodiversity through two workflows based on Earth observation and ground level data.

Technical Challenge: Integrate parallel data analysis with other

processing workflows in a geographically distributed environment.

Means: Integration of models of plant species distribution with

multi-level image data and processing in a scientific gateway.

Existing resources: Parallel Data analysis service, satellite data,

vegetation images, OpenModeller, EUBrazilOpenBio ENM

service.

Required developments: Adapting both simulators to cooperate by

(15)

Di sse m ina tio n In te rop e ra b ility B u sine ss M o d e l & E x p loita tio n S a a S P a a S Ia a S UC1 Leishmania Virtual Lab UC2 Heart Simulation UC3 Climate Change Validation & Requirements

Computing & Data Resources

JiT OurGrid CSGrid ICFF mc2 CloudScape Parallel Data Analysis COMPSs e-Science Central

EUBrazil Cloud Connect

(16)

Conclusions: Future computing platforms

• Open Data Services

• Sharing and exploitation of results Virtual Research Environments for biodiversity • Programming Environments • Execution Services Platform Services • Hybrid Clouds • Federated Services • Interoperability Innovative infrastructures Services

• User friendly approach for the uptake by the wider research community • Built on previous experience

• Integration of services • Interoperability • Multidisciplinary science • Technology transfer Future Infrastructures

(17)

www.bsc.es

Thank you!

http://compss.bsc.es

For further information please contact

References

Related documents