Scientific and Technical Applications
as a Service in the Cloud
Swiss Distributed Computing Day
University of Bern, 28.11.2011 – adapted version
Wibke Sudholt
CloudBroker GmbH
Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41 44 633 79 34
Email: [email protected] Web: http://www.cloudbroker.com
All rights reserved. © CloudBroker GmbH
Overview
• High performance computing (HPC) in the cloud
• CloudBroker Platform
• Example: Protein modeling in the IBM Cloud for
ETH Zurich
• EuroCloud Swiss
28.11.2011 2 Swiss Distributed Computing Day
All rights reserved. © CloudBroker GmbH
Cloud Terms
28.11.2011Utility Computing
Computing on Demand
Software as a Service
Infrastructure as a Service
Platform as a Service
Multi-Tenancy
Public Cloud
Private Cloud
Cloud Storage
Pay-per-Use
Elasticity
4 Swiss Distributed Computing DayScalability
Hybrid Cloud
Cloud Bursting
Self Service
Virtualization
SOA
Clusters
Grids
ASP
Web Services
Internet
Cloud Computing Definition
• Access to computer resources on demand
without much initial investment in time or
money (self service)
• Only pay for what is actually used in small
steps (OpEx instead of CapEx)
• Nearly unlimited scalability (elasticity)
= Change in business model
All rights reserved. © CloudBroker GmbH
Cloud Services
28.11.2011 6 Swiss Distributed Computing Day
• Web / office / business applications, … • Salesforce, Google Apps, ...
Software as a Service (SaaS)
• Development / deployment frameworks, distribution / messaging / monitoring systems, databases, …
• Microsoft Windows Azure, Google App Engine, ...
Platform as a Service (PaaS)
• Computing power, virtual machines, storage space, … • Amazon Web Services, IBM SmartCloud Enterprise, ...
Problems of Traditional HPC
• Scientific and technical applications
– Complex algorithms and applications needing HPC resources (supercomputers, clusters, grids)
– Mainly used in research and development (R&D), often project-based, with increasing importance
• HPC computer infrastructure, middleware tools and application software
– Require expert knowledge
– Expensive, time-consuming and complex to buy, set up, use and maintain
– Hard to integrate with existing systems and processes – Often operating at capacity limit
⇒ HPC is hardly accessible or affordable for SMEs / small research groups, specialized application purposes or short-term projects
All rights reserved. © CloudBroker GmbH
Advantages of Cloud for HPC
• Immediate access to computer resources on
demand
• Availability of resources not existing in-house
• Possibility for spill-over / cloud bursting
• Temporary, non-binding utilization
• Pay-per-use with minimal initial investment
• Nearly unlimited scalability
• Hardware and partly software maintained by
cloud providers
28.11.2011 8 Swiss Distributed Computing Day
Challenges of Cloud for HPC
• HPC infrastructure, middleware and applications
remain complex to set up, use and maintain also in
the cloud
• Dynamic features of the cloud and pay-per-use
billing add to the complexity
• Performance limitations for some HPC calculations
due to virtualization and available hardware
• Security concerns for R&D because of outsourcing,
internationality, SLAs, multi-tenancy and potential
vendor lock-in
• Hardware and software vendors have to adapt to the
pay-per-use and self-service business model
All rights reserved. © CloudBroker GmbH
CloudBroker Platform
28.11.2011 10 Swiss Distributed Computing Day
Solutions of CloudBroker GmbH
• Easy, scalable, secure, integrable and pay-per-use access to scientific and technical applications in the cloud
• HPC application store / marketplace with direct deployment and execution of applications in the cloud and one bill for everything
• Using infrastructure as a service (IaaS) from cloud providers • Offering platform as a service (PaaS) for software vendors • Providing software as a service (SaaS) to end users
• Application parameters and files remain the same as for local execution and can be easily exported
All rights reserved. © CloudBroker GmbH
CloudBroker Platform
12 CloudBroker Platform Amazon Cloud IBM Cloud … Cloud Bio-informatics Applications Molecular Modeling Applications Fluid Dynamics Applications Web Browser UI … Applications Web Service APIGeneric Workbenches Clo ud Bro ke r In te gra tio n Domain-Specific Gateways R&D End Users and Software Vendors
CLI
28.11.2011 Swiss Distributed Computing Day
Cloud Broker End Users Software Vendors Cloud Providers
Business Model
Resources Applications Usage $ $ $All rights reserved. © CloudBroker GmbH
Security Frame: Transport Layer Security, Access Rights Security
Functionality
14 IBM Cloud … Cloud Application Manager Process Monitor Resource ManagerWeb Service API Web Browser UI
End Users Clients
IBM Adapter
… Adapter Cloud Provider Access Manager
Scalability and Fault Tolerance Handler User Manager Accounting Module Billing Module Payment Module Queuing System Storage Manager Image Manager Portals Process Manager Amazon Cloud Amazon Adapter Software Vendors 28.11.2011 Swiss Distributed Computing Day
Security
Security Certified Data Center CBP . Corporate IT SSL Secured Connection Authentication Industry Standard Server Security Technology CBP . Industry Standard Secure Data Center Security Certified Compute and Storage Cloud Technology Customer CloudBroker Cloud Provider SSL secured connection Corporate Security Policies and StandardsAuthentication to VM SSL Secured Connection Authentication to Cloud CloudBroker Platform Industry Standard Application Security Technology Cloud Instances Dedicated, Secured and Restricted Virtual Machines Client Browser or Application
All rights reserved. © CloudBroker GmbH
Typical Calculation Lifecycle
1. Prepayment (user)
2. Software selection and job creation (user)
3. Data file upload (user) to cloud storage (platform) 4. Job submission (user)
5. Compute instance startup or reuse (platform)
6. Data file upload from cloud storage to master instance (platform)
7. Calculation on compute instances (platform, application) 8. Data file download from master instance to cloud storage
(platform)
9. Compute instance shutdown or reuse (platform)
10. Data file download (user) from cloud storage (platform) 11. Billing (platform)
28.11.2011 16 Swiss Distributed Computing Day
Current Applications
Application Domain Remarks
GAMESS Quantum chemical calculations
BLAST DNA and protein sequence alignment AutoDock Protein-ligand docking
Gromacs Molecular dynamics simulations X! Tandem Mass spectrometry data matching OpenFOAM Computational fluid dynamics
Rosetta Protein modeling Only with own license ??? Computational fluid dynamics In preparation
??? Material science In preparation ??? DNA and protein sequence alignment Requested ??? Protein modeling Requested
… …
All rights reserved. © CloudBroker GmbH
Application Requirements
• Software characteristics
– Scientific and technical applications, open source or commercial, independent of domain – Compute-intensive, not data-intensive – Batch-oriented, non-interactive, command line, running for hours or days – Installable on Linux – Single-threaded, multi-threaded or parallel / MPI
• Deployment in the
cloud
– Installation shell script and software package – Configuration through
the platform
– Selection of pricing options
– Validation and execution by the CloudBroker team – Several software
versions possible
28.11.2011 18 Swiss Distributed Computing Day
Integration into Third Party Tools
• Provide all platform and
cloud advantages
within an environment
known to the user
• Public or private,
generic or
domain-specific clients,
workflows,
workbenches, portals,
etc.
• Utilize platform as cloud
middleware in the
background
• KNIME
– Konstanz Information Miner – http://www.knime.org – Workflow framework• SCI-BUS
– SCIentific gateway Based User Support – http://www.sci-bus.eu – EU FP7 project– 11 User communities from different domains
All rights reserved. © CloudBroker GmbH
SCI-BUS Project
28.11.2011 20 Swiss Distributed Computing Day
SCI-BUS is supported by the FP7 Capacities Programme under contract no. RI-283481
Example:
Protein Modeling
in the IBM Cloud
All rights reserved. © CloudBroker GmbH
Scientific Background
• R&D group: Institute of Molecular Systems Biology
(IMSB) at ETH Zurich (http://www.imsb.ethz.ch)
• Goal: Better understand the mechanisms of
infectious diseases to fight antibiotics resistance
• Example: Streptococcus bacterium
• Method: Computationally model the 3D structures of
important proteins from their 1D sequence
• Software: Rosetta (http://www.rosettacommons.org)
• Analysis: Find the important structural differences
between less and more harmful bacteria strains
28.11.2011 22 Swiss Distributed Computing Day
All rights reserved. © CloudBroker GmbH
Infrastructure
• Problem: Calculations would need several months
on ETH Zurich’s compute cluster due to long
queue waiting times and low job throughput
• Calculations: Embarrassingly parallel and thus
highly scalable, compute-intensive and not
data-intensive, can be automated and outsourced
⇒ Perfect fit for cloud computing
• Solution: Use the CloudBroker Platform to deploy
the Rosetta software and manage data and
calculations on IBM SmartCloud Enterprise cloud
resources
28.11.2011 24 Swiss Distributed Computing Day
Project Architecture
All rights reserved. © CloudBroker GmbH
Showcase Results
• 249 Streptococcus target proteins modeled using special Rosetta client for automation
• Up to 63 compute instances with 1’008 virtual CPUs in parallel provided by the IBM SmartCloud Enterprise
• Number of instances in the cloud automatically adjusted to the workload by the CloudBroker Platform
• Optimized data transfer between ETH Zurich file server and compute and storage instances in the cloud
• About 36’000 single-threaded jobs created by the client, managed by the platform and computed in the cloud
• Almost 250’000 CPU hours utilized for the production calculations
• Ca. 2.3 Mio 3D protein structure models created • Calculations finished within two weeks
28.11.2011 26 Swiss Distributed Computing Day
All rights reserved. © CloudBroker GmbH
EuroCloud Swiss
• Swiss association for
cloud computing
• Platform and lobbying for
cloud computing in
Switzerland
• http://www.eurocloud
swiss.ch
• Representative of
Switzerland in the
EuroCloud Europe
network
• Collaboration with simsa
• Swiss Cloud Conference:
21.03.2012, Technopark
Zurich
• Swiss Cloud Award 2012
• Code of practice
• Certification
• …
28.11.2011 28 Swiss Distributed Computing Day
Thank You
• CloudBroker management team, in particular
Nicola Fantini
• CloudBroker development team
• SystemsX.ch, in particular Dr. Peter Kunszt
• ETH Zurich, in particular Dr. Lars Malmström
• IBM, in particular Marcel Lautenschlager,
Roland Reifler and Stefan Ruckstuhl
• EuroCloud Swiss
All rights reserved. © CloudBroker GmbH
For More Information
Contact Dr. Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1 CH-8005 Zurich Switzerland Phone: +41 44 633 79 34 Email: [email protected] Web: http://www.cloudbroker.com 28.11.2011 30 Swiss Distributed Computing Day