Vision and Implementation of a Federated
Cloud Infrastructure
for Research
Matteo Turilli
NGS – EPW2
EPSRC funded, 2 years Oxford node
FleSSR
JISC funded, 10 months David Wallom
MyTrustedCloud
The Oxford e-Research Centre
• 5 Years old, from 3 to >50 staff
• Building on the UK e-Science programme
• Hosting both research projects and infrastructure services
• Projects and Applications in;
• Astrophysical simulations • Biochemistry
• Computational biology • Classics digital assets
• Climate system simulation • Computational finance
• Energy systems management • Environment and climate science • Medical data sharing
• Neuroscience data management • Social science data analysis
• ....
Research Computing Services within the Oxford
e-Research Centre
RESEARCH COMPUTING Visualisation •Data 0.6PB Storage Grid Computing •NGS •EGI •Campus Grid High Performance Computing •Clusters •Shared memory Cloud Computing •Eucalyptus Clouds Volunteer Computing •Climate Prediction Training •HPC •Matlab •CUDA •Computational ChemistryThe UK-NGS
To enable coherent electronic access for all UK
researchers to
all computational and data
based resources and facilities required to carry
out their research, independent of resource or
researcher location.
UK e
-
Infrastructure
LHC
ISIS TS2 HPCx
+ HECtoR
Users get common access, tools, information, nationally supported services, through NGS
Integrated internationally VRE, VLE, IE Regional and Campus grids Community Grids
HEIs
H. Woo et al, Phys Rev B 72 064437 (2005)
Example: La2-xSrxNiO4
Extracting Knowledge
from the Data Deluge
The Infrastructure-User Relationship
Infrastructure Provider
(Research & Commercial – National, European & Global)
End-Users & Community Technology Experts
(National, European & Global Collaborations)
The Infrastructure-User Relationship: The Problem
Infrastructure Co-ordinator
(Research & Commercial – National, European & Global)
Community End-User Technology Experts Community End-User Technology Experts Community End-User Technology Experts Community End-User Technology Experts Community End-User Technology Experts
Multiple Independent Infrastructure Providers, each independent and supporting individual mixes of communities
Explosion of User Communities on
e-infrastructure
•
Increasing the diversity of the users
•
Provide more diverse services for more users
•Scale out the support model
•
Demanding an increase in the flexibility of the infrastructure
•
Different communities have different technological
demands
•
Supported by a limited set of physical infrastructure and
technology experts
•
With a limited set of common interfaces
A Virtualised Future?
Infrastructure Providers
Infrastructure Providers
(Research & Commercial – National, European & Global)
End-Users & Community Technology Experts
(National, European & Global Collaborations)
End-Users
(National, European & Global
Collaborations)
Experts
(Communicating between
users & providers)
A Virtualised Future?
Community Specific Operations Staff
Infrastructure Providers
Infrastructure Providers
(Research & Commercial – National, European & Global)
With thanks to Steven NewhouseEnd-Users
(National, European & Global
Collaborations)
Experts
(Communicating between
users & providers)
A Virtualised Future?
Community Specific Operations Staff
Infrastructure Providers
Infrastructure Providers
(Research & Commercial – National, European & Global)
Community Specific Applications
End-Users
(National, European & Global
Collaborations)
Experts
(Communicating between
users & providers)
A Virtualised Future?
Community Specific Operations Staff
Infrastructure Providers
Infrastructure Providers
(Research & Commercial – National, European & Global)
Community Specific Applications
STANDARDS
End-Users
(National, European & Global
Collaborations)
Experts
(Communicating between
users & providers)
A Virtualised Future?
Community Specific Operations Staff
Infrastructure Providers
Infrastructure Providers
(Research & Commercial – National, European & Global)
Community Specific Applications
STANDARDS
Community Specific AppliancesEnd-Users
(National, European & Global
Collaborations)
Experts
(Communicating between
users & providers)
A Virtualised Future?
Community Specific Operations Staff
Infrastructure Providers
Infrastructure Providers
(Research & Commercial – National, European & Global)
Community Specific Applications Utilising Community specific applications in appliances
STANDARDS
Community Specific AppliancesWhat does this mean?
•
Movement of current services to VM
• No ‘big-bang’ migration - gradual change transparent to the
end-user
•
Clearly identifying the role of the expert
• Already residing in the communities
•
Increase the flexibility of the infrastructure
• Allow experts to configure resources
• Meet the immediate needs of their users
•
Supporting interdisciplinary tool usage
• Experts from other communities have access, demonstrate
Cloud Infrastructure for Research
Centralisation Vs Federation
• Centralisation: one large, dedicated datacentre that serves the
national HEI demand
• Federation: heterogeneous set of local infrastructures are
coordinated nationally in order to satisfy the HEI demand
Evaluation criteria • Funding • Scalability • Flexibility • Maintenance • Support • Accountability • Obsolescence • Competitiveness • Security
UK Federated Cloud System
•
Central core services
• Registration, Authn & Authz • Accounting
• Monitoring
• Service Discovery
•
Standard cloud interfaces
• Currently defacto through single IaaS provider
•
Utilise meta layer for abstraction of exact cloud from the user
Resource providers
•
Eucalyptus Instances
• University of Oxford (NGS/FleSSR) • University of Edinburgh (NGS/FleSSR) • University of Reading (FLeSSR)
• Imperial College
• Manchester University
• Eduserv (Commercial/Charity)
•
NGS Core Services
•
EoverI Meta layer
•
Distributed Storage to allow sharing of instances and data
services
Resource Centre View
20 Compute Resources Storage Resources Information Services VM Image Repository Management ControlWide Area Access Control
Wide Area Message Bus VM Management Layer Network Resources
Resource
Centre
Experts End-User VM VM VM VMResource Centre View
21 Cloudscape III - EGI Use Case
Compute Resources Storage Resources Information Services VM Image Repository Management Control
Wide Area Access Control
Wide Area Message Bus VM Management Layer Network Resources
Resource
Centre
Experts End-User VM VM VM VM GLUE 2.0 OCCI CDMI OVF JMX UsageRecord SAML SRM xFTP© 2009 Open Grid Forum
IaaS Cloud Interoperability Profile (IaaSCIP)
Oxford e-Research Centre
NGS Cloud Activities
NGS Cloud Activities
• NGS Agile Deployment Environments
EPSRC funded, 2 years
• Staff:
• David Wallom (OeRC, Oxford);
• David Fergusson (NeSC, Edinburgh);
• Steve Thorn (NeSC, Edinburgh);
• Matteo Turilli (OeRC, Oxford).
• Goals:
• EC2 compatible, open source solution;
• development of a dedicated pool of images;
• collecting data about feasibility, costs, stability;
Eucalyptus Vs Nimbus, OpenNebula, OpenStack
Eucalyptus Pros
• Very good implementation of
EC2 and EBS APIs;
• Enterprise support offered by
Canonical through UEC;
• Dedicated installation in UEC;
• Modular design;
• Xen and Kvm compatible;
• Open source and commercial.
Eucalyptus Cons
• Design limitations;
• AAA.
The others
• Limited EC2 API
implementation;
• No native support for EBS;
• Globus WS4 (Nimbus);
• Early development stage;
• Slow development.
To keep an eye on
• OpenNebula 2.2 (to be
tested);
• OpenStack Compute and
Eucalyptus Architecture
Eucalyptus Network Architecture
Cloud/Cluster Controller eth0 eth1 hypervisor vBridge 129.67.2.254 192.168.2.1 VM eth0 192.168.2.2 129.67.2.1 The Internet Users Node Controller CloudEucalyptus Data Architecture
2 data storage systems:
• Walrus (S3): enables the creation of private ‘buckets’, repositories
of OS, kernel, initrd images. Users can create them via euca-tools, s3curl, s3cmd and s3fs;
• Storage Controller (SC): enables the creation of EBS. EBS is
attached to a VM as a persistent raw volume. If properly
unmounted, EBS are persistent across attachments (to different VMs) or VM reboots. Storage Controller Node Controller VM Walrus: Scp image
Users
• Login/password: users connect via https to the cloud
controller web interface. Register, get approved, log into the web interface and download a zipped x509 certificate;
• x509: credentials used to interrogate the cloud controller
about zone, images, instances, storage blocks, etc;
• Key pair: generated by the users to access running instances.
The key is injected into the instance at creation time.
Cluster/Node Controllers
• Key pair: each NC is registered with the CC. A key is created on
the CC and synchronised via scp.
Client Tools – Command Line Interface
Euca-tools
• WS-tools clone
Client Tools – Hybridfox
• Relatively intuitive GUI;
• Firefox extension based on
ElasticFox –> multiplatform;
• Specifically tailored for
Eucalyptus;
• Accepts multiple identities –>
doubles as management tool for the cloud administrators;
• CLI interface still required to
Client Tools – RightScale Gems RightAws
• Ruby-language interface;
• Open-source;
• Not specifically tailored for
Eucalyptus;
• Works with EC2 and EBS. To be
tested with S3 clone.
• Flaky with authorisation of
security groups;
• May require Hybridfox or
NGS Cloud Prototypes
Oxford III
6 x 2 AMD 2 core; 8GB ram. 1 x 4 AMD 2 core; 32GB ram.
• CentOS 5.4;
• Eucalyptus 1.6.2 installed from
rpm repositories;
• Ganglia and Nagios monitoring
systems;
• 5 default VM templates =
44/44/22/22/11 VMs (editable);
NGS Cloud Prototypes
Oxford IV
3 x 4 Xeon 6 core; 48GB ram. 2 x 1 Xeon 2 core; 32GB ram.
• Ubuntu 10.10;
• Ubuntu Enterprise Cloud;
• 2+2 bounded public NICs on CC;
• 12TB ECB, 12TB Walrus on SED
disks;
NGS Cloud Prototypes
Edinburgh II • 32 x Sun Fire X4100 • Dual-core, 2.8 GHz Opteron 8 GB RAM, 70 GB RAID1 • 64 cores• 1 Headnode (Cloud and Cluster
controllers
• 31 Nodes (Node controller)
• Max 2 VMs per core: 124 slots
(2GB RAM)
Managing and Monitoring
Tools
• Hybridfox + euca-tools: overall cloud usage and status + testing;
• Landscape: canonical, not open-source management solution for
UEC. Did not try RightScale as fairly expensive and hosted;
• Linux CLI: dedicated scripts to monitor logs and daemons status.
Issues
• Public IP Database corruption (addressed in version 2);
• No user quota on the open source version of Eucalyptus;
• No accounting on the open source version of Eucalyptus;
• VERY verbose, not persistent logs;
User Support
Tools
• Ticketing system: web-based platform (footprints). Addressed
around 200 tickets in 1 year;
• Web site: subscription instructions, links to Eucalyptus
documentation and to the support e-mail;
• Mailing list: used mainly to announce new services, scheduled or
unscheduled downtime, planned upgrades.
Issues
• Access through institutional firewall via proxy;
• Available resources (limitation of Eucalyptus design);
• Instructions on how to build a dedicated image;
NGS Cloud Usage 2010/2011
• 106 registered users: uptake has been very fast and constant
throughout the whole testing period;
• 26 institutions: 23 HEI both universities and colleges, 3
companies; • 30 projects; • 10 research areas. Life sciences Teaching Mathematics Cloud R&D Physics Ecology Geography Medicine Social Science Engineering
Exemplar Case Studies
• Evolutionary Genomics: “analysis and Information management of
Next Generation Sequencing (NGS) of Genomic data poses many challenges in terms of time and size. We are exploring the translation of high quality NGS scientific analysis pipelines to make best use of Cloud infrastructure”;
• Geospatial Science: “geospatial data is a mix of raster and vector data.
As rasterizing is CPU-hungry process, and all maps displayed on the screen of the final user are rasters, it is more efficient to do the process on the server side. I am investigating how this process can be dispersed across many, if not unlimited instances in a cloud”;
• Agent-based modelling of crime: “at the moment I have a tomcat
server that hosts some web services used to run social simulation model, it needs access to the file system to run fortran scripts, create files etc. There are loads of problems with running our own server at uni and I think a virtual machine that I could have control over would be much better”.
Oxford e-Research Centre
Research and Development
Flexible Services for the Support of Research (FleSSR)
6 Partners
• Academic and industrial;
• 3 cloud infrastructures.
Goals
Building federated cloud infrastructure,
extending the use of NGS central services with cloud brokering and accounting.
Use cases
• Multi Platform Software Development;
FleSSR Architecture
Oxford Reading Eduserv Zeel/i Broker STFC/NGS Accounting DatabaseFleSSR Infrastructure
• Local/Global: services depends either on local or global access.
Cloud brokering is not mandatory for AWS-like service access;
• Multiple identities: every user may have multiple identities, both
local and global;
• Only personal identities: group identities are not implemented.
The management of every single identity is left to the legally responsible user;
• Multiple AA technologies: AA may differ depending on local and
global policies/technologies;
• Multiple accounting: every single identity is accounted for its
FleSSR Use Case: Multi Platform Software Development
Zeel/i Broker Instance configuration
manager FleSSR cloud Build manager CVS / SVN repository Build instance 1 Build instance 2 Build instance 3 Build instance 4 Build instance 5
FleSSR Use Case: On demand Research data storage
Zeel/i Broker Volume Manager
FleSSR cloud
VM EBS Interface
EBS Volume
FleSSR Output
Code
• Instance configuration and build manager: Perl command line
utility + Java client utilising the Zeel/I API;
• Personal EBS volume manager: web-based, Java client for EBS
volumes handling + tailored VM image with multiple data interfaces (SFTP, WebDAV, GlusterFS, rsync, ssh);
• Eucalyptus open-source accounting system: Perl aggregators and
parsers for standard eucalyptus open-source log files + MySQL accounting database + PHP accounting client.
Use cases
• SKA community testing of Use case 1;
MyTrustedCloud
4 Partners
• academic and industrial;
• 1 cloud infrastructure;
Goals
Integrating Trusted Computing technologies into Eucalyptus so to guarantee and enforce the
expected behaviour of data interfaces and reliable data handling.
Use cases
• Reliable and accountable data exchange;
MyTrustedCloud Architecture
Cluster Controller / Storage Controller
SAN HW TPM
SED Disks for EBS volumes and S3 storage Node Controller HW TPM Trusted VM Trusted VM Trusted VM
Conclusions
• Utilisation of virtual infrastructure is the only scalable method to
support large number of disparate user communities;
• Federation as robust and scalable model of national/European
cloud infrastructure for research;
• Federation is only possible by the availability of standard
interfaces;
• Very successful pilot test of multiple prototypes of cloud
infrastructure;
• Crucial role played by Research & Development in order to
customise open-source cloud infrastructure solutions to the specific needs of academic research.
Thank You
COPYRIGHT DISCLAIMER
Texts, marks, logos, names, graphics, images, photographs, illustrations, artwork, audio clips, video clips, and software copyrighted by their respective owners are used on these slides for non-commercial, educational and personal purposes only. Use of any copyrighted material is not authorized without the written consent of the copyright holder. Every effort has been made to respect the copyrights of other parties. If you believe that your copyright has been misused, please direct your correspondence to: [email protected] and/or [email protected] stating your position and we shall endeavour to correct any misuse as early as possible.
Matteo Turilli
David Wallom