Agenda
What is the Cloud?
Why would I want to use the cloud? What will it cost?
Using LabKey in the cloud
How does LabKey use the cloud? Other scientific tools in the Cloud
Introduction
Who am I
What is the cloud Wikipedia says
Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically the Internet).
- http://en.wikipedia.org/wiki/Cloud_computing
What does this mean to me?
rent vs buy
only pay for what you use
Types of Clouds
Datacenter as a Service* Platform as a Service* Software as a Service*
How non-IT folks see the cloud vendors.
Datacenter as a Service
From an API or GUI, you are able to provision and manage all pieces of a datacenter (servers, network hardware, storage, databases, etc).
What do you get? Full control over:
servers (install and configure any way you want) network hardware (firewall, load balancing)
Pay as you use, self-service, and support custom configs Access to all the vendors services
Storage, Database, Message Queues, CDN, load balancers, firewall,etc
Platform as a Service
“delivers a computing platform and/or solution stack as a
service, often consuming cloud infrastructure and sustaining cloud applications.”
- Wikipedia (http://en.wikipedia.org/wiki/IaaS#Platform)
What do you get?
servers (or their equivalent) storage service
database service, etc
no firewall, load balancing, etc
Failover, clustering, custom configs…. maybe
Platform as a Service (cont)
There are two types of Platform as a Service You get a server(s)
essentially the old hosted server model
existing hosting and VPS companies are in this space. major vendors: Rackspace, GoGrid, IBM, etc
You write some code and hit the deploy button.
you never interact with the servers directly
your application code is bundled with “deployment descriptors” and
sent to “cloud” via API
Software as a Service
“deliver software over the Internet, eliminating the need to install and run the application on the customer's own computers and simplifying maintenance and support.”
- Wikipedia (http://en.wikipedia.org/wiki/IaaS#Application)
What do you get?
you are the end-user for this software personalization available
pay as you use
Why would I want to use the cloud?
To meet a deadline….
A reviewer asked for the samples processed using a new method. I need to process large number of samples for a grant application
Prototyping: Try a new processing method
Proteomics: Use an updated FASTA file or additional parameter Genomics: Reference sequence has changed
I have new hypothesis and want to quickly re-process my data
Why would I want to use the cloud? (cont)
I want to try out new software to see if it meets my needs
LabKey Online
Galaxy’s free public server UCSC Genome Browser
I want to automate my pipeline
Cyclecomputing.com (Push button HPC in the cloud) Starcluster
CloudBioLinux
Why would I NOT want to use the cloud? (cont)
Processing huge amounts of data Data transfer time is too long
small network pipe to the internet
transfer time + processing time in the cloud >= processing time in on
your laptop
I have a long running study (year or more) and I need to the
computing around 24x7
What will it cost?
What will I be changed for? How will I be billed?
What will I get charged for?
When using the “cloud” you are renting the computers you need.
Most clouds bill by hour
vendors: AWS, Rackspace,Windows Azure, Google App Engine, etc some do not (Heroku)
SAAS usually bills by the month
vendors: Salesforce, etc
How will I be billed?
Billed monthly
Billed to credit card
Large institutions or large companies use purchase orders Monitoring usage and cost during the month
How to estimate your costs
In general things you will get charged for are:
Servers (instances) Network usage
ie what you send into and out of cloud
Storage
Estimating Costs: Servers What do I mean by Servers?
Called instances at AWS, Google App Engine and Windows Azure
Called Dynos at Heroku Usage is charged per hour
Price goes up with the size of the server How to estimate:
how many servers will you need?
what type of servers do you need?
windows or linux
what will they be doing?
how big a server to do you need?
where should they be located
AWS: Spot instances
Estimating Costs: Network
What do I mean by Network
Bandwidth into and out of the “cloud”
You are changed only for Bandwidth out of “cloud” Bandwidth into “cloud” is generally free
Bandwidth between servers is generally free Bandwidth between datacenters (not free) For most scientific applications
This is usually small compared to Servers 100GB of traffic in a month = $15
Estimating Costs: Storage
What do I mean by Storage?
Amount of data you have stored in the cloud Windows Azure
$0.15/GB per month based on daily average You are charged for # of transactions
AWS
$0.10/GB per month
You are charged for # of I/O requests For most scientific applications:
Using LabKey in the cloud
Who is doing it
Which clouds can run a LabKey Server Installing LabKey from scratch
Who is running LabKey in the cloud?
LabKey
LabKey Online Test servers
Non-Profit Research Institute Seattle based BioTech company
Which clouds can run LabKey Server?
Datacenter as a Service clouds Amazon Web Services
Some Platform as a Service clouds Rackspace
GoGrid
IBM Smartcloud
LabKey currently cannot be used on
Window Azure
Google App Engine Heroku
Installing the LabKey in a cloud
1. Start a new instance at your cloud provider
1. Download the LabKey Server installer
Windows Installer
Linux Installer (coming in 11.3)
2. Install LabKey Server
Instructions at http://www.labkey.org
How does LabKey do it?
Use Amazon Web Services and Rackspace Cloud offerings Operating Systems
Linux: Ubuntu 10.04 LTS
How does LabKey do it? (cont)
Installation/Configuration
Choose latest Ubuntu AMI ( http://uec-images.ubuntu.com/releases/10.04/release/)
Use EBS backed instances
AWS: Use Cloudformation to provision
Instances
Networks (firewalls)
Disks
How does LabKey do it? (cont)
Data upload/download speeds what do we see here at FHCRC the “ship us a hard drive” option
Processor /memory combinations test and measure
Pipelines in the Cloud our experience
What does it cost us?
Lets use LabKey Online as an example:
Server stats
instance type: m1.large
(2) EBS volumes: 85GB total Operating System: Linux Datacenter: us-east-1c
Cost break-down (average monthly price: July->Oct 2011)
Cost Price Percentage of Total
Instance $250.92 95.8%
Other scientific tools in the cloud
Galaxy
Both SAAS and install on your own instances in the cloud GenomeSpace
Cytoscape, Galaxy, GenePattern, Genomica, Integrative Genomics
Viewer (IGV), and the UCSC Browser in the cloud
The Gaggle
The Gaggle is a framework for exchanging data between
independently developed software tools and databases….
CloudBioLinux Starcluster
Key Messages
LabKey has been run successfully in the cloud by both LabKey
and a number of other customers
We would love to help you get started using LabKey in the
Any questions?
Brian Connolly[email protected] 206-667-7521
If you use LabKey Server for your research, please reference one of these publications about the platform:
General Use: Nelson EK, Piehler B, Eckels J, Rauch A, Bellew M, Hussey P, Ramsay S, Nathe C, Lum K, Krouse K, Stearns D, Connolly B, Skillman T, Igra M. LabKey Server: An
open source platform for scientific data integration, analysis and collaboration. BMC
Bioinformatics 2011 Mar 9; 12(1): 71.
Proteomics: Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin CW, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich A, McIntosh MW: Computational Proteomics Analysis System (CPAS): An Extensible, Open-Source Analytic System for Evaluating and Publishing
Proteomic Data and High Throughput Biological Experiments. Journal of Proteome
Research 2006, 5:112-121.
Flow Cytometry: Shulman N, Bellew M, Snelling G, Carter D, Huang Y, Li H, Self SG, McElrath MJ, De Rosa SC: Development of an automated analysis system for data from
flow cytometric intracellular cytokine staining assays from clinical vaccine trials.