• No results found

Brian Connolly Systems Engineer, LabKey Software LabKey Server in the Cloud

N/A
N/A
Protected

Academic year: 2021

Share "Brian Connolly Systems Engineer, LabKey Software LabKey Server in the Cloud"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

LabKey Server in the Cloud

Brian Connolly

Systems Engineer, LabKey Software [email protected]

(2)

Agenda

What is the Cloud?

Why would I want to use the cloud? What will it cost?

Using LabKey in the cloud

How does LabKey use the cloud? Other scientific tools in the Cloud

(3)

Introduction

Who am I

(4)
(5)

What is the cloud Wikipedia says

Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically the Internet).

- http://en.wikipedia.org/wiki/Cloud_computing

What does this mean to me?

rent vs buy

 only pay for what you use

(6)

Types of Clouds

Datacenter as a Service* Platform as a Service* Software as a Service*

How non-IT folks see the cloud vendors.

(7)

Datacenter as a Service

From an API or GUI, you are able to provision and manage all pieces of a datacenter (servers, network hardware, storage, databases, etc).

What do you get? Full control over:

servers (install and configure any way you want) network hardware (firewall, load balancing)

Pay as you use, self-service, and support custom configs Access to all the vendors services

Storage, Database, Message Queues, CDN, load balancers, firewall,etc

(8)

Platform as a Service

“delivers a computing platform and/or solution stack as a

service, often consuming cloud infrastructure and sustaining cloud applications.”

- Wikipedia (http://en.wikipedia.org/wiki/IaaS#Platform)

What do you get?

servers (or their equivalent) storage service

database service, etc

no firewall, load balancing, etc

 Failover, clustering, custom configs…. maybe

(9)

Platform as a Service (cont)

There are two types of Platform as a Service You get a server(s)

essentially the old hosted server model

existing hosting and VPS companies are in this space. major vendors: Rackspace, GoGrid, IBM, etc

 You write some code and hit the deploy button.

you never interact with the servers directly

your application code is bundled with “deployment descriptors” and

sent to “cloud” via API

(10)

Software as a Service

“deliver software over the Internet, eliminating the need to install and run the application on the customer's own computers and simplifying maintenance and support.”

- Wikipedia (http://en.wikipedia.org/wiki/IaaS#Application)

What do you get?

you are the end-user for this software personalization available

pay as you use

(11)
(12)

Why would I want to use the cloud?

To meet a deadline….

A reviewer asked for the samples processed using a new method. I need to process large number of samples for a grant application

Prototyping: Try a new processing method

Proteomics: Use an updated FASTA file or additional parameter Genomics: Reference sequence has changed

I have new hypothesis and want to quickly re-process my data

(13)

Why would I want to use the cloud? (cont)

I want to try out new software to see if it meets my needs

LabKey Online

Galaxy’s free public server UCSC Genome Browser

I want to automate my pipeline

Cyclecomputing.com (Push button HPC in the cloud) Starcluster

CloudBioLinux

(14)

Why would I NOT want to use the cloud? (cont)

Processing huge amounts of data Data transfer time is too long

small network pipe to the internet

transfer time + processing time in the cloud >= processing time in on

your laptop

I have a long running study (year or more) and I need to the

computing around 24x7

(15)
(16)

What will it cost?

What will I be changed for? How will I be billed?

(17)

What will I get charged for?

When using the “cloud” you are renting the computers you need.

Most clouds bill by hour

vendors: AWS, Rackspace,Windows Azure, Google App Engine, etc some do not (Heroku)

 SAAS usually bills by the month

vendors: Salesforce, etc

(18)

How will I be billed?

Billed monthly

Billed to credit card

Large institutions or large companies use purchase orders Monitoring usage and cost during the month

(19)

How to estimate your costs

In general things you will get charged for are:

Servers (instances) Network usage

ie what you send into and out of cloud

Storage

(20)

Estimating Costs: Servers  What do I mean by Servers?

 Called instances at AWS, Google App Engine and Windows Azure

 Called Dynos at Heroku  Usage is charged per hour

Price goes up with the size of the server How to estimate:

 how many servers will you need?

what type of servers do you need?

windows or linux

what will they be doing?

 how big a server to do you need?

 where should they be located

 AWS: Spot instances

(21)

Estimating Costs: Network

What do I mean by Network

Bandwidth into and out of the “cloud”

You are changed only for Bandwidth out of “cloud” Bandwidth into “cloud” is generally free

Bandwidth between servers is generally free Bandwidth between datacenters (not free) For most scientific applications

This is usually small compared to Servers 100GB of traffic in a month = $15

(22)

Estimating Costs: Storage

What do I mean by Storage?

Amount of data you have stored in the cloud Windows Azure

$0.15/GB per month based on daily average You are charged for # of transactions

 AWS

$0.10/GB per month

You are charged for # of I/O requests For most scientific applications:

(23)
(24)

Using LabKey in the cloud

Who is doing it

Which clouds can run a LabKey Server Installing LabKey from scratch

(25)

Who is running LabKey in the cloud?

LabKey

LabKey Online Test servers

Non-Profit Research Institute Seattle based BioTech company

(26)

Which clouds can run LabKey Server?

Datacenter as a Service clouds Amazon Web Services

Some Platform as a Service clouds Rackspace

GoGrid

IBM Smartcloud

LabKey currently cannot be used on

Window Azure

Google App Engine Heroku

(27)

Installing the LabKey in a cloud

1. Start a new instance at your cloud provider

1. Download the LabKey Server installer

Windows Installer

Linux Installer (coming in 11.3)

2. Install LabKey Server

Instructions at http://www.labkey.org

(28)
(29)

How does LabKey do it?

Use Amazon Web Services and Rackspace Cloud offerings Operating Systems

Linux: Ubuntu 10.04 LTS

(30)

How does LabKey do it? (cont)

Installation/Configuration

Choose latest Ubuntu AMI ( http://uec-images.ubuntu.com/releases/10.04/release/)

 Use EBS backed instances

 AWS: Use Cloudformation to provision

 Instances

 Networks (firewalls)

 Disks

(31)

How does LabKey do it? (cont)

Data upload/download speeds what do we see here at FHCRC the “ship us a hard drive” option

Processor /memory combinations test and measure

Pipelines in the Cloud our experience

(32)

What does it cost us?

Lets use LabKey Online as an example:

Server stats

instance type: m1.large

(2) EBS volumes: 85GB total Operating System: Linux Datacenter: us-east-1c

Cost break-down (average monthly price: July->Oct 2011)

Cost Price Percentage of Total

Instance $250.92 95.8%

(33)
(34)

Other scientific tools in the cloud

Galaxy

Both SAAS and install on your own instances in the cloud GenomeSpace

Cytoscape, Galaxy, GenePattern, Genomica, Integrative Genomics

Viewer (IGV), and the UCSC Browser in the cloud

The Gaggle

The Gaggle is a framework for exchanging data between

independently developed software tools and databases….

CloudBioLinux Starcluster

(35)

Key Messages

LabKey has been run successfully in the cloud by both LabKey

and a number of other customers

We would love to help you get started using LabKey in the

(36)

Any questions?

Brian Connolly

[email protected] 206-667-7521

(37)

If you use LabKey Server for your research, please reference one of these publications about the platform:

General Use: Nelson EK, Piehler B, Eckels J, Rauch A, Bellew M, Hussey P, Ramsay S, Nathe C, Lum K, Krouse K, Stearns D, Connolly B, Skillman T, Igra M. LabKey Server: An

open source platform for scientific data integration, analysis and collaboration. BMC

Bioinformatics 2011 Mar 9; 12(1): 71.

Proteomics: Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin CW, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich A, McIntosh MW: Computational Proteomics Analysis System (CPAS):  An Extensible, Open-Source Analytic System for Evaluating and Publishing

Proteomic Data and High Throughput Biological Experiments. Journal of Proteome

Research 2006, 5:112-121.

Flow Cytometry: Shulman N, Bellew M, Snelling G, Carter D, Huang Y, Li H, Self SG, McElrath MJ, De Rosa SC: Development of an automated analysis system for data from

flow cytometric intracellular cytokine staining assays from clinical vaccine trials.

References

Related documents

The presentation has not been updated since it was originally presented, and does not constitute a commitment by any CDF entity to underwrite, subscribe for or place any securities or

Disease is indicated by the 6' Cusp, 6th house, planets in the constellation of the occupants of the 6th house, the occupants of the &I' house, the planets in the constellation

In models of money supply growth ∆m, output growth ∆y, inflation ∆p, fluctuations in an interest rate ∆r and a rate spread rr, however, we find only one case in which

we assessed data quality in three different parts of the data input process: the process of including patients in the trauma registry (case completeness); the process of

Based on the above, the AAR ruled that the capital gains arising on sale of shares of the Indian listed company is not taxable in India as the conditions of Limitation of

In bottom-up proteomics, tandem mass spectrometry (or MS/MS) analysis is performed to obtain amino acid sequence information of the peptide ions 79.. Peptide sequencing

Recommendation: Control over the University’s software should be improved by establishing procedures designed to ensure compliance with the State of Connecticut’s Property

In such a distribution setup the customer service (lead-time) is maximized, without increasing the inventory levels (being the main logistics costs driver). The responsive,