FREE computing using Amazon EC2
Seong-Hwan Jun1 1Department of Statistics
Univ of British Columbia
Outline
Basics of servers
Amazon EC2
Setup R on an EC2 instance
What is “server”?
I Basically computers just like the one you have but no monitor I A lot of computers
I Stacked on a rack
I Connected in a network – fast transfer of data between the computers
I Shares hard drive storage and in many cases memory (RAM) as well
I Its only purpose is to process something like crunch numbers that’s why there is no need for monitors
I Usually a variant of Unix installed because they are much more stable than Windows
When should you run jobs on the servers?
I Jobs that take long time to finish
I When you have multiple jobs that can run in parallel
I Pretty much all the time because you need your computer to do other things like. . . facebook
I Why not? These machines do nothing except to crunch numbers and process jobs
When NOT to run jobs on the servers
I During the development stage of your code, you should run small jobs on your computer to get quick results to verify correctness of your code as you develop
Outline
Basics of servers
Amazon EC2
Setup R on an EC2 instance
Cloud computing
I Cloud computing is the concept of not having to know where your servers are located
I These computers are there somewhere in the clouds of servers. . .
I When you launch a job into the cloud, one of the available computers will get the job and run it – you won’t know exactly which computer is running your job
Amazon Elastic Cloud Computing
I EC2 for short
I An individual computer on the Amazon’s cloud of computers is referred to as aninstance
I There are many types of instances –
micro,
small, medium, large, extra large, and so on and so on.I Only the
micro
instances are free. . . But the other instances are quite cheap if you ever need fast computersHow to use EC2 instances
1. Sign up for an account
2. You need to provide your credit card information – make sure you read the rules carefully so that you don’t get charged
3. Once you sign up, you get 750 free hours of computing per month!
4. You can use these hours anyway you want – for example, you can get 10 EC2 instances at once, run 10 jobs (1 job per instance) simultaneously for 75 hours or get one instance and run a job on it for 750 hours
Other Amazon services
Amazon offers wide variety of services under the brand name of “Amazon Web Services” (Details:
http://aws.amazon.com/
) The most useful service for us is EBS and S3,I Storage for very large and frequently used data (GB’s or even more)
I These data are easily accessible from the EC2 instances I EBS is free up to 30GB – S3 is not free but quite cheap
Here are some of the things you can do with AWS:
I MapReduce for natural language processing (e.g., counting n-grams)
I Any machine learning problem where datasize does not fit in your personal computer
I Scientific computing – R, MATLAB, python, Java, C++, and etc I Storing genome sequences (human and other species) on EBS
or S3 – process it using EC2 instances
I Amazon has many large datasets publicly available –
http://aws.amazon.com/datasets?_encoding=
UTF8&jiveRedirect=1
Outline
Basics of servers
Amazon EC2
Setup R on an EC2 instance
Creating a free instance
The instructions are well described here:
http://www.r-bloggers.com/
automating-r-scripts-on-amazon-ec2/
You can also Google the following keywords: “Amazon EC2 R” or other combination of relevant keywords for a step-by-step instructions.
Key-pair
Logging in to our department server requires username and password.
I To log in to an EC2 instance, you use something called key pair. I These are files that you download once when you create them
and keep in your computer
Logging in
Public DNSLogging in
CommandsNow you can login using the key-pair file and the public DNS,
1. chmod 400 key.pem
2. ssh -i key.pem ubuntu@public-dns Example:
Installing R
Type the following commands,
1. sudo apt-get update
2. sudo apt-get -y install r-base
3. type “R” on the command prompt to ensure that the installation was successful
The second command is the command for installing R. It may take up to few minutes.
Running R jobs
Refer to Song Cai’s slides or search the Google by yourself (most of you know how to do it already).
Things that you can do on an EC2 instance
I Run Java, C, C++, Fortran, and other jobs
I Host a web server – you can get your results via your personal private website
I One example usage:
1. Use C++, Java, or R to connect to your stock broker’s trading platform (API)
2. Run your trading algorithm on multiple instances of EC2
3. Process the results at night using R on EC2
4. View the results through your web on your phone on the bus to school or during a boring morning class
I or you can just run your R code with multiple different inputs over different instances of EC2
Outline
Basics of servers
Amazon EC2
Setup R on an EC2 instance
How to use stat department servers
. . . PhD student Song Cai gave a talk on it last year and he asked me to give one on it this year. His slides are very good (concise) so we will just go over it together.