iCER Bioinformatics Support
Fall 2011
John B. Johnston
HPC Programmer
Institute for Cyber Enabled Research
Institute for Cyber Enabled
Research (iCER)
•Hardware (HPCC)
•Software and Support
•Education
•Consulting
iCER: What is it?
The Institute for Cyber Enabled Research (iCER) at Michigan State University (MSU) was
established to coordinate and support
multidisciplinary resource for computation and
computational sciences.
The Center's goal is to enhance MSU's national and international presence and competitive edge in disciplines and research thrusts that rely on
HPCC: What is it?
The HPCC provides computational
hardware and support to MSU faculty,
students and researchers.
The HPCC is contained within iCER;
effectively representing the hardware,
systems and software “arm” of iCER’s
research support mission.
Bioinformatics Outreach
•HPCC hardware •Software resources •Help Desk •Seminars •One-on-one Consulting•Limited on-site systems setup and configuration •Programming and scripting assistance
•FREE!
HPCC Cluster Overview
•Linux operating system
•Primary interface is text based though Secure Shell (ssh) •All Machines in the main cluster are binary compatible (compile once, run anywhere)
•Each user has 50Gigs of personal hard drive space.
–/mnt/home/username/
•Users have access to 33TB of scratch space.
–/mnt/scratch/username/
•A scheduler is used to manage jobs running on the cluster •A submission script is used to tell the scheduler the
resources required and how to run a job
•A Module system is used to manage the loading and unloading of software configurations
Gateway to the System
•Access to HPCC is primarily though the
gateway machine:
–ssh
–ssh
–Access to all HPCC services uses MSU
username and password.
•Once in, you can go to the user-oriented
destination of choice.
Why the HPCC Cluster?
•Large data sets•Lots of number crunching
•A need to run many simultaneous jobs with
different data sets and/or configuration settings •You need software you don’t have, don’t want to / can’t setup
•Comprehensive readymade development environment that is actively administered
Linux? OH NOES!
•If you are a Linux pro, go ahead and take a short nap (you’ve got ~60
seconds)
•If you’re not, don’t worry! That’s why I get the (not so) big bucks.
•The Bioinformatics Help Desk is here to get you up and running.
Linux Support
•Client application selection•Bring in your laptop (if you have one)
•Cookbook tutorials and cheat sheets (more on the way)
•One-on-one consultation
•Limited on-site support and training
•We also provide samba support for Windows and Mac boxes so you can map your HPCC account directory to your workstation
HPCC Online Resources
•
www.hpcc.msu.edu
– HPCC home
•
wiki.hpcc.msu.edu
– Public/Private Wiki
•
forums.hpcc.msu.edu
– User forums
•
rt.hpcc.msu.edu
– Help desk request
tracking
Available Software
•Center Supported Development Software
–Intel compilers, openmp, openmpi, mvapich, totalview, mkl, pathscale, gnu...
•Center Supported Research Software
–Matlab, R, amber, blast, charmm, emboss...
•Customer Software (module use.cus)
–Clustalw, QuEST, MEME, Velvet, mpiBLAST, bowtie, AMOS, ABySS, MUMmer, HMMER, phylip, SAMTools… –For a more up to date list, see the documentation wiki:
http://wiki.hpcc.msu.edu/
User Software
•50GB of initial user space provided •Install your own in user space
•HPCC offers a rich build environment •Quota increases can be made available
•Code installation and (modest) modification support is available through “moi”
Virtual Machines
•Virtual “Servers” expressed in software
•Available for research labs/working groups •Flavors currently available:
– Galaxy
– BLAST (web browser based) – UCSC Genome Browser
Database Offerings
•db-01: Internal MySQL database node attached to the cluster. Host user datasets of modest
size.
•BLAST database repository
•VM-based – UCSC for example
•Up to 1TB total user space for free, $250/yr. per TB thereafter
Multiprocessor Apps
Many bioinformatics applications are beginning to appear in multiprocessor-capable versions.
Workload can be divided to allow each processor to complete part of the job in parallel, decreasing run time.
HPC provides accessibility to a large number of processing cores, memory, and disk space.
Some Examples
•Multithreaded BLAST – shared memory •mpiBLAST – distributed memory
•Velvet Assembler – multithreaded shared mem •MAKER2 – MPI, distributed memory
Cluster Developer Nodes
•Developer Nodes are accessible from gateway and used for testing.
–ssh dev-amd05 – Same hardware as amd05 –ssh dev-intel07 – Same hardware as intel07 –ssh dev-intel10 – Same hardware as intel10 –ssh dev-amd09 – Same hardware as amd09 –ssh dev-gfx10 – Same hardware as gfx10
•We periodically have some test boxes:
–ssh dev-gfx08– Nvidia Graphics Processing Node –ssh dev-cell08 – Playstation 3 Cell processor
–ssh dev-intel09 – 8 core Intel Xeon with 24GB of memory
•Jobs running on the developer nodes should be limited to two hours of walltime.
Steps in Using the HPCC
•
Connect to HPCC• Determine required software
• Transfer required input files and source code
• Compile programs (if needed)
• Test software/programs on a developer node • Write a submission script
• Submit the job
A couple of examples
•Biological model – long running, many similar but not identical runs
•Multiprocessor BLAST searches •Multiprocessor Velvet assembly
•Use of the HPCC cluster was able to produce more results in less time, with little or no active user management
But I don’t need a “cluster”
•Tool selection, setup
•Scripting assistance
•Data “browsing”, sharing, group analysis
•Lab help or training
Scripting
•Customized, standardized, modify •Python, Perl, or ?
•We have a growing “collection” available as a Git repository.
•Perhaps you don’t know anything about
scripting; or maybe you do, but could use some help?
Tutorials
•Titus Brown's ANGUS-NGS tutorials, converted for using examples on HPC instead of Amazon •Using UCSC for certain tasks
•mpiBLAST
•Velvet and Oases
Seminars and Education
•NextGen Bioinformatics Seminars
•wiki.hpcc.msu.edu/display/Bioinfo/NextGen+Bioinformatics+Seminars
•HPCC Mid-Morning Break
Setting up an account
•All account requests must come via a PI.
•Have your PI fill-in the form at:
–
www.hpcc.msu.edu/request
•Once received, we will process your
request and notify you when your account
is ready.
Bioinformatics Contact
•John Johnston, HPC Programmer –M-W, 1449 BPS, 884-2572
–Th-F, 505 BMB, 432-7177
•Ticket requests:
–https://rt.hpcc.msu.edu/index.html
–Please include “Bioinformatics Help” in the subject to more quickly route your request.