VPAC100: Introduction to High Performance
Computing Using Linux
Welcome to the VPAC training course.
Moore's law, first observed by Gordon Moore of Fairchild Camera and Instrument Corp. in 1965, states that the numbers of transistors placed on a minimum cost integrated circuit doubles approximately every twenty-four months. Moore was discussing computing power from the previous six years from his observation, and yet it has remained largely true since than as well, and indeed has been extended to other metrics.
For the user, it means we has meant presently we can take advantage of computer resources an order of magnitude roughly every four years. That, coupled along with parallel architecture and programming advances have allowed researchers to tackle simulations and data sets of enormous magnitude never before thought possible. Your attendance here today is a good sign for us that you wish to be part of that trend.
The introduction course is designed to cater for a wide range of computer skill sets; covering basic concepts and command line interfaces for the absolute beginner, through to setting environment variables with modules and submitting jobs. It is certainly not necessary for the beginner to dwell too much on the advanced material, nor for the expert to suffer familiar territory.
After working through the basic operations, job submission sections, and advanced sections users will be given the chance to work through a tutorial where they can submit a simple jobs to the cluster using a molecular dynamics program, NAMD, a finite element analysis, Abaqus, and a cloud-enabled submission for the statistical package, R. Each of these will outline and emphasise the basic workflow procedure for submitting high performance computing jobs.
By attending the training course today, we hope that you are able to learn what VPAC High performance computing capabilities and services are, and able to take full advantage of them. We understand that for many of you, the computing front end can be a little intimidating.
We are here to help!
There is no such thing as a bad question, - a simple answer is all the difference from swirling in a dark whirlpool to seeing the proverbial light.
Good luck &
Table of Contents
1.0 Introduction to VPAC 5
Who We Are
Creating an Account At VPAC 5
VPAC's Hardware 5
VPAC's Software 6
Adding Licensed Software
What Is An HPC System?
How to Use RT 22
2.0 An Introduction to Linux 24
Logging On 24
Exploring the Environment 25
Files and Editing Files 27
Transferring Files 28
Editing, Creating Directories, Moving Files 30 Copying Directories, File Differences 31 Searching for and within files, Wildcards 32 3.0 Environment Modules and the Portable Batch System 34
Environment Modules 34
Portable Batch System 38
Grid Australia 42 4.0 HPC Tutorials 43 NAMD and VMD 43 Abaqus 49 R and Grisu 55 Further HPC Exercises 58
1.0 An Introduction To VPAC
Who We Are
The Victorian Partnership for Advanced Computing (VPAC) is is a not for profit registered research agency established in 2000 by a consortium of Victorian Universities. VPAC's members include Deakin University, LaTrobe University, Monash University, RMIT University, Swinburne University of Technology, the University of Melbourne, the University of Ballarat and Victoria University.
VPAC's purpose is to provide independent expert services, training, and support in advanced computing to its members. Person's employed or enrolled at any of the member Universities are entitled to have an account subject to approval by their University Operations Committee member. Such an account provides access to a number of supercomputers for High Performance Computing (HPC) in addition to technical support and training.
Our activities are in a number of fields:
• High Performance Computing (HPC) • Grid computing (lead agent of ARCS)
• Engineering (work with Holden, AutoCRC, etc) • Life Sciences (our own pet molecular modeller) • Geodynamics (with Monash and Caltech)
Each Member has a share of the cycles, proportional to subscription. For users there is no charge for use. A share is also reserved for Industry. There is also extra capacity for VLSCI Stage 0. and access includes programming support and advice for use of the facility.
What We Have (Hardware)
For researchers, VPAC's internationally recognised HPC facility provides advanced computing tools and capabilities is managed by an expert team of software engineers and system administrators who provide high-level support. These supercomputers run a wide variety of compilers, scientific and mathematical programs, and libraries including:
There are three clusters that VPAC manages, Wexstan, Edda and Tango. Each of these are Linux-based systems and are summarised as follows:
Wexstan 16 IBM e325 nodes 32 64-bit CPUs,
using 2 GHz Dual Opteron and Myrinet backbone interconnect, running CentOS 5 GNU/Linux as the operating system. 56 GB Ram and 0.5 Terabytes of Hard Disk Storage.Software on wexstan.vpac.org includes:
abaqus fftw fluent gcc gmp hdf5 hyperworks intel iprscan java latentgold mesa mpfr namd openmpi perl petsc pgi python tcl wine
–
Edda
188 CPU,
47 node Linux cluster, using Power5 1.67GHz CPUs running SuSE 9.0 GNU/Linux. Half the nodes have 16GB of RAM each and the other half 8GB each. Edda's benchmark performance is estimated to be at 1 teraflop.Software on edda.vpac.org includes:
amber blender dock fds fftw gamess gcc gdal gmp gromacs ibmxlc ibmxlf imagemagick java lam lsdyna marmot mesa mpich mpj mrbayes namd ncarg neinastran openfoam openmpi perl petsc pgplot povray python scons srb stgermain underworld valgrind xmds
Tango
760 CPUs, 92 Compute nodes, with Quad Core AMD Opteron GNU/Linux CPUs, using Infiniband interconnect (~2µslatency) and running 64-bit CentOS 5. Each node has 32GB RAM and four 320 GB disks.
–
Abaqus ACML AMBER ANSYS Ant Atlas Autodock AutoGrow binutils BLAST Blender BLT Bonnie++ Boost BWA CDAT CUDA DAWN DDD DOCK Ecat EGSnrc EM3D? EMBOSS FDS FFTW Fluent freeglut Freesurfer FSL Gamess GATE Gaussian GCC GD GDB GEANT GENREG GEOS Git GLUE GMP GotoBLAS Grace Graphviz GROMACS GSL H5utils Harminv HDF5 Hadoop HPMPI
HyperWorks hypre IDL IMOD Intel Compilers Intel-mpi IPRSCAN ISP ITKSNAP JAIDA JAS3 JasPer Java lal lammps latentgold libctl libelf libfame libgdiplus libpng libsvm libtool ligplot lmf lp_solve lsdyna madymo maq marmot mash matio MATLAB meep meme mesa metaio mgltools
minibaum molden molekel mono mopac mpfr mpiblast mpich mpiexec
mrbayes mummer mvapich namd nauty neinastran netcdf ns octave openbabel openfoam openmpi padb pahole paraview pbsssh pcre perl petsc PGI
compilers phaser povray python quilt R rosetta rysnc sabre schrodinger scilab scons semtex smem speccpu spinner srb stata svm-perf szlib tau tcl tk tkcon torque ultrascan underworld valgrind velvet visualdoc vmd wien2k wine xfoil xmds zlib
What We Have (Software)
Abaqus: A package for finite element analysis, usually applied in mechanical engineering.
ABWT: The AB WT Analysis Pipeline is an off-instrument SOLiD data analysis software package for the analysis of experiments run. It maps reads from a transcript sample to a reference genome and assigning tag counts to features of the reference genome.
ACML: The AMD Core Math Library (ACML) is a set of optimised and threaded math routines, especially useful for computational-intensive tasks.
AMBER: Assisted Model Building with Energy Refinement (AMBER) is a family of force fields for molecular dynamics of biomolecules. AMBER is also the name for the molecular dynamics software package that simulates these force fields.
ANSYS: ANSYS is an engineering simulation for general-purpose finite element analysis and computational fluid dynamics.
Ant: Apache Ant as an automated software build processes, like Make etc, but is implemented using the Java language, requiring the Java platform, and best suited to building Java projects.
Atlas: Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra, providing an open source implementation of BLAS APIs for C and Fortran77.
Autodock: AutoDock is a suite of docking tools designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.
AutoGrow: AutoGrow uses AutoDock as the selection operator. For each
generation, all ligand files are docked to the target protein, and for each dock, AutoDock returns a predicted binding affinity. AutoGrow (Java DOCK), uses fragment-based growing, docking, and evolutionary techniques.
BEAM: BEAMnrc is a general purpose Monte Carlo simulation system for modelling radiotherapy sources which is based on the EGSnrcMP code system for modelling coupled electron and photon transport.
binutils: The GNU Binary Utilities, or binutils, is a collection of programming tools for the manipulation of object code in various object file formats. They are typically used in conjunction with GNU Compiler Collection, make, and GDB.
BLAST: The NCBI Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between DNA sequences sequences and can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene
families.
Blender: Blender is a 3D graphics application used for modeling, UV unwrapping, texturing, rigging, water simulations, skinning, animating, rendering, particle, and other simulations, non-linear editing, compositing, and creating interactive 3D applications. The image (right) was created with Blender.
BLT: The BLT library is an extension to Tcl/Tk. that adds plotting widgets, a geometry manager, a canvas item, and several new commands to Tk.
number of simple tests of hard drive and file system performance.
Boost: The Boost C++ libraries are a collection of open source libraries that extend the functionality of C++. They range from general-purpose libraries like the smart_ptr library to libraries primarily aimed at other library
developers and advanced C++ users, like the metaprogramming template (MPL) and DSL creation (Proto).
BWA: The Burrows-Wheeler Alignment (BWA) Tool is a fast light-weight tool that aligns short sequences to a sequence database, such as the human reference genome.
Circuitscape: Circuitscape is a free, open-source program which borrows algorithms from electronic circuit theory to predict patterns of movement, gene flow, and genetic differentiation among plant and animal populations in
heterogeneous landscapes.
CDAT: The Climate Data Analysis Tools (CDAT) is a software infrastructure that uses Python. The CDAT subsystems, implemented as Python modules, provide access to and management of gridded data (Climate Data Management System or CDMS); large-array numerical operations (Numerical Python); and visualization (Visualization and Control System or VCS). The image (left) is a composite of CDAT windows.
CLHEP: CLHEP (Class Library for High Energy Physics) is a C++ library that provides utility classes for general numerical programming, vector arithmetic, geometry, pseudorandom number generation, and linear algebra, specifically targeted for high energy physics simulation and analysis software.
CPMD: The Car-Parrinello Molecular Dynamics code is a parallelized plane wave/pseudopotential implementation of Density Functional Theory,
particularly designed for ab-initio molecular dynamics.
CUDA: The NVIDIA CUDA Toolkit includes accelerated BLAS and FFT
implementations, parallel thread execution and CUDA command line compiler. CUDA (an acronym for Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA, the computing engine in NVIDIA graphics processing units (GPUs).
DAWN: Drawer for Academic WritiNgs (DAWN) is a renderer which reads 3D geometrical data and visualizes with a vectorized 3D PostScript processor with analytical hidden line/surface removal.
to perform high-speed molecular dynamics simulations of biological systems on parallel systems that is available as part of Schrodinger or as a stand-alone package.
DDD: Data Display Debugger (DDD) is a graphical front-end for command-line debuggers such as GDB, DBX, WDB, Ladebug, JDB, XDB, the Perl debugger, the bash debugger bashdb, the GNU Make debugger remake, or the Python debugger pydb. Besides typical features such as viewing source texts, DDD has also provides interactive graphical data display, where data structures are displayed as graphs.
DOCK: DOCK simulates the problem of docking molecules to each other. In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex.
Ecat: Comprehensive C Library and utilities to handle Ecat, Interfile and Analyze datasets.Allows conversion, and access to file internals.
EGSnrc: EGSnrc is a package for the Monte Carlo simulation of coupled electron-photon transport.
EM3D: EM3D is an integrated software application designed to facilitate the analysis and visualization of electron microscope (EM) tomography data by cellular and molecular biologists.
EMBOSS: The European Molecular Biology Open Software Suite (EMBOSS) is a molecular biology tool which copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.
FDS: Fire Dynamics Simulator (FDS) is a computational fluid dynamics model of fire-driven fluid flow. The software solves numerically a form of the Navier-Stokes equations appropriate for low-speed, thermally-driven flow, with an emphasis on smoke and heat transport from fires.
FFTW: "Fastest Fourier Transform in the West" (FFTW) is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).
FLUENT: FLUENT is a flexible general-purpose computational fluid dynamics package used for engineering sJ20.15e.o1291612imulations of all levels of complexity.
freeglut: freeglut is a completely opensource alternative to the OpenGL Utility Toolkit (GLUT) library.
FreeSurfer: FreeSurfer is a set of automated tools for reconstruction of the brain’s cortical surface from
structural MRI data, and overlay of functional MRI data onto the reconstructed surface. The image (left) is from the Surfstat page, a MATLAB toolbox that works with FreeSurfer.
FSL: FSL is a
comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data.
G4beamline: G4Beamline is a particle tracking and simulation program based on the Geant4 toolkit that is specifically designed to easily simulate beamlines and other systems using single-particle tracking.
GAMESS: General Atomic and Molecular Electronic Structure System
(GAMESS) is a program for ab initio molecular quantum chemistry. A variety of molecular properties, ranging from simple dipole moments to frequency dependent hyperpolarizabilities may be computed.
GATE: The Geant4 Application for Emission Tomography (GATE) provides comprehensive physics modeling abilities of the general purpose codes while making it possible to intuitively configurate an Emission Tomography
simulation. GATE allows the accurate description of time-dependent
phenomena such as source or detector movement and source decay kinetics.
Gaussian: Gaussian provides electronic structure modeling which can be applied to both stable species and compounds which are difficult or impossible to observe experimentally. Gaussian can be used for comprehensive
investigations of molecules and reactions, predicting and interpreting spectra, exploring diverse chemical arenas, and complex modelling.
GCC: The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages including C and C++ with front ends for Fortran, Pascal, Objective-C, Java, Ada and others.
GD: The Graphics Draw (GD) Library is a graphics software library for
dynamically manipulating images. Its native programming language is ANSI C, but it has interfaces for many other programming languages.
GDB: The GNU Debugger (GDB) is the standard debugger for the GNU software system. It is a portable debugger that runs on many Unix-like
systems and works for many programming languages, including Ada, C, C++, FreeBASIC, FreePascal and Fortran.
GEANT: GEometry ANd Tracking (GEANT) is a simulation software designed to describe the passage of elementary particles through matter, using Monte Carlo methods.
GENREG: Generator fuer regulaere Graphen (GENREG) generates regular graphs for the chosen parameters and constructs them.
GEOS: GEOS (Geometry Engine - Open Source) is a C++ port of the Java Topology Suite (JTS). As such, it aims to contain the complete functionality of JTS in C++. This includes all the OpenGIS Simple Features for SQL spatial predicate functions and spatial operators, as well as specific JTS enhanced topology functions.
Git: Global information tracker (Git) is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
GLUE: Grid LSC User Environment (GLUE) is a collection of utilities for running data analysis pipelines for online and offline analysis as well as accessing various grid utilities. It also provides the infrastructure for the segment database.
GMP: GNU MP (GMP) is a library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. It has a rich set of functions, and the functions have a regular interface. It is
particularly designed for speed.
GotoBLAS: The GotoBLAS codes are a fast implementation of the Basic Linear Algebra Subroutines. The advantage is fast calculation which makes use of all instruction sets of modern processors.
Grace: Grace is a tool to make two-dimensional plots of numerical data. It combines the convenience of a graphical user interface with the power of a scripting language which enables it to do sophisticated calculations or perform automated tasks.
Graphviz: Graphviz is open source graph visualization software with several main graph layout programs, interactive graphical interfaces, auxiliary tools, libraries, and language bindings.
GROMACS: The GROningen MAchine for Chemical Simulations (GROMACS) is a molecular dynamics simulation package that is very fast and has support for different force fields. It is notable for being used for protein folding at Folding@Home. The image (right, below) is Gromacs in action.
GSL: The GNU Scientific Library (GSL), is a collection of numerical routines for scientific computing. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting.
H5utils: The package
h5utils is a set of utilities for visualization and conversion of scientific data in the free,
portable HDF5 format along with programs to convert HDF5 datasets into the formats required by other free visualization software (e.g. plain text, Vis5d, and VTK).
Harminv: Harminv is used to solve problems of harmonic inversion - given a discrete-time, finite-length signal that consists of a sum of finitely-many sinusoids (possibly exponentially decaying) in a given bandwidth, it
determines the frequencies, decay constants, amplitudes, and phases of those sinusoids. It can, in principle, provide much better accuracy than
straightforwardly extracting FFT peaks.
HDF5: HDF5 is a data model, library, and file format for storing and
managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
Hadoop (module hod): Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion.
HPMPI: HP-MPI is a high performance and production quality
implementation of the Message-Passing Interface (MPI) standard for HP servers and workstations.
HyperWorks: Altair HyperWorks is a computer-aided engineering simulation software platform that includes modeling, analysis, visualization and data management solutions for linear, nonlinear, structural optimization, fluid-structure interaction, and multi-body dynamics applications.
hypre: Hypre is a library of high performance preconditioners that features parallel multigrid methods for both structured and unstructured grid
IDL: Interactive Data Language (IDL) is a programming language used for data analysis. IDL is vectorized, numerical, and interactive, and is commonly used for interactive processing of large amounts of data (including image processing).
IMOD: IMOD is a set of image processing, modeling and display programs used for tomographic reconstruction and for 3D reconstruction of EM serial sections and optical sections.
IMSL: (International Mathematics and Statistics Library) is a commercial collection of software libraries of numerical analysis functionality that are implemented in the computer programming languages of C, Java, C#.NET, and Fortran.
Intel Compilers: Intel compilers are optimised for its hardware platforms to minimise stalls and produce code that executes in the fewest number of
cycles. Intel's suite of compilers has front ends for C, C++, and Fortran.
Intel-mpi: Intel's MPI Library and runtime environment for Linux.
IPRSCAN: IntroProScan (iprscan) is a tool that combines different protein signature recognition methods into one resource. InterProScan is more than a simple wrapping of sequence analysis applications since it requires
performing a considerable data look-ups from some databases and program outputs.
ISP: In-situ Partial Order (ISP) is a dynamic verifier for MPI Programs. ISP will help you debug your programs, and graphically show you all the possible send/receive matches, barrier synchronizations, etc.
ITKSNAP: ITK-SNAP is used to segment structures in 3D medical images, providing semi-automatic segmentation using active contour methods, as well as manual delineation and image navigation.
JAIDA: JAIDA is a Java (J) implementation of the Abstract Interfaces for Data Analysis (AIDA). JAIDA allows Java programmers to create histograms, scatter plots and tuples, perform fits, view plots and store and retrieve analysis
objects from files.
JAS3: JAS3 is a follow on from Java Analysis Studio (JAS), a general purpose data analysis tool for histograms, XY plots, scatterplots, export of plots in a variety of formats, and AIDA compliant analysis system.
JasPer: JasPer is a collection of software (i.e., a library and application programs) for the coding and manipulation of images.
Java: The Java programming language is an object-orientated language similar to C and C++ in syntax, but with a simpler object model and fewer low-level facilities. It is designed for developing cross-playment applications.
LAL: The LSC Algorithm Library (LAL) is a collection of routines written in ANSI C99 for gravitational wave data analysis.
LAMMPS: Large-scale Atomic/Molecular Massively Parallel Simulator
(LAMMPS) has potentials for soft materials and solid-state materials (metals,
semiconductors) and coarse-grained or mesoscopic
systems. It can be used to model atoms or as a parallel particle simulator at the atomic, meso, or continuum scale. The image (left) is atom-to-continuum coupling using LAMMPS.
LatentGOLD: Latent GOLD is a latent class and finite mixture program. Latent GOLD contains separate modules for estimating three different model structures; LC Cluster models, DFactor models, and LC Regression models.
libctl: libctl is a free Guile-based library implementing flexible control files written originally to support our Photonic Bands and Meep software.
libELF: LibELF lets you read, modify or create Executable and Linkage
Format (ELF) files in an architecture-independent way. The library takes care of size and endian issues.
libFAME: libFAME is a real-time MPEG-1 and MPEG-4 rectangular and arbitrary shaped video encoding library.
libframe: libFrame is a toolkit that contains various tools useful for
development in C++, ranging from a Config class to an Expression library, to a set of abstract tuple handling classes, to an event handling application frame work with configurable thread pooling.
libgdiplus: Libgdiplus is the Mono library that provide a GDI+ comptible API on non-Windows operating systems.
libpng: libpng is a Portable Network Graphics (PNG) reference library. Portable Network Graphics (PNG) is a bitmapped image format with lossless data compression.
libsvm: libsvm is a library for Support Vector Machines (SVM). SVMs are a set of related supervised learning methods used for classification and
libtool: GNU libtool is a generic library support script. Libtool hides the
complexity of using shared libraries behind a consistent, portable interface by encapsulating both the platform-specific dependencies, and the user interface, in a single script.
ligplot: Ligplot Is a program for automatically generating schematic diagrams of protein-ligand interactions for a given PDB file.
LMF: The Local Maximum Fitting (LMF) algorithm firstly finds local
maximums within a certain time window, and regenerate the time series data as a sum of harmonic curves. The number of harmonic curves is limited by AIC ( Akaike Information Criterion ) to avoid over-fitting.
LP_SOLVE: LP_SOLVE is a linear programming code written in ANSI C, which has solved problems as large as 30,000 variables and 50,000 constraints. Lp_solve can also handle (smaller) integer and mixed-integer problems.
LSDYNA: LS-DYNA is general-purpose multiphysics simulation software package typically used for highly nonlinear transient dynamic finite element analysis (FEA) using explicit time integration.
MADYMO: MAthematical DYnamic MOdels (MADYMO) is a multibody dynamics solver and frequently used for automobile occupant safety/injury calculations.
Maq: Maq builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for
Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.
Marmot: Marmot is a library written in C++, which has to be linked to your application in addition to the existing MPI library. It will check if your
application conforms to the MPI standard and will issue warnings if there are errors or non-portable constructs.
Mash: Mash is a toolkit for multimedia using IP multicast. The Mash toolkit is an outgrowth of the MBone tools (e.g.--sdr, vic, vat) developed to support streaming audio and video applications.
matio: Libmatio is an open-source library for reading/writing Matlab MAT files. This library is designed for use by programs/libraries that do not have access or do not want to rely on Matlab's libmat shared library.
MATLAB: MATLAB DCS is a numerical computing
environment allowing matrix manipulation, plotting of functions and data, implementation of
algorithms etc. on a cluster. The image (right) is the MATLAB desktop
environment.
MEEP: MIT Electromagnetic Equation Propagation
(MEEP) is a finite-difference time-domain (FDTD)
simulation software package
developed at MIT to model electromagnetic systems.
MEME: MEME provides tools for discovering and using protein and DNA sequence motifs, a pattern of nucleotides or amino acids that appears repeatedly in a group of related DNA or protein sequences.
Mesa: Mesa is an implementation of the OpenGL specification for rendering interactive 3D graphics, usable in a variety of environments from software emulation to GPUs.
Metaio: Metaio contains a library for parsing LIGO_LW Table files and can read XML files compressed with the gzip compression algorithm.
MGLTools: Developmed by the Molecular Graphics Laboratory (MGL) MGLTools, is used for visualization and analysis of molecular structures. It includes AutoDockTools (ADT), Python Molecular Viewer (PMV) and Vision, a visual-based programming environment.
Minibaum: Minibaum3 is a small C program which has been used for hypohamilton graphs and angular momentum graphs.
Molden: Molden displays molecular density from the ab initio packages GAMESS and GAUSSIAN and others. Molden reads all the required
information from the GAMESS / GAUSSIAN outputfile. Molden is capable of displaying molecular orbitals, electron density and molecular minus atomic density.
Molekel: Molekel is a molecular visualization program that import and
exports data using OpenBabel and displays molecules with different rendering styles, generates iosurfaces, and animates.
set of tools, including a C# compiler and a Common Language Runtime.
MOPAC: Molecular Orbital PACkage (MOPAC) is a semiempirical quantum chemistry program based on Dewar and Thiel's NDDO approximation.
MPFR: The GNU MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
mpiBLAST: mpiBLAST is an implementation of the bioinformatics software NCBI BLAST, which finds regions of local similarity between sequences.
Through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, it improves performance by several orders of magnitude.
MPICH: MPICH is a free and portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel
computing.
Mpiexec: Mpiexec is a replacement program for the script mpirun, which is part of the mpich package. It is used to initialize a parallel job from within a PBS batch or interactive environment.
MrBayes: MrBayes conducts Bayesian estimation of phylogeny based on the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations.
MUMmer: MUMmer is rapidly aligns entire genomes, including incomplete genomes and contigs from a shotgun sequenceing program.
MVAPICH: MVAPICH implements MPI over InfiniBand, 10GigE/iWARP and RDMA over Ethernet.
NAMD: NAnoscale Molecular Dynamics (NAMD) is a molecular dynamics simulation package written
using the Charm++ parallel programming model, often used to simulate large systems (e.g., millions of atoms).
nauty: nauty is a program for computing automorphism groups of graphs, digraphs and can also produce a canonical labelling.
NEI Nastran: NEi Nastran is a finite element analysis (FEA) solver used to
and nonlinear stress, dynamics, and heat transfer characteristics of structures and mechanical components. The image (right, above) is a Nastran model.
netCDF: The Unidata network Common Data Form (netCDF) is an interface for scientific data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format
support the creation, access, and sharing of scientific data.
NS: Network Simulator (NS) is a discrete event simulator targeted at
networking research. Ns provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless (local and satellite) networks.
Octave: GNU Octave is a high-level language, primarily intended for numerical computations and is highly compatiable with MATLAB.
Open Babel: Open Babel is a chemical toolbox which can read, write and convert over 90 chemical file formats, and filter and search molecular files using SMARTS and other methods.
OpenFOAM: Open Field Operation and Manipulation (OpenFOAM) is primarily a C++ toolbox for the customisation and extension of numerical solvers for continuum mechanics problems, including computational fluid dynamics.
Open MPI: Open MPI combines the merger of three major MPI
implementations (FT-MPI, LA-MPI, and LAM/MPI) to create a complete MPI-2 implementation.
Padb: Padb works as a parallel front end to gdb allowing it to target parallel applications.
pahole: Analyzes your code and identifies unused memory holes in data structures, and suggests re-ordering to improve memory usage and speed.
Paraview: ParaView is a data analysis and visualisation application. The data exploration can be done interactively in 3D or programmatically using
ParaView's batch processing capabilities. ParaView was developed to analyze extremely large datasets.
PBSssh: PBSssh: Is a Bourne-Again shell executable for Portable Batch Script.
PCRE: Perl-Compatible Regular Expression (PCRE) library is a set of
functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5.
language designed as a a general-purpose Unix scripting language to make report processing easier.
PETSc: PETSc is a suite of data structures and routines for solution of scientific applications modeled by partial differential equations.
PGI compilers: PGI compilers are a set of Fortran, C and C++ compilers for High Performance Computing Systems from Portland Group.
Phaser: Phaser consists of CCP4 and BLT. The former is used to determine macromolecular structures by X-ray crystallography, and other biophysical techniques. BLT has ben previously described.
POV-Ray: Persistence of Vision Raytracer (POV-RAY) is a high-quality for creating three-dimensional graphics. Features include radiosity, photon mapping, focal blur, and other photorealistic capabilities.
Python: Python is a general-purpose high-level programming language that aims for high levels of readability. It features a fully dynamic type system and automatic memory management. Like other dynamic languages it is often used as a scripting language.
Quilt: Quilt is a tool to manage large sets of patches by keeping track of the changes each patch makes. Patches can be applied, unapplied, refreshed, etc.
R: R is a programming language and software environment for statistical computing, where it is a defacto standard, and graphics.
Rosetta: Rosetta is a molecular modeling software package for understanding protein structures, protein design, protein docking, DNA and protein-protein interactions.
rysnc: Rsync copies files either to or from a remote host, or locally on the current host. It is quick because it only copies changed files.
SABRE: Software for the Analysis of Recurrent Events (SABRE) is a program for the statistical analysis of multi-process random effect response data. These responses can take the form of binary, ordinal, count and linear recurrent events.
Schrodinger: Jaguar is an ab initio quantum chemistry package for both gas and solution phase calculations, with strength in treating metal-containing systems produced by the Schrödinger company.
Scilab: Scilab is a scientific software package for numerical computations which includes hundreds of mathematical functions, sophisticated data
structures (including lists, polynomials, rational functions, linear systems), an interpreter and a high level programming language.
SCons: SCons is a software construction tool (build tool, or make tool) implemented in Python, that uses Python scripts as "configuration files" for software builds.
Semtex: Semtex is a family of spectral element simulation codes. The spectral element method is a high-order finite element technique that combines the geometric flexibility of finite elements with the high accuracy of spectral methods.
smem: smem is a memory reporting tool, notable for its ability to report proportional set size (PSS), which is a more meaningful representation of the amount of memory used by libraries and applications in a virtual memory system.
SPEC CPU2006: The Standard Performance Evaluation Corporation (SPEC) that has a standard set of relevant benchmarks for computer systems. SPEC CPU2006 measures the performance of the processor, memory architecture, and compilers.
Spinner: Spinner is an anti-idle program that displays a little "spinning" ASCII character in the top left corner of your terminal. Spinner is useful for keeping ssh links from dropping due to inactivity.
SRB: Storage Resource Broker (SRB) is a Data Grid Management System (DGMS) or simply a logical distributed file system based on a client-server architecture which presents the user with a single global logical namespace or file hierarchy.
Stata: Stata is a integrated statistical package that provides data analysis, data management, and graphics that includes linear mixed models,
multivariate methods, multinominal probit and Mata, a matrix language.
SVMperf: Support Vector Machine for Multivariate Performance Measures (SVMperf) is an implementation of the Support Vector Machine (SVM) formulation for optimizing multivariate performance measures and implements alternative structural formulation of the SVM optimization problem for conventional binary classification with error rate and ordinal regression.
Szlib: Szip is a freeware portable general purpose lossless compression program.
TAU: Tuning and Analysis Utilities (TAU) is a program and performance analysis tool for high-performance parallel and distributed computing with a suite of tools for static and dynamic analysis of programs written in C, C++, FORTRAN 77/90, Python, High Performance FORTRAN, and Java.
Tcl: Tool command language (Tcl) is a scripting language commonly used for rapid prototyping, scripted applications, GUIs and testing.
Tk: Tk is an a library of basic elements ("widgits") for building a graphical user interface. It is typically used with Tcl.
tkcon: tkcon is a replacement for the standard console that comes with Tk which provides many more features than the standard console.
TORQUE: Terascale Open-Source Resource and QUEue Manager (TORQUE) is a distributed resoource manager with notable fault tolerance, scalability and a useful scheduling interface.
UltraScan: UltraScan is used for the analysis of ultracentrifugation data. The software features an integrated data editing and analysis environment in graphical user interface, popular sedimentation and equilibrium analysis methods with support for velocity and equilibrium experiments, single and multi-channel centerpieces, absorbance and interference optics.
Underworld: Underworld is a 3D-parallel geodynamic modelling framework capable of deriving viscous / viscoplastic thermal, chemical and
thermochemical models consistent with tectonic processes, such as mantle convection and lithospheric deformation over long time scales.
Valgrind: Valgrind is an instrumentation framework for building dynamic analysis tools.
Velvet: Velvet is a set of algorithms manipulating de Bruijn graphs for genomic Sequence assembly. It was designed for short read sequencing technologies, such as Solexa or 454 Sequencing. The tool takes in short read sequences, removes errors then produces high quality unique contigs.
VisualDOC: VisualDOC is a general-purpose optimization tool that allows the user to quickly add design optimization capabilities to almost any analysis program.
VMD: Visual molecular dynamics (VMD) is a molecular modelling and visualization computer program. VMD is primarily developed as a tool for viewing and analyzing the results of molecular dynamics simulations.
WIEN2k: WIEN2k performs quantum mechanical calculations on periodic solids. It uses the full-potential (linearized) augmented plane-wave and local-orbitals basis set to solve the Kohn–Sham equations of density functional theory.
Wine: Wine Is Not an Emulator (Wine) allows Unix-like computer operating systems to execute programs written for Microsoft Windows.
XFOIL: XFOIL is an interactive program for the design and analysis of subsonic isolated airfoils.
XMDS: eXtensible Multi Dimensional Simulator (XMDS) is a numerical
simulation package that integrates equation that converts XML files to a C++ program that integrates the equations.
zlib: zlib is a general purpose data compression library with data formats specified by RFCs 1950 to 1952.
Adding Licensed Software
Additional software can be installed at users request depending on software licensing and availability.
What Is A HPC System? Why Clusters? Why Linux? Why the
Command Line?
High-performance computing (HPC) is the use of supercomputers and clusters to solve advanced computation problems. A supercomputer is a nebulous term for computer that is at the frontline of current processing capacity, particularly speed of calculation. In contemporary machines this is measured in tera- and peta- FLOPS (floating point operations per second). One type of supercomputer architecture are clustered computers. Simply put, there are a collection of smaller computers strapped together with tha high-speed locla network. Applications can be parallelised across them through programming.
Clustered computing is when two or more computers serve a single resource. This improves performance and provides redundancy in case of failure system. Parallel computing refers to the submission of jobs or processes over one or more processors and by splitting up the task between them. By the way of analogy consider a horse and cart as computer system and the load as the computing tasks. If one wants to move a greater load there is essentially three options.
● Re-arrange the load so it is more efficiently arranged. This is analogous
to improving the code. It can help, and help significantly, but its ultimately limited.
● Purchase a bigger cart and a bigger horse to move the load. This is
analogous to buying a bigger computer and getting better software. In computing, this rapidly develops decreasing returns.
● Distribute the load among several carts and horses, managed by a
teamster. This is analogous to parallel processing in a cluster. It is the most cost-efficient and most scalable method.
VPAC is vendor-neutral, meaning that we use the best technology for the job. The best operating system technology for high performance, clustered systems and parallel computing is a UNIX-like operating system such as GNU/Linux. The reasons for this is are manifold.
Firstly, GNU/Linux scales and does so with stability and efficiency. Secondly, critical software such as the Message Parsing Interface (MPI) and nearly all scientific programs are designed to work with GNU/Linux. Thirdly, the operating system and many applications are provided as "free and open source", which means that not only are there are some financial savings, were also much better placed to improve, optimize and maintain specific programs.
Finally, there is the command line. For most users a Graphic User Interface (GUI) is how they interact with a computer system, and there are some advantages with this, not the least being a usually intuitive visual representation for actions. However this takes up significant computer resources. In contrast a command-line interface provides a great deal more power and is very resource efficient. Running supercomputers with a GUI is not a sound policy.
These reasons correlate with actual application in the real world. In November 2009 of the Top 500 Supercomputers worldwide, only about 1% did not use a "UNIX-like" operating system (a decline from 1.4% on November 2007) and nearly all use an operating system that is entirely "free and open source" (a small percentage use a combination of free and proprietary systems). It is essential therefore, to become familiar with Linux and the command-line if one want to use High Performance Computing.
2.0 An Introduction to Linux
In this introduction to Linux part of the course we will engage in several tasks. The first will be to log into a Linux system and familiarise ourselves with the environment. We will then create some files on the local machine and copy those files to the supercomputer. We will then log on to the
supercomputer, modify those files and copy them back the local computer. Back on the local computer we'll create a directory, move the files to that directory and run some very basic search functions.
When we want you to enter a command or follow a menu path, the font will be in Courier 10 pitch with a grey background.
1. Logging On
From the Main Menu bar select Application > System Tools >
Terminal. This will open a command line interface. When that is open, go the terminal menu and select File > Open Terminal.You should now have two terminal windos open.
In the first terminal window we'll explore some of the basic commands on the local machine In the second terminal window we'll do the same, but on the supercomputer.
To log on to a VPAC supercomputer, you will need a user account and password and a Secure Shell 2 (ssh) client. VPAC does not allow protocols such as Telnet, FTP or RSH as they insecurely send passwords in plain-text over the network.
Linux distributions almost always include SSH as part of the default installation as does Mac OS 10.x, although you may also wish to use the Fugu SSH client. For MS-Windows users, we recommend using the PuTTY SSH client.
If using Mac OS 10.x, you will probably want to add a terminal alias to your dock. From the Macintosh HDD and go to the Applications folder, then Utilities from within that. Terminal is in the Utilities folder. Drag it to an empty space in the Dock, and the operating system will put an alias there. If you are using a graphic interface for Linux, like GNOME or KDE, you may wish to do the same with one of the terminal clients and panels.
If you're using MS-Windows, download PuTTY. In the 'Host Name' box, enter the server you want to connect to (e.g., tango.vpac.org) and select SSH from the 'Connection type' radio button. Verify the host key when connecting for the first time. You will also probably want to have X-forwarding enabled for any connections that require graphic forwarding. See the following image:
In the 'Host Name' box, enter the server you want to connect to (e.g., tango.vpac.org) and select ssh from the 'Connection type' radio button. Its useful to enter a session name, "Tango" in the above case and save it so you don't need to remember the details next time.
Generally, the other Putty settings will be fine as they are. One thing you might need if you are going to be using XWindows (to display a graphical interface from VPAC on your desktop) is to turn on XForwarding. You will also need some sort of "XWindows Server" installed on your desktop, perhaps XWin32 or Exceed3D. A possible free option is XMing, http://www.straightrunning.com/XmingNotes/
With Mac or Linux simply open the terminal client and enter your username and the machine you wish to connect to, followed by the password when connected. For example;
ssh <your
username>@tango.vpac.org
Secure shell opens a connection to a remote system.
If you want to enable graphic enabling you can use the -X or -Y (secure) options e.g., ssh X <your username>@tango.vpac.org
In our training course we are using Linux machines with the Fedora distribution installed. The ssh command above can be entered as it is written, replacing <your username> with the account you have been provided
(train01, train02 etc).
2. Exploring The Environment
The first thing we'll do is explore the environment of the command-line on both our local machine and the supercomputer. On both these systems, run the following commands.
whoami "Who Am I?; prints the effective user id.
pwd "Print working directory"; prints the directory where you're currently in. When a user logs in on a Linux or other UNIX-like system on the command line, they start in their home directory. The output of the above command should be: /home/<username>
Now let's run a listing for the directory on both the local computer and the supercomputer:
ls "List"; lists contents for particular directory, the current directory by default.
Linux commands often come with options expressed as:
<command> <option[s]>
Run this command on both the local computer and the supercomputer.
ls lart "List" with long format, including file permissions (l), include hidden files ('a', for all), sorted by reverse order ('r'), by modification time ('t').
Linux also have very useful 'pipes' and redirect commands. To pipe one command through another use the '|' symbol.
The who command how who is currently logged into the system. You may suspect that this will differ on the supercomputer and the local system! Run the command on the local computer and then 'pipe' the who command through
Run this command on the supercomputer.
who u | less "Who" shows who is logged on, how long they've been idle and piped through the less command.
Another environment feature to explore is the ps or process status command. A number of programs can be run by a one or more users simultaneously, including helping programs called daemons. If no options are added ps selects all processes with the same effective user ID (euid=EUID) as the current user and associated with the same terminal as the invoker. To see what is running, who is running it, the process ID, and how much CPU they are using use:
ps afux | less "ps" provides a list of current processes. The 'a' option list the processes of all users, the 'f' shows job hierarchy, the 'u' option provides additional information about each process, and the 'x' option includes non-user programs such as daemons. This is piped through less.
To redirect output use the '>' symbol. To redirect input (for example, to feed data to a command) use the '<'. Concatenation is achieved through the use of '>>' symbol.
Run this command on the supercomputer.
w > list.txt The command 'w' acts like a
combination of who, uptime and ps -a. This is redirected to the file list.txt This command lists the current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes and redirects this information to the file list.txt
Run the ls command to make sure you have the file list.txt on both the
supercomputer and concatenate the file and print on the standard output (i.e., the terminal) using the cat command to ensure that the data from the who
command is there.
ls
3. Files and Editing Files
Linux expresses its files as words made up of pretty much any
characters, excepting the slash (/) which is used for directory navigation. In general however it is a good idea to avoid filenames with punctuation marks, non-printing characters (including spaces) as these can lead to some
difficulties and annoyances, especially on the command-line level. It is a convention to use underscores instead of spaces e.g.,
this_is_a_long_name.txt
Linux is case-sensitive with its filenames. This means that that list.txt
is a different file to LIST.TXT, or even lisT.txT. Files do not usually require a program association suffix (the C compiler is an exception, for example),
although you may find this convinient. The file list can be opened by a text-editor just as easily as list.txt.
There are three text editors usually available on Linux systems. The first is nano, a very easy to use clone from the Pine email client that uses control keys with a the equivalent of a "shortcut bar". The hefty EMacs (Editor Macros) editor and environment is a feature-rich program that was first written in 1976 and is now up to version 22.1!
Also from 1976 is Vim (Vi improved) which is a series of enhancements build on the "screen orientated" text editor vi. Vim is generally understood to be a modal editor, operating either in a insert mode (where typed text
becomes part of the document) or command mode (where keystrokes are interpreted as commands that control the edit session). Vi or Vim are often installed as the default text editor.
In "UNIX culture" EMacs and Vim are considered favourites among experienced users, with nano considered the best for beginners. There are also long-running, and largely tongue-in-cheek, "editor wars" with various proponents debating the relative merits of different editors.
Today we will use the nano editor to make some changes to the file
list.txt which is on the supercomputer. To open the file for editing simply type:
nano list.txt
Use Cntrl-V or the cursor keys to navigate to the bottom of the page. Enter a blank line, then add these words of literary wisdom from Theodor Seuss Geisel.
One fish, two fish, red fish, blue fish
Nano; A Simple Text Editor
Nano is an easy to use text editor typically available on Linux systems. Nano, a very easy to use clone from the Pine email client that uses control keys with a the equivalent of a "shortcut bar".
With nano editing is very intuitive. Start with nano <filename> on the command prompt. One can type straight to the display and editing is a simple function of simultaneously using Ctrl and a keystroke. The most commonly used key combinations are available on the bottom of the screen, including cutting (^K) and pasting ("uncutting", ^U) lines of text, searching ("where is", ^W), opening ("read a file", ^R), saving files ("write out", ^O), scrolling up and down the text (^Y, ^V). Further commands can be displayed through invoking help ("get help", ^G) such as search and replace (M-R, ie., meta key, usually Esc and 'R').
4. Transferring Files in General
To move files to and from the supercomputer and one's desktop you need to use an SCP (secure copy protocol) or SFTP (secure file transfer protocol) over SSH. If you are using Linux or Mac, you will be able to do this with the standard command-line interface with the general procedure of;
scp source destination
This however doesn't quite give the full story. Both the source or the destination may include a username and address, although if one is running the command the source or destination machine the account information does not have to be entered for that machine. Often a path to the files and
directories will be required as well. However remembering the order source then destination is good shorthand.
The following is a more elaborate version of scp:
scp source.address:/path/to/source destination.address:/path/to/destination/
If you are using MS-Windows, we recommend using WinSCP, or, if using certain applications such as MATLAB, the PuTTY Secure Copy client PSCP. WinSCP comes with a intuitive GUI that provides basic file management functionality in addition to Secure Shell and Secure Copy functions.
Linux and Mac users can also use a GUI for secure file transfers. For Linux users this is typically inbuilt with the file browser application. For Mac users you might wish to consider Fugu or Cyberduck (links below).
Transferring files With Rsync and Unison
RSync provides a way to keep two repositories of files "in sync", one of these repositories may be on your desktop, the other your home directory at VPAC. The nice feature of rsync is that it is very fast - after the initial backup. The reason for this is that it tracks changes. There is no point copying and re-writing an entire file when only a handful of characters have changed. The following is the basic command for rsync between two Linux machines:
rsync av e ssh source/ [email protected]:/path/to/destination/
The -av ensures that it is in archive mode (recursive, copies symlinks,
preserves permissions) and is verbose. The -e is to specify a transfer protocol, in this case ssh. Note that rsync is "trailing slash sensitive". A trailing / on a source means "copy the contents of this directory". Without a trailing slash it means "copy the directory".
A tutorial on using rsync as a backup tool can be found at
using rsync for MS-Windows at
http://optics.ph.unimelb.edu.au/help/rsync/rsync_pc1.html
An alternative to rsync is Unison, maintained by Benjamin C. Pierce, professor in the Department of Computer and Information Science at the University of Pennsylvania. It is available for Linux, Mac and MS-Windows and can be operated through a GUI or with command-line tools. The main technical difference is that it is a two-way synchronisation tool, whereas rsync effectively offers one-way mirroring. For example, if you use rsync to mirror a file from the cluster to your desktop, then modify that file on the desktop, you risk ovewriting the changes you made the next time you conduct an rsync. Unison can work out what has changed, keep different versions and even merge the changes.
The basic command line operation is achieved in a very similar syntax:
unison a.tmp ssh://remotehostname/a.tmp
The main disadvantage with Unison is that is not as widely deployed as rsync and requires installation on local and remote machines, and preferably of the same version.
Putty is available from:
http://www.chiark.greenend.org.uk/~sgtatham/putty/
WinSCP is available from:
http://winscp.net
Fugu is available from:
http://rsug.itd.umich.edu/software/fugu/
For training manual http://cyberduck.ch/ SCP client for Mac.
More information on OpenSSH and the latest version can be found at:
http://www.openssh.com/
A tutorial on using rsync as a backup tool
http://www.mikerubel.org/computers/rsync_snapshots/
Rsync for MS-Windows
http://optics.ph.unimelb.edu.au/help/rsync/rsync_pc1.html
Unison for Linux, Mac and MS-Windows
http://www.cis.upenn.edu/~bcpierce/unison/
5. Transferring Files from the Supercomputer to Local
We are going to copy the file list.txt from the supercomputer to the local machine. From the local machine enter the following command:
scp <username>@tango.vpac.org:list.txt .
Be sure to replace <username> with your username on the
supercomputer (train01, train02, train03 etc). When the transfer is complete check on the local machine that the file has transferred with ls.
Two questions might come to mind when entering the command; firstly, why aren't we running the command on the supercomputer (a "put", rather than a "get") and secondly, what is the "." for?
To answer the first question, it must be remembered that in order for a copy of files to occur, both machines have to know where the other one is, translated from hostnames and Internet Protocol (IP) addresses. In most cases, local machines use private IP addresses, not public addresses. For example, if you were on the supercomputer and wanted to copy the file to a local machine you might think the following could work, as it follows the suggested format of source and destination and uses the correct command:
scp list.txt <username>@192.168.1.100
The problem is, which of the multitudes of computers out there with a private address of 192.168.1.100 do you want to copy list.txt to? How would the supercomputer know which machines have this private addresses? Even if it could find out, it would take a very long time to connect to all the switches in the world to find out!
Explaining the '.' is a little easier. It simply refers to the current directory. Thus in the example, the source is tango, the destination is the directory the command is being run in. One could use the command cd .
which would mean 'change directory to the directory you are currently in', which is pretty pointless. More useful however is cd .. which means change directory to the parent of the current directory.
6. Editing list.txt, Creating Directories, Moving Files
We know have list.txt on the local computer. Let's add some new material to it before sending it back to the supercomputer. On the local computer enter:
nano list.txt
Navigate to the bottom of the file using Cntrl-V or the cursor keys and enter the following lines:
This one has a little star. This one has a little car.
Then write out the file (Cntrl-O) and exit (Cntrl-X).
Our next step will be to create a directory to put this file in and then move the file into that directory. The navigate to the directory and make sure that it is there. On the local computer enter:
mkdir seuss
mv list.txt seuss/ cd seuss
ls pwd
The output should be the list.txt file from the ls command and
/home/<user>/seuss from the pwd command.
7. Copying Directories, File Differences
The next step will be to copy the directory and its contents from the local computer to the supercomputer. This uses the scp command again, but this time with the -r (recursive) option, which will copy the directory and all subdirectories and files within it. On the local computer enter the following commands:
cd ~
scp r seuss/ <username>@tango.vpac.org:
Now on the supercomputer do a directory listing but specify the file you want and with the time option. You should see a list.txt in your home
directory (the original one) and a seuss directory. Move into the seuss
directory and run a directory listing again with the same option. There should be another list.txt file, the one you just moved, but you will notice it has a different timestamp.
ls l list.txt cd seuss
ls l list.txt
Sometimes you may wish to compare the content of files as well as when they were created. To do this use the diff command. This compares files line-by-line and prints the differences to the screen. As usual there are a number of options which can be ascertained from the command man diff, but for now we'll just use the basic command. The command uses brackets to indicate where additional material is located. To illustrate this, let's add some lines to the first list.txt file and then run the diff comparison.
cd ~
Add the following lines.
Yes, some are red. And some are blue. Some are old. Some are new.
Write out (Cntrl-O) and exit (Cntl-X) and run the diff command.
diff list.txt seuss/list.txt
The output should be something like the following:
< Yes, some are red. And some are blue. < Some are old. Some are new. > This one has a little star. > This one has a little car. > Say! What a lot of fish there are.
For a side-by-side representation use the command sdiff instead.
8. Searching for and within files and Wildcards
Often you will want to search for files or search within files for a particular phrase. The find command, which will find files according to the directory and subdirectories offered, by name, modification time, size etc, and with filter operations, all of which are available through man find. To find all files with the suffix .txt on your supercomputer account use the following command:
cd ~
find . name '*.txt'
Note that the filter is within quotes, to ensure that the command is not expanded due to the wildcard.
To search within a collection of files use the grep command. It originally an abbreviation of "global search for a regular expression and print to standard output". The command searches the named input files for lines containing a match to the given pattern, including regular expressions, and prints the matching lines. As usual there are a variety of options available through man grep. The following command will search for the pattern 'red', ignoring case, within the directory seuss. Enter the following on the supercomputer:
cd ~
grep i red seuss/*
multiple results, grep will also display the filename. Compressed or gzipped files can be searched with zgrep.
The wildcard you see most often is * (asterisk), but we'll start with something simpler: ? (question mark). When it appears in a filename, the ? matches any single character. For example, letter? refers to any filename that begins with letter and has one character after that. This would include letterA, letter1, as well as filenames with a non-printing character as their last letter, like letter^C. The * wildcard matches any character or group of zero or more characters. For example, *.txt matches all files whose names end with .txt, *.c matches all files whose names end with .c (by convention, source code for programs in the C language), and so on.xactly how is this different to what Telecom was in the 1980s? It might be a well-engineered network, but it will still be too dear in the city. It will still have farmers arguing; ‘I decided to live 50kms out of town, but I don’t see why the government shouldn’t give me everything a townie has.’ It will still have union, green, depressed-area and retail-pressure groups trying to manipulate the politicians and the press.
Wildcard Matches
? Any single character
* Any group of zero or more characters
[ab] Either a or b
[a-z] Any character between a and z, inclusive
9. Delete files and directories
Sometimes you'll want to remove files and directories from your account. Be very careful and very selective with this because when you're operating on the command line there's no "trashcan" to easily undelete files. Somewhere, delete really means what it says, and that somewhere is here.
On the supercomputer we'll carefully delete the file in the home directory and then change the directory to seuss and delete the file there. We'll delete the file in that directory, change out of that directory and delete the directory. cd ~ rm list.txt cd seuss rm list.txt cd ~ rmdir seuss
Then on the local computer we'll use a shortcut; a command which
deletes the entire directory, all subdirectories and all files within the directory tree. This is remove with the recursive and force options.
cd ~
rm rf seuss
Be very careful with rm, especially with the -rf option and especially with wildcards. Consider what would happen to someone who wishes to delete all their backup files in a directory with the helpful suffix .BAK. Choosing a wildcard and the suffix they intend to type rm *.BAK but instead, they mistype the command and type rm * .BAK. The result of this typing error is that they have just deleted everything in that directory. Worse still imagine a user running as root thinking that they are about to delete a directory and instead types rm rf / ; a command that will delete everything or, more commonly rm rf ./ ; a command which deletes the current directory and all
3.0 Environment Modules and the Portable
Batch System
Environment Modules
Environment modules (not to be confused with kernel modules - another topic for another day) provide for the dynamic modification of the user's environment via module files. Each module contains the necessary configuration information for the user's session to operate according according to the modules loaded, such as the location of the application, its manual path, LD_LIBRARY_PATH and so forth - it is a lot easier that having to set these every time an application is used!
Modulefiles also have the advantages of being shared on many users on a system (such as an HPC system) and easily allowing multiple installations of the same application but with different versions and compilation options.
Module commands
Some basic module commands include the following:
1. Listing Available Modules
module avail
This option lists all the modules which are available to be loaded. Notice that many of them have version numbers associated with them. Modules
makes it easy to switch compiler application versions. The module name without a version number is the production default.
2. Module Specific Help
module help [modulefile]
If a module looks interesting, to get more information use this command to display the `help' information contained within the given module file. Note that this will only work if the module file has help associated with it. For example, module help namd will provide no additional information.